Generating the Julia Set using C++ and OpenCL

For the sake of keeping things short as possible I will skip redundant explanations what Julia Set and fractals are in general. There is plenty documentation on internet about it and let’s face it you are probably here just for the codez.  Just to make sure we are on the same page this is the picture of Julia set and what we will generate with our implementation. Colors are completely arbitrary. 

JuliaSet

Task here is to generate 2D image of Julia set where each individual pixel in image is represented with following equation:

zk+1 = zk 2 + c

Z and c represent 2D points with (x,y) values.  Each pixel is given an initial value for z and we will refer to as z0. This initial value is determined by the pixel’s position in the range z spans.

The value of c is constant for all the pixels. The above equation is applied to each pixel and is evaluated iteratively (in a loop). In each iteration, z is updated according to the above equation (the list of values zk refers to the orbit, or dwell of z). This creates a sequence of z values for each pixel. The first 5 values in a sequence are…

z0

z1 = z0 2 + c

z2 = z1 2 + c

z3 = z2 2 + c

z4 = z3 2 + c

Once we’ve calculated a new value for z in a given iteration, we calculate the length of z. If the length is greater than some threshold (usually 2.0), we stop iterating and use the index of the current iteration (k) as an index to look-up an array of colours to determine the colour for the pixel being processed. If we never exceed the threshold and k reaches a given maximum number of iterations then we colour the pixel black.

The image plane (x, y) is actually a complex plane and z and c are complex numbers. Complex numbers are like 2D points (with x and y values as noted above), but the rules for multiplying complex numbers is different. Multiplying “real” numbers (as you’ve done in maths before) simply scales the number along the “number line”, that is, the real axis. However, multiplying complex numbers (2D points on the plane) actually rotates the points in the plane as well as scales their position.

So this is basic intro into what code actually does. I will assume reader is familiar with setting up OpenCL, both for ATI/AMD and Nvidia graphic cards. Very briefly, for AMD cards you should download OpenCL SDK from here and for Nvidia ones CUDA toolkit form here. If we are talking about Windows and Visual Studio as C++ IDE then for Nvidia cards you should add following to “Additional include directories”:

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\include

and

C:\Program Files (x86)\AMD APP SDK\2.9\include

for AMD based cards. This is where your OpenCL header files will live after SDK inhalation. Same goes for “Library directories”

C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v5.5\lib\Win32

and

C:\Program Files (x86)\AMD APP SDK\2.9\lib\x86

Of course paths can differ if your version and installation path is different then mine.

Last “setup” thing to do is to link against “OpenCL.lib”, actual OpenCL library. This is the point where you are good to go and write, compile and execute your OpenCL kernels.

In my particular example I am using “Windows Imaging Component” to manage bitmaps (load, save to file , etc.).  Basically entire application, if I may call it that, is split into three files, main.cpp where application logic is, julia_kernels.cl – OpenCL implementation for generating Julia set images and imageio.cpp which is basically wrapper around aforementioned  “WIC”.

Application allows to execute code both on CPU and GPU to be able to measure speed gains when using OpenCL over standard linear CPU implementation.

main.cpp

#include 
#include 
#include 
#include 
#include "imageio.h"
#include "setup_cl.h"
#include 

#include 
#include 

using namespace std;


//
// support structs (reflect vector types in OpenCL)
//

struct float2 {

	float x, y;

	float2(const float _x, const float _y) : x(_x), y(_y) {}
};

struct float3 {

	float x, y, z;

	float3(const float _x, const float _y, const float _z) : x(_x), y(_y), z(_z) {}
};

struct float4 {

	float x, y, z, w;

	float4(const float _x, const float _y, const float _z, const float _w) : x(_x), y(_y), z(_z), w(_w) {}
};


struct pixel_color {
	float R;
	float G;
	float B;
	float A;
};

pixel_color colors[16];

void InitPixelColors()
{
	pixel_color black = { 0.0f, 0.0f, 0.0f, 255.0f };
	pixel_color blue = { 9.0f, 26.0f, 236.0f, 255.0f };
	pixel_color brown = { 163.0f, 107.0f, 7.0f, 255.0f };
	pixel_color green = { 8.0f, 240.0f, 6.0f, 255.0f };
	pixel_color magenta = { 250.0f, 32.0f, 130.0f, 255.0f };
	pixel_color orange = { 250.0f, 107.0f, 6.0f, 255.0f };
	pixel_color red = { 250.0f, 0.0f, 0.0f, 255.0f };
	pixel_color darkGrey = { 50.0f, 50.0f, 50.0f, 255.0f };
	pixel_color lightBlue = { 50.0f, 170.0f, 200.0f, 255.0f };
	pixel_color lightGreen = { 50.0f, 240.0f, 175.0f, 255.0f };
	pixel_color lightYellow = { 245.0f, 250.0f, 140.0f, 255.0f };
	pixel_color violetRed = { 245.0f, 150.0f, 180.0f, 255.0f };
	pixel_color ivory = { 255.0f, 250.0f, 240.0f, 255.0f };
	pixel_color yellow = { 235.0f, 255.0f, 15.0f, 255.0f };
	pixel_color cyan = { 0.0f, 240.0f, 240.0f, 255.0f };
	pixel_color lime = { 191.0f, 255.0f, 0.0f, 255.0f };

	colors[0] = black;
	colors[1] = blue;
	colors[2] = brown;
	colors[3] = green;
	colors[4] = magenta;
	colors[5] = orange;
	colors[6] = red;
	colors[7] = darkGrey;
	colors[8] = lightBlue;
	colors[9] = lightGreen;
	colors[10] = lightYellow;
	colors[11] = violetRed;
	colors[12] = ivory;
	colors[13] = yellow;
	colors[14] = cyan;
	colors[15] = lime;
}

pixel_color getPixelColor(int val){

	return colors[val];
}

void generate_julia_set_image(
	cl_context context, 
	cl_device_id device, 
	cl_command_queue commandQueue, 
	cl_program program, 
	float c_component_real_part, 
	float c_component_imaginary_part,
	int imageWidth,
	int imageHeight,
	int numIterations
	)
{
	cl_int err;
	cl_kernel kernel = 0;
	cl_mem outputImage = 0;
	BGRA8 *result;
	pixel_color pixelColor;
	pixelColor.R = 1.0;
	pixelColor.G = 1.0;
	pixelColor.B = 0.0;
	pixelColor.A = 1.0;
	cl_mem pixelColorBuffer;

	float2 C_component(c_component_real_part, c_component_imaginary_part);
	cl_mem c_componentBuffer;

	try
	{
		kernel = clCreateKernel(program, "generate_image", 0);
		if (!kernel)
			throw(string("could not create kernel"));

		// setup output image
		cl_image_format outputFormat;

		outputFormat.image_channel_order = CL_BGRA; // store a BGRA image
		outputFormat.image_channel_data_type = CL_UNORM_INT8;// Each component is 8 bits in the range [0, 1]

		outputImage = clCreateImage2D(context, CL_MEM_WRITE_ONLY, &outputFormat, imageWidth, imageHeight, 0, 0, &err);

		pixelColorBuffer = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(pixel_color), &pixelColor, 0);
		c_componentBuffer = clCreateBuffer(context, CL_MEM_READ_ONLY | CL_MEM_COPY_HOST_PTR, sizeof(float2), &C_component, 0);

		if (!pixelColorBuffer){
			throw(string("pixel buffer failed"));
		}

		clSetKernelArg(kernel, 0, sizeof(cl_mem), &pixelColorBuffer);
		clSetKernelArg(kernel, 1, sizeof(cl_mem), &outputImage);
		clSetKernelArg(kernel, 2, sizeof(cl_mem), &c_componentBuffer);
		clSetKernelArg(kernel, 3, sizeof(int), &numIterations);


		// setup worksize arrays
		size_t globalWorkSize[2] = { imageWidth, imageHeight };

		// setup event (for profiling)
		cl_event doneEvent;

		// enqueue kernel
		err = clEnqueueNDRangeKernel(commandQueue, kernel, 2, 0, globalWorkSize, 0, 0, 0, &doneEvent);

		// block until kernel finishes and report time taken to run the kernel
		clWaitForEvents(1, &doneEvent);
		cl_ulong startTime = (cl_ulong)0;
		cl_ulong endTime = (cl_ulong)0;

		clGetEventProfilingInfo(doneEvent, CL_PROFILING_COMMAND_START, sizeof(cl_ulong), &startTime, 0);
		clGetEventProfilingInfo(doneEvent, CL_PROFILING_COMMAND_END, sizeof(cl_ulong), &endTime, 0);

		double tdelta = (double)(endTime - startTime);

		cout << "time taken (in seconds) to create the image = " << (tdelta * 1.0e-9) << endl;

		// extract the resulting image from OpenCL
		result = (BGRA8*)malloc(imageWidth * imageHeight * sizeof(BGRA8));
		if (!result)
			throw("cannot create output image buffer to save to disk");

		size_t origin[3] = { 0, 0, 0 };
		size_t region[3] = { imageWidth, imageHeight, 1 };

		err = clEnqueueReadImage(commandQueue, outputImage, CL_TRUE, origin, region, 0, 0, result, 0, 0, 0);

		wstring filename(L"gpu_" + to_wstring(imageWidth) + L"x" + to_wstring(imageHeight) + L"_" + to_wstring(255) + L".bmp");
		saveImageBGRA8(imageWidth, imageHeight, result, filename);
	}
	catch (std::string err)
	{
		cout << "Error has occured: " << err << std::endl;
	}
	
	/* Deallocate resources */
	clReleaseMemObject(pixelColorBuffer);
	clReleaseMemObject(outputImage);
	clReleaseMemObject(c_componentBuffer);
	clReleaseKernel(kernel);
	
}



double GetComplexLenght(complex input)
{
	return std::sqrt(std::pow((std::real(input)), 2) + std::pow((std::imag(input)), 2));
}

void RunOnCPU(int imageWidth, int imageHeight, int numIterations, double c_component_real_part, double c_component_imaginary_part)
{
	std::chrono::time_point start, end;
	start = std::chrono::system_clock::now();

	cout << "\n\nRunnig  on CPU\n" << endl;

	BGRA8* imageBuffer = (BGRA8*)malloc(imageWidth * imageHeight * sizeof(BGRA8));

	complex Z(0.0, 0.0);
	complex C(c_component_real_part, c_component_imaginary_part);


	for (int i = 0; i < imageHeight; i++)
	{
		for (int j = 0; j < imageWidth; j++)
		{
			Z = complex((-imageWidth / 2 + j) / (imageWidth / 4.0), (-imageHeight / 2 + i) / (imageHeight / 4.0));
			int iteration = 0;

			for (int k = 0; k < numIterations; k++)
			{
				Z = std::pow(Z, 2) + C;
				iteration++;
				if (GetComplexLenght(Z) > 2)
				{
					break;
				}
			}

			int pixelIndex = j + (i * imageWidth);
			if (iteration == numIterations)
			{
				imageBuffer[pixelIndex].r = 0;
				imageBuffer[pixelIndex].g = 0;
				imageBuffer[pixelIndex].b = 0;
				imageBuffer[pixelIndex].a = 255;
			}
			else
			{
				pixel_color pc = getPixelColor(iteration % 16);
				imageBuffer[pixelIndex].r = (BYTE)pc.R;
				imageBuffer[pixelIndex].g = (BYTE)pc.G;
				imageBuffer[pixelIndex].b = (BYTE)pc.B;
				imageBuffer[pixelIndex].a = (BYTE)pc.A;
			}
		}
	}

	end = std::chrono::system_clock::now();

	std::chrono::duration elapsed_seconds = end - start;
	std::time_t end_time = std::chrono::system_clock::to_time_t(end);

	std::cout << "\nFinished computation on CPU at " << std::ctime(&end_time)
		<< "elapsed time: " << elapsed_seconds.count() << "s\n\n\n";

	wstring filename(L"cpu_" + to_wstring(imageWidth) + L"x" + to_wstring(imageHeight) + L"_" + to_wstring(numIterations) + L".bmp");
	saveImageBGRA8(imageWidth, imageHeight, imageBuffer, filename);
	free(imageBuffer);
}

int RunOnGPU(int imageWidth, int imageHeight, int numIterations, double c_component_real_part, double c_component_imaginary_part)
{
	cout << "\n\nRunnig  on GPU\n" << endl;
	// 1. Create and validate OpenCL context
	cl_context context = createContext();

	if (!context)
	{
		cout << "cl context not created\n";
		shutdownCOM();
		return 1;
	}

	// 2. Get a list of devices associated with the context
	size_t deviceBufferSize;
	cl_int errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, 0, 0, &deviceBufferSize);
	cl_device_id* contextDevices = (cl_device_id*)malloc(deviceBufferSize);
	errNum = clGetContextInfo(context, CL_CONTEXT_DEVICES, deviceBufferSize, contextDevices, 0);

	// use the first device in the list - should be GPU since we requested this during context setup
	cl_device_id device = contextDevices[0];

	// 3. Create and vailedate program object based on julia_kernels.cl
	cl_program program = createProgram(context, device, "julia_kernels.cl");

	if (!program)
	{
		cout << "Could not create program object";
		shutdownCOM();
		return 1;
	}

	// 4. create and validate the command queue (for first device in context)
	// Note: add profiling flag so we can get timing data from events
	cl_command_queue commandQueue = clCreateCommandQueue(context, device, CL_QUEUE_PROFILING_ENABLE, 0);
	if (!commandQueue) {

		cout << "command queue not created\n";
		shutdownCOM();
		return 1;
	}

	// call function to carry out task
	generate_julia_set_image(context, device, commandQueue, program, c_component_real_part, c_component_imaginary_part, imageWidth, imageHeight, numIterations);

	/* Deallocate resources */
	clReleaseCommandQueue(commandQueue);
	clReleaseProgram(program);
	clReleaseContext(context);
}

void Test_800x600_255(double c_component_real_part, double c_component_imaginary_part)
{
	int imageWidth = 800;
	int imageHeight = 600;
	int numIterations = 255; 
	RunOnCPU(imageWidth, imageHeight, numIterations, c_component_real_part, c_component_imaginary_part);
	RunOnGPU(imageWidth, imageHeight, numIterations, c_component_real_part, c_component_imaginary_part);
}

void Test_1024x768_255(double c_component_real_part, double c_component_imaginary_part)
{
	int imageWidth = 1024;
	int imageHeight = 768;
	int numIterations = 255;
	RunOnCPU(imageWidth, imageHeight, numIterations, c_component_real_part, c_component_imaginary_part);
	RunOnGPU(imageWidth, imageHeight, numIterations, c_component_real_part, c_component_imaginary_part);
}

void Test_1680x1050_255(double c_component_real_part, double c_component_imaginary_part)
{
	int imageWidth = 1680;
	int imageHeight = 1050;
	int numIterations = 255;
	RunOnCPU(imageWidth, imageHeight, numIterations, c_component_real_part, c_component_imaginary_part);
	RunOnGPU(imageWidth, imageHeight, numIterations, c_component_real_part, c_component_imaginary_part);
}

void Test_6400x3200_255(double c_component_real_part, double c_component_imaginary_part)
{
	int imageWidth = 6400;
	int imageHeight = 3200;
	int numIterations = 255;
	RunOnCPU(imageWidth, imageHeight, numIterations, c_component_real_part, c_component_imaginary_part);
	RunOnGPU(imageWidth, imageHeight, numIterations, c_component_real_part, c_component_imaginary_part);
}

void TestScalability()
{
	int imageWidth = 6400;
	int imageHeight = 3200;
	double c_component_real_part = -0.805;
	double c_component_imaginary_part = 0.156;
	int numIterations = 255;

	while (RunOnGPU(imageWidth, imageHeight, numIterations, c_component_real_part, c_component_imaginary_part) == 0)
	{
		imageWidth += 1000;
		imageHeight += 1000;
	}
}

void RunAutomaticTests()
{
	cout << "Runnig automatic tests!" << endl;
	double c_component_real_part = -0.805;
	double c_component_imaginary_part = 0.156;

	// Run speed comapre tests
	Test_800x600_255(c_component_real_part, c_component_imaginary_part);
	Test_1024x768_255(c_component_real_part, c_component_imaginary_part);
	Test_1680x1050_255(c_component_real_part, c_component_imaginary_part);
	Test_6400x3200_255(c_component_real_part, c_component_imaginary_part);

	// Test Scalability - How big images can we create
	TestScalability();

	cout << "Test run finished!" << endl;
}

int main(int argc, char **argv)
{
	// initialise COM so we can export image data using WIC
	initCOM();

	InitPixelColors();

	// clear screen
	system("cls");
	cout << "***************************************************************\n";
	cout << "*             Julia set generator                             *\n";
	cout << "***************************************************************\n\n\n";

	cout << "Do you want to run automatic tests both on GPU and CPU (y/n)? ";
	char answer;
	cin >> answer;

	if (answer == 'y' || answer == 'Y'){
		RunAutomaticTests();
	}
	else
	{
		cout << "Enter image width: ";
		int imageWidth = 800;
		cin >> imageWidth;
		cout << "\nEnter image height: ";
		int imageHeight = 600;
		cin >> imageHeight;

		float c_component_real_part;
		cout << "\nEnter real part of C component: ";
		cin >> c_component_real_part;

		float c_component_imaginary_part;
		cout << "\nEnter imaginary part of C component: ";
		cin >> c_component_imaginary_part;

		int numIterations = 255;
		cout << "\nEnter the number of iterations: ";
		cin >> numIterations;

		RunOnCPU(imageWidth, imageHeight, numIterations, c_component_real_part, c_component_imaginary_part);

		RunOnGPU(imageWidth, imageHeight, numIterations, c_component_real_part, c_component_imaginary_part);
	}
	//done
	shutdownCOM();

	return 0;
}

And this is actual OpenCL kernel implementation:

// Julia set structs and kernels
typedef struct pixel_color {
	float		R; 
	float		G; 
	float		B;
	float		A;
} pixel_color;

//2 component vector to hold the real and imaginary parts of a complex number:
typedef float2 cfloat;

#define I ((cfloat)(0.0, 1.0))
#define M_PI 3.14159265358979323846

/*
 * Return Real (Imaginary) component of complex number:
 */
inline float real(cfloat a){
     return a.x;
}
inline float imag(cfloat a){
     return a.y;
}

/*
 * Get the modulus of a complex number (its length):
 */
inline float cmod(cfloat a){
    return (sqrt(a.x*a.x + a.y*a.y));
}

// Add two complex numbers
inline cfloat cadd(cfloat a, cfloat b){
	return (cfloat)( a.x + b.x, a.y + b.y);
}

/*
 * Get the argument of a complex number (its angle):
 * http://en.wikipedia.org/wiki/Complex_number#Absolute_value_and_argument
 */
inline float carg(cfloat a){
    if(a.x > 0){
        return atan(a.y / a.x);

    }else if(a.x < 0 && a.y >= 0){
        return atan(a.y / a.x) + M_PI;

    }else if(a.x < 0 && a.y < 0){
        return atan(a.y / a.x) - M_PI;

    }else if(a.x == 0 && a.y > 0){
        return M_PI/2;

    }else if(a.x == 0 && a.y < 0){
        return -M_PI/2;

    }else{
        return 0;
    }
}


/*
 * Multiply two complex numbers:
 *
 *  a = (aReal + I*aImag)
 *  b = (bReal + I*bImag)
 *  a * b = (aReal + I*aImag) * (bReal + I*bImag)
 *        = aReal*bReal +I*aReal*bImag +I*aImag*bReal +I^2*aImag*bImag
 *        = (aReal*bReal - aImag*bImag) + I*(aReal*bImag + aImag*bReal)
 */
inline cfloat  cmult(cfloat a, cfloat b){
    return (cfloat)( a.x*b.x - a.y*b.y, a.x*b.y + a.y*b.x);
}

/*
 *  Square root of complex number.
 *  Although a complex number has two square roots, numerically we will
 *  only determine one of them -the principal square root, see wikipedia
 *  for more info: 
 *  http://en.wikipedia.org/wiki/Square_root#Principal_square_root_of_a_complex_number
 */
 inline cfloat csqrt(cfloat a){
     return (cfloat)( sqrt(cmod(a)) * cos(carg(a)/2),  sqrt(cmod(a)) * sin(carg(a)/2));
 }

 inline pixel_color getPixelColor(int val){
	pixel_color colors[16];
	pixel_color black = { 0.0f, 0.0f, 0.0f, 1.0f };
	pixel_color blue = { 9.0f, 26.0f, 236.0f, 1.0f };
	pixel_color brown = { 163.0f, 107.0f, 7.0f, 1.0f };
	pixel_color green = { 8.0f, 240.0f, 6.0f, 1.0f };
	pixel_color magenta = { 250.0f, 32.0f, 130.0f, 1.0f };
	pixel_color orange = { 250.0f, 107.0f, 6.0f, 1.0f };
	pixel_color red = { 250.0f, 0.0f, 0.0f, 1.0f };
	pixel_color darkGrey = { 50.0f, 50.0f, 50.0f, 1.0f };
	pixel_color lightBlue = { 50.0f, 170.0f, 200.0f, 1.0f };
	pixel_color lightGreen = { 50.0f, 240.0f, 175.0f, 1.0f };
	pixel_color lightYellow = { 245.0f, 250.0f, 140.0f, 1.0f };
	pixel_color violetRed = { 245.0f, 150.0f, 180.0f, 1.0f };
	pixel_color ivory = { 255.0f, 250.0f, 240.0f, 1.0f };
	pixel_color yellow = { 235.0f, 255.0f, 15.0f, 1.0f };
	pixel_color cyan = { 0.0f, 240.0f, 240.0f, 1.0f };
	pixel_color lime = { 191.0f, 255.0f, 0.0f, 1.0f };
	
	colors[0] = black;
	colors[1] = blue;
	colors[2] = brown;
	colors[3] = green;
	colors[4] = magenta;
	colors[5] = orange;
	colors[6] = red;
	colors[7] = darkGrey;
	colors[8] = lightBlue;
	colors[9] = lightGreen;
	colors[10] = lightYellow;
	colors[11] = violetRed;
	colors[12] = ivory;
	colors[13] = yellow;
	colors[14] = cyan;
	colors[15] = lime;

	pixel_color normalizedColorVal = { colors[val].R / 255.0f, colors[val].G / 255.0f, colors[val].B / 255.0f, 1.0f };

	return normalizedColorVal;
 }

__kernel void generate_image(const global pixel_color *pixelColor, write_only image2d_t outputImage , const global float2 *c_component,  int numIterations)
{
	// get id of element in array
	int x = get_global_id(0);
	int y = get_global_id(1);
	int w = get_global_size(0);
	int h = get_global_size(1);

	pixel_color pc = *pixelColor;
	//pc.R = c_component[0].x;
	//pc.G = c_component[0].y;

	cfloat Z = { ( -w / 2 + x) / (w/4.0f) , ( -h / 2 + y) / (h/4.0f) };
	cfloat C = { c_component[0].x, c_component[0].y };
	int iteration = 0;

	while (iteration < numIterations)
	{
		 cfloat Zpow2 = cmult(Z, Z); 
		 cfloat Zn = cadd(Zpow2, C);
		 Z.x = Zn.x;
		 Z.y = Zn.y;
		 iteration++;
		 if(cmod(Z) > 2)
		 {
			break;
		 }
	}

	// threshold reached mark pixel as black
	if (iteration == numIterations)
	{
		pc.R = 0.0f;
		pc.G = 0.0f;
		pc.B = 0.0f;
	}
	else
	{
		pc = getPixelColor(iteration % 16);
	}

	// RGBA
	float4 color = (float4)(pc.R, pc.G, pc.B, pc.A);

	write_imagef(outputImage, (int2)(x, y), color);
}

Here is the example application running.

juliaset generator

I don't want to get into details about how each line of code works. I’ve tried to comment on important parts of code. If you have any questions about some parts of the code hit me in comments and I’ll be glad to clear things up.

Here is the entire solution available for download.

JuliaSet.zip (19.74 kb)

About me

Bizic Bojan is Co-Founder of Amida IT-Services GmbH and Software Architect with focus on .NET, C++, Python and Cloud Native solutions. 

 

Disclaimer:

The opinions expressed herein are my own personal opinions and do not represent my employer’s view in any way.

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported License.