### Julia programming language

Note: working with Julia commit 2013-07-26 (0.2.0-prerelease+2820)Julia is a new high-level, high-performance dynamic programming language for technical computing.

High-performance assures fast

*execution*, whereas dynamic languages enable fast

*development*.

Both used to be contradictionary features of programming languages, since dynamic basically means no compilation before execution, and that means: More work at run-time. However, since Just-In-Time-Compilers got better and better, now there is a way to have dynamic and high perfomance languages. Julia is one of those languages. They use the LLVM as JIT-compiler - which I think is a pretty neat idea.

Installation is moderately easy as is described here . There are some precompiled packages here, but since Julia is pretty new language, it is best to do a git clone . For that, you might want to update your GCC, since I was not able to compile it with 4.1.2 which comes standard with OSX.

### Julia sets

### So somehow I could not resist in writing a program calculating a Julia set in Julia. Since there was already the mandelbrot example here, it was easy to change it to calculate Julia sets:

**julia.jl**

# the julia iteration function function julia(z, maxiter::Int64) c=-0.8+0.156im for n = 1:maxiter if abs(z) > 2 return n-1 end z = z^2 + c end return maxiter end

Since Julia has built-in support for complex numbers, we can now calculate, for example julia(0.5+0.2im, 256) .

### Sequential Calculation

To calculate a whole picture, we need some for loops like this:**calcJulia.jl**

# load image support include("myimage.jl") # load our julia function include("julia.jl") # create a 1000x500 Array for our picture h = 500 w = 1000 m = Array(Int64, h, w) # time measurements print("starting...\n") tStart=time() # for every pixel for y=1:h, x=1:w # translate numbers [1:w, 1:h] -> -2:2 + -1:1 im c = complex((x-w/2)/(h/2), (y-h/2)/(h/2)) # call our julia function m[y,x] = julia(c, 256) end tStop = time() # write the ppm-file myppmwrite(m, "julia.ppm") print("done. took ", tStop-tStart, " seconds\n");

Unfortuately, there is no image.jl anymore shipped with julia anymore, so here is a very easy version of it:

**myimage.jl**

function myppmwrite(img, file::String) s = open(file, "w") write(s, "P6\n") n, m = size(img) write(s, "$m $n 255\n") for i=1:n, j=1:m p = img[i,j] write(s, uint8(p)) write(s, uint8(p)) write(s, uint8(p)) end close(s) end

Together with the julia.jl file, you can execute above script with

julia calcJulia.jl

There is an even easier way to calculate the array m:

m = [ julia(complex(r,i)) for i=-1.0:.01:1.0, r=-2.0:.01:0.5 ];

(taken from extras/image.jl). However, above approach yields more insight on the parallel algorithm.### Parallel Calculation

Parallel programming is not as easy as in, for example OpenMP. We are creating a distributed array here (darray) and initialize it with the function parJuliaInit, which has to calculate its local part of the array. Because every processor needs to know the init function parJuliaInit and julia, we need to use the**@everywhere**command for the load and the function definition (@everywhere is not explained in the docs yet):

**parJulia.jl**

include("myimage.jl") # we need julia.jl everywhere @everywhere include("julia.jl") @everywhere w=1000 @everywhere h=500 # the parallel init function, needed everywhere @everywhere function parJuliaInit(I) # create our local patch d=(size(I[1], 1), size(I[2], 1)) m = Array(Int, d) # we need to shift our image xmin=I[2][1] ymin=I[1][1] for x=I[2], y=I[1] c = complex((x-w/2)/(h/2), (y-h/2)/(h/2)) m[y-ymin+1, x-xmin+1] = julia(c, 256) end return m end print("starting...\n") tStart = time() # create a distributed array, initialize with julia values Dm = DArray(parJuliaInit, (h, w)) tStop = time() # convert into a local array m = convert(Array, Dm) # write the picture myppmwrite(m, "julia.ppm") # print out time print("done. took ", tStop-tStart, " seconds\n");

### Results

You can run that code with

julia -p 4 parJulia.jl

Where you replace 4 with your number of processors.

Here's a comparison of different processor sizes and algorithms for different picture sizes calculated on a MacPro with 2x 2.26 Ghz Quad-Core Xeon processors (= 8 processors in total):

code | #processors | time 500x1000 | time 2000x4000 |
---|---|---|---|

calcJulia.jl | 1 | 2.20 s | 34.50 s |

parJulia.jl | 1 | 3.16 s | 46.26 s |

parJulia.jl | 2 | 1.83 s | 23.64 s |

parJulia.jl | 4 | 0.92 s | 7.28 s |

parJulia.jl | 5 | 0.83 s | 5.27 s |

parJulia.jl | 8 | 0.85 s | 2.97 s |

parJulia.jl | 10 | 0.81 s | 2.50 s |

The table shows the difficulties in parallel computing: Even if the algorithm scales perfectly in theory, the execution on N processors is not always N times faster. This is because of not all parts of your implementation and your

Note: Syntax highlighting is done with Syntax Highlighter and this little beta JavaScript configuration file for the Julia language.

Syntax Highlighting should now work.

Fixed sequential calculation.

Fixed parallel calculation. load() is not @everywhere by default.

Now working with Julia commit 2013-07-26, which changed DArray syntax and require.

*hardware*might scale perfectly, especially not for all problem sizes (very large or very small). The drastically difference of access rates for the L1/L2/RAM memory can even lead to results like the parallel calculation of the 2000x4000 image, which is 15 times faster on 8 processors than on one.### Conclusion

Julia is a nice language with a MATLAB-style syntax which could have a big influence on scientific computing. Many applications are operating on the memory bandwith limit, or the communication bandwith limit, so with a JIT compiler, those would be just fine.Note: Syntax highlighting is done with Syntax Highlighter and this little beta JavaScript configuration file for the Julia language.

**Update 1:**Syntax Highlighting should now work.

**Update 2:**Fixed sequential calculation.

**Update 3:**Fixed parallel calculation. load() is not @everywhere by default.

**Update 4:**Now working with Julia commit 2013-07-26, which changed DArray syntax and require.

This example only creates one dArray and only produces 1/4 of the Julia set image. How do I set up four darrays to produce the complete image?

ReplyDeleteHi Jim,

ReplyDeletesorry, I just checked that with the newest mac binaries, it does not work. Problem is that it's not loading the parJuliaInit on all cores. It's the same problem they describe here http://docs.julialang.org/en/latest/manual/parallel-computing/ , and they say the solution is to put it in an external file an use load("thefile.jl"). Somehow there seems to be a big in this routine recently.a

As I said in the article, Julia is pretty new and continously under development, best would be to do a git clone, they're not updating the mac binary very often.

- Martin

i mean bug.

Deletehi Jim, I fixed the parallel code (and the serial one o_O). hope it works now for you!

ReplyDeleteI just ran your latest implementation. Works great. This example gives more insight on how to apply DArrays.

ReplyDeleteThanks