Parallella Community

by **Aaahm1975** » Tue Jun 02, 2015 12:36 pm

Hi folks

I have been fascinated by the parallella board since I discovered its existence a couple days ago. I watched the presentations from Japan this weekend and saw a talk that Andreas gave at the Erlang conference.

Andreas you are a visionary, a pioneer who can clearly see how computing should develop. I was particularly interested in your plans for future epiphany V and onwards improvements. I liked the fact that you are honest with your perceived failures, at times when things did not go to plan. I am glad that your successes have triumphed. To the people who bought the parallela board and even more so those who use it and try do things with it, you have my admiration and great respect.

I am getting back into programming 20 years after two years of computer science studies learning programming in turbo Pascal. I hated turbo Pascal and now that I am learning Python see how simple languages can be.

So now onto the topic.

Andreas you mentioned that a parallell programming book is being written this year, what are your plans with regards to this?

You also mentioned that we need a new programming language, how do you think this can be initiated? Could one not approach a person like Guido van Rossum and get Python redesigned to be such a program?

At any rate I am supportive of your work and believe that it has a bright future, it's ludicrous to me that certain phone app startups get more funding from venture capitalists then you do and yet what they offer is meaningless in the greater scheme of things.

by **dobkeratops** » Fri Jun 05, 2015 7:01 pm

>> You also mentioned that we need a new programming language, how do you think this can be initiated? Could one not approach a person like Guido van Rossum and get Python redesigned to be such a program?

I've only just discovered parallella myself. Having encountered CELL in gamedev (and been through the pain of adapting single-threaded, large memory code to local stores/DMA/parallelism), this is' giving me a nostalgic buzz but also caution. Local stores used to be quite common in games-machines.

The same observation was made - 'it needs a new language' - and the chip simply died before such a language appeared - another similar chip -AGEIA PhysX - aimed at the same role - suffered the same fate. I seem to remember IBM having some project called 'Octopiler' ..

(I remember mentioning erlang at the time.)
It was squeezed out by multicore CPU's and GPU's getting more general. Originally in games it was intended to do a larger chunk of the rendering.. vertex processing.. but in practice often ended up just making it awkward for implementing what gets done with traditional multicore now. the volume of code ending up on the SPU's was quite large.

It seems potential 'languages' are out there, erlang, haskell, but I think its' still difficult to map algorithms to NUMA hardware.. perhaps its for the same reason C++ still persists despite GC'd languages, that is memory allocation is actually part of the algorithm: and for a NUMA machine, how you distribute data-structures between processors is *also* part of the algorithm.

There's a new language, Rust, which is very similar to C++ but with some tweaks that make it slightly more suited to parallel programming.. (e.g. better immutability) - but I still think you'd have to do further with something that doesn't necessarily deal with adresses directly, allowing the compiler to split up jobs & data structures itself. They might still have some concepts that might map onto it.. ownership of objects moving between threads..

Since the hell/excitement of CELL, C++ has got better with Lambdas. We seriously needed them back then.

I'd be curious to know more , what's the thinking in this community, are there any efforts in progress.

I have personally started my own language project a while back (taking inspiration from C++ & Rust) - but it wouldn't be suited to this.

I'm quite tempted to grab one of these boards, but am worried about it being a tiny 'mental niche', but lets see what this community is like...

by **piotr5** » Fri Jun 05, 2015 9:59 pm

I disagree about the need for a new language. c++ hasn't been put into full use yet, especially in the area of parallell programming some non-standard attributes might aid the compiler in transforming sequential programs into parallell programs. also c++ offers some method of data-flow-tracking, you'd just need a non-standard compiler to make use of that more transparently. the only area about c++ where I truely am disappointed is that I frequently have difficulties with code-refactoring. suppose you have a loop which iterates over all elements of a tree. then you need to write an iterator to do exactly the same. i.e. you have c code and want to create a c++ object out of that. how do you do it? the answer is, you use the boost library. there is a possibility to interrupt the current loop and continue elsewhere, effectively creating an iterator out of the old c loop. but is that solution really equal to re-writing the thing from scratch? and there are many more situations where code-reuse is impossible because of structural differences.

what I like the most about c++ is that it doesn't need any garbage-collection. you create an object by putting some declaration into the function using it, it unfolds into whatever additional memory needed beyond stack. and then when you don't need it anymore, let your function end so the useless data is declared as garbage. at the same time your function returns, so the memory will look the same as before calling the function. if your function did return an object, that object too gets destroyed and re-created anew as soon as the cleaning is finished. same with passing objects as parameters, when the object has been created just for the sake of passing data, that data will be extracted into whatever new objects and the old object is partially destroyed as soon as it isn't needed anymore -- only the data on the heap is kept till the function returns. if you do make sure data is alive only for as long as it's needed, your data-heap will never become fragmented because it's stored much alike to the stack structure underlying local variables. only problem is, there is a lack of libs which implement that kind of heap-management, so calling other libs could make garbage collection necessary. but thanks to r-value passing this now has been reduced...

as for parallell programming, once you figure out your current core's address, you could just pass a lambda function F into some class, declare it and F will run on another core for as long as your function is active, or alternatively after your function has finished, the end of F is awaited. and again alternatively while waiting for F to finish some other tasks could be started...

as for splitting objects, it's true that in c++ you cannot describe objects in a natural language, you must prepare for various eventualities manually. if some task requires one part of the data and another thread requires another part, put each part on the heap, and move it around among the cores when it's needed. the only thing c++ is lacking is a command for pre-loading data to where it's needed. i.e. if some future function needs some data, earlier the compiler needs to make the program initiate dma-transfer or already during creation initiate the correct placement-new. so you'd need an attribute for performing a command at some earlier point of time. i.e. in your program you define some dependency, and at some other point of the program this dependency will trigger some function, either when asked for it or at a fixed amount of assembler-cycles. not really sure how this could be solved in standard c++. but then, don't know of any other programming language allowing such a back-in-time programming...

by **dobkeratops** » Fri Jun 05, 2015 11:38 pm

I guess lambdas help a *lot* .. the problem we had before with cell was, your algorithm was conflated with details of transfers, and other optimisations, an implementation was severely butchered.

I guess the parallel's individual cores may not be so much of a nightmare, and they can directly read each other.

so what you're saying is inline with this idea that 'how you map onto memory is part of the algorithm''(what you're trying to describe with your program), however it would be nice to express the program's algorithms in a purer form , and put those mappings in as hints or another layer of information. Then you could compile the same algorithm for different architectures more easily.

by **piotr5** » Sat Jun 06, 2015 5:48 am

unfortunately, with some experience you will learn that a pure form of algorithm never exists. suppose you have a 128biit register and you can address it in 8-bit chunks. it's logical you somehow could implement parallell algorithms for 8-bit values on that. try it and you'll notice that pure form is ambiguous. for example simd in general and this kind of programming specifically doesn't have actual branching for individual vector-entries. but you still could implement branching by adding some nop. i.e. while all other values of the vector are multiplied with a meaningful value, the one where you wish to skip the multiplication will be set to 1. same with adding numbers and the number 0. thereby every if-statement translates into a function which results in 1 on some vectors and in 0 in others. of course you'd like to write "if" in such cases for readability. so again we have the problem that compiler has no precognition: for vectors that later will be added some values are being transformed into 0 and vectors that will be multiplied have something transformed into 1. how could the compiler know up front which to use? well, it would be possible if you wrote the actual branch to be conditionally executed before the actual condition, like in perl -- unfortunately in perl some other language components are missing to alter the meaning of the if-statement. many mathematical algorithms have an additional ambiguity, could be implemented in different ways. which way to use mainly depends on the context you're addressing. so basically if the compiler would have to choose from context which methods to use then the algorithm would need to be stated in all possible versions at once.

so, yes, in c++ it's possible to hide the gory details of how memory is getting used, put them into another file or a lib. and no, when working around limitations of some hardware-setup those gory details become structurally part of your program and cannot be hidden. c++ was designed for expressing mathematical algorithms in pure form, you are free to program operators to work the way you want so that formulas appear in a readable way and thereby documentation isn't needed. but c++ still lacks the ability to express in sourcecode why the algorithm actually is working. it's possible to make the compiler create the coefficients of some newton-method based on an algorithm how to calculate them. maybe you can tell the compiler to actually apply newton on some mathematical formula. but how do you express the mathematical knowledge behind the newton algorithm? what is the pure form of newton-methods? how do you document why that approximation works? newton is about intersecting a tangent with the x-axis to find the zeroes. so you need to transform the formula in a way that it's zero at the value you wish to calculate, invert your function. then there's a method for calculating how many iterations you'd need and how much precision is required for all these calculations. also it might be useful to calculate 2 iterations in a single step if this would reduce the actual work (with the help of multiply-add). expressing all this is possible in c++ (although nobody has ever doone it). what doesn't work well is to make the compiler re-arrange the calculations in a way that the result is less precise but faster. this way the algorithm would stop being deterministic. every time you switch around brackets in your float calculations, you also change the result, operations on float are not associative...

by **dobkeratops** » Sat Jun 06, 2015 2:43 pm

>> this way the algorithm would stop being deterministic. every time you switch around brackets in your float calculations, you also change the result, operations on float are not associative...

I imagine transformations would only be allowed that are deterministic; re-ordering how the algorithm maps onto registers, time and space - should IMO be possible seperately from specifying the actual calculations, and even indexing, with the functional mentality of map/reduce/filter .(I'm not a LISP or Haskell fanatic.. but I definitely see the reasoning behind them after having done all the manual pipelining for the 360 ..)

I draw the analogy of register allocation:

we used to code in asm, specifying exactly how the algorithm mapped onto registers & stack;
eventually, the compilers got good enough that we trusted them to do that for us, and started using C/C++; however we still sometimes examine or profile the output, being aware of whats going on, checking for register over-spill and moving things around for different cache behaviour. And whilst the compilers were improving we had the 'register' keyword (or even pragmas for specific register allocation).
So the next step in this trend would be to express things in a more functional way, but then examine how it compiles to C, profile it, tweak the mapping with hints perhaps..

Parallella Community

Parallel Programming Language Development

Parallel Programming Language Development

Re: Parallel Programming Language Development

Re: Parallel Programming Language Development

Re: Parallel Programming Language Development

Re: Parallel Programming Language Development

Re: Parallel Programming Language Development

Who is online