Great, thanks for trying it, posting bugs and especially posting some code
The code is very much beta and I will look into & fix the issue around host to device communication issue you mentioned for p2p & bcast. Please let me know of any other issues you find.
Yes, for the host id from a device that is a good point, I suppose the solution would be for me to add a num_device_cores and num_host_cores call (which should be quite trivial) and then a mapping function which translates the relative host id to the absolute core id (which always follow the device ids.)
Due to the limits of memory per core in the Epiphany, it does the lexing & parsing on the host (using Flex & Bison) to "compile" into a byte code representation (which can be written out via command line args.) This is designed to have as small a memory footprint as possible and is transferred onto the device via memory copy. The device (and host threads if selected) are running an interpreter which then executes the byte code itself. An additional thread on the host is a "monitor" device cores can communicate with this via a memory copy to do I/O, maths functions such as cos,tan etc (as we don't want to put maths library onto cores), string handling etc... By default it tries to put the variable values in core memory too, but for big arrays this is not always possible. There is some logic to switch the variables and/or byte code into shared memory (which can be done explicitly via command line switches too) if it doesn't fit into the device memory but this is obviously at a performance penalty (it is quite noticeable actually, and ePython provides a timing command line option where you can see the impact of this.)
In terms of performance, it is currently quite slow - but the whole idea with it is education and to get people really quickly writing parallel code and running it on this architecture. I could imagine that, based on this, they would then explore some other tools and build on the initial knowledge gained. I think there are plenty of places where the code could be sped up, and lots of additional functionality which could be added too.
Thanks,
Nick