I'd like to build a ring buffer between each eCore and the host. I do not see any atomic operations available and worse I see no way to introduce fences on Epiphany.
I would like at least those fences available on a weak-memory architecture (acquire/release semantics) but the eCores appear to support an almost relaxed model. About
the only thing I think I can rely on (please confirm!) is that writes are ordered when an eCore targets the external shared memory area.
Note I'm not concerned with eCore <--> eCore communications across local memories. Only eCore <--> ARM shared memory over the mesh.
I have seen the trick on this forum of implementing a release barrier by writing and then polling on the written address until we can verify that the value is durable (i.e., the write finished).
Is that the best tool we have on Epiphany for implementing fences?
Anyway, I think I can at least build a simple 1:1 FIFO per eCore but I'm just checking to see if anyone has any other tricks. For example, can I implement memory barriers using the DMA engine?
Are there any memory ordering guarantees at least for DMAs regardless of how inefficient this might be?
I have able to achieve efficient communication using the Cell SPU atomic cache and LL/SC instructions as well as unified virtual memory on GPUs. I'm just trying to get a similar communications channel working on Epiphany.
If I can't build lock-free structures on Epiphany then I will probably not be spending much time with it.
Thanks for any help or clarification you can provide.