by dar » Mon Sep 02, 2013 12:47 pm
quote]Regarding local memory, is this an issue due to the way cores in the same 'work group' wouldn't be able to communicate at the same speed with each other, but only their neighbor - combined with the fact they dont have a normal shared memory but share their banks?[[/quote]
The issue of memory architecture is complicated, and defines most everything about Epiphany. Basically, OpenCL uses address space qualifiers to describe two distinct characteristics of a memory architecture - physical locality and visibility. This makes sense for GPUs since the definitions are aligned. Epiphany shows the limitations of this terminology. The memory that one considers to be "local" for Epiphany, which I refer to as "core-local" memory, is physically local to a single core executing a work item in a workgroup. However, that memory is visible from all other cores with very fast, albeit non-uniform access from all other cores. An effective way to use this core-local memory is to allocate it similar to "OpenCL private" memory for each core, but then allow data to be exchanged between cores using extensions. It is possible to implement OpenCL local memory concept, which we do, but it is not efficient or a good way to use the processor. It can be used in theory by following various "rules" for access, similar to analogous rules we learned to use for GPUs, but since this is not a GPU, the rules will be different. The entire exercise is a detour from just understanding the memory architecture and writing your algorithms accordingly. It is a trade-off between portability vs performance. Performance is not portable with OpenCL anyway, so there is nothing special with Epiphany in that regard.