Does OpenCL __local memory (declared as __local in kernel function signature, and set to null arg_value and appropriate size in clSetKernelArg() ) map to the 32K local memory of each epiphany RISC core? I ask because when I tried to launch a kernel to run on each core, and have each kernel write the address of its __local pointer into a global memory buffer, the addresses were all the same, and all with high order 12 bits set to 0x808.
I'm confused. In one sense, as per OpenCL, local memory should be shared by all work items in a work group. For OpenCL on epiphany via COPRTHR there is only 1 work group with max 16 work items, each work item corresponding to a RISC core. So __local buffers should indeed be shared by all epiphany cores as per OpenCL. The answer I got is 'correct' except that each core will pay a different access cost because although the __local memory was in local storage for 1 core, it is not local for any other core, and is accessed through the inter-core mesh.
How do I make use of the 32K RISC core local memory in an OpenCL kernel to speed up execution by avoiding repeated access to global memory?
Thanks for the help.