templates generating epiphany-code is the topic. so let's look at an actual example: cout object and its friends.
if the part you want to have compiled for epiphany contains cout, what should happen? ideally then linux should create a new terminal for each epiphany-core this code is running on, and the cout object in context of parallellized loops or whatever should send instructions to arm which message with which parameters should be sent to the newly created console.
of course this means you'd need to abandon the stdlib implementation of cout, you can't use a global object for that. but on the other hand, object-creation on the epiphany would need to cause activity in the linux-kernel (creation of a new device).
another lesson that can be learned from this example is, apart from local and shared memory, parallell architecture also has hidden memory to cope with. you said triangles are stored on GPU in games, this is a good example of such hidden memory. gpu has its own address space, accessing it is done either by a mirrorred virtual address-space or by sending messages to the device. either way, this memory should be considered to be readonly and writeonly, no read nor write access to that memory is performed, instead its memory-address is just passed to the device for moving around or transforming or whatever. it's a bit like playing chess through a key-hole.
so judging from your description this kind of memory has a name, what is it called? how do you call pointers to memory-locations inside a gpu? how do you call pointers stored inside the gpu which are pointing to the cpu's address space? imho the first object we need is exactly this kind of pointer implemented in c++...