http://www.kalrayinc.com
seems to be a very similar idea, including the ability to extend the grid with multiple chips on a card.
they seem to have more structure, a grid of clusters, does that mean more yield problems? (i.e. if you have a defect, you're more likely to lose a cluster?)
I guess epiphany's ability to load/store across the network could give the same benefit as having clusters (i.e. you could think of a 2x2 block as having a slightly longer latency, larger scratchpad, like an openCL workgroup), seems more versatile to me?
they have 2mb/16 core cluster(?), I guess thats's 128k each, and they have to cache that, does that mean epiphany's compute density is higher. sounds like a significantly more complex chip all round.
maybe it maps more closely to OpenCL (but I wondered if making 2x2 tiles equivalent to workgroups and using a fraction of each scratchpad as 'local-memory' would help epiphany map better)
I guess it should be possible to have a programming model that work work 'very well' on both (MPI?) ?