https://www.youtube.com/watch?v=Ey-inJ9Dz6Q
Seems to be a language that can automatically map arrays to a partitioned global address space, and (if I've understood correctly?) iterations over those arrays are automatically distributed across threads (or processors) tied to subsets of the address range. It's sort of data-parallel, but with controlled locality.
I suppose as it stands this still isn't really a perfect match to the epiphany chip, where the scratchpads are more analogous to L1 cache (and you'd really be using DMA to work with off chip global memory); but I guess it might apply more to the vision of future versions with large amounts of 3d memory per core, on chip. I speculate you might be able to make an openCL implementation work like this (but it would be horrendously complex to do.)
The talk is interesting, covering the use cases for PGAS where it offers potential advantages over MPI and shared-memory.