I've just come across a paper from Paralant describing their Insight library for Epiphany, that contains a super description of how to get the best from the Epiphany architecture, and includes measured speedups for various matrix sizes on a 16-core Epiphany at 1GHz/core.
The bottom line is that it tops out at 7Gflops for multiplication of 64x64 matrices - not at all shabby! [edit: as pointed by EggsBackonandSpam below, that is on just 4 cores]
This paper is very well written and could be used as a tutorial on getting the best from the Epiphany architecture.