Many intresting use cases for Epiphany would require more than 16K of code (if one uses half of SRAM for code), and running code directly from DRAM on the host is far too slow. Since the architecture lacks hardware code/data caching one idea would be implement code caching to SRAM in software.
My idea is roughly: the compiler dives the compiled code into "pages" (I think 1k, but optimal size have to be found by testing), so the code of each function doesn't span over two pages. All pages are always present on shared DRAM, but the code thats currently running on a core well be in thats core's SRAM. When a function wants to call another function it instead jumps to helper code that checks if the function already are in SRAM, otherwise copies in the right page from DRAM, then calls it. (or maybe even from neighbouring cores, but that will probably require a lot more logic. I want the helper code small and simple)
Of course, some often called functions may need to be locked on SRAM, and called directly, so calling them is really fast. Or if function A very often calls function B the complier could put them on the same page so the call (using relative addressing) will be overhead free. At first alls this will need to be done with explicit code directives, but collecting runtime profile information to use at recompiles would be a intresting improvement. I also think that functions currently on stack should be locked, so returns always can be done directly.
Is this just a crazy idea or could it be turned into something useful? I think it would be useful for code where the "main loop logic" could be fit in say, 8k, but now and then needs to call a function from a large set of functions. (Think an interpreter for a reasonably complex language.) If one elaborates more on the idea it could be extended into a software MMU, shared libraries or even a OS. But I think just a (manually tuned) code cache will be a good start and enough in many cases.