I'm currently just starting out working on the parallella board, haven't worked another project on it before so not a lot of experience here, please be patient with me

What I'm trying to do is run an fft algorithm on the board. I've been using the kissfft libraries, working exclusively on the arm part of the board and it seems to be working fine. However, as a next step I want to use the epiphany cores to execute the fft. My point here though, and what's different from other threads I've seen, is, I don't want to parallellize the fft itself, I just want to run seperate ffts in seperate data, in parallell.
Is there a working version of what I described doable? When I try to run the kissfft on the cores the program seems to hang (possibly due to memory allocation problems that kissfft uses internally?)