The OpenCL startup time is problematic because the eSDK serial loader is slow. This is being addressed in COPRTHR 2 (https://arxiv.org/abs/1604.04207). It's going to get much better soon.
Alternatively, you need to you something called "persistent threads" if your clforka() call is inside a loop. Basically, move the loop inside the kernel device code instead of the host code. I have not looked at your code.