While the doc specify the number of cycles per type of instruction I could not found any info on the following:
1) Latency of Floating point SW exception, ...?
2) Event counter interrupt (#cycles between the counter reaching zero and the execution of the interrupt assuming that no other interrupt with higher priority showed up or interrupt are disabled)
3) Latency of test&set with read in the same core? In a directly adjacent core (Manhattan) Also does that instruction as it's own logic logic when reaching out in a distant core or does a non local read modify happen?
Thanks!