What is ARM NEON? – The ARM® NEON™ general-purpose SIMD engine … – in other words it is an extended instruction set similar to the x86 CPU SSE/SSE2 etc.
One my friend from time to time asked me about: What do you think about ARM NEON optimization for your 3d math functions?
My answers were:
- FPS in my project in the normal range
- Profiler doesn’t show hot spots in a math functions
- Data required to be aligned on 16 bytes, my code was not ready for this
A few weeks ago i added FSAA (full screen antialiasing) to game and FPS immediately fell under 20. That was a problem. After one week of optimizations FPS increased to 25 again. FSAA ate all of my GPU power, and I had only one way to speed up the performance – optimize the code for CPU.
Usually when i run xCode profiler i saw ~10% of CPU time inside matrix palette skinning block. This code looked very optimized and my attention shifted to other places. One week ago my friend came to me and said – “Hey, yesterday i spent a lot of time to learn asm commands for ARM NEON and i feel like i can help you write that code. Let’s try to optimize your matrix palette skinning block”.
We sat together near my laptop and we started.
Plain C++ code for matrix palette skinning: