From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32) and 196 (SSE2/x86-32) cycles. (cherry picked from commit 81f2a3f4ffcc6935b8b8ada4954700b3f333ae4f)
From ~550 cycles (C version) to 170 (SSE/x86-64), 206 (MMX/x86-32) and 196 (SSE2/x86-32) cycles. (cherry picked from commit 81f2a3f4ffcc6935b8b8ada4954700b3f333ae4f)