在arm系统下,不能使用sse指令加速,这让带sse指令加速的程序员头疼不已,很幸运的在网上找了这个,neon指令集生成了一套替换sse的函数接口,给大家恭喜以下,感谢github,互帮互助,共同进步!
https://github.com/crawlingcn/sse-to-neon
more information, visit http://blog.crawling.cn/?p=20
这个无rcp计算,928行,是具体实现
https://github.com/jjsu/SSE2NEON/blob/master/include/sse2neon.h
185行,无rcp,是具体实现:
https://github.com/otim/SSE-to-NEON/blob/master/sse_to_neon.hpp
这个:
无 '_mm_loadu_ps'
https://github.com/merckhung/sse2neon/blob/master/SSE2NEON.h
这个完整:
需要 #define ANDROID,还是不完整
https://github.com/lucien-ye/sse-neon/blob/master/sse_neon.hpp这两个有点像,带有测试例子:
'_mm_loadu_ps' was not declared
https://github.com/merckhung/sse2neon
https://github.com/noxo/sse2neon
https://github.com/noxo/sse2neon/blob/master/SSE2NEON.h
1197行,有具体实现,有测试例子
https://github.com/jratcliff63367/sse2neon/blob/master/SSE2NEON.h
好像是完整的
https://github.com/TuringKi/fDSST_cpp
https://github.com/TuringKi/fDSST_cpp/blob/master/src/SSE2NEON.h
这个是neon-sse
https://github.com/intel/ARM_NEON_2_x86_SSE/blob/master/NEON_2_SSE.h
RETf RCP(const
__m128 x) {
__m128 recip = vrecpeq_f32(x);
recip= vmulq_f32(recip, vrecpsq_f32(recip, x));
return recip;
}
RETf SQRT(const __m128 x) {
return vsqrtq_f32(x);
}
//这个精确度比较高
RETfRCPSQRT(const __m128 x) {
__m128 e = vrsqrteq_f32(x);
e= vmulq_f32(e, vrsqrtsq_f32(x, vmulq_f32(e, e)));
e= vmulq_f32(e, vrsqrtsq_f32(x, vmulq_f32(e, e)));
return e;
}
RETf RCP(const __m128 x) {
__m128 recip = vrecpeq_f32(x);
recip= vmulq_f32(recip, vrecpsq_f32(recip, x));
return recip;
}
RETf SQRT(const __m128 x) {
return vsqrtq_f32(x);
}
pc上的:
RETf RCPSQRT(const simdqf x) { return _mm_rsqrt_ps(x); }
2570

被折叠的 条评论
为什么被折叠?



