MMX SSE 3D NOW!Ìv ..
331:ftHgÌŒ³µ³ñ
07/06/01 00:14:39
€[ÞAæÝèáëÁÄœÈB
>>327
MCªPÁÄ¢œoŠÍÈ¢æB
àZHwÌâèÁÄSIMDªNeBJÉ¢Ä鿀È
PÈãvZÈÌ©ÁÄbB
332:ftHgÌŒ³µ³ñ
07/06/01 00:43:42
œÆŠÎExceÅpÓ³êÄ¢éNormdistÆ¢ÁœmŠªzÖÌæ€ÈàÌÍ
€pCu[ðwü·éèª éªAXs[hÆžxÌg[hIt̲®ª©È¢B
±êðCÅgÞÆl¥Z+EXPÌœpÆÈéBÊÉvO~O·éÆ©Èèx¢ªSIMDðg€ÆåÉ
¬»·éBSIMDÉEXPª éÆÇ¢ÌŸªB
333:EÍEjÁ-
07/06/01 00:59:18
»ÌÓÍj
[g@gÁœèe[[WJµœèB
IntelªwpCuoµÄÈ©ÁœH
x87ÌAÍÇ€¹}NœßÈÌÅB
¿ÈÝÉdivpsƩşçêé€ÍßlÅžxá¢B
334:ftHgÌŒ³µ³ñ
07/06/01 01:06:33
WÖæèj
[gÌÙ€ª¢ÁÄ±Æ éÌ©H
j
[gÍûö®ððÌÉœp·éªúlâèª éµÈB
MKLÌEXPÍg¢ûÉæéŸë€ªAÉf[^ðßéKvª éÆ«Í¬Ÿë€ª
»€ÅÈ¢ÆANZXÌx³ªlbNÉÈÁÄ Üèð§œÈ¢B
335:EÍEjÁ-
07/06/01 01:12:44
SIMDÅgÝŒ¹ÎWÖÌX[vbgzŠé±ÆÈñÄŽçÉ éæB
ÜÅSIMDg€©ç±»Ó¡ª éñŸ¯ÇB
336:ftHgÌŒ³µ³ñ
07/06/01 01:24:31
Ö-»€ÈñŸB
Æ·êεÄÝé¿l éÈB
expÌj
[gÍÈP»€ŸµBBB
úlÇ€·é©È
337:ftHgÌŒ³µ³ñ
07/06/01 01:33:41
àZÌâèÁÄû©ªx¢eJÅǀɩÈéÌ©B
338:ftHgÌŒ³µ³ñ
07/06/01 01:37:50
gíȢŷßλêɱµœ±ÆÍÈ¢B
ðÍIÉßéû@ªÈ¯êÎeJðg€µ©È¢B
Æ¢€RÅgíêéB
339:ftHgÌŒ³µ³ñ
07/06/01 09:24:19
ÇàZÉÍŒÚÖWÈ¢ªÌbÉÈÁ¿áÁœËB
340:ftHgÌŒ³µ³ñ
07/06/02 00:29:40
>>335
ælŠœç@f(x)=exp(x)Å@f'(x)=exp(x)Ÿ©çexpðj
[gÅßéÌͳŸÈB
@
e[[WJÅâÁÄÝœñŸª
Ee[[WJ®@exp(x)=x^n/n!
EKæªð·×ÄúlÝè
0!:n0=1
1!:n1=1
2!:n2=2
EEE
x:üÍl
expsse:ßél
x=_mm_mul_pd(x,x);
expsse=_mm_add_pd((expsse, _mm_div_pd(x, n0));
x=_mm_mul_pd(x,x);
expsse=_mm_add_pd((expsse, _mm_div_pd(x, n1));
x=_mm_mul_pd(x,x);
expsse=_mm_add_pd((expsse, _mm_div_pd(x, n2));
EEE
±ñÈŽ¶Bforªgížn=20ÜÅâÁÄžxX
碟Áœ©ÈB
µ©µÔÍñíÉx©ÁœB
ԪȩÁœÌÅ ÜèÚµ²×È©ÁœªR[fBOÉâè é©ÈH
341:ftHgÌŒ³µ³ñ
07/06/05 01:32:46
Visual C++ 2005 Å ssse3 Ì intrinsics ªgŠé怟
342:EÍEjÁ-
07/06/05 06:08:53
¢ÂÌÜÉSSSE3ÜÅgŠéæ€ÉÈÁœÌ©
CCAZuÌj[jbNÍgŠéª
343:ftHgÌŒ³µ³ñ
07/06/05 14:01:57
>342
šOª>333Ål^UÁœñŸ©çÓCàÁÄ>340ÌtH[µëâ
344:EÍEjÁ-
07/06/05 19:50:20
CeVÌ`FCª é mulpdšdivpdšaddpd
ÎÛf[^ª¡ éÈç¡C^[[u·éÆX[vbgªÒ°Ü·B
345:ftHgÌŒ³µ³ñ
07/06/05 20:33:01
>>344
»Ì_Ås¯ÎSPEÌXJàœÍX[vbg1NbNÅoéñŸªÈB
Â[©x86É»Ì_ðKp·éÉÍWX^ª«(ry
346:EÍEjÁ-
07/06/05 20:34:12
WX^l[
347:EÍEjÁ-
07/06/05 20:39:13
XORÅ[NAµœèiCore 2©çSIMDWX^ÉàKpj
f[^Ú®iªf[^Ú®ÍsÂj·éÆWX^l[ÌqgÉÈé
SIMDWX^ÍàIÉ80öxÍ é©ç»±»±¢¯éñÅÈ¢H
CellÌZjbgÍIntelA[LÌ»êÌ{ÌCeVª©©éB
SPEÌXJª«\ÅÈ¢ÌÍA»à»àxN^»·çoÈ¢f[^ð
ÀñÅCeVBÁ·éÉÍÀEª é©çB
348:ftHgÌŒ³µ³ñ
07/06/05 21:00:36
ZíªæÉXg[·éB
349:ftHgÌŒ³µ³ñ
07/06/05 21:06:06
AžçB
WX^l[Æ©ÁÄŸÁÄélÌR[hÌ«ûÉ»¡ª éB
æÁÛÇRpNgÈZ¶áÈ¢ÀèZÊðWX^©çÇ¢o³È«á¢¯È¢Æv€ñŸªB
350:ftHgÌŒ³µ³ñ
07/06/05 21:36:42
âèÍ\ªÈžxªŸçêÈ¢±ÆŸÆv€
¬xðã°é±ÆÍ»ñÈÉïµÈ¢
èZðtÆÌæZÉu«·Šé
«
a4*x^4 +a3*x^3 +a2*x^2 +a1*x + a0
«
(((a4*x+a3)*x+a2)*x+a1)*x+a0
«
tmp=a[N]
for(i=N-1;i>=0;i--) tmp=tmp*x+a[i]
«
(ú»ªÍȪ)
LOOP:
mulpd xmm4, xmm0
mulpd xmm5, xmm1
mulpd xmm6, xmm2
mulpd xmm7, xmm3
addpd xmm4, [esi]
addpd xmm5, [esi]
addpd xmm6, [esi]
addpd xmm7, [esi]
add esi, 16
sub ecx, 1
jnz LOOP
351:EÍEjÁ-
07/06/05 21:50:00
¢âZÍßlœßŸ©çæZÉu«·ŠéŸ¯Å éöx
žxmÛÅ«éñ¶áÈ¢ÌH
IntelÍx87{žxÍàIÉ80rbgžxÉWJµÄµÄé¯Ç
SSEÍžxxOµÄœæ€ÈB
žxªKvÈçx87ÌÙ€ª¢¢©àµêÈ¢
352:ftHgÌŒ³µ³ñ
07/06/05 22:11:04
ßlÈÌÍrcpps,rsqrtpsœßÅA
divpsÍßl¶áÈ¢ñ¶áÈ¢H
353:EÍEjÁ-
07/06/05 22:21:41
¢âßlBàIÉtßlÆæZµÄ韯B
354:EÍEjÁ-
07/06/05 22:22:19
¿ÈÝÉ{šÌZÍœ\NbNà©©èÜ·©çB
355:ftHgÌŒ³µ³ñ
07/06/05 22:28:36
>>353
»Ì\[XÍH
rcppsÌCeVÆmulpsÌCeVð«µÄà
divpsÌCeVæèžÁÆZ¢æB
CeÌœßZbg}j
AÉÍ
rcpps,rsqrtpsÌàŸÉÍßlÅ éÆŸLµÄ é¯ÇA
divpsÉÍßlÈñÄ¢ÄÈ¢æB
356:ftHgÌŒ³µ³ñ
07/06/05 22:32:08
x87ðärÎÛɵÄéŵåB
»ç80bitÈçœ\NbNà©©éíÈB
357:ftHgÌŒ³µ³ñ
07/06/05 23:08:46
>>350
X}A³ŠÄÙµ¢
iAZuÍêèÈŸª)œ®ßÌp^ÍaZšæZÌJèÔµÅâ鵩ȢƢ€F¯ÈÌŸª
æZðæÉ4ÂâÁÄA Æ©çaZ4Âs€ÁÄvZÍÂ\ÈÌ©H
358:>>357
07/06/05 23:13:47
>>344
> ÎÛf[^ª¡ éÈç¡C^[[u·éÆX[vbgªÒ°Ü·B
359:ftHgÌŒ³µ³ñ
07/06/05 23:31:26
>>358
»€¢€èª éÌ©
TNX
360:EÍEjÁ-
07/06/06 00:30:48
>>340Æ>>350Ì®ªêvµÄÈ¢Cª·éÌC̹¢H
361:EÍEjÁ-
07/06/06 00:31:53
>>355
·Üñ»€ŸÁœ©àB
œŸŒSµ§ÉßéÙÇžxÈ©ÁœÆv€
362:ftHgÌŒ³µ³ñ
07/06/06 00:34:38
>>360
ÊšBe[[WJÆ`FrVFtnÌá¢B
363:350
07/06/06 01:01:20
>>360
>>340Ìe[[WJ®ÆintrinsicsÅÌÀªêvµÄ¢È¢
364:340
07/06/06 19:11:56
>>344
ñ èªÆ€B
ŽÌm¯ÅÍ¢ÜêªçñªœßÌÀÑÉHvª¢éÆ¢€±ÆÆ
ŒÌf[^à¯ÉÅ«œç¯És€Æ¢€±Æ©ÈB
>>363
m©ÉŪܿªÁÆéBLq~XÈÌūɵȢşÍêB
ÜAexpijŸÆß®ÅâÁœÙ€ª¬»€ŸËB
365:340
07/06/06 19:15:18
ƱëÅàZVXeðâélœ¿Í±ÌÓÌZpð
ÙÆñÇmçȢ̟¯êÇF³ñÍÇ̪ìÅâçêÄéÌŵ倩H
366:ftHgÌŒ³µ³ñ
07/06/06 21:54:11
MÆ©Q[¶áÈ¢©ÈB
tÉ·«œ¢ÌÍAàZ̪ìÅpøÌH
ªKvŸë€µAXs[hæèàÂÇ«ªßç껀¶á³¢H
367:340
07/06/06 23:23:51
àZÅÍÅßêObhª±ü³êŸµœæ€Ÿ¯Ç
HPCªìÍÙÆñÇ¢JñÈóÔŸÆv€B
ÜœSIMDÆ©MPIÈÇÍOpbP[W\tgÌêÅÍgíêÄ¢éB
SEÖÌuàÁÆvZð¬Å«È¢©vÆÌv¿Éε
©ªÌ³mȱÆàmçžuåÈàÆÔª©©évÌê_£èÅó¯üêçêÈ¢P[Xªœ¢B
±ÌœßœÌ[U[ͬ»Í³È±ÆÆvÁÄ¢éB
Æ¢€óµºAOÉà°œæ€ÉXNÇâfoeBuÖAÈÇÅͬ»ÖÌj[YÍA
±ÌÓ©çHPC»ªiñÅ¢Æv€B
»_ÅÍA·®~µ¢vZÊàºèð·éÆú©©éÆ¢Áœ±ÆªA œèÜŠÉš±ÈíêÄ¢é©çEEE
368:EÍEjÁ-
07/06/06 23:36:08
ÁÄ¢€©msdn2Ýœ¯Ç
VC++ŸÆwÖÍSSE2ÎÈçSSE2ÅgÁÄê鿀ÉÈÁÄéÝœ¢H
³ÌÅK»[`æè¬µæ€ÆlŠéÌÍÔÁ¿á¯©Èè³d©ÆB
369:340
07/06/06 23:57:04
exp()Í»êÙÇâèÉÈÁĢȢ©ç¢¢ñŸ¯Ç
âèÍãÉ¢œ±ÆÈñŸB
F³ñAàZVXeâçÈ¢©¢H
370:ftHgÌŒ³µ³ñ
07/06/07 01:16:32
dÉ·éCÍÈ¢ªbèÆµÄ»¡Í éÈB
dœ¢ÌÍDBÆ©ÅASIMD»·éæ€ÈàÌÅͳ¢C[Wª éB
371:ftHgÌŒ³µ³ñ
07/06/07 01:19:32
»ñÈÌAvÉæéÌÅÍH
372:ftHgÌŒ³µ³ñ
07/06/07 01:23:59
«Ý^C~OàSÌptH[}XÉÍÓªKv
373:ftHgÌŒ³µ³ñ
07/06/18 23:11:59
yhlKeBuª«`FbNz
3ÂÈãA`FbNªÂ¯ÎAi^Ì«iÍÐñȪÁÄšèA
lKeBu¯ghl¶ðàñŢܷB
hæ³ÐõÌìÁœ³Jc[͜Ɗ
ÁÄÄà}Z[µÄg€
hæÌl Ì éÐõÌө͜ƊÔáÁÄ¢Äà}Z[·é
HÍKžhæÌÐõÆs׫Ÿ
hæ©çu¢ÂÜÅà±±ÅdµÄŸ³¢Ë(À¢àÅ)vÆŸíêÄðµ¢
©ÐÅdÈñÄÅ«éí¯ªÈ¢
hJÌâè_ÌbèªoéÆŽî«oµÉµÄœ_·é
hJÌâèðwE·élÍ¢Ÿ
hæÉÍdŸ¯ÅÈvCx[gÉ¢ÄàOCOCøÁ£ÁÄ~µ¢
øÁÄêéhæ³Ðõðžh·é
©ªÌzàzðmçÈ¢ÌÍRŸAPàð·¢ÄÍ¢¯È¢
hæ³Ðõæè©ªÌ¶Uûüªá¢ÌÍRŸ
hæÉKöðUèA©í¢ªÁÄàç€±ÆªåØŸ
`rÍhæÉ©í¢ªÁÄàç¢â·©çhÉÍLŸ
374:ftHgÌŒ³µ³ñ
07/06/22 00:18:26
˥˥±êSSEgÁÄ©¯éH
int RSHash(string cr)
{
int b = 378551; //Ramdom Ranges I've chosen (can be modified)
int a = 63689;
int hash = 0; //Output hash
int i; //Temp number that scrolls through the string array
for(i = 0; i < cr.length(); i++) //Loop to convert each character
{
hash = hash * a + cr[i]; //Algorithm that hashs
a = a * b;
}
return (hash & 0x7FFFFFFF); //Returns the hashed string
}
375:ftHgÌŒ³µ³ñ
07/06/22 00:43:56
PMULUDQ©B
2Â̶ñð¯É·éÈçgŠé©ÈH
376:EÍEjÁ-
07/06/22 00:54:28
intÍ32rbgËH
MMX/SSE2Ì|¯ZÍ16rbg~16rbgÜŵ©Å«È¢©ç
Œ[ÉgÁÄ੊ÁÄxÈ鿀ÈB
˶ÖWª«ê¢ÉðÅ«ÄpmaddwdªKpÅ«êάÍÈ軀ŸªB
377:EÍEjÁ-
07/06/22 00:55:54
>>375
»Á¿ª Áœ©ÇYêµÄœ
378:EÍEjÁ-
07/06/22 01:07:24
[vÌà€ðQ{Éø«LηƱ€©H
hash = (hash * a * a * b) + (cr[i] * a * b) + cr[i+1];
a = a * b * b;
ø«Lε̝ÎÀñZÅ«»€ÈƱëÍ\ éñŸªA³ÄEEE
ª¢Ä¥
379:ftHgÌŒ³µ³ñ
07/06/22 01:46:48
SSEü嵜¢ÌÅ·ª
ÝñÈDZ©çèð¯ͶߜÌÅ·©H
380:341
07/06/22 02:34:18
g¢û
intrin.hðCN[h·é±ÆÅSSE3ÌintrinsicsÜÅÍgŠé
SSSE3ÌintrinsicsÍ©OÅeœßðè`·é±ÆÅgŠæ€ÉÈé
á
#include <intrin.h>
/* MMX */
__MACHINEX86X_NOWIN64(__m64 _mm_abs_pi8(__m64))
__MACHINEX86X_NOWIN64(__m64 _mm_sign_pi8(__m64, __m64))
__MACHINEX86X_NOWIN64(__m64 _mm_alignr_pi8(__m64,__m64, int))
/* XMM */
__MACHINEX86X_NOIA64(__m128i _mm_abs_epi8(__m128i))
__MACHINEX86X_NOIA64(__m128i _mm_sign_epi8(__m128i, __m128i))
__MACHINEX86X_NOIA64(__m128i _mm_alignr_epi8(__m128i,__m128i, int))
âè_
palignrœßÌ3ÔÚÌøªèÅàG[ÉÈçÈ¢
381:341
07/06/22 02:38:44
>>380Ìù³
âè_
palignrœßÌ3ÔÚÌøª wèÅÈ¢êÉx G[ÉÈçÈ¢
382:ftHgÌŒ³µ³ñ
07/06/22 15:15:51
>>379
ŽàÅͱñÈÌ©çnßÄ®ìðmFµÄ¢ÁœB
i±Ì«ûAaª|C^©zñ©ÁÄæÊªââ±µ¢ñŸªj
#include<stdio.h>
int main()
{
int i,a[]={12,23,34,45};
__asm{
movdqu xmm0,a
paddd xmm0,xmm0
movdqu a,xmm0
}
for (i=0; i<4; i++) printf("%d\n",a[i]);
return 0;
}
383:EÍEjÁ-
07/06/22 23:23:53
>>381
·ÜñÓ¡sŸBG[ÉÈçÈ¢ÈçÇñÈR[hfÌH
384:341
07/06/23 00:28:50
èÌê
0f 3a 0f c1 01 palignr mm0, mm1, 1
èÅÈ¢ê
0f 3a 0f c1 ac palignr mm0, mm1, DWORD PTR _i$[ebp]
385:ftHgÌŒ³µ³ñ
07/06/23 09:10:46
IA-32 SIMDt@XÁÄÝœB
}ª¢Ï¢ÚÁÄéŒw
386:ftHgÌŒ³µ³ñ
07/06/24 09:40:42
>>340
exp(x)=2^(y1+y2)ÆÏ··éiy1Í®y2Íj
y2ðœ®ßÅß鯬ūé
10
öxÌžxÅæ¯êÎ7ÅÂ\
>>465
387:ftHgÌŒ³µ³ñ
07/06/28 12:38:59
>exp(x)=2^(y1+y2)ÆÏ··éiy1Í®y2Íj
»êðâéÈçy1Í®Ay2Í-0.5<=y2<0.5ÆIñŸûªæ³»€ŸËB
388:ftHgÌŒ³µ³ñ
07/06/29 01:19:41
RWV
»€ŸËB
ƱëÅPª³ÌÆ«ÍrbgVtgÅs¯éñŸª
ÌÍ¢¢û@È¢©ËH
389:ftHgÌŒ³µ³ñ
07/06/29 12:06:17
IA-32SIMDt@XubN
±ñÈ{ªoÄéË
390:ftHgÌŒ³µ³ñ
07/06/29 13:24:55
>>303-309
391:ftHgÌŒ³µ³ñ
07/07/03 22:32:30
XÅÏçÁÆ©œŽ¶ÅÍSSSE3ÜÅÐîµÄéÝœ¢Ÿ¯Ç
±êÅÇ€âÁĺªðÂàèÈñŸë€
392:ftHgÌŒ³µ³ñ
07/07/08 00:12:54
N©SSE2ÉrcppdÆrsqrtpdªÈ¢RðIÉàŸµÄŸ³¢
393:ftHgÌŒ³µ³ñ
07/07/08 00:20:39
ŠÄßððDoubleÅg€KvÍÈ¢©çŸë€
394:ftHgÌŒ³µ³ñ
07/07/08 01:16:16
SSEÌrcpps,rsqrtpsÁÄ
bÎë·b
1.5~2^-12
ÁÄ¢Ä é©çA
Œ12bitç¢Ìžxµ©È¢ÁÄŸæËH
395:ftHgÌŒ³µ³ñ
07/07/08 01:17:58
rcpps©çàÁįé©çBœßà{ÉÈéªÈB
»µÄ{žxÉ»ñÈgWX^¯È¢Æ¢€ÌªIRŸë€B
396:395
07/07/08 01:27:58
[žçBgWX^ð¯È¢Æ¢€å£Í¯¶ÈñŸª
dgÝÆµÄÍe[uðø¢Ä韯ÌÍžB
Ÿ©ç24bitÆ©32bitªàÌe[uðpÓoéí¯ª³¢B
{žxðg€æ€ÈlÍžxdÈí¯ÅA
divpd éñŸ©çÇ€µÄàâ蜢ñŸÁœç\tgEFAÅrcpps©çžxã°ÄêA
}CNR[hðpÓ·éÌÍoJoJµ¢AÆ¢€±ÆB
397:ftHgÌŒ³µ³ñ
07/07/08 01:42:11
>>396
\«êÄéñŸª
Ç393Æœªá€ñŸæ
398:395
07/07/08 08:28:15
>>397
á¢Í³¢BœŸ393Ÿ¯ŸíêÄà392ª[ŸµÃ碟ë€B
e[uÌvfª
12bit->4096
24bit->1677
Ÿ¯KvÅ éÆlŠêΚIɳȪ[ŸoéB
399:EÍEjÁ-
07/07/08 13:13:44
Ü }CNR[hgÝí¹êÎÅ«Èà³¢Ÿë€¯Ç
NbN©©éãÉyßzð¶áË¥
400:ftHgÌŒ³µ³ñ
07/07/09 22:20:07
>>389
±Ì{ÁÄÝœ¯ÇAt@X¢€©çœßSÚÁÄéÌ©ÆvÁœçÚÁÄÈ©Áœ
è³Ét@XIÈà̪ 鯢¢ÈÆvÁÄÁœÌÉEEEŒ\Ÿí[Ž¢µ
401:¢hÌAú{
07/07/15 21:35:53
ÅßAEêÅuoßèñ¶hvÆ¢€Ÿtª©êĢܷB
h_ñðØçêœÉà©©íçžuÌhæÅàØçêĵÜÁĶūܹñv
ÈÇÆ ŸÁīƵÅÄ_ñµœẖÆÅ·B
¡ßAŒNOÉØÁœhªoеīÄÝñÈÑÁèµÜµœB
eÅR\R\Ì¢lɫ¢ÄÄ_ñµœ»€Å·B¯¶ÌlÉÍÙÁÄEEE
»ñȱÆÜŵĩîÌßÌhæÉ±ŸíÁÄlÔÆµÄpž©µÈ¢ÌÅ·©B
hÅXLAbvAhÅûüAbvÆ©Ÿ€ÈçêÓɵªÝ©ž
¡ÌïÐðnèà¢ÄŸ³¢B
ÐÆÂÌïÐÅhü¯ÌP²ÈdðµÄ¢œçXLAbvÈñÄ èŠÈ¢Åµå€B
gªsÈ€iÌ[𥀜ßÉhŸÆR«èÈ¢ûüÍeÉàñ¶µÄA
¢ÂØçêéñ¶áÈ¢©ÆrNrNµÈªçl Ì élÔÆŸ¯ÇµA
_ñI¹ðÊm³êêΫƵBßSÈl¶Å·ËB
ñŸÙ€ª¢¢ñ¶áȢŷ©B
402:ftHgÌŒ³µ³ñ
07/07/19 19:33:12
>>400
URLØÝž(download.intel.com)
±êÅŠŠâñB
ÆÍrgCÖÌwb_©êΟ¢œ¢¯éŸëB
403:ftHgÌŒ³µ³ñ
07/07/21 21:32:10
yhlKeBuª«`FbNz
3ÂÈãA`FbNªÂ¯ÎAi^Ì«iÍÐñȪÁÄšèA
lKeBu¯ghl¶ðàñŢܷB
hæ³ÐõÌìÁœ³Jc[͜Ɗ
ÁÄÄà}Z[µÄg€
hæÌl Ì éÐõÌө͜ƊÔáÁÄ¢Äà}Z[·é
dlÆÍ³Ðõ©çû`³êéàÌŸ
û`³êœdlðÓ}ÇšèðūȩÁœÌÍ©ªÌÓCŸ
HÍKžhæÌÐõÆs׫Ÿ
©ªÌdÅâ誶µÄàð·éÌÍhÌdÅÍÈ¢
hæ©çu¢ÂÜÅà±±ÅdµÄŸ³¢Ë(À¢àÅ)vÆŸíêÄðµ¢
©ÐÅdÈñÄÅ«éí¯ªÈ¢
hJÌâè_ÌbèªoéÆŽî«oµÉµÄœ_·é
hJÌâèðwE·élÍ¢Ÿ
hæÉÍdŸ¯ÅÈvCx[gÉ¢ÄàOCOCøÁ£ÁÄ~µ¢
øÁÄêéhæ³Ðõðžh·é
©ªÌzàzðmçÈ¢ÌÍRŸAPàð·¢ÄÍ¢¯È¢
hæ³Ðõæè©ªÌ¶Uûüªá¢ÌÍRŸ
`rÍhæÉ©í¢ªÁÄàç¢â·©çhÉÍLŸ
404:ftHgÌŒ³µ³ñ
07/07/27 18:34:11
SSE2ÅAxNgÌevf²ÆÉƧÈVtgÊðÝèµÄVtgðs€±ÆÍsÂ\
Æ¢€ðÅǢŷ©?
a[0] <<= b[0]
a[1] <<= b[1]
a[2] <<= b[2]
a[3] <<= b[3]
Ýœ¢Èªâ蜢ñÅ·ªB
405:ftHgÌŒ³µ³ñ
07/07/27 20:05:42
>>404
¬xÍòéªAVtgÌãíèÉæZŷ鯩B
406:ftHgÌŒ³µ³ñ
07/07/27 23:47:27
cOȪç»êŸÆXJ[æèàxÈÁĵܢܵœB€[Þ
407:ftHgÌŒ³µ³ñ
07/07/28 09:54:29
»€©BÊ»€ÈâèÈÌŜƩµÄ¬»µœ¢È B
408:EÍEjÁ-
07/07/29 02:13:32
>>404
€ñ³BAltiVecŸÆÅ«éñŸæË»êB
409:EÍEjÁ-
07/07/29 02:26:32
ºÊrbg¿é¯Ç±ñÈÌÍÇ€H
cvttpi2psšwìšcvttps2pi
410:341
07/08/25 01:58:03
VC++ 2008ÌIntrinsicsÍ
Intel SSSE3/SSE4.1/SSE4.2ÆAMD ABM/SSE4aÉÎµÄ¢éæ€Ÿ
411:EÍEjÁ-
07/08/25 03:04:10
IntelÅ¢€Æsmmintrin.hÁÄz©B
ÈÉSÁÄB
412:ftHgÌŒ³µ³ñ
07/08/26 11:25:57
žçµÜ·B
MMXÌVtgnÌœßÁÄA\[XIyhÉŒlAAMMWX^ÌÝÅ·æËH
ÄpWX^ieaxÆ©jð¢Äà€Ü®ìµÈ©ÁœCª·éñÅ·ªA
VC6,VC8ÆàÉAȺÌR[hªG[ÉÈçžÊÁĵܢܷB
psllq mm0 , eax
œ©šá¢µÄÜ·©H
413: 0uxK91AxII
07/08/26 13:46:03
>>412
disassembleµÄݜ܊B
414:ftHgÌŒ³µ³ñ
07/08/26 14:50:52
>>413
eaxÁÄ¢œÆ±ëªmm0ÉÈÁÄܵœB
CCAZuÌoOH@VC6̱ë©ç»€ÈÌÉ
VC8ÅàúuµÄéÌÍÈñ©Rª éÌ©ÈB
415:EÍEjÁ-
07/08/26 17:02:22
mm0àeaxà3rbg\»Ì 0 ŸàÌB
ÄpEFP/MMEXMMÌWX^wèÍopcodeÅÜé©ç
416:ftHgÌŒ³µ³ñ
07/08/26 17:31:27
>>415
tAZµÄmm0É©ŠéRª»êŸÆµÄàAAZu̶@
ƵÄeaxÁÄ׫ÅÍȢƱëÉA»€¢ÄG[ɵȢ
ÌÍš©µ¢Æv¢Ü¹ñ©H
417:EÍEjÁ-
07/08/26 17:32:25
movd eax mm0Æ
movd mm0, eax
ŸÆÇ€Èé©ÈBopcodeÅæÊµÄéÍžŸª
418:ftHgÌŒ³µ³ñ
07/08/26 17:44:31
»ÌœßÍ©œÚÇšèÉ®ÌÅAÁÉ¢€±ÆÍ èܹñB
419:ftHgÌŒ³µ³ñ
07/08/30 17:09:15
>>404
AMDª\µœSSE5ÅÆ§VtgÊªÅ«éæ€ÉÈéËB
SIMDÌ[e[gœßÜÅÇÁ³êÄéB
URLØÝž(developer.amd.com)
420:ftHgÌŒ³µ³ñ
07/09/02 04:12:12
à€32bit fpÆ®ÍAltiVecݷɵÄêæ
AltiVec©çSSEÉdûÈÚÁœ}J[ÈŽÉÍSSEÍg¢É·¬é
421:ftHgÌŒ³µ³ñ
07/09/02 04:17:13
Åàload/storeÌACg̵¢ªyÈÌÍ¿åÁÆ¢¢ÆvÁœ
422:EÍEjÁ-R:c-
07/09/04 07:15:10
SSE4.1ÅXMM-ÄpWX^ÔÌf[^oµüêªÈPÉÈé©çVtgâ[e[gÍäB
423:ftHgÌŒ³µ³ñ
07/09/05 23:49:14
>>422
à[oÄéÈçäÅ«é¯Ç³AÜŸoÄÈ¢¶áñB
424:EÍEjÁ-R:c-
07/09/06 00:47:16
»êŸÁ¿á€ÆBulldozerÍšë©Barcelona·çoÄÈ¢ª
425:ftHgÌŒ³µ³ñ
07/10/19 17:12:17
MMXÌpmull, pmulhÍÇ€¢€g¢ûªzè³êÄ¢éÌŵ倩H
Èñ©g¢É»€ÉvŠéÌÅ·ªccB
426:ftHgÌŒ³µ³ñ
07/10/19 19:17:59
>>425
pmull : 16 bit ÅÏÞ
pmulh : ¬ 16 bit ÌÅè¬_æZµœ¢
427:RELÍME,,jÁªªªªªª
07/10/20 01:26:21
xbyakðü¢·é©
428:ftHgÌŒ³µ³ñ
07/10/24 22:36:46
IA-32 SIMDt@XubNº
URLØÝž(www.cutt.co.jp)
SSSE3ÜÅ
SSE4ÍÈ¢
429:ftHgÌŒ³µ³ñ
07/10/24 22:52:02
Ÿ©ç¢ÁÄB
430:ftHgÌŒ³µ³ñ
07/10/25 01:49:44
ÚÁœèÅ»ê©B}W¢çñB
431:RELÍME,,jÁªªªªªª
07/10/25 01:55:23
SDK for 45nm Next Generation Intel Core 2 Processor Family and Intel SSE4
ÉSSE4.1/4.2Ì}j
Aê®ÆG~
[VDLLªÂ¢Äé©ç»êÅ\ªB
432:ftHgÌŒ³µ³ñ
07/10/27 05:20:50
aðßéÈÇAêÂÌϪ€ÊÌêASSEðgÁÄvZ·éCR[hð³ŠÄŸ³¢
433:RELÍME,,jÁªªªªªª
07/10/28 02:23:19
aÈçÈP¶áñ
__declspec(align(16)) float a[100];
float dest;
//SIMD»O
for (int i = 0; i < 100; i++) {
sum += a[i];
}
//SIMDȋ
__m128i sumx = { 0.0f,0.0f,0.0f,0.0f };
for (int i = 0; i < 100; i++) {
sumx = _mm_add_ps(sum, *(__m128*)&a[i*4]));
}
sumx = _mm_hadd_ps(sumx, sum);
sumx = _mm_hadd_ps(sumx, sum);
_mm_store_ss(sum, sumx);
434:RELÍME,,jÁªªªªªª
07/10/28 02:24:14
ù³
//SIMDȋ
__m128i sumx = { 0.0f,0.0f,0.0f,0.0f };
for (int i = 0; i < 100; i+=4) {
sumx = _mm_add_ps(sumx, *(__m128*)&a[i]));
}
sumx = _mm_hadd_ps(sumx, sum);
sumx = _mm_hadd_ps(sumx, sum);
_mm_store_ss(sum, sumx);
435:ftHgÌŒ³µ³ñ
07/10/31 21:29:38
movqÉηégÝÖÁÄÈ¢ñH
436:ftHgÌŒ³µ³ñ
07/10/31 21:57:18
_mm_loadl_epi64
437:ftHgÌŒ³µ³ñ
07/10/31 22:03:36
°(EÖEÉ)ÉGb
438:ftHgÌŒ³µ³ñ
07/11/01 00:19:03
__m128i _mm_loadl_epi64(__m128i*)
__m128i _mm_move_epi64(__m128i)
void _mm_storel_epi64(_m128i*, __m128i)
439:ftHgÌŒ³µ³ñ
07/11/01 02:28:07
sse4Ìx`©ÄéÆ¿åÁÆSzÉÈÁÄéÈB
440:RELÍME,,jÁªªªªªª
07/11/01 23:00:54
>>435
MMXÌͳ¢B
__m64 mm0 = a[i]ÅùÉ[hÉWJ³êéBÄ©ARpCC¹B
441:ftHgÌŒ³µ³ñ
07/11/14 19:03:24
x86ÌbÈñÅ·ª
64rbg®ÌO[oÏ1Âðì·éÖª
pÉÉÄÎêévOðìÁĢܷB
AZu©éÆA32rbgžÂìµÄ¢éæ€ÈñÅ·ª
±êðMMXðgÁÄ¢ÁØñÉì·éæ€É·êÎA¬»µÜ·©H
¬»·éÈçAMMX×µÄÝæ€©ÈÆvÁÄéÌÅ·ªB
442:ftHgÌŒ³µ³ñ
07/11/14 19:57:23
µÜ¹ñ
443:ftHgÌŒ³µ³ñ
07/11/14 19:59:23
andÆ©orÆ©rbgZŸ¯ÈçÂ\«ªÈ¢ÆàŸŠÈ¢B
444:ftHgÌŒ³µ³ñ
07/11/14 22:10:28
>>441
double ÅâêÎ[?
445:RELÍME,,jÁªªªªªª
07/11/14 23:38:44
ÖR[ÌCC»ÌÙ€ªÜŸøÊ é©àÈ
446:ftHgÌŒ³µ³ñ
07/12/22 23:14:35
2ÂÌ4byteÌf[^ðæª©çCÓÌ
rbgêvµÄ¢é©`FbN·éÌÉ
SSEÁÄLøÉgŠ»€H1rbgžÂ`FbN
Æ©Az߬Ä
uint32_t chk_bit(uint32_t master, uint32_t src, uint32_t bit){
±ÌÇ€µæ€B
}
447:RELÍME,,jÁªªªªªª
07/12/22 23:53:57
y±Ìz
return ((master ^ src) >> (32 - bit)) != 0;
448:ftHgÌŒ³µ³ñ
07/12/24 00:16:33
xmmWX^ð128bit intƵÄg€êC
ébitª§ÁÄ¢é©Ç€©ð¬É²×éû@Í èܹñ©H
SSE4.1Åptstœßª±ü³êÜ·ªC»êÜÅÒÁÄçêܹñD
449:RELÍME,,jÁªªªªªª
07/12/24 00:30:38
xmm0ɲל¢lªª 鯷êÎ
movdqa xmm1, [pattern]
movdqa xmm2, xmm0
pand xmm2, xmm1
pcmpeqb xmm2, xmm1
pmovskb eax, xmm0
test eax, eax
Åeaxª0ÈOÈçrbgª§ÁÄéB
450:RLÍM,,jÁªªªªªª
07/12/24 00:31:51
pmovmskbÉù³µÄŸµ
451:ftHgÌŒ³µ³ñ
07/12/24 00:46:27
THX
âÍèpmovmskbÅÚ·µ© èܹñ©DDD
NehalemÜÅÍ64bit int x 2ÅâÁÆÌªgH
452:ftHgÌŒ³µ³ñ
07/12/24 11:21:12
ptestÍPenryn¶áËH
453:ftHgÌŒ³µ³ñ
07/12/24 13:14:13
LARGE_INTEGER otime, ntime;
`ª`
double dt = (double)((ntime.QuadPart - otime.QuadPart) * (1000.0f / f.QuadPart));
ðSSEgÁÄVC8ÌCCAZuÅÆÇ€ÈèÜ·©H
454:ftHgÌŒ³µ³ñ
07/12/24 19:38:49
Ç€lŠÄàXJZŸæË
455:ftHgÌŒ³µ³ñ
07/12/24 20:04:35
64bit®ð64bit®¬_ÉÔÒ·éSSEœßÍ
64bit[hɵ© èܹñ
VCÅÍ64bit[hpÌvOÉCCAZuÍgŠÜ¹ñ
456:RLÍM,,jÁªªªªªª
07/12/24 20:30:13
*intrin.hÌ}j
Að©ê΢¢Æv€ª
457:ftHgÌŒ³µ³ñ
07/12/24 20:32:45
SSEÅ_ÈËH
458:RLÍM,,jÁªªªªªª
07/12/24 20:37:16
³¢È
459:ftHgÌŒ³µ³ñ
07/12/24 20:58:49
¯ÄàèÔ¹
460:ftHgÌŒ³µ³ñ
07/12/24 21:00:24
t@C[EH[ìéÉ
pPbgðÍ·éÌÉSSEgÁÄà
èÔÉÈ韯H
461:RLÍM,,jÁªªªªªª
07/12/24 21:02:06
»ë»ëevìÁœÙ€ª¢¢ñ¶áÈ¢ÌB
SSEÍåÊÌf[^ɯ¶ð©¯éƫɱ»øÊðö·éñÅ ÁÄ
ש¢Í]Ìx86œßÌÙ€ªÞµë¬¢B
462:RLÍM,,jÁªªªªªª
07/12/24 21:03:24
>>460
»€¢€ÌÉü¢ÄéÁÄbÍ·¢œ±ÆªÈ¢B
463:ftHgÌŒ³µ³ñ
07/12/24 21:55:34
PêÌ64bitâ128bitlðœñàvZ·éÆ«ÍH
464:ftHgÌŒ³µ³ñ
07/12/24 21:57:38
ÄpWX^É[ÜçÈ¢TCYMéÈçøÊ éñ¶áÈ¢Ì?
465:RELÍME,,jÁªªªªªª
07/12/24 22:04:42
AÍ ÜÅ32rbglð4¯ɵ€àÌÅ ÁÄAœ{·µ€ÈçÞµë64rbg[hÅÄpWX^x[XÅâÁœÙ€ª¢¢B
466:ftHgÌŒ³µ³ñ
07/12/24 22:15:44
( ¥Í¥)ÂV¿ ͪ°Íª°Íª°Íª°Íª°
467:ftHgÌŒ³µ³ñ
07/12/25 17:49:27
PADDQƩ̟뀯ÇC2DÅàCeV2ŸµË
ÀÍcarry select œèŸë€©
468:451
07/12/26 11:18:07
>>452
xXٻ
ptestÍSSE4.1ÅCPenrynÍSSE4¶áÈ©ÁœH
469:ftHgÌŒ³µ³ñ
07/12/26 11:52:30
SSE4ÍSSE4.1ÆSSE4.2ÌŒûðÜÞÄÌ(³mÉÍAÆ¢€ÉÈÁœ)
PenrynÍ4.1ÌÝ
NehalemÍ4.1Æ4.2ÌŒû
>>452ÅÁÄéñ¶áË[ÌH
470:451
07/12/26 13:20:32
>>469
Ç€à»ÌlÅ·ËCTHXD
URLØÝž(pc.watch.impress.co.jp)
PenrynÈç»êÙÇÒœÈÄÏÞÈCÇ€µæ€D
471:451
07/12/26 13:22:02
>»êÙÇÒœÈÄÏÞÈ
±êÍ¿iªè ÉÈéÌðCÁÄÓ¡Å·
472:ftHgÌŒ³µ³ñ
07/12/26 13:25:45
64bit®¬_Ìzñª ÁœÆ«âÎlªÅåÌvfð
ßœ¢ÌÅ·ªSSE2ðgÁĬÉÅ«éŵ倩H
iÍVC++ðgÁÄÜ·ªæè¬»Å«éÈçAZu
à©ÁÄÝæ€©ÆlŠÄÜ·BOSÍVista(64bit)Å·B
473:ftHgÌŒ³µ³ñ
07/12/26 13:54:20
>>472
x64pÌÌVC++ŸÆftHgÅSSE2ðg€æ€ÉÈÁÄ¢éB
ºÌð@cl@/c@/O2@/FAsc@test.c@ÅRpCµÄAtest.cod@ð©ÄÝéÆÜ Ü ÌR[hª¶¬³êÄ¢éŒB
double@maxabs(double@array[],@int@size)@{
@@double@ans@=@0.0;
@@for@(int@i=0;@i<size;@i++)@ans@=@__max(ans,@abs(array[i]));
@@return@ans;
}
474:ftHgÌŒ³µ³ñ
07/12/26 18:30:53
>>473
CŸÆufor (int i=0;vÌÆ±ëÅG[ÉÈéÌÅ
#include <cmath>
#include <cstdlib>
ðt¯ÁŠÄCPPt@CɵÄRpCµÄÝܵœB
movsdx andpd comisd
ÈǪgíêĢܷËB±êªSSE2©ÈH
Åà±êª{ɬÈR[h©Ç€©Íí©çȢŷËB
CeÅMKLÆ¢€CuðoµÄÜ·ª±êÍ·²¢Å·ËB
µœÌÍsñvZÌ꟯iLUªðjÅ·ªlvZÌ{ÉÚÁÄ¢é
vOðVC++ÅRpCµœàÌæè5{®ç¢¬Å·B
wxÅVASYðJµœÂ\«Íá¢ÌÅÀx
ÌZpª¢Æv€ÌÅ·B
5{Ì·Íå«¢ÌÅSSEÈÇð׵ĜƩß뜢ÌÅ·ª
åÏ©ÈH
475:ftHgÌŒ³µ³ñ
07/12/26 18:38:45
dou you make correlated random number?
476:451
07/12/26 20:38:30
>>472
P4pÅ¿åÁÆÃ¢¯Ç
uXg[~O SIMD g£œß 2(SSE2)ðgpµœA{žx®¬_xNgÌÅå/ŬvfÆ»ÌCfbNXÌovURLØÝž(download.intel.com)
477:472
07/12/26 23:15:38
>>476
â蜢ÌÍâÎlÌÅåÈÌÅ¿åÁÆá€¯ÇȩȩQlÉÈèÜ·B
ÅåÆÅ¬ðàÆßÄâÎlÌå«ÈûðÆéû@Æ
âÎlðvZµÈªçÅåðàÆßéû@ª èÜ·ËB
Ü ±ÌÓÍ¢ë¢ëÀ±µÄÝÈ¢ÆÇꪢ¢Ì©í©çȢŷËB
478:RELÍME,,jÁªªªªªª
07/12/27 02:18:53
>>473
VC++ŸÆmaxsdµ©gíÈ¢\ŽB
>>473x[XÅAarrayÌvfªôÂÅ128rbg«EÉ ÁÄéÌOñÈç±ñÈŽ¶
œÔñSSE3ç¢ÍgŠéæËHœ¢µœ±ÆÉgÁÄÈ¢¯ÇB
#include <pmmintrin.h>
double maxabs(double array[], int size) {
@ @ static const union {
@ @ @ @ __m128d pd;
@ @ @ @ __int64 a[2];
@ @ } mask = { 0x7FFFFFFFFFFFFFFFi64, 0x7FFFFFFFFFFFFFFFi64 };
@ @ double ans;
@ @ __m128d ans_pd = { 0.0, 0.0 };
@ @ for (int i = 0; i < size; i+= 2)
@ @ @ @ ans_pd = _mm_max_pd( ans_pd, _mm_and_pd(mask.pd, *((__m128d*)&array[i])) );
@ @ ans_pd = _mm_hadd_pd(ans_pd, ans_pd);
@ @ _mm_store_sd(&ans, ans_pd);
@ @ return ans;
}
479:ftHgÌŒ³µ³ñ
07/12/27 23:05:22
>>474
SIMD»ÌûªÈPÅÊ»€Ÿ©çÚÉ«ⷢñŸ¯ÇA»±ÍÅãÌèiÈñŸæËB
zñ¯mÌl¥ZÅÍÉÍSÄÌZðÜÆßÄsÁÄAãüð1ñÅÏÜ·B
sñÈñ©ÌO³IÈ[vÍLbV
Éüé·û`ÌubNPÊÅð·éB
»€¢€Æ±ëªoÄêÎSIMD»µÈÄàOÒÅÅå2{ç¢AãÒÍvèmêÈ¢ö¬ÈéB
480:474
07/12/28 23:06:10
>>479
C¿Íí©é¯ÇïÌIÉÍÇ€âéÌ©ÈH
Z¢ÌŬ»µœ¢vOðAbvµÄÝéB
#include <cmath>
double LU(int n, double** a, int* ip)
{
int i,j,k,ii,ik;
double t,u,det,*w;
w=new double[n];
for(k=0;k<n;k++){
ip[k]=k;
for(j=0,u=0;j<n;j++){ t=fabs(a[k][j]); if(t>u) u=t; }
if(u==0){ delete[]w; return 0; }
w[k]=1/u;
}
det=1;
for(k=0;k<n;k++){
u=-1;
481:474
07/12/28 23:08:38
±«
for(i=k;i<n;i++){
ii=ip[i];
t=fabs(a[ii][k])*w[ii];
if(t>u){ u=t; j=i; }
}
ik=ip[j];
if(j!=k){
ip[j]=ip[k]; ip[k]=ik;
det=-det;
}
u=a[ik][k]; det*=u;
if(u==0){ delete[]w; return 0; }
for(i=k+1;i<n;i++){
ii=ip[i];
t=(a[ii][k]/=u);
for(j=k+1;j<n;j++) a[ii][j]-=t*a[ik][j];
}
}
delete[]w;
return det;
}
ÅãÌfor[vª3d[vÈÌűÌtߪêÔXs[hÉÖWµÄ»€B
482:ftHgÌŒ³µ³ñ
07/12/28 23:33:40
LbV
øŠã°êΩñ謻µ»€ŸË
483:479
07/12/29 23:23:23
pž©µÈªçLUªðÍâÁœ³¢ñŸ¯Ç
3dÌ[vÅa[ii]ÌCÍiÉ˶µÄØèÖíèA
a[ik]ÌCÍkÉ˶µÄØèÖíéŵåB
a[ik]ÌCÍkÉ«1xµ©ØèÖíçÈ¢©çLbV
ÉžÁÆüÁÄé¯ÇA
a[ii]ÌCÍkª¯¶ÅàiªÏíÁœçØèÖíÁĵ܀B
a[ii]ÌCÍÅåkñANZXª é©çAoêÎØèÖŠœ³¢B
»±ÅÜÄðÆÁÄAa[ik]ðà€µpÉÉØèÖŠéãíèÉa[ii]ðà€µäÁèØèÖí鿀ÉoXðæéB
oXðæéÆ¢€ÌÍÇ€¢€©ÆŸ€ÆA
ñ³ÌÅ1000x1000Ìvfª ÁœÉ
1CžÂ·éÌÅÍÈáŠÎ100x100Ì}Xª10x10 éàÌÆµÄ·éB
(0, 0)-(999, 0)ðµÄ©ç(0, 1)-(999, 1)Ìð·éñ¶áÈÄA
(0, 0)-(99, 0)ÜŵÄ(0, 1)-(99, 1)...(0, 99, 99, 99), (100, 0)-(199, 0)Ìæ€ÈÌdûB
¡ñÍ[vª3dŸ©çA«·ŠéÆ6dÌ[vÉÈÁĪÍÅɱñªçªéB
µ©àCªipÉæÁÄüêÖíéÝœ¢Ÿ©çA±€¢€è@ªgŠéÌ©æªçÈ¢B²ßñB
ipªÇ€Ïíé©ðOÉKvÈCªvZoê΢¢ñŸë€¯ÇB
484:474
07/12/30 21:31:17
>>483
Rg èªÆ€²Ž¢Ü·
µÚ×É480ÌvOiLU1jÆ
intelÌMKLðgÁœê(LU2)ÆðeXgµÜµœB
TCY(n) LU1(VC++) LU2(MKL) äŠ
4 0.218Ês 1.140Ês 0.191
8 0.796Ês 2.680Ês 0.297
16 4.087Ês 7.460Ês 0.548
32 0.0246ms 0.0204ms 1.21
64 0.174ms 0.0656ms 2.65
128 1.31ms 0.271ms 4.83
256 10.2ms 1.435ms 7.11
512 82.4ms 9.13ms 9.03
1024 780ms 66.1ms 11.8
2048 7.58s 0.501s 15.1
4096 60.9s 3.79s 16.1
8192 486.5s 29.9s 16.3
CPUFintel Q6700(3.4GHz)
TCYª¬³¢êÍLU1ÅàLbV
ÌpøŠª¢ÍžÅ·ª
\zÇ€èLU2ÆÌ·ª¬³Èènª²¬³¢Æ«Ít]µÄ¢Ü·B
œÔñLU2Í¡GÈâÁĢĻ̜ßxÈÁÄ¢éÌŵå€B
nªñíÉå«ÈéÆLU1ÍLbV
øŠª«¢œßLU2Éå·ð¯çêĵÜÁÄÜ·B
¬»ðߎ·ÈçSSEÌÅK»ðlŠéOÉLbV
øŠðã°éHvð·×«Å·ËB
ÆÈéÆ483Ìæ€Èœd[vÌubN»ªKRÆÈé©H
Ü Aïµ»€ÈÌÅïÌIÈû@͵lŠÄÝÜ·B
€Ü¢ACfAª èܵœçÜœRgŸ³¢BiââXá¢ÉÈè éªj
485:ftHgÌŒ³µ³ñ
07/12/30 22:03:51
±±Íêg£œßÌXŸ©çËŠB
àµÚ®·éÈç±±©ÈH
œÚØÝž(techÂ)l50
486:ftHgÌŒ³µ³ñ
07/12/30 22:07:51
\tgEFAEvtFb`ÌbðßÂÂâêÎ
Xá¢ÉÈçÈžÉÏÞ©B
487:ftHgÌŒ³µ³ñ
08/01/16 18:30:07
xmm Ì not ðêœßÅßéû@Í èܹñ©H@~0 ÆÌ xor ͳµÅ
488:487
08/01/16 20:45:05
ã°Æ«Ü·
489:ftHgÌŒ³µ³ñ
08/01/16 21:28:47
pandnÅà_ŸæÈB
»€ÈéÆAüèÌœßÆßÄÀ¿IÉPœßÅÏÜ·±ÆðlŠé碩B
490:ftHgÌŒ³µ³ñ
08/01/17 00:08:24
³Áۢ˥Dall 1 Ìèð read ·é±ÆðlŠêÎC
cmpeq xmm0,xmm0 Å all 1 ðìéûª¬¢©H
491:RELÍME,,jÁªªªªªª
08/01/17 00:58:10
all 1ð[hŸÆWX^l[~O³êéÌÅæsÀsÅ«éÂ\«ª éB
P[XoCP[XB
492:ftHgÌŒ³µ³ñ
08/02/19 10:43:09
fBAtB^ð쬵œ¢ÌÅ·ªA
RpCªSSEðg¢â·¢R[hÍÇÌæ€É¢œçæ¢Ìŵ倩H
493:ftHgÌŒ³µ³ñ
08/02/20 07:38:55
fBAtB^Å©®xNg»Íïµ¢È B
\[gASYðGPUÅâéâÂÝœ¢ÉŒüIÈàÌÉ·êÎâÁÄêȢͳ¢¯ÇB
ÆÉ©êÔà€Ì[vªxNg¯mÌZÉÈ鿀ÉS|¯éB
494:ftHgÌŒ³µ³ñ
08/06/19 00:27:56
sñvZðSSEÅsÁœêÌ
ðàTCgµçÈ¢Á·©H
495:ftHgÌŒ³µ³ñ
08/06/19 03:21:26
URLØÝž(www.google.co.jp)
496:ftHgÌŒ³µ³ñ
08/06/19 22:39:47
a1=3.1 a2=4.0 a3=5.5 a4=6.1
r1 = a1^2 + a2^2 + a3^2 - a4^2
r2 = a1+a2 - a3*a4 +a1+a3 + a2*a4
r = r1+r2
±ñÈÌSSEÅ𫜢ñŸ¯ÇÇ€·ê΢¢ñŸë€
497:ftHgÌŒ³µ³ñ
08/06/21 16:25:20
dvec.hÌ
cmpeq()µœÊªSÄO©Ç€âÁĻʷéÌH
498:ftHgÌŒ³µ³ñ
08/06/21 20:26:50
if ( _mm_movemask_pd(cmpeq(a, b)) == 0 )
499:ftHgÌŒ³µ³ñ
08/06/21 23:31:44
Þ```
{žxÌâÎlÍÇ€·êÎÜéÌH
500:ftHgÌŒ³µ³ñ
08/06/22 00:08:16
wZÌשH¿Áœ ªgŠæB
û@1([h³µ):
// a = a < 0 ? -a : a;
F64vec2 z = _mm_setzero_pd();
a = select_lt(a, z, z-a, a);
û@2([hLè):
static const __int64 _0x7FFFFFFFFFFFFFFFLL = 0x7FFFFFFFFFFFFFFFLL;
F64vec2 m(*(double *)&_0x7FFFFFFFFFFFFFFFLL);
a &= m;
501:ftHgÌŒ³µ³ñ
08/06/22 00:11:01
ÔáÁœBmÍ
static const F64vec2 m = ȺȪ
502:ftHgÌŒ³µ³ñ
08/06/22 00:17:45
>>500
[»€¢€Ìª éÌÅ·Ë
œª Ù߬ÄA «êÄé©àµêܹñªÇ€à·¢Ü¹ñB
503:ftHgÌŒ³µ³ñ
08/06/22 01:06:49
¢¢ÁÄ±ÆæB
¿ÈÝÉâÎl̪[vÉüÁÄ¢Ä䊪¢êÍ[hLèA
âÎlðæéתá¢Èç[h³µªøŠIB
ƱëÅÅßrcp_nr()ârsqrt_nr()ÌžxÌá³ðQ¢Ä¢Ä
à€êiKvZ·é©fŒÉdivpsðg€©lŠÄœñŸª
OOÁÄÝéÆdivpsÆsqrtpsÁÄÅßÍ6 cycleÈÌÈBtypo¶áȢ̩H¬·¬éB
Èñ©rsqrt_nr()ÍÆà©rcp_nr()ÁÄvçÈ¢qÈCªµÄ«œB
»à»àœÌrcpps, rsqrtpsÅVMXÝœ¢É12rbgpӵȩÁœÌ©Æâ¢Âßœ¢B
vZbTÍúX¬ÈéñŸ©ç12rbgªÅÍå«·¬ÅàpÓ·é׫ŸÁœë€B
504:ftHgÌŒ³µ³ñ
08/06/22 10:06:41
6 cycleÆ¢€ÌÍA2.0 ÅéÝœ¢ÈPÈP[XŸ¯¶áÈ¢ÌH
505:ftHgÌŒ³µ³ñ
08/06/22 10:39:18
0ZÁÄÇ€µ€×«ÈÌH
íÉ`FbN·éÌ©È
506:ftHgÌŒ³µ³ñ
08/06/22 11:39:56
>>504
îñªÃ¢ª248966.pdfðÇñÅà13 cycleŸŒB±êÅà\ª¬ß¬éŸëB
¿åÁÆOÜÅ40 cycleŸÁœÌÉB
±èáÅÍ12bitÌrcppsðÁÄéÈBtæÁÄžxã°ª1ñÅÏßÎ13 cycleÅà[ŸoéB
>>505
0ZÍwãè`³êĢȢB
vOÆ©ÖWÈA÷ãÅ®¢œ_ÅÂÁ±Þ׫ŸB
è`³êĢȢÌÅARŠÍ©ªÅpÓ·éB
1. 0ÅZ·é_Å®ªÔáÁÄ¢œÆF¯µAû@ðüßéB
2. ª0ÉÈçÈ¢æ€ÈWbNÉ·éB
3. ª0ÉÈÁœêÌñðôðݯéB
3-1. c = a / (b + 0.001) Ìæ€È©³ã°ðs€B
3-2. c = a / max(b, 1) Ìæ€ÈºÀðݯéB
3-3. c = b == 0 ? 0 : a/b Ìæ€ÈÁÊÈæèµ¢ðs€B
4. ʪNAN©ð²×Äã©çvZÉžsµÄ¢éðF¯·éB
3ÍvOÉgÝüêéÉÈéB3-2Æ3-3ÍROɪ©êλ±ÅæèµŠÎ¢¢µAŒOÜŪ©çȯêÎíÉ`FbN·éB
ÉæÁÄͳÉñð·éæèà4ªdvÈêà éB
507:ftHgÌŒ³µ³ñ
08/06/22 12:42:58
gccÉ#include <dvec.h>ÁÄÈ¢ñŸÈ
icc€µ©È¢©
508:ftHgÌŒ³µ³ñ
08/06/22 13:32:06
>>505
íÉ`FbNµÈÄàA__try { } ÅÍñÅA
0ZáOð __except() Åó¯éû@à éæB
áOð¶³¹éæ€ÉfldcwâldmxcsrÅÝè·éKvª é¯ÇB
509:ftHgÌŒ³µ³ñ
08/06/25 14:48:40
¡ÌXbhůÉMMXœßðg€êÉ
Xbh²ÆÉemmsðÄÔKvÍ èÜ·©H
»êÆàÈºÌæ€ÉA¡ÌXbh©ç²¯oœãÉ
êñŸ¯emmsðÄ×ÎOKÅ·©H
#pragma omp parallel for
for (int i = 0; i < hoge; i++) {
@ // ±±ÅMMXðgp (x87œßÍgíÈ¢)
}
_mm_empty();@// emmsͱÌêñÅOKH
510:509
08/06/25 16:10:01
œ© Áœç|¢ÌÅ
Xbh²ÆÉemmsðÄÑo·±ÆÉµÜµœB
511:ftHgÌŒ³µ³ñ
08/06/26 02:27:18
}`XbhÅMMX͢Ο¯ÇÈ
512:ftHgÌŒ³µ³ñ
08/06/26 22:59:19
»ÌÖñÍOS˶¶áÈ¢ÌH
513:ftHgÌŒ³µ³ñ
08/06/26 23:10:16
¢âSSEàMMXà}`RAÆ}`Xbh¢ÎŸë
514:ftHgÌŒ³µ³ñ
08/06/27 00:40:22
}`Xbh¢ÎÆÍReLXgXCb`ÖÌÎ̱ÆðŸÁÄ¢éÌH
MMX/3DNowÍx87FPUÆWX^pŸ©çâè³¢µA
SSEÍFXSAVE/FXRSTORÆ¢€êpœßª é©çA±êàâèȳ»€ŸªB
515:ftHgÌŒ³µ³ñ
08/06/27 00:50:59
»€¶áË[
}`vZXÈçReNXgXCb`ÉSSE/MMXWX^ª
Û¶³êéŸë€ª}`XbhÍêÂÌvZXÌÅ¡Ì
Xbhðç¹éÌÅÄpWX^Þðµ©È¢
516:ftHgÌŒ³µ³ñ
08/06/27 01:03:32
>>515
Èɻ̳«H
NµÄÝ
517:ftHgÌŒ³µ³ñ
08/06/27 01:10:51
}`XbhÌÀÉ˶·ébŸª
ÈÆà Linux ÍåävŸÁœÍž
_È«ðŽàm蜢
518:ftHgÌŒ³µ³ñ
08/06/27 02:14:56
Windows 95ƩH
vZX٧ٵȈȈ(ry
ÈñÄcbR~Í¢âñ
519:ftHgÌŒ³µ³ñ
08/06/27 02:31:10
>>515
1D>>514 É¢Ä éæ€É MMXWX^ = x87WX^ ÈÌŸ©çA
@@MMXWX^ªÛ¶³êÈ¢Èç¬_Zà¯lÅÍȢ̩B
QDÁ [UÅàÈ¢ÀèReNXgXCb`ðÖ~ūȢB
@@³àȯêÎNÅàVXeð§â~ūĵ܀B@iáFWindows 3.1j
@@§äūȢƵœçAÅÍÇÌ^C~OÅWX^ðÒð·êÎæ¢Ì©B
RD>}`vZXÈç ... ª ... WX^ªÛ¶³êé
@@»êÅA}`RAñÎÉÈéRÍÈÉH
520:ftHgÌŒ³µ³ñ
08/06/27 03:03:41
VOiièÝjÆXbhÌæÊÌt©È¢lÁÄ¢éæËB
MMXÌWX^ðFDPÆ€L³¹œÌÍOS€ÌÎðsvÉ·éœßÌ
d|¯ŸÁœÌŸ¯Ç¡ÆÈÁÄÍâÁ¿áÁœÁÄŽ¶ÌdgÝÆ]¿³êÄ¢éB
521:519
08/06/27 08:30:11
šÁÆA>>515 ÍεÄÈ¢iūȢjÆŸÁÄ¢éÌÉA
QÔÅÍâèûð·¢ÄµÜÁÄéÈB
±Ì¿âÍPñ·éƵÄAãíèÉRÔÉÇÁµÄš±€B
}`RAñÎÆ¢€ÌÍ}`vZbTijRAj
ÈçÎOKÆ¢€±Æ©H
SMTinCp[XbfBO)ÌêÍÇ€ÈÌ©B
522:ftHgÌŒ³µ³ñ
08/06/27 08:33:32
î{IÉVORAÅÌÝpÂ\Ÿ
523:ftHgÌŒ³µ³ñ
08/06/27 10:26:51
VíÌL`KCª\êÄéÈ
®©È¢ÁÂ[̪ŸL³êÄé¿o¹â
524:ftHgÌŒ³µ³ñ
08/06/28 01:13:58
>>522
Üžù³·éÆjRAÈéPêÍVORAÌëèŸB
»ñÅ÷ÉʶĢȢ怟©çà€êx·æB
¡ÌVORAÌCPUðÚµœ}`vZbTÌ
VXeÅMMX/SSEÍpÂ\ÈÌ©B
VORAÈñŸ¯Ç[IÉ}`vZbTÌ
æ€ÉUé€SMTÎÌCPUÅÍÇ€©B
ïÌIÉŸ€ÆHTLøÈPentium4ÅMMXÍgŠéÌ©B
_Ÿ¯ÅÍÈÄARð³ŠÄà碜¢ÌŸªB
525:ftHgÌŒ³µ³ñ
08/06/28 01:31:34
ëðÌÈ¢æ€ÉÇLB
bðÈPÉ·éœß}`vZXÌêð·«œ¢B
Æè Šž±ÌÉ¢ÄÍ}`XbhÍÛ¯ÅB
526:ftHgÌŒ³µ³ñ
08/06/28 10:19:45
}`vZXÌêÍ
šCPU1Ÿ¯ÅÒ®·é
šCPU2ÂÈç»Ì€¿PŸ¯
SÂÈç»Ì€¿PŸ¯
527:ftHgÌŒ³µ³ñ
08/06/28 11:15:05
]àdlÍà€¢¢æA¿o¹¿ðB
528:RELÍME,,jÁªªªªªª
08/06/28 14:34:24
>>496 TXŸ¯ÇAÇÌÖñªïµ¢H
PžxÌ꟯Ç
;; r1 = a1^2 + a2^2 + a3^2 - a4^2
movaps xmm0, xmmword ptr [a1_a2_a3_a4] ;; a1, a2, a3, a4ðÀ×Äš
movaps xmm1, xmm0
xorps xmm1, xmmword ptr [MASK_PPPN] ;; MASK_PPPN = { 0x00000000, 0x00000000, 0x00000000, 0x80000000 }
;; ±êÅ xmm1 = { a1, a2, a3, -a4 } ÉÈé
dpps xmm0, xmm1, 0xF1 ;; àÏðÆé
;; SSE4.1ñÎÈçdppsͱêÅãÖ(SSE3K{)
;; mulps xmm0, xmm1
;; haddps xmm0, xmm0
;; haddps xmm0, xmm0
;; SSE3ªgŠÈ¢ÈçEEEhCEEE
;; r2€ÍåµÄÀñ»ªúÒūȢÌÅXJÅâÁÄæµ
;; r2 = a1+a2 - a3*a4 +a1+a3 + a2*a4 = a1+a1+a2+a3 + (a2-a3)*a4
movss xmm1, dword ptr [a1]
addss xmm1, xmm1 ;; a1+a1
movss xmm2, dword ptr [a2]
addss xmm1, xmm2 ;; (a1+a1) + a2
movss xmm3, dword ptr [a3]
addss xmm1, xmm3 ;;
subss xmm2, xmm3 ;; a2 - a3
mulss xmm2, dword ptr [a4] ;; (a2-a3)*a4
addss xmm1, xmm2 ;; r2 = = (a1+a1+a2+a3) + ((a2-a3)*a4)
;; Åãͱ€
addss xmm0, xmm1 ;; r = r1 + r2
y[WÅVX\ŠXbhÌõÞXêbèÌj
[XšÜ©¹Xg¥IvVð\ŠÉÂÔµ2ch
5214úOÉXV/141 KB
S:undef