MMX moves (Developers)
Caveat: I am far from an expert! You'll be hard-pressed to find anyone who can 100% tell you about this stuff. (I've looked, it's complex! Different advice is found everywhere!)
> Do I something wrong or the simple MMX moves are slower than normal 386
> moves?
>
> {block A}
> @mmxloop: movq mm0,ds:[esi];movq es:[edi],mm0
> add esi,8;sub ecx,8;add edi,8;cmp ecx,8;jge @mmxloop
Seg overrides are always slower, and DS: is default (even though some dumb assemblers will stick it in there anyways, wasting space). I think something like this only really helps on large data. (You could also try putting 8 into a spare register and using that instead of the immediate value. Not sure how much that'd help, though.)
> {block B}
> shr ecx,1;pushf;shr ecx,1;rep movsd;adc ecx,ecx
> rep movsw;popf;adc ecx,ecx;rep movsb
Since you're almost certainly writing this for a 686, their internal register renaming helps a lot (even for the stack and flags). So it will be more difficult to beat their default "rep movsb" (which is fairly fast on semi-modern, superscalar, out-of-order machines).
Complete thread:
Mix view