MMX moves (Developers)
try to copy 8 bytes aligned 64 bytes blocks with this
movq mm0,ds:[esi];
movq mm1,ds:[esi+32];
movq mm2,ds:[esi+8];
movq mm3,ds:[esi+40];
movq mm4,ds:[esi+16];
movq mm5,ds:[esi+48];
movq mm6,ds:[esi+24];
movq mm7,ds:[esi+56];
movq es:[edi],mm0
movq es:[edi+32],mm1
movq es:[edi+8],mm2
movq es:[edi+40],mm3
movq es:[edi+16],mm4
movq es:[edi+48],mm5
movq es:[edi+24],mm6
movq es:[edi+56],mm7
at least ESI or EDI should be 8 bytes aligned, and if both aligned you will get max speed 
Complete thread:
Mix view