Instruction set: x86

From ScienceZero
Jump to: navigation, search

Registers

IA-32 Architecture

General Purpose integer   (32 bit)  EAX EBX ECX EDX ESI EDI EBP ESP
Floating point registers  (80 bit)  st(0)-st(7) (8 level stack)
MMX integer registers     (64 bit)  MM0 - MM7 (shares bits with the FPU stack)
SSE floating point       (128 bit)  XMM0 - XMM7
Segment Registers         (16 bit)  CS  DS  SS  ES  FS  GS
Status and control        (32 bit)  EFLAGS  EIP  MXCSR

EAX - Accumulator for operands and results data.
EBX - Pointer to data in the DS segment.
ECX - Counter for string and loop operations.
EDX - I/O pointer.
ESI - Pointer to data in the segment pointed to by the DS register;
      source pointer for string operations.
EDI - Pointer to data (or destination) in the segment pointed to by the ES register;
      destination pointer for string operations.
ESP - Stack pointer (in the SS segment).
EBP - Pointer to data on the stack (in the SS segment).

AX BX CX DX - 16 bit register sharing bits with the 32 bit registers*.
AH AL BH BL CH CL DH DL - 8 bit registers sharing bits with the 16 bit registers*.

31     24 23     16 15      8  7      0
 00000000  00000000  00000000  00000000
 <--------- EAX EBX ECX EDX ---------->
                     <-- AX BX CX DX ->
                     <- AH ->  <- AL ->
                     <- BH ->  <- BL ->
                     <- CH ->  <- CL ->
                     <- DH ->  <- DL ->
 <--------- ESI EDI EBP ESP ---------->
                     <-- SI DI BP SP ->

*Using the 16 and 8 bit registers can be very slow on some processors and should be avoided if possible.

x86 Integer Instructions

This is the full 8086-8088 instruction set, but most, if not all of these instructions are available in 32-bit mode, they just operate on 32 bit registers (eax, ebx, etc) and values instead of their 16-bit (ax, bx, etc) counterparts. See also x86 assembly language for a quick tutorial for this chip.

Original 8086/8088 instructions

Instruction Meaning Notes
AAA ASCII adjust AL after addition used with unpacked binary-coded decimal
AAD ASCII adjust AX before division buggy in the original instruction set, but "fixed" in the NEC V20, causing a number of incompatibilites
AAM ASCII adjust AX after multiplication
AAS ASCII adjust AL after subtraction
ADC Add with carry
ADD Add
AND Logical AND
CALL Call procedure
CBW Convert byte to word
CLC Clear carry flag
CLD Clear direction flag
CLI Clear interrupt flag
CMC Complement carry flag
CMP Compare operands
CMPSB Compare bytes in memory
CMPSW Compare words
CWD Convert word to doubleword
DAA Decimal adjust AL after addition (used with packed binary coded decimal)
DAS Decimal adjust AL after subtraction
DEC Decrement by 1
DIV Unsigned divide
ESC Used with floating-point unit
HLT Enter halt state
IDIV Signed divide
IMUL Signed multiply
IN Input from port
INC Increment by 1
INT Call to interrupt
INTO Call to interrupt if overflow
IRET Return from interrupt
Jcc Jump if condition (JA, JAE, JB, JBE, JC, JCXZ, JE, JG, JGE, JL, JLE, JNA, JNAE, JNB, JNBE, JNC, JNE, JNG, JNGE, JNL, JNLE, JNO, JNP, JNS, JNZ, JO, JP, JPE, JPO, JS, JZ)
JMP Jump
LAHF Load flags into AH register
LDS Load pointer using DS
LEA Load Effective Address
LES Load ES with pointer
LOCK Assert BUS LOCK# signal (for multiprocessing)
LODSB Load byte
LODSW Load word
LOOP/LOOPx Loop control (LOOPE, LOOPNE, LOOPNZ, LOOPZ)
MOV Move
MOVSB Move byte from string to string
MOVSW Move word from string to string
MUL Unsigned multiply
NEG Two's complement negation
NOP No operation
NOT Negate the operand, logical NOT
OR Logical OR
OUT Output to port
POP Pop data from stack
POPF Pop data into flags register
PUSH Push data onto stack
PUSHF Push flags onto stack
RCL Rotate left (with carry)
RCR Rotate right (with carry)
REPxx Repeat CMPS/MOVS/SCAS/STOS (REP, REPE, REPNE, REPNZ, REPZ)
RET Return from procedure
RETN Return from near procedure
RETF Return from far procedure
ROL Rotate left
ROR Rotate right
SAHF Store AH into flags
SAL Shift Arithmetically left (multiply)
SAR Shift Arithmetically right (signed divide)
SBB Subtraction with borrow
SCASB Compare byte string
SCASW Compare word string
SHL Shift left (multiply)
SHR Shift right (unsigned divide)
STC Set carry flag
STD Set direction flag
STI Set interrupt flag
STOSB Store byte in string
STOSW Store word in string
SUB Subtraction
TEST Logical compare (AND)
WAIT Wait until not busy Waits until BUSY# pin is inactive (used with floating-point unit)
XCHG Exchange data
XLAT Table look-up translation
XOR Exclusive OR

Added with 80186/80188

Instruction Meaning Notes
BOUND Check if r16 or r32 (array index) is within bounds specified by m16&16 or m32&32
ENTER Create a <nested> stack frame for a procedure
INSB Input byte from port DX into ES:(E)DI
INSW Input word from port DX into ES:(E)DI
LEAVE Set SP to BP, then pop BP or Set ESP to EBP, then pop EBP
OUTSB Output byte from memory location specified in DS:(E)SI to I/O port specified in DX
OUTSW Output word from memory location specified in DS:(E)SI to I/O port specified in DX
POPA Pop DI, SI, BP, BX, DX, CX, and AX
PUSHA Push AX, CX, DX, BX, original SP, BP, SI, and DI
PUSHW

Added with 80286

Instruction Meaning Notes
ARPL Adjust RPL of r/m16 to not less than RPL of r16
CLTS Clears TS flag in CR0
LAR r16 <- r/m16 masked by FF00H or r32 <- r/m32 masked by 00FxFF00H
LGDT Load m into GDTR
LIDT Load m into IDTR
LLDT Load segment selector r/m16 into LDTR
LMSW Loads r/m16 in machine status word of CR0
LOADALL LOADALL loads all of the CPU registers, including the "hidden" software-invisible registers. At the completion of a LOADALL instruction, the entire CPU state is defined according to the LOADALL data table. Undocumented, emulated in BIOS on some computers.
LSL Load: r16 <- segment limit, selector r/m16 or Load: r32 <- segment limit, selector r/m32
LTR Load r/m16 into task register
SGDT Store GDTR to m
SIDT Store IDTR to m
SLDT Stores segment selector from LDTR in r/m16 or Store segment selector from LDTR in low-order 16 bits of r/m32
SMSW Store machine status word to r/m16 or Store machine status word in low-order 16 bits of r32/m16; high-order 16 bits of r32 are undefined
STR Stores segment selector from TR in r/m16
VERR Set ZF=1 if segment specified with r/m16 can be read
VERW Set ZF=1 if segment specified with r/m16 can be written

Added with 80386

Instruction Meaning Notes
BSF Bit scan forward on r/m
BSR Bit scan reverse on r/m
BT Bit Test, Store selected bit in CF flag
BTC Store selected bit in CF flag and complement
BTR Store selected bit in CF flag and clear
BTS Store selected bit in CF flag and set
CDQ EDX:EAX <- sign-extend of EAX
CMPSD Compares doubleword at address DS:(E)SI with doubleword at address ES:(E)DI and sets the status flags accordingly
CWDE EAX <- sign-extend of AX
INSD
IRETD Interrupt return (32-bit operand size)
IRETDF
IRETF
JECXZ Jump short if ECX register is 0
LFS Load FS:r16 or FS:r32 with far pointer from memory
LGS Load GS:r16 or GS:r32 with far pointer from memory
LSS Load SS:r16 or SS:r32 with far pointer from memory
LODSD Load doubleword at address DS:(E)SI into EAX
LOOPD
LOOPED
LOOPNED
LOOPNZD
LOOPZD
MOVSD
MOVSX Move with Sign-Extension
MOVZX Move with Zero-Extend
OUTSD Output doubleword from memory location specified in DS:(E)SI to I/O port specified in DX
POPAD Pop EDI, ESI, EBP, EBX, EDX, ECX, and EAX
POPFD Pop top of stack into EFLAGS
PUSHAD Push EAX, ECX, EDX, EBX, original ESP, EBP, ESI, and EDI
PUSHD
PUSHFD
SCASD Compare EAX with doubleword at ES:(E)DI and set status flags
SETcc Conditionally set byte.
SHLD Double Precision Shift Left
SHRD Double Precision Shift Right
STOSD Store EAX at address ES:(E)DI

Added with 80486

Instruction Meaning Notes
BSWAP Reverses the byte order of a 32-bit register.
CMPXCHG Compare and Exchange
CPUID Returns processor identification and feature information to the EAX, EBX, ECX, and EDX registers, according to the input value entered initially in the EAX register.
INVD Flush internal caches; initiate flushing of external caches.
INVLPG Invalidate TLB Entry for page that contains m
RSM Resume operation of interrupted program
WBINVD Write back and flush Internal caches; initiate writing-back and flushing of external caches.
XADD Exchange and Add

Added with Pentium

Instruction Meaning Notes
CMPXCHG8B Compare EDX:EAX with m64. If equal, set ZF and load ECX:EBX into m64. Else, clear ZF and load m64 into EDX:EAX.
RDMSR Load MSR specified by ECX into EDX:EAX
RDPMC* Read performance-monitoring counter specified by ECX into EDX:EAX RDPMC was introduced in the Pentium Pro processor and the Pentium processor with MMX technology.
RDTSC Read time-stamp counter into EDX:EAX
WRMSR Write the value in EDX:EAX to MSR specified by ECX


Added with Pentium Pro

Instruction Meaning Notes
CMOVcc Conditional move register to memory or register to register.
SYSENTER Transition to System Call Entry Point Equivalent for AMD is SYSCALL
SYSEXIT Transition from System Call Entry Point Equivalent for AMD is SYSRET

Added with Pentium III

as part of the SSE branding

Instruction Meaning Notes
MASKMOVQ Selectively write bytes from mm1 to memory location using the byte mask in mm2
MOVNTPS Move packed single-precision floating-point values from xmm to m128, minimizing pollution in the cache hierarchy.
MOVNTQ Move quadword from mm to m64, minimizing pollution in the cache hierarchy.
PREFETCH0 Move data specified by address closer to the processor using T0 hint.
PREFETCH1 Move data specified by address closer to the processor using T1 hint.
PREFETCH2 Move data specified by address closer to the processor using T2 hint.
PREFETCHNTA Move data specified by address closer to the processor using NTA hint.
SFENCE Serializes store operations. (for Cacheability and Memory Ordering)

Added with Pentium 4

as part of the SSE2 branding

Instruction Meaning Notes
CLFLUSH Flushes cache line containing m8.
LFENCE Serializes load operations.
MASKMOVDQU Selectively write bytes from xmm1 to memory location using the byte mask in xmm2.
MFENCE Serializes load and store operations.
MOVNTDQ Move double quadword from xmm to m128, minimizing pollution in the cache hierarchy.
MOVNTI Move doubleword from r32 to m32, minimizing pollution in the cache hierarchy.
MOVNTPD Move packed double-precision floating-point values from xmm to m128, minimizing pollution in the cache hierarchy.
PAUSE Delays execution of next instruction an implementation-specific amount of time.

Added with Pentium 4 supporting SSE3

only on processors supporting Hyper-threading
as part of the SSE3 branding

Instruction Meaning Notes
MONITOR Sets up a linear address range to be monitored by hardware and activates the monitor. The address range should be of a write-back memory caching type.
MWAIT A hint that allows the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events.

Added with Pentium 4 6x2

VMX is intended to support virtualization of processor hardware and a system software layer acting as a host to multiple guest software environments. The virtual-machine extensions (VMX) includes five instructions that manage the virtual-machine control structure (VMCS) and five instruction that manage VMX operation. Additional details of VMX are described in IA-32 Intel Architecture Software Developer’s Manual, Volume 3B.

Instruction Meaning Notes
VMPTRLD This instruction takes a single 64-bit source operand that is in memory. It makes the referenced VMCS active and current, loading the current-VMCS pointer with this operand and establishes the current VMCS based on the contents of VMCS-data area in the referenced VMCS region.
VMPTRST This instruction takes a single 64-bit destination operand that is in memory. The current-VMCS pointer is stored into the destination operand.
VMCLEAR This instruction takes a single 64-bit operand that is in memory. The instruction sets the launch state of the VMCS referenced by the operand to “clear”, renders that VMCS inactive, and ensures that data for the VMCS have been written to the VMCSdata area in the referenced VMCS region.
VMREAD This instruction reads a component from the VMCS (the encoding of that field is given in a register operand) and stores it into a destination operand that may be a register or in memory.
VMWRITE This instruction writes a component to the VMCS (the encoding of that field is given in a register operand) from a source operand that may be a register or in memory
VMCALL This instruction allows a guest in VMX non-root operation to call the VMM for service. A VM exit occurs, transferring control to the VMM.
VMLAUNCH This instruction launches a virtual machine managed by the VMCS. A VM entry occurs, transferring control to the VM.
VMRESUME This instruction resumes a virtual machine managed by the VMCS. A VM entry occurs, transferring control to the VM.
VMXOFF This instruction causes the processor to leave VMX operation.
VMXON This instruction takes a single 64-bit source operand that is in memory. It causes a logical processor to enter VMX root operation and to use the memory referenced by the operand to support VMX operation.

Added with x86-64

Instruction Meaning Notes
CMPXCHG16B Compare and Exchange Sixteen Bytes

x87 Floating-point Instructions

Original 8087 instructions

Instruction Meaning Notes
F2XM1 Replace ST(0) with (2 * ST(0) - 1)
FABS Replace ST with its absolute value.
FADD Add m to ST or add ST to ST(i) or ST(i) to ST
FADDP Add m to ST or add ST to ST(i) or ST(i) to ST, pop the stack.
FBLD Convert BCD value to real and push onto the FPU stack.
FBSTP Store ST(0) in m80bcd and pop ST(0).
FCHS Complements sign of ST(0)
FCLEX Clear floating-point exception flags after checking for pending unmasked floating-point exceptions.
FCOM Compare ST(0) with m or ST(i)
FCOMP Compare ST(0) with m or ST(i), pop the stack.
FCOMPP Compare ST(0) with ST(1) and pop register stack twice.
FDECSTP Decrement TOP field in FPU status word.
FDISI Sets the interrupt enable mask in the control word.
FDIV Divide ST(0) by m and store result in ST(0), Divide ST(0) by ST(i) and store result in ST(0), Divide ST(i) by ST(0) and store result in ST(i)
FDIVP Divide ST(i) by ST(0), store result in ST(i), and pop the register stack, Divide ST(1) by ST(0), store result in ST(1), and pop the register stack.
FDIVR Reverse Divide
FDIVRP Divide ST(0) by ST(i), store result in ST(i), and pop the register stack
FENI Sets tag for ST(i) to empty.
FFREE Sets tag for ST(i) to empty.
FIADD Integer add m to ST(0) and store result in ST(0).
FICOM Integer compare ST(0) with m.
FICOMP Integer compare ST(0) with m, pop stack .
FIDIV Integer divide ST(0) by m and store result in ST(0)
FIDIVR Integer divide m by ST(0) and store result in ST(0)
FILD Integer push m onto the FPU register stack.
FIMUL Integer multiply ST(0) by m and store result in ST(0)
FINCSTP Increment the TOP field in the FPU status register
FINIT Initialize FPU after checking for pending unmasked floating-point exceptions.
FIST Integer store ST(0) in m.
FISTP Integer store ST(0) in m, pop the stack.
FISUB Integer subtract m from ST(0) and store result in ST(0)
FISUBR Integer subtract ST(0) from m and store result in ST(0)
FLD Push m or ST(i) onto the FPU register stack.
FLD1 Push +1.0 onto the FPU register stack.
FLDCW Load FPU control word from m2byte.
FLDENV Load FPU environment from m14byte or m28byte.
FLDENVW
FLDL2E Push log2</sup>e onto the FPU register stack.
FLDL2T Push log210 onto the FPU register stack.
FLDLG2 Push log102 onto the FPU register stack.
FLDLN2 Push loge2 onto the FPU register stack.
FLDPI Push p onto the FPU register stack.
FLDZ Push +0.0 onto the FPU register stack.
FMUL Multiply ST(0) by m or ST(i) and store result in ST(0),
FMULP Multiply ST(0) by m or ST(i) and store result in ST(0), pop the stack,
FNCLEX Clear floating-point exception flags without checking for pending unmasked floating-point exceptions.
FNDISI Sets the interrupt enable mask in the control word.
FNENI Clears the interrupt enable mask in the control word.
FNINIT Initialize FPU without checking for pending unmasked floating-point exceptions.
FNOP No operation is performed.
FNSAVE Store FPU environment to m94byte or m108byte without checking for pending unmasked floating-point exceptions. Then re-initialize the FPU.
FNSAVEW
FNSTCW Store FPU control word to m2byte without checking for pending unmasked floating-point exceptions.
FNSTENV Store FPU environment to m14byte or m28byte without checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions.
FNSTENVW
FNSTSW Store FPU status word at m2byte or in AX without checking for pending unmasked floating-point exceptions.
FPATAN Replace ST(1) with arctan(ST(1)/ST(0)) and pop the register stack
FPREM Replace ST(0) with the remainder obtained from dividing ST(0) by ST(1)
FPTAN Replace ST(0) with its tangent and push 1 onto the FPU stack.
FRNDINT Round ST(0) to an integer.
FRSTOR Load FPU state from m94byte or m108byte.
FRSTORW
FSAVE Store FPU state to m94byte or m108byte after checking for pending unmasked floating-point exceptions. Then re-initialize the FPU.
FSAVEW
FSCALE Scale ST(0) by ST(1).
FSQRT Computes square root of ST(0) and stores the result in ST(0)
FST Copy ST(0) to m or ST(i)
FSTCW Store FPU control word to m2byte after checking for pending unmasked floating-point exceptions.
FSTENV Store FPU environment to m14byte or m28byte after checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions.
FSTENVW
FSTP Copy ST(0) to m or ST(i) and pop register stack.
FSTSW Store FPU status word at m2byte or in AX after checking for pending unmasked floating-point exceptions.
FSUB Subtract m from ST(0) and store result in ST(0) or Subtract ST(i) from ST(0) and store result in ST(0) or Subtract ST(0) from ST(i) and store result in ST(i)
FSUBP Subtract ST(0) from ST(i), store result in ST(i), and pop register stack
FSUBR Subtract ST(0) from m32real or ST(i) and store result in ST(0) or Subtract ST(i) from ST(0) and store result in ST(i)
FSUBRP Subtract ST(i) from ST(0), store result in ST(i), and pop register stack
FTST Compare ST(0) with 0.0.
FWAIT Check pending unmasked floating-point exceptions.
FXAM Classify value or number in ST(0)
FXCH Exchange the contents of ST(0) and ST(i)
FXTRACT Separate value in ST(0) into exponent and significand, store exponent in ST(0), and push the significand onto the register stack.
FYL2X Replace ST(1) with (ST(1) * log2ST(0)) and pop the register stack
FYL2XP1 Replace ST(1) with ST(1) * log 2 (ST(0) + 1.0) and pop the register stack

Added with 80287

Instruction Meaning Notes
FSETPM Set Protected Mode Only used on 80287

Added with 80387

Instruction Meaning Notes
FCOS Replace ST(0) with its cosine
FLDENVD Load FPU environment from m28byte.
FNSAVED Store FPU environment to m94byte or m108byte without checking for pending unmasked floating-point exceptions. Then re-initialize the FPU.
FNSTENVD Store FPU environment to m14byte or m28byte without checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions.
FPREM1 Replace ST(0) with the IEEE remainder obtained from dividing ST(0) by ST(1)
FRSTORD Load FPU state from m94byte or m108byte.
FSAVED Store FPU state to m94byte or m108byte after checking for pending unmasked floating-point exceptions. Then re-initialize the FPU.
FSIN Replace ST(0) with its sine.
FSINCOS Compute the sine and cosine of ST(0); replace ST(0) with the sine, and push the cosine onto the register stack.
FSTENVD Store FPU environment to m14byte or m28byte after checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions.
FUCOM Performs an unordered comparison of the contents of register ST(0) and ST(i) and sets condition code flags C0, C2, and C3 in the FPU status word according to the results.
FUCOMP Performs an unordered comparison of the contents of register ST(0) and ST(i) and sets condition code flags C0, C2, and C3 in the FPU status word according to the results and pops the stack.
FUCOMPP Performs an unordered comparison of the contents of register ST(0) and ST(i) and sets condition code flags C0, C2, and C3 in the FPU status word according to the results and pops the stack twice.

Added with Pentium Pro

Instruction Meaning Notes
FCMOVB Move if below (CF=1)
FCMOVBE Move if below or equal (CF=1 or ZF=1)
FCMOVE Move if equal (ZF=1)
FCMOVNB Move if not below (CF=0)
FCMOVNBE Move if not below or equal (CF=0 and ZF=0)
FCMOVNE Move if not equal (ZF=0)
FCMOVNU Move if not unordered (PF=0)
FCMOVU Move if unordered (PF=1)
FCOMI Compare ST(0) with ST(i) and set status flags accordingly
FCOMIP Compare ST(0) with ST(i), set status flags accordingly, and pop register stack
FUCOMI Compare ST(0) with ST(i), check for ordered values, and set status flags accordingly
FUCOMIP Compare ST(0) with ST(i), check for ordered values, set status flags accordingly, and pop register stack
FXRSTOR Loads x87 FPU, MMX™ technology, Streaming SIMD Extensions, and Streaming SIMD Extensions 2 state from m512byte.
FXSAVE x87 FPU, MMX™ technology, Streaming SIMD Extensions, and Streaming SIMD Extensions 2 state to m512byte.

Added with Pentium 4 supporting SSE3

as part of the SSE3 branding

FISTTP Store ST as a signed integer (truncate) in m and pop ST

SIMD Instructions

MMX instructions

added with Pentium MMX

Instruction Meaning Notes
EMMS Empty MMX state This instruction must be executed before executing floating-point instructions.
MOVD Move doubleword
MOVQ Move quadword
PACKSSDW Pack doublewords into words (signed with saturation)
PACKSSWB Pack words into bytes (signed with saturation)
PACKUSWB Pack words into bytes (unsigned with saturation)
PADDB Add with wrap-around on byte
PADDD Add with wrap-around on doubleword
PADDSB Add signed with saturation on byte
PADDSW Add signed with saturation on word
PADDUSB Add unsigned with saturation on byte
PADDUSW Add unsigned with saturation on word
PADDW Add with wrap-around on word
PAND Bitwise AND
PANDN Bitwise AND NOT
PCMPEQB Packed compare for equality byte
PCMPEQD Packed compare for equality double word
PCMPEQW Packed compare for equality word
PCMPGTB Packed compare greater than byte
PCMPGTD Packed compare greater than double word
PCMPGTW Packed compare greater than word
PMADDWD Packed multiply on words and add resulting pairs
PMULHW Packed multiply high on words
PMULLW Packed multiply low on words
POR Bitwise OR
PSLLD Packed shift left logical doubleword by amount specified in MMX register or by immediate value
PSLLQ Packed shift left logical quadword by amount specified in MMX register or by immediate value
PSLLW Packed shift left logical word by amount specified in MMX register or by immediate value
PSRAD Packed shift right arithmetic doubleword by amount specified in MMX register or by immediate value
PSRAW Packed shift right arithmetic word by amount specified in MMX register or by immediate value
PSRLD Packed shift right logical doubleword by amount specified in MMX register or by immediate value
PSRLQ Packed shift right logical quadword by amount specified in MMX register or by immediate value
PSRLW Packed shift right logical word by amount specified in MMX register or by immediate value
PSUBB Subtract with wrap-around on byte
PSUBD Subtract with wrap-around on doubleword
PSUBSB Subtract signed with saturation on byte
PSUBSW Subtract signed with saturation on word
PSUBUSB Subtract unsigned with saturation on byte
PSUBUSW Subtract unsigned with saturation on word
PSUBW Subtract with wrap-around on word
PUNPCKHBW Unpack (interleave) high-order bytes from MMX register
PUNPCKHDQ Unpack (interleave) high-order doublewords from MMX register
PUNPCKHWD Unpack (interleave) high-order words from MMX register
PUNPCKLBW Unpack (interleave) low-order bytes from MMX register
PUNPCKLDQ Unpack (interleave) low-order doublewords from MMX register
PUNPCKLWD Unpack (interleave) low-order words from MMX register
PXOR Bitwise EOR

Extendend MMX|MMX+ instructions

added with 6x86MX from Cyrix; supported on other CPUs too, i.e. Extended MMX on Athlon 64

3DNow! instructions

added with K6-2

Instruction Meaning Notes
FEMMS Faster Enter/Exit of the MMX or floating-point state.
PAVGUSB Average of unsigned packed 8-bit values.
PF2ID Converts packed floating-point operand to packed 32-bit integer.
PFACC Floating-point accumulate.
PFADD Packed, floating-point addition.
PFCMPEQ Packed floating-point comparison, equal to.
PFCMPGE Packed floating-point comparison, greater than or equal to.
PFCMPGT Packed floating-point comparison, greater than.
PFMAX Packed floating-point maximum.
PFMIN Packed floating-point minimum.
PFMUL Packed floating-point multiplication.
PFRCP Floating-point reciprocal approximation.
PFRCPIT1 Packed floating-point reciprocal, first iteration step.
PFRCPIT2 Packed floating-point reciprocal/reciprocal square root, second iteration step.
PFRSQIT1 Packed floating-point reciprocal square root, first iteration step.
PFRSQRT Floating-point reciprocal square root approximation.
PFSUB Packed floating-point subtraction.
PFSUBR Packed floating-point reverse subtraction.
PI2FD Packed 32-bit integer to floating-point conversion.
PMULHRW Multiply signed packed 16-bit values with rounding and store the high 16 bits.
PREFETCH Prefetch processor cache line into L1 data cache (Dcache).
PREFETCHW Prefetch processor cache line into L1 data cache (Dcache).

3DNow!+ instructions

added with Athlon

Instruction Meaning Notes
PF2IW Packed Floating-Point to Integer Word Conversion with Sign Extend
PFNACC Packed Floating-Point Negative Accumulate
PFPNACC Packed Floating-Point Mixed Positive-Negative Accumulate
PI2FW Packed Integer Word to Floating-Point Conversion
PSWAPD Packed Swap Doubleword

Streaming SIMD Extensions|SSE instructions

added with Pentium III
also see integer instruction added with Pentium III

SSE SIMD Floating-Point Instructions

Instruction Meaning Notes
ADDPS Add packed single-precision floating-point values from xmm2/m128 to xmm1.
ADDSS Add the low single-precision floating-point value from xmm2/m32 to xmm1.
ANDNPS Bitwise logical AND NOT of xmm2/m128 and xmm1.
ANDPS Bitwise logical AND of xmm2/m128 and xmm1.
CMPPS Compare packed single-precision floating-point values from xmm2/mem with packed single-precision floating-point values in xmm1 register using imm8 as comparison predicate.
CMPSS Compare low single-precision floating-point value from xmm2/m32 with low single-precision floating-point value in xmm1 register using imm8 as comparison predicate.
COMISS Compare low single-precision floating-point values in xmm1 and xmm2/mem32 and set the EFLAGS flags accordingly.
CVTPI2PS Convert two signed doubleword integers from mm/m64 to two single-precision floating-point values in xmm..
CVTPS2PI Convert two single-precision floating-point values from xmm/m64 to two signed doubleword signed integers in mm.
CVTSI2SS Convert one signed doubleword integer from r/m32 to one single-precision floating-point number in xmm.
CVTSS2SI Convert one single-precision floating-point number from xmm/m32 to one signed doubleword integer in r32.
CVTTPS2PI Convert two single-precision floating-point values from xmm/m64 to two signed doubleword signed integers in mm using truncation.
CVTTSS2SI Convert one single-precision floating-point number from xmm/m32 to one signed doubleword integer r32 using truncation.
DIVPS Divide packed single-precision floating-point values in xmm1 by packed single-precision floating-point values xmm2/m128.
DIVSS Divide low single-precision floating-point value in xmm1 by low single-precision floating-point value in xmm2/m32
LDMXCSR Load Streaming SIMD Extension control/status word from m32.
MAXPS Return the maximum single-precision floating-point values between xmm2/m128 and xmm1.
MAXSS Return the maximum scalar single-precision floating-point value between xmm2/mem32 and xmm1.
MINPS Return the minimum single-precision floating-point values between xmm2/m128 and xmm1.
MINSS Return the minimum scalar single-precision floating-point value between xmm2/mem32 and xmm1.
MOVAPS Move packed single-precision floating-point numbers from/to xmm2/m128 to/from xmm1.
MOVHLPS Move two packed single-precision floating-point values from high quadword of xmm2 to low quadword of xmm1.
MOVHPS Move two packed single-precision floating-point values from/to m64 to/from high quadword of xmm.
MOVLHPS Move two packed single-precision floating-point values from low quadword of xmm2 to high quadword of xmm1.
MOVLPS Move two packed single-precision floating-point values from/to m64 to/from low quadword of xmm.
MOVMSKPS Extract 4-bit sign mask of from xmm and store in r32.
MOVNTPS Move packed single-precision floating-point values from xmm to m128, minimizing pollution in the cache hierarchy.
MOVSS Move scalar single-precision floating-point value from/to xmm2/m64 to/from xmm1 register.
MOVUPS Move packed single-precision floating-point numbers from/to xmm2/m128 to/from xmm1.
MULPS Multiply packed single-precision floating-point values in xmm2/mem by xmm1.
MULSS Multiply the low single-precision floating-point value in xmm2/mem by the low single-precision floating-point value in xmm1.
ORPS Bitwise OR of xmm2/m128 and xmm1
RCPPS Returns to xmm1 the packed approximations of the reciprocals of the packed single-precision floating-point values in xmm2/m128.
RCPSS Returns to xmm1 the packed approximation of the reciprocal of the low single-precision floating-point value in xmm2/m32.
RSQRTPS Returns to xmm1 the packed approximations of the reciprocals of the square roots of the packed single-precision floating-point values in xmm2/m128.
RSQRTSS Returns to xmm1 an approximation of the reciprocal of the square root of the low single-precision floating-point value in xmm2/m32.
SHUFPS Shuffle packed single-precision floating-point values selected by imm8 from xmm1 and xmm1/m128 to xmm1.
SQRTPS Computes square roots of the packed single-precision floating-point values in xmm2/m128 and stores the results in xmm1.
SQRTSS Computes square root of the low single-precision floating-point value in xmm2/m32 and stores the results in xmm1.
STMXCSR Store Multimedia Extended Control Status Register.
SUBPS Subtract packed single-precision floating-point values in xmm2/mem from xmm1.
SUBSS Subtract the lower single-precision floating-point numbers in xmm2/m32 from xmm1.
UCOMISS Compare lower single-precision floating-point number in xmm1 register with lower single-precision floating-point number in xmm2/mem and set the status flags accordingly.
UNPCKHPS Interleaves single-precision floating-point values from the high quadwords of xmm1 and xmm2/mem into xmm1.
UNPCKLPS Interleaves single-precision floating-point values from the low quadwords of xmm1 and xmm2/mem into xmm1.
XORPS Bitwise exclusive-OR of xmm2/m128 and xmm1.

SSE SIMD Integer Instructions

Instruction Meaning Notes
PAVGB Average packed unsigned byte integers from xmm2/m128 and xmm1, with rounding.
PAVGW Average packed unsigned word integers from xmm2/m128 and xmm1, with rounding.
PEXTRW Extract the word specified by imm8 from xmm and move it to a r32.
PINSRW Move the low word of r32 or from m16 into xmm at the word position specified by imm8.
PMAXSW Compare signed word integers in xmm2/m128 and xmm1 for maximum values.
PMAXUB Compare unsigned byte integers in xmm2/m128 and xmm1 for maximum values.
PMINSW Compare signed word integers in xmm2/m128 and xmm1 for minimum values.
PMINUB Compare unsigned byte integers in xmm2/m128 and xmm1 for minimum values.
PMOVMSKB Move the byte mask of xmm to r32.
PSADBW Absolute difference of packed unsigned byte integers from xmm2 /m128 and xmm1; the 8 low differences and 8 high differences are then summed separately to produce two word integer results.
PSHUFW Shuffle the words in mm2/m64 based on the encoding in imm8 and store the result in mm1.

SSE2 instructions

added with Pentium 4
also see integer instructions added with Pentium 4

SSE2 SIMD Floating-Point Instructions

Instruction Meaning Notes
ADDPD Add packed double-precision floating-point values from xmm2/m128 to xmm1.
ADDSD Add the low double-precision floating-point value from xmm2/m64 to xmm1.
ANDNPD Bitwise logical AND NOT of xmm2/m128 and xmm1.
ANDPD Bitwise logical AND of xmm2/m128 and xmm1.
CMPPD Compare packed double-precision floating-point numbers from xmm2/m128 with packed double-precision floating-point numbers in xmm1, using imm8 as comparison predicate.
CMPSD* Compare low double-precision floating-point value from xmm2/m64 with low double-precision floating-point value in xmm1 register using imm8 as comparison predicate.
COMISD Compare low double-precision floating-point values in xmm1 and xmm2/mem64 and set the EFLAGS flags accordingly.
CVTDQ2PD Convert two packed signed doubleword integers from xmm2/m128 to two packed double-precision floating-point values in xmm1.
CVTDQ2PS Convert four packed signed doubleword integers from xmm2/m128 to four packed single-precision floating-point values in xmm1.
CVTPD2DQ Convert two packed double-precision floating-point values from xmm2/m128 to two packed signed doubleword integers in xmm1.
CVTPD2PI Convert two packer double-precision floating-point numbers from xmm/m128 to two packed signed doubleword integers in mm.
CVTPD2PS Convert two double-precision floating-point values in xmm2/m128 to two single-precision floating-point values in xmm1.
CVTPI2PD Convert two signed doubleword integers from mm/mem64 to two double-precision floating-point values in xmm.
CVTPS2DQ Convert four packed single-precision floating-point values from xmm2/m128 to four packed signed doubleword integers in xmm1.
CVTPS2PD Convert two packed single-precision floating-point values in xmm2/m64 to two packed double-precision floating-point values in xmm1.
CVTSD2SI Convert one double-precision floating-point number from xmm/m64 to one signed doubleword integer r32.
CVTSD2SS Convert one double-precision floating-point value in xmm2/m64 to one single-precision floating-point value in xmm1.
CVTSI2SD Convert one signed doubleword integer from r/m32 to one double-precision floating-point value in xmm.
CVTSS2SD Convert one single-precision floating-point value in xmm2/m32 to one double-precision floating-point value in xmm1.
CVTTPD2DQ Convert two packed double-precision floating-point values from xmm2/m128 to two packed signed doubleword integers in xmm1 using truncation.
CVTTPD2PI Convert two packer double-precision floating-point numbers from xmm/m128 to two packed signed doubleword integers in mm using truncation.
CVTPS2DQ Convert four packed single-precision floating-point values from xmm2/m128 to four packed signed doubleword integers in xmm1.
CVTTSD2SI Convert one double-precision floating-point number from xmm/m64 to one signed doubleword integer r32 using truncation.
DIVPD Divide packed double-precision floating-point values in xmm1 by packed double-precision floating-point values xmm2/m128.
DIVSD Divide low double-precision floating-point value n xmm1 by low double-precision floating-point value in xmm2/mem64.
MAXPD Return the maximum double-precision floating-point values between xmm2/m128 and xmm1.
MAXSD Return the maximum scalar double-precision floating-point value between xmm2/mem64 and xmm1.
MINPD Return the minimum double-precision floating-point values between xmm2/m128 and xmm1.
MINSD Return the minimum scalar double-precision floating-point value between xmm2/mem64 and xmm1.
MOVAPD Move Aligned Packed Double-Precision Floating-Point Values.
MOVHPD Move High Packed Double-Precision Floating-Point Value.
MOVLPD Move Low Packed Double-Precision Floating-Point Value.
MOVMSKPD Extract 2-bit sign mask of from xmm and store in r32.
MOVSD* Move Scalar Double-Precision Floating-Point Value
MOVUPD Move Unaligned Packed Double-Precision Floating-Point Values.
MULPD Multiply packed double-precision floating-point values in xmm2/m128 by xmm1.
MULSD Multiply the low double-precision floating-point value in xmm2/mem64 by low double-precision floating-point value in xmm1.
ORPD Bitwise OR of xmm2/m128 and xmm1.
SHUFPD Shuffle packed double-precision floating-point values selected by imm8 from xmm1 and xmm1/m128 to xmm1.
SQRTPD Computes square roots of the packed double-precision floating-point values in xmm2/m128 and stores the results in xmm1.
SQRTSD Computes square root of the low double-precision floating-point value in xmm2/m64 and stores the results in xmm1.
SUBPD Subtract packed double-precision floating-point values in xmm2/m128 from xmm1.
SUBSD Subtracts the low double-precision floating-point numbers in xmm2/mem64 from xmm1.
UCOMISD Compares (unordered) the low double-precision floating-point values in xmm1 and xmm2/m64 and set the EFLAGS accordingly.
UNPCKHPD Interleaves double-precision floating-point values from the high quadwords of xmm1 and xmm2/m128.
UNPCKLPD Interleaves double-precision floating-point values from the low quadwords of xmm1 and xmm2/m128.
XORPD Bitwise exclusive-OR of xmm2/m128 and xmm1
  • CMPSD and MOVSD have the same name as the string instruction mnemonics CMPSD (CMPS) and MOVSD (MOVS), however, the former refer to scalar Double precision|double-precision Floating point whereas the latters refer to Integer strings.

SSE2 SIMD Integer Instructions

Instruction Meaning Notes
MOVDQ2Q Move low quadword from xmm to mmx register .
MOVDQA Move aligned double quadword from/to xmm2/m128 to/from xmm1.
MOVDQU Move unaligned double quadword from/to xmm2/m128 to/from xmm1.
MOVQ2DQ Move quadword from mmx to low quadword of xmm.
PADDQ Add packed quadword integers xmm2/m128 to xmm1
PMULUDQ Multiply packed unsigned doubleword integers in xmm1 by packed unsigned doubleword integers in xmm2/m128, and store the quadword results in xmm1.
PSHUFHW Shuffle the high words in xmm2/m128 based on the encoding in imm8 and store the result in xmm1.
PSHUFLW Shuffle the low words in xmm2/m128 based on the encoding in imm8 and store the result in xmm1.
PSHUFD Shuffle the doublewords in xmm2/m128 based on the encoding in imm8 and store the result in xmm1.
PSLLDQ Shift left xmm1 by imm8 bytes, clearing low-order bits.
PSRLDQ Shift right xmm1 by imm8, clearing high-order bits.
PUNPCKHQDQ Interleave doublewords from the high quadwords of xmm1 and xmm2/m128 into xmm1.
PUNPCKLQDQ Interleave low quadwords of xmm1 and xmm2/m128 into xmm1 register.

SSE3 instructions

added with Pentium 4 SSE3
also see integer and floating-point instructions added with Pentium 4 SSE3

SSE3 SIMD Floating-Point Instructions

Instruction Meaning Notes
ADDSUBPD Add/Subtract packed DP FP numbers from XMM2/Mem to XMM1.
ADDSUBPS Add/Subtract packed SP FP numbers from XMM2/Mem to XMM1.
HADDPD Add horizontally packed DP FP numbers from XMM2/Mem to XMM1.
HADDPS Add horizontally packed SP FP numbers from XMM2/Mem to XMM1.
HSUBPD Subtract horizontally packed DP FP numbers in XMM2/Mem from XMM1.
HSUBPS Subtract horizontally packed SP FP numbers in XMM2/Mem from XMM1.

SSE3 SIMD Integer Instructions

Instruction Meaning Notes
MOVDDUP Move 64 bits representing the lower DP data element from XMM2/Mem to XMM1 register and duplicate.
MOVSHDUP Move 128 bits representing packed SP data elements from XMM2/Mem to XMM1 register and duplicate high.
MOVSLDUP Move 128 bits representing packed SP data elements from XMM2/Mem to XMM1 register and duplicate low.
LDDQU Load 128 bits from Mem to XMM register.

SSSE3 instructions

added with Xeon 5100 series and Core 2

Instruction Meaning Notes
PSIGNW Negate the elements of a register of bytes, words or dwords if the sign of the corresponding elements of another register is negative.
PSIGND Negate the elements of a register of bytes, words or dwords if the sign of the corresponding elements of another register is negative.
PSIGNB Negate the elements of a register of bytes, words or dwords if the sign of the corresponding elements of another register is negative.
PSHUFB Packed Shuffle Bytes.
PMULHRSW Packed Multiply High with Round and Scale.
PMADDUBSW Multiply and Add Packed Signed and Unsigned Bytes.
PHSUBW Packed Horizontal Subtract Word.
PHSUBSW Packed Horizontal Subtract and Saturate Words.
PHSUBD Packed Horizontal Subtract Doubleword.
PHADDW Packed Horizontal Add Word.
PHADDSW Packed Horizontal Add and Saturate Words.
PHADDD Packed Horizontal Add Doubleword.
PALIGNR Packed Align Right.
PABSW Packed Absolute Value Word.
PABSD Packed Absolute Value Doubleword.
PABSB Packed Absolute Value Byte.

External links