Instruction set: x86

Registers

IA-32 Architecture

General Purpose integer   (32 bit)  EAX EBX ECX EDX ESI EDI EBP ESP
Floating point registers  (80 bit)  st(0)-st(7) (8 level stack)
MMX integer registers     (64 bit)  MM0 - MM7 (shares bits with the FPU stack)
SSE floating point       (128 bit)  XMM0 - XMM7
Segment Registers         (16 bit)  CS  DS  SS  ES  FS  GS
Status and control        (32 bit)  EFLAGS  EIP  MXCSR

EAX - Accumulator for operands and results data.
EBX - Pointer to data in the DS segment.
ECX - Counter for string and loop operations.
EDX - I/O pointer.
ESI - Pointer to data in the segment pointed to by the DS register;
      source pointer for string operations.
EDI - Pointer to data (or destination) in the segment pointed to by the ES register;
      destination pointer for string operations.
ESP - Stack pointer (in the SS segment).
EBP - Pointer to data on the stack (in the SS segment).

AX BX CX DX - 16 bit register sharing bits with the 32 bit registers^*.
AH AL BH BL CH CL DH DL - 8 bit registers sharing bits with the 16 bit registers^*.

31     24 23     16 15      8  7      0
 00000000  00000000  00000000  00000000
 <--------- EAX EBX ECX EDX ---------->
                     <-- AX BX CX DX ->
                     <- AH ->  <- AL ->
                     <- BH ->  <- BL ->
                     <- CH ->  <- CL ->
                     <- DH ->  <- DL ->
 <--------- ESI EDI EBP ESP ---------->
                     <-- SI DI BP SP ->

^*Using the 16 and 8 bit registers can be very slow on some processors and should be avoided if possible.

x86 Integer Instructions

This is the full 8086-8088 instruction set, but most, if not all of these instructions are available in 32-bit mode, they just operate on 32 bit registers (eax, ebx, etc) and values instead of their 16-bit (ax, bx, etc) counterparts. See also x86 assembly language for a quick tutorial for this chip.

Original 8086/8088 instructions

Instruction	Meaning	Notes
AAA	ASCII adjust AL after addition	used with unpacked binary-coded decimal
AAD	ASCII adjust AX before division	buggy in the original instruction set, but "fixed" in the NEC V20, causing a number of incompatibilites
AAM	ASCII adjust AX after multiplication
AAS	ASCII adjust AL after subtraction
ADC	Add with carry
ADD	Add
AND	Logical AND
CALL	Call procedure
CBW	Convert byte to word
CLC	Clear carry flag
CLD	Clear direction flag
CLI	Clear interrupt flag
CMC	Complement carry flag
CMP	Compare operands
CMPSB	Compare bytes in memory
CMPSW	Compare words
CWD	Convert word to doubleword
DAA	Decimal adjust AL after addition	(used with packed binary coded decimal)
DAS	Decimal adjust AL after subtraction
DEC	Decrement by 1
DIV	Unsigned divide
ESC	Used with floating-point unit
HLT	Enter halt state
IDIV	Signed divide
IMUL	Signed multiply
IN	Input from port
INC	Increment by 1
INT	Call to interrupt
INTO	Call to interrupt if overflow
IRET	Return from interrupt
Jcc	Jump if condition	(JA, JAE, JB, JBE, JC, JCXZ, JE, JG, JGE, JL, JLE, JNA, JNAE, JNB, JNBE, JNC, JNE, JNG, JNGE, JNL, JNLE, JNO, JNP, JNS, JNZ, JO, JP, JPE, JPO, JS, JZ)
JMP	Jump
LAHF	Load flags into AH register
LDS	Load pointer using DS
LEA	Load Effective Address
LES	Load ES with pointer
LOCK	Assert BUS LOCK# signal	(for multiprocessing)
LODSB	Load byte
LODSW	Load word
LOOP/LOOPx	Loop control	(LOOPE, LOOPNE, LOOPNZ, LOOPZ)
MOV	Move
MOVSB	Move byte from string to string
MOVSW	Move word from string to string
MUL	Unsigned multiply
NEG	Two's complement negation
NOP	No operation
NOT	Negate the operand, logical NOT
OR	Logical OR
OUT	Output to port
POP	Pop data from stack
POPF	Pop data into flags register
PUSH	Push data onto stack
PUSHF	Push flags onto stack
RCL	Rotate left (with carry)
RCR	Rotate right (with carry)
REPxx	Repeat CMPS/MOVS/SCAS/STOS	(REP, REPE, REPNE, REPNZ, REPZ)
RET	Return from procedure
RETN	Return from near procedure
RETF	Return from far procedure
ROL	Rotate left
ROR	Rotate right
SAHF	Store AH into flags
SAL	Shift Arithmetically left (multiply)
SAR	Shift Arithmetically right (signed divide)
SBB	Subtraction with borrow
SCASB	Compare byte string
SCASW	Compare word string
SHL	Shift left (multiply)
SHR	Shift right (unsigned divide)
STC	Set carry flag
STD	Set direction flag
STI	Set interrupt flag
STOSB	Store byte in string
STOSW	Store word in string
SUB	Subtraction
TEST	Logical compare (AND)
WAIT	Wait until not busy	Waits until BUSY# pin is inactive (used with floating-point unit)
XCHG	Exchange data
XLAT	Table look-up translation
XOR	Exclusive OR

Added with 80186/80188

Instruction	Meaning	Notes
BOUND	Check if r16 or r32 (array index) is within bounds specified by m16&16 or m32&32
ENTER	Create a <nested> stack frame for a procedure
INSB	Input byte from port DX into ES:(E)DI
INSW	Input word from port DX into ES:(E)DI
LEAVE	Set SP to BP, then pop BP or Set ESP to EBP, then pop EBP
OUTSB	Output byte from memory location specified in DS:(E)SI to I/O port specified in DX
OUTSW	Output word from memory location specified in DS:(E)SI to I/O port specified in DX
POPA	Pop DI, SI, BP, BX, DX, CX, and AX
PUSHA	Push AX, CX, DX, BX, original SP, BP, SI, and DI
PUSHW

Added with 80286

Instruction	Meaning	Notes
ARPL	Adjust RPL of r/m16 to not less than RPL of r16
CLTS	Clears TS flag in CR0
LAR	r16 <- r/m16 masked by FF00H or r32 <- r/m32 masked by 00FxFF00H
LGDT	Load m into GDTR
LIDT	Load m into IDTR
LLDT	Load segment selector r/m16 into LDTR
LMSW	Loads r/m16 in machine status word of CR0
LOADALL	LOADALL loads all of the CPU registers, including the "hidden" software-invisible registers. At the completion of a LOADALL instruction, the entire CPU state is defined according to the LOADALL data table.	Undocumented, emulated in BIOS on some computers.
LSL	Load: r16 <- segment limit, selector r/m16 or Load: r32 <- segment limit, selector r/m32
LTR	Load r/m16 into task register
SGDT	Store GDTR to m
SIDT	Store IDTR to m
SLDT	Stores segment selector from LDTR in r/m16 or Store segment selector from LDTR in low-order 16 bits of r/m32
SMSW	Store machine status word to r/m16 or Store machine status word in low-order 16 bits of r32/m16; high-order 16 bits of r32 are undefined
STR	Stores segment selector from TR in r/m16
VERR	Set ZF=1 if segment specified with r/m16 can be read
VERW	Set ZF=1 if segment specified with r/m16 can be written

Added with 80386

Instruction	Meaning	Notes
BSF	Bit scan forward on r/m
BSR	Bit scan reverse on r/m
BT	Bit Test, Store selected bit in CF flag
BTC	Store selected bit in CF flag and complement
BTR	Store selected bit in CF flag and clear
BTS	Store selected bit in CF flag and set
CDQ	EDX:EAX <- sign-extend of EAX
CMPSD	Compares doubleword at address DS:(E)SI with doubleword at address ES:(E)DI and sets the status flags accordingly
CWDE	EAX <- sign-extend of AX
INSD
IRETD	Interrupt return (32-bit operand size)
IRETDF
IRETF
JECXZ	Jump short if ECX register is 0
LFS	Load FS:r16 or FS:r32 with far pointer from memory
LGS	Load GS:r16 or GS:r32 with far pointer from memory
LSS	Load SS:r16 or SS:r32 with far pointer from memory
LODSD	Load doubleword at address DS:(E)SI into EAX
LOOPD
LOOPED
LOOPNED
LOOPNZD
LOOPZD
MOVSD
MOVSX	Move with Sign-Extension
MOVZX	Move with Zero-Extend
OUTSD	Output doubleword from memory location specified in DS:(E)SI to I/O port specified in DX
POPAD	Pop EDI, ESI, EBP, EBX, EDX, ECX, and EAX
POPFD	Pop top of stack into EFLAGS
PUSHAD	Push EAX, ECX, EDX, EBX, original ESP, EBP, ESI, and EDI
PUSHD
PUSHFD
SCASD	Compare EAX with doubleword at ES:(E)DI and set status flags
SETcc	Conditionally set byte.
SHLD	Double Precision Shift Left
SHRD	Double Precision Shift Right
STOSD	Store EAX at address ES:(E)DI

Added with 80486

Instruction	Meaning	Notes
BSWAP	Reverses the byte order of a 32-bit register.
CMPXCHG	Compare and Exchange
CPUID	Returns processor identification and feature information to the EAX, EBX, ECX, and EDX registers, according to the input value entered initially in the EAX register.
INVD	Flush internal caches; initiate flushing of external caches.
INVLPG	Invalidate TLB Entry for page that contains m
RSM	Resume operation of interrupted program
WBINVD	Write back and flush Internal caches; initiate writing-back and flushing of external caches.
XADD	Exchange and Add

Added with Pentium

Instruction	Meaning	Notes
CMPXCHG8B	Compare EDX:EAX with m64. If equal, set ZF and load ECX:EBX into m64. Else, clear ZF and load m64 into EDX:EAX.
RDMSR	Load MSR specified by ECX into EDX:EAX
RDPMC*	Read performance-monitoring counter specified by ECX into EDX:EAX	RDPMC was introduced in the Pentium Pro processor and the Pentium processor with MMX technology.
RDTSC	Read time-stamp counter into EDX:EAX
WRMSR	Write the value in EDX:EAX to MSR specified by ECX

Added with Pentium Pro

Instruction	Meaning	Notes
CMOVcc	Conditional move register to memory or register to register.
SYSENTER	Transition to System Call Entry Point	Equivalent for AMD is SYSCALL
SYSEXIT	Transition from System Call Entry Point	Equivalent for AMD is SYSRET

Added with Pentium III

as part of the SSE branding

Instruction	Meaning	Notes
MASKMOVQ	Selectively write bytes from mm1 to memory location using the byte mask in mm2
MOVNTPS	Move packed single-precision floating-point values from xmm to m128, minimizing pollution in the cache hierarchy.
MOVNTQ	Move quadword from mm to m64, minimizing pollution in the cache hierarchy.
PREFETCH0	Move data specified by address closer to the processor using T0 hint.
PREFETCH1	Move data specified by address closer to the processor using T1 hint.
PREFETCH2	Move data specified by address closer to the processor using T2 hint.
PREFETCHNTA	Move data specified by address closer to the processor using NTA hint.
SFENCE	Serializes store operations.	(for Cacheability and Memory Ordering)

Added with Pentium 4

as part of the SSE2 branding

Instruction	Meaning	Notes
CLFLUSH	Flushes cache line containing m8.
LFENCE	Serializes load operations.
MASKMOVDQU	Selectively write bytes from xmm1 to memory location using the byte mask in xmm2.
MFENCE	Serializes load and store operations.
MOVNTDQ	Move double quadword from xmm to m128, minimizing pollution in the cache hierarchy.
MOVNTI	Move doubleword from r32 to m32, minimizing pollution in the cache hierarchy.
MOVNTPD	Move packed double-precision floating-point values from xmm to m128, minimizing pollution in the cache hierarchy.
PAUSE	Delays execution of next instruction an implementation-specific amount of time.

Added with Pentium 4 supporting SSE3

only on processors supporting Hyper-threading
as part of the SSE3 branding

Instruction	Meaning	Notes
MONITOR	Sets up a linear address range to be monitored by hardware and activates the monitor. The address range should be of a write-back memory caching type.
MWAIT	A hint that allows the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events.

Added with Pentium 4 6x2

VMX is intended to support virtualization of processor hardware and a system software layer acting as a host to multiple guest software environments. The virtual-machine extensions (VMX) includes five instructions that manage the virtual-machine control structure (VMCS) and five instruction that manage VMX operation. Additional details of VMX are described in IA-32 Intel Architecture Software Developer’s Manual, Volume 3B.

Instruction	Meaning	Notes
VMPTRLD	This instruction takes a single 64-bit source operand that is in memory. It makes the referenced VMCS active and current, loading the current-VMCS pointer with this operand and establishes the current VMCS based on the contents of VMCS-data area in the referenced VMCS region.
VMPTRST	This instruction takes a single 64-bit destination operand that is in memory. The current-VMCS pointer is stored into the destination operand.
VMCLEAR	This instruction takes a single 64-bit operand that is in memory. The instruction sets the launch state of the VMCS referenced by the operand to “clear”, renders that VMCS inactive, and ensures that data for the VMCS have been written to the VMCSdata area in the referenced VMCS region.
VMREAD	This instruction reads a component from the VMCS (the encoding of that field is given in a register operand) and stores it into a destination operand that may be a register or in memory.
VMWRITE	This instruction writes a component to the VMCS (the encoding of that field is given in a register operand) from a source operand that may be a register or in memory
VMCALL	This instruction allows a guest in VMX non-root operation to call the VMM for service. A VM exit occurs, transferring control to the VMM.
VMLAUNCH	This instruction launches a virtual machine managed by the VMCS. A VM entry occurs, transferring control to the VM.
VMRESUME	This instruction resumes a virtual machine managed by the VMCS. A VM entry occurs, transferring control to the VM.
VMXOFF	This instruction causes the processor to leave VMX operation.
VMXON	This instruction takes a single 64-bit source operand that is in memory. It causes a logical processor to enter VMX root operation and to use the memory referenced by the operand to support VMX operation.

Added with x86-64

Instruction	Meaning	Notes
CMPXCHG16B	Compare and Exchange Sixteen Bytes

x87 Floating Point Instructions

Original 8087 instructions

Instruction	Meaning	Notes
F2XM1	Replace ST(0) with (2 * ST(0) - 1)
FABS	Replace ST with its absolute value.
FADD	Add m to ST or add ST to ST(i) or ST(i) to ST
FADDP	Add m to ST or add ST to ST(i) or ST(i) to ST, pop the stack.
FBLD	Convert BCD value to real and push onto the FPU stack.
FBSTP	Store ST(0) in m80bcd and pop ST(0).
FCHS	Complements sign of ST(0)
FCLEX	Clear floating-point exception flags after checking for pending unmasked floating-point exceptions.
FCOM	Compare ST(0) with m or ST(i)
FCOMP	Compare ST(0) with m or ST(i), pop the stack.
FCOMPP	Compare ST(0) with ST(1) and pop register stack twice.
FDECSTP	Decrement TOP field in FPU status word.
FDISI	Sets the interrupt enable mask in the control word.
FDIV	Divide ST(0) by m and store result in ST(0), Divide ST(0) by ST(i) and store result in ST(0), Divide ST(i) by ST(0) and store result in ST(i)
FDIVP	Divide ST(i) by ST(0), store result in ST(i), and pop the register stack, Divide ST(1) by ST(0), store result in ST(1), and pop the register stack.
FDIVR	Reverse Divide
FDIVRP	Divide ST(0) by ST(i), store result in ST(i), and pop the register stack
FENI	Sets tag for ST(i) to empty.
FFREE	Sets tag for ST(i) to empty.
FIADD	Integer add m to ST(0) and store result in ST(0).
FICOM	Integer compare ST(0) with m.
FICOMP	Integer compare ST(0) with m, pop stack .
FIDIV	Integer divide ST(0) by m and store result in ST(0)
FIDIVR	Integer divide m by ST(0) and store result in ST(0)
FILD	Integer push m onto the FPU register stack.
FIMUL	Integer multiply ST(0) by m and store result in ST(0)
FINCSTP	Increment the TOP field in the FPU status register
FINIT	Initialize FPU after checking for pending unmasked floating-point exceptions.
FIST	Integer store ST(0) in m.
FISTP	Integer store ST(0) in m, pop the stack.
FISUB	Integer subtract m from ST(0) and store result in ST(0)
FISUBR	Integer subtract ST(0) from m and store result in ST(0)
FLD	Push m or ST(i) onto the FPU register stack.
FLD1	Push +1.0 onto the FPU register stack.
FLDCW	Load FPU control word from m2byte.
FLDENV	Load FPU environment from m14byte or m28byte.
FLDENVW
FLDL2E	Push log_{2</sup>e onto the FPU register stack.}
FLDL2T	Push log₂10 onto the FPU register stack.
FLDLG2	Push log₁₀2 onto the FPU register stack.
FLDLN2	Push log_e2 onto the FPU register stack.
FLDPI	Push p onto the FPU register stack.
FLDZ	Push +0.0 onto the FPU register stack.
FMUL	Multiply ST(0) by m or ST(i) and store result in ST(0),
FMULP	Multiply ST(0) by m or ST(i) and store result in ST(0), pop the stack,
FNCLEX	Clear floating-point exception flags without checking for pending unmasked floating-point exceptions.
FNDISI	Sets the interrupt enable mask in the control word.
FNENI	Clears the interrupt enable mask in the control word.
FNINIT	Initialize FPU without checking for pending unmasked floating-point exceptions.
FNOP	No operation is performed.
FNSAVE	Store FPU environment to m94byte or m108byte without checking for pending unmasked floating-point exceptions. Then re-initialize the FPU.
FNSAVEW
FNSTCW	Store FPU control word to m2byte without checking for pending unmasked floating-point exceptions.
FNSTENV	Store FPU environment to m14byte or m28byte without checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions.
FNSTENVW
FNSTSW	Store FPU status word at m2byte or in AX without checking for pending unmasked floating-point exceptions.
FPATAN	Replace ST(1) with arctan(ST(1)/ST(0)) and pop the register stack
FPREM	Replace ST(0) with the remainder obtained from dividing ST(0) by ST(1)
FPTAN	Replace ST(0) with its tangent and push 1 onto the FPU stack.
FRNDINT	Round ST(0) to an integer.
FRSTOR	Load FPU state from m94byte or m108byte.
FRSTORW
FSAVE	Store FPU state to m94byte or m108byte after checking for pending unmasked floating-point exceptions. Then re-initialize the FPU.
FSAVEW
FSCALE	Scale ST(0) by ST(1).
FSQRT	Computes square root of ST(0) and stores the result in ST(0)
FST	Copy ST(0) to m or ST(i)
FSTCW	Store FPU control word to m2byte after checking for pending unmasked floating-point exceptions.
FSTENV	Store FPU environment to m14byte or m28byte after checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions.
FSTENVW
FSTP	Copy ST(0) to m or ST(i) and pop register stack.
FSTSW	Store FPU status word at m2byte or in AX after checking for pending unmasked floating-point exceptions.
FSUB	Subtract m from ST(0) and store result in ST(0) or Subtract ST(i) from ST(0) and store result in ST(0) or Subtract ST(0) from ST(i) and store result in ST(i)
FSUBP	Subtract ST(0) from ST(i), store result in ST(i), and pop register stack
FSUBR	Subtract ST(0) from m32real or ST(i) and store result in ST(0) or Subtract ST(i) from ST(0) and store result in ST(i)
FSUBRP	Subtract ST(i) from ST(0), store result in ST(i), and pop register stack
FTST	Compare ST(0) with 0.0.
FWAIT	Check pending unmasked floating-point exceptions.
FXAM	Classify value or number in ST(0)
FXCH	Exchange the contents of ST(0) and ST(i)
FXTRACT	Separate value in ST(0) into exponent and significand, store exponent in ST(0), and push the significand onto the register stack.
FYL2X	Replace ST(1) with (ST(1) * log2ST(0)) and pop the register stack
FYL2XP1	Replace ST(1) with ST(1) * log 2 (ST(0) + 1.0) and pop the register stack

Added with 80287

Instruction	Meaning	Notes
FSETPM	Set Protected Mode	Only used on 80287

Added with 80387

Instruction	Meaning	Notes
FCOS	Replace ST(0) with its cosine
FLDENVD	Load FPU environment from m28byte.
FNSAVED	Store FPU environment to m94byte or m108byte without checking for pending unmasked floating-point exceptions. Then re-initialize the FPU.
FNSTENVD	Store FPU environment to m14byte or m28byte without checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions.
FPREM1	Replace ST(0) with the IEEE remainder obtained from dividing ST(0) by ST(1)
FRSTORD	Load FPU state from m94byte or m108byte.
FSAVED	Store FPU state to m94byte or m108byte after checking for pending unmasked floating-point exceptions. Then re-initialize the FPU.
FSIN	Replace ST(0) with its sine.
FSINCOS	Compute the sine and cosine of ST(0); replace ST(0) with the sine, and push the cosine onto the register stack.
FSTENVD	Store FPU environment to m14byte or m28byte after checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions.
FUCOM	Performs an unordered comparison of the contents of register ST(0) and ST(i) and sets condition code flags C0, C2, and C3 in the FPU status word according to the results.
FUCOMP	Performs an unordered comparison of the contents of register ST(0) and ST(i) and sets condition code flags C0, C2, and C3 in the FPU status word according to the results and pops the stack.
FUCOMPP	Performs an unordered comparison of the contents of register ST(0) and ST(i) and sets condition code flags C0, C2, and C3 in the FPU status word according to the results and pops the stack twice.

Added with Pentium Pro

Instruction	Meaning	Notes
FCMOVB	Move if below (CF=1)
FCMOVBE	Move if below or equal (CF=1 or ZF=1)
FCMOVE	Move if equal (ZF=1)
FCMOVNB	Move if not below (CF=0)
FCMOVNBE	Move if not below or equal (CF=0 and ZF=0)
FCMOVNE	Move if not equal (ZF=0)
FCMOVNU	Move if not unordered (PF=0)
FCMOVU	Move if unordered (PF=1)
FCOMI	Compare ST(0) with ST(i) and set status flags accordingly
FCOMIP	Compare ST(0) with ST(i), set status flags accordingly, and pop register stack
FUCOMI	Compare ST(0) with ST(i), check for ordered values, and set status flags accordingly
FUCOMIP	Compare ST(0) with ST(i), check for ordered values, set status flags accordingly, and pop register stack
FXRSTOR	Loads x87 FPU, MMX™ technology, Streaming SIMD Extensions, and Streaming SIMD Extensions 2 state from m512byte.
FXSAVE	x87 FPU, MMX™ technology, Streaming SIMD Extensions, and Streaming SIMD Extensions 2 state to m512byte.

Added with Pentium 4 supporting SSE3

as part of the SSE3 branding

FISTTP Store ST as a signed integer (truncate) in m and pop ST

SIMD Instructions

MMX instructions

added with Pentium MMX

Instruction	Meaning	Notes
EMMS	Empty MMX state	This instruction must be executed before executing floating-point instructions.
MOVD	Move doubleword
MOVQ	Move quadword
PACKSSDW	Pack doublewords into words (signed with saturation)
PACKSSWB	Pack words into bytes (signed with saturation)
PACKUSWB	Pack words into bytes (unsigned with saturation)
PADDB	Add with wrap-around on byte
PADDD	Add with wrap-around on doubleword
PADDSB	Add signed with saturation on byte
PADDSW	Add signed with saturation on word
PADDUSB	Add unsigned with saturation on byte
PADDUSW	Add unsigned with saturation on word
PADDW	Add with wrap-around on word
PAND	Bitwise AND
PANDN	Bitwise AND NOT
PCMPEQB	Packed compare for equality byte
PCMPEQD	Packed compare for equality double word
PCMPEQW	Packed compare for equality word
PCMPGTB	Packed compare greater than byte
PCMPGTD	Packed compare greater than double word
PCMPGTW	Packed compare greater than word
PMADDWD	Packed multiply on words and add resulting pairs
PMULHW	Packed multiply high on words
PMULLW	Packed multiply low on words
POR	Bitwise OR
PSLLD	Packed shift left logical doubleword by amount specified in MMX register or by immediate value
PSLLQ	Packed shift left logical quadword by amount specified in MMX register or by immediate value
PSLLW	Packed shift left logical word by amount specified in MMX register or by immediate value
PSRAD	Packed shift right arithmetic doubleword by amount specified in MMX register or by immediate value
PSRAW	Packed shift right arithmetic word by amount specified in MMX register or by immediate value
PSRLD	Packed shift right logical doubleword by amount specified in MMX register or by immediate value
PSRLQ	Packed shift right logical quadword by amount specified in MMX register or by immediate value
PSRLW	Packed shift right logical word by amount specified in MMX register or by immediate value
PSUBB	Subtract with wrap-around on byte
PSUBD	Subtract with wrap-around on doubleword
PSUBSB	Subtract signed with saturation on byte
PSUBSW	Subtract signed with saturation on word
PSUBUSB	Subtract unsigned with saturation on byte
PSUBUSW	Subtract unsigned with saturation on word
PSUBW	Subtract with wrap-around on word
PUNPCKHBW	Unpack (interleave) high-order bytes from MMX register
PUNPCKHDQ	Unpack (interleave) high-order doublewords from MMX register
PUNPCKHWD	Unpack (interleave) high-order words from MMX register
PUNPCKLBW	Unpack (interleave) low-order bytes from MMX register
PUNPCKLDQ	Unpack (interleave) low-order doublewords from MMX register
PUNPCKLWD	Unpack (interleave) low-order words from MMX register
PXOR	Bitwise EOR

Extendend MMX|MMX+ instructions

added with 6x86MX from Cyrix; supported on other CPUs too, i.e. Extended MMX on Athlon 64

3DNow! instructions

added with K6-2

Instruction	Meaning	Notes
FEMMS	Faster Enter/Exit of the MMX or floating-point state.
PAVGUSB	Average of unsigned packed 8-bit values.
PF2ID	Converts packed floating-point operand to packed 32-bit integer.
PFACC	Floating-point accumulate.
PFADD	Packed, floating-point addition.
PFCMPEQ	Packed floating-point comparison, equal to.
PFCMPGE	Packed floating-point comparison, greater than or equal to.
PFCMPGT	Packed floating-point comparison, greater than.
PFMAX	Packed floating-point maximum.
PFMIN	Packed floating-point minimum.
PFMUL	Packed floating-point multiplication.
PFRCP	Floating-point reciprocal approximation.
PFRCPIT1	Packed floating-point reciprocal, first iteration step.
PFRCPIT2	Packed floating-point reciprocal/reciprocal square root, second iteration step.
PFRSQIT1	Packed floating-point reciprocal square root, first iteration step.
PFRSQRT	Floating-point reciprocal square root approximation.
PFSUB	Packed floating-point subtraction.
PFSUBR	Packed floating-point reverse subtraction.
PI2FD	Packed 32-bit integer to floating-point conversion.
PMULHRW	Multiply signed packed 16-bit values with rounding and store the high 16 bits.
PREFETCH	Prefetch processor cache line into L1 data cache (Dcache).
PREFETCHW	Prefetch processor cache line into L1 data cache (Dcache).

3DNow!+ instructions

added with Athlon

Instruction	Meaning	Notes
PF2IW	Packed Floating-Point to Integer Word Conversion with Sign Extend
PFNACC	Packed Floating-Point Negative Accumulate
PFPNACC	Packed Floating-Point Mixed Positive-Negative Accumulate
PI2FW	Packed Integer Word to Floating-Point Conversion
PSWAPD	Packed Swap Doubleword

Streaming SIMD Extensions|SSE instructions

added with Pentium III
also see integer instruction added with Pentium III

SSE SIMD Floating-Point Instructions

Instruction	Meaning	Notes
ADDPS	Add packed single-precision floating-point values from xmm2/m128 to xmm1.
ADDSS	Add the low single-precision floating-point value from xmm2/m32 to xmm1.
ANDNPS	Bitwise logical AND NOT of xmm2/m128 and xmm1.
ANDPS	Bitwise logical AND of xmm2/m128 and xmm1.
CMPPS	Compare packed single-precision floating-point values from xmm2/mem with packed single-precision floating-point values in xmm1 register using imm8 as comparison predicate.
CMPSS	Compare low single-precision floating-point value from xmm2/m32 with low single-precision floating-point value in xmm1 register using imm8 as comparison predicate.
COMISS	Compare low single-precision floating-point values in xmm1 and xmm2/mem32 and set the EFLAGS flags accordingly.
CVTPI2PS	Convert two signed doubleword integers from mm/m64 to two single-precision floating-point values in xmm..
CVTPS2PI	Convert two single-precision floating-point values from xmm/m64 to two signed doubleword signed integers in mm.
CVTSI2SS	Convert one signed doubleword integer from r/m32 to one single-precision floating-point number in xmm.
CVTSS2SI	Convert one single-precision floating-point number from xmm/m32 to one signed doubleword integer in r32.
CVTTPS2PI	Convert two single-precision floating-point values from xmm/m64 to two signed doubleword signed integers in mm using truncation.
CVTTSS2SI	Convert one single-precision floating-point number from xmm/m32 to one signed doubleword integer r32 using truncation.
DIVPS	Divide packed single-precision floating-point values in xmm1 by packed single-precision floating-point values xmm2/m128.
DIVSS	Divide low single-precision floating-point value in xmm1 by low single-precision floating-point value in xmm2/m32
LDMXCSR	Load Streaming SIMD Extension control/status word from m32.
MAXPS	Return the maximum single-precision floating-point values between xmm2/m128 and xmm1.
MAXSS	Return the maximum scalar single-precision floating-point value between xmm2/mem32 and xmm1.
MINPS	Return the minimum single-precision floating-point values between xmm2/m128 and xmm1.
MINSS	Return the minimum scalar single-precision floating-point value between xmm2/mem32 and xmm1.
MOVAPS	Move packed single-precision floating-point numbers from/to xmm2/m128 to/from xmm1.
MOVHLPS	Move two packed single-precision floating-point values from high quadword of xmm2 to low quadword of xmm1.
MOVHPS	Move two packed single-precision floating-point values from/to m64 to/from high quadword of xmm.
MOVLHPS	Move two packed single-precision floating-point values from low quadword of xmm2 to high quadword of xmm1.
MOVLPS	Move two packed single-precision floating-point values from/to m64 to/from low quadword of xmm.
MOVMSKPS	Extract 4-bit sign mask of from xmm and store in r32.
MOVNTPS	Move packed single-precision floating-point values from xmm to m128, minimizing pollution in the cache hierarchy.
MOVSS	Move scalar single-precision floating-point value from/to xmm2/m64 to/from xmm1 register.
MOVUPS	Move packed single-precision floating-point numbers from/to xmm2/m128 to/from xmm1.
MULPS	Multiply packed single-precision floating-point values in xmm2/mem by xmm1.
MULSS	Multiply the low single-precision floating-point value in xmm2/mem by the low single-precision floating-point value in xmm1.
ORPS	Bitwise OR of xmm2/m128 and xmm1
RCPPS	Returns to xmm1 the packed approximations of the reciprocals of the packed single-precision floating-point values in xmm2/m128.
RCPSS	Returns to xmm1 the packed approximation of the reciprocal of the low single-precision floating-point value in xmm2/m32.
RSQRTPS	Returns to xmm1 the packed approximations of the reciprocals of the square roots of the packed single-precision floating-point values in xmm2/m128.
RSQRTSS	Returns to xmm1 an approximation of the reciprocal of the square root of the low single-precision floating-point value in xmm2/m32.
SHUFPS	Shuffle packed single-precision floating-point values selected by imm8 from xmm1 and xmm1/m128 to xmm1.
SQRTPS	Computes square roots of the packed single-precision floating-point values in xmm2/m128 and stores the results in xmm1.
SQRTSS	Computes square root of the low single-precision floating-point value in xmm2/m32 and stores the results in xmm1.
STMXCSR	Store Multimedia Extended Control Status Register.
SUBPS	Subtract packed single-precision floating-point values in xmm2/mem from xmm1.
SUBSS	Subtract the lower single-precision floating-point numbers in xmm2/m32 from xmm1.
UCOMISS	Compare lower single-precision floating-point number in xmm1 register with lower single-precision floating-point number in xmm2/mem and set the status flags accordingly.
UNPCKHPS	Interleaves single-precision floating-point values from the high quadwords of xmm1 and xmm2/mem into xmm1.
UNPCKLPS	Interleaves single-precision floating-point values from the low quadwords of xmm1 and xmm2/mem into xmm1.
XORPS	Bitwise exclusive-OR of xmm2/m128 and xmm1.

SSE SIMD Integer Instructions

Instruction	Meaning	Notes
PAVGB	Average packed unsigned byte integers from xmm2/m128 and xmm1, with rounding.
PAVGW	Average packed unsigned word integers from xmm2/m128 and xmm1, with rounding.
PEXTRW	Extract the word specified by imm8 from xmm and move it to a r32.
PINSRW	Move the low word of r32 or from m16 into xmm at the word position specified by imm8.
PMAXSW	Compare signed word integers in xmm2/m128 and xmm1 for maximum values.
PMAXUB	Compare unsigned byte integers in xmm2/m128 and xmm1 for maximum values.
PMINSW	Compare signed word integers in xmm2/m128 and xmm1 for minimum values.
PMINUB	Compare unsigned byte integers in xmm2/m128 and xmm1 for minimum values.
PMOVMSKB	Move the byte mask of xmm to r32.
PSADBW	Absolute difference of packed unsigned byte integers from xmm2 /m128 and xmm1; the 8 low differences and 8 high differences are then summed separately to produce two word integer results.
PSHUFW	Shuffle the words in mm2/m64 based on the encoding in imm8 and store the result in mm1.

SSE2 instructions

added with Pentium 4
also see integer instructions added with Pentium 4

SSE2 SIMD Floating-Point Instructions

Instruction	Meaning	Notes
ADDPD	Add packed double-precision floating-point values from xmm2/m128 to xmm1.
ADDSD	Add the low double-precision floating-point value from xmm2/m64 to xmm1.
ANDNPD	Bitwise logical AND NOT of xmm2/m128 and xmm1.
ANDPD	Bitwise logical AND of xmm2/m128 and xmm1.
CMPPD	Compare packed double-precision floating-point numbers from xmm2/m128 with packed double-precision floating-point numbers in xmm1, using imm8 as comparison predicate.
CMPSD*	Compare low double-precision floating-point value from xmm2/m64 with low double-precision floating-point value in xmm1 register using imm8 as comparison predicate.
COMISD	Compare low double-precision floating-point values in xmm1 and xmm2/mem64 and set the EFLAGS flags accordingly.
CVTDQ2PD	Convert two packed signed doubleword integers from xmm2/m128 to two packed double-precision floating-point values in xmm1.
CVTDQ2PS	Convert four packed signed doubleword integers from xmm2/m128 to four packed single-precision floating-point values in xmm1.
CVTPD2DQ	Convert two packed double-precision floating-point values from xmm2/m128 to two packed signed doubleword integers in xmm1.
CVTPD2PI	Convert two packer double-precision floating-point numbers from xmm/m128 to two packed signed doubleword integers in mm.
CVTPD2PS	Convert two double-precision floating-point values in xmm2/m128 to two single-precision floating-point values in xmm1.
CVTPI2PD	Convert two signed doubleword integers from mm/mem64 to two double-precision floating-point values in xmm.
CVTPS2DQ	Convert four packed single-precision floating-point values from xmm2/m128 to four packed signed doubleword integers in xmm1.
CVTPS2PD	Convert two packed single-precision floating-point values in xmm2/m64 to two packed double-precision floating-point values in xmm1.
CVTSD2SI	Convert one double-precision floating-point number from xmm/m64 to one signed doubleword integer r32.
CVTSD2SS	Convert one double-precision floating-point value in xmm2/m64 to one single-precision floating-point value in xmm1.
CVTSI2SD	Convert one signed doubleword integer from r/m32 to one double-precision floating-point value in xmm.
CVTSS2SD	Convert one single-precision floating-point value in xmm2/m32 to one double-precision floating-point value in xmm1.
CVTTPD2DQ	Convert two packed double-precision floating-point values from xmm2/m128 to two packed signed doubleword integers in xmm1 using truncation.
CVTTPD2PI	Convert two packer double-precision floating-point numbers from xmm/m128 to two packed signed doubleword integers in mm using truncation.
CVTPS2DQ	Convert four packed single-precision floating-point values from xmm2/m128 to four packed signed doubleword integers in xmm1.
CVTTSD2SI	Convert one double-precision floating-point number from xmm/m64 to one signed doubleword integer r32 using truncation.
DIVPD	Divide packed double-precision floating-point values in xmm1 by packed double-precision floating-point values xmm2/m128.
DIVSD	Divide low double-precision floating-point value n xmm1 by low double-precision floating-point value in xmm2/mem64.
MAXPD	Return the maximum double-precision floating-point values between xmm2/m128 and xmm1.
MAXSD	Return the maximum scalar double-precision floating-point value between xmm2/mem64 and xmm1.
MINPD	Return the minimum double-precision floating-point values between xmm2/m128 and xmm1.
MINSD	Return the minimum scalar double-precision floating-point value between xmm2/mem64 and xmm1.
MOVAPD	Move Aligned Packed Double-Precision Floating-Point Values.
MOVHPD	Move High Packed Double-Precision Floating-Point Value.
MOVLPD	Move Low Packed Double-Precision Floating-Point Value.
MOVMSKPD	Extract 2-bit sign mask of from xmm and store in r32.
MOVSD*	Move Scalar Double-Precision Floating-Point Value
MOVUPD	Move Unaligned Packed Double-Precision Floating-Point Values.
MULPD	Multiply packed double-precision floating-point values in xmm2/m128 by xmm1.
MULSD	Multiply the low double-precision floating-point value in xmm2/mem64 by low double-precision floating-point value in xmm1.
ORPD	Bitwise OR of xmm2/m128 and xmm1.
SHUFPD	Shuffle packed double-precision floating-point values selected by imm8 from xmm1 and xmm1/m128 to xmm1.
SQRTPD	Computes square roots of the packed double-precision floating-point values in xmm2/m128 and stores the results in xmm1.
SQRTSD	Computes square root of the low double-precision floating-point value in xmm2/m64 and stores the results in xmm1.
SUBPD	Subtract packed double-precision floating-point values in xmm2/m128 from xmm1.
SUBSD	Subtracts the low double-precision floating-point numbers in xmm2/mem64 from xmm1.
UCOMISD	Compares (unordered) the low double-precision floating-point values in xmm1 and xmm2/m64 and set the EFLAGS accordingly.
UNPCKHPD	Interleaves double-precision floating-point values from the high quadwords of xmm1 and xmm2/m128.
UNPCKLPD	Interleaves double-precision floating-point values from the low quadwords of xmm1 and xmm2/m128.
XORPD	Bitwise exclusive-OR of xmm2/m128 and xmm1

CMPSD and MOVSD have the same name as the string instruction mnemonics CMPSD (CMPS) and MOVSD (MOVS), however, the former refer to scalar Double precision|double-precision Floating point whereas the latters refer to Integer strings.

SSE2 SIMD Integer Instructions

Instruction	Meaning	Notes
MOVDQ2Q	Move low quadword from xmm to mmx register .
MOVDQA	Move aligned double quadword from/to xmm2/m128 to/from xmm1.
MOVDQU	Move unaligned double quadword from/to xmm2/m128 to/from xmm1.
MOVQ2DQ	Move quadword from mmx to low quadword of xmm.
PADDQ	Add packed quadword integers xmm2/m128 to xmm1
PMULUDQ	Multiply packed unsigned doubleword integers in xmm1 by packed unsigned doubleword integers in xmm2/m128, and store the quadword results in xmm1.
PSHUFHW	Shuffle the high words in xmm2/m128 based on the encoding in imm8 and store the result in xmm1.
PSHUFLW	Shuffle the low words in xmm2/m128 based on the encoding in imm8 and store the result in xmm1.
PSHUFD	Shuffle the doublewords in xmm2/m128 based on the encoding in imm8 and store the result in xmm1.
PSLLDQ	Shift left xmm1 by imm8 bytes, clearing low-order bits.
PSRLDQ	Shift right xmm1 by imm8, clearing high-order bits.
PUNPCKHQDQ	Interleave doublewords from the high quadwords of xmm1 and xmm2/m128 into xmm1.
PUNPCKLQDQ	Interleave low quadwords of xmm1 and xmm2/m128 into xmm1 register.

SSE3 instructions

added with Pentium 4 SSE3
also see integer and floating-point instructions added with Pentium 4 SSE3

SSE3 SIMD Floating-Point Instructions

Instruction	Meaning	Notes
ADDSUBPD	Add/Subtract packed DP FP numbers from XMM2/Mem to XMM1.
ADDSUBPS	Add/Subtract packed SP FP numbers from XMM2/Mem to XMM1.
HADDPD	Add horizontally packed DP FP numbers from XMM2/Mem to XMM1.
HADDPS	Add horizontally packed SP FP numbers from XMM2/Mem to XMM1.
HSUBPD	Subtract horizontally packed DP FP numbers in XMM2/Mem from XMM1.
HSUBPS	Subtract horizontally packed SP FP numbers in XMM2/Mem from XMM1.

SSE3 SIMD Integer Instructions

Instruction	Meaning	Notes
MOVDDUP	Move 64 bits representing the lower DP data element from XMM2/Mem to XMM1 register and duplicate.
MOVSHDUP	Move 128 bits representing packed SP data elements from XMM2/Mem to XMM1 register and duplicate high.
MOVSLDUP	Move 128 bits representing packed SP data elements from XMM2/Mem to XMM1 register and duplicate low.
LDDQU	Load 128 bits from Mem to XMM register.

SSSE3 instructions

added with Xeon 5100 series and Core 2

Instruction	Meaning	Notes
PSIGNW	Negate the elements of a register of bytes, words or dwords if the sign of the corresponding elements of another register is negative.
PSIGND	Negate the elements of a register of bytes, words or dwords if the sign of the corresponding elements of another register is negative.
PSIGNB	Negate the elements of a register of bytes, words or dwords if the sign of the corresponding elements of another register is negative.
PSHUFB	Packed Shuffle Bytes.
PMULHRSW	Packed Multiply High with Round and Scale.
PMADDUBSW	Multiply and Add Packed Signed and Unsigned Bytes.
PHSUBW	Packed Horizontal Subtract Word.
PHSUBSW	Packed Horizontal Subtract and Saturate Words.
PHSUBD	Packed Horizontal Subtract Doubleword.
PHADDW	Packed Horizontal Add Word.
PHADDSW	Packed Horizontal Add and Saturate Words.
PHADDD	Packed Horizontal Add Doubleword.
PALIGNR	Packed Align Right.
PABSW	Packed Absolute Value Word.
PABSD	Packed Absolute Value Doubleword.
PABSB	Packed Absolute Value Byte.

External links

8086/80186/80286/80386/80486 Instruction Set
Free IA-32 documentation, provided by Intel
Netwide Assembler x86 Instruction Reference

Instruction set: x86

Contents

Registers

IA-32 Architecture

x86 Integer Instructions

Original 8086/8088 instructions

Added with 80186/80188

Added with 80286

Added with 80386

Added with 80486

Added with Pentium

Added with Pentium Pro

Added with Pentium III

Added with Pentium 4

Added with Pentium 4 supporting SSE3

Added with Pentium 4 6x2

Added with x86-64

x87 Floating Point Instructions

Original 8087 instructions

Added with 80287

Added with 80387

Added with Pentium Pro

Added with Pentium 4 supporting SSE3

SIMD Instructions

MMX instructions

Extendend MMX|MMX+ instructions

3DNow! instructions

3DNow!+ instructions

Streaming SIMD Extensions|SSE instructions

SSE SIMD Floating-Point Instructions

SSE SIMD Integer Instructions

SSE2 instructions

SSE2 SIMD Floating-Point Instructions

SSE2 SIMD Integer Instructions

SSE3 instructions

SSE3 SIMD Floating-Point Instructions

SSE3 SIMD Integer Instructions

SSSE3 instructions

External links

Navigation menu

Search