Difference between revisions of "Instruction set: x86"
(New page: ==Registers== ===IA-32 Architecture=== General Purpose integer (32 bit) EAX EBX ECX EDX ESI EDI EBP ESP Floating point registers (80 bit) st(0)-st(7) (8 level stack) MMX integer r...) |
|||
Line 41: | Line 41: | ||
===Original 8086/8088 instructions=== | ===Original 8086/8088 instructions=== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 207: | Line 207: | ||
===Added with 80186/80188=== | ===Added with 80186/80188=== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 233: | Line 233: | ||
===Added with 80286=== | ===Added with 80286=== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 273: | Line 273: | ||
===Added with 80386=== | ===Added with 80386=== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 353: | Line 353: | ||
===Added with 80486=== | ===Added with 80486=== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 375: | Line 375: | ||
===Added with Pentium=== | ===Added with Pentium=== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 393: | Line 393: | ||
===Added with Pentium Pro=== | ===Added with Pentium Pro=== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 429: | Line 429: | ||
===Added with Pentium 4=== | ===Added with Pentium 4=== | ||
''as part of the SSE2 branding'' | ''as part of the SSE2 branding'' | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 453: | Line 453: | ||
''only on processors supporting Hyper-threading''<br> | ''only on processors supporting Hyper-threading''<br> | ||
''as part of the SSE3 branding''<br> | ''as part of the SSE3 branding''<br> | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 465: | Line 465: | ||
VMX is intended to support virtualization of processor hardware and a system software layer acting as a host to multiple guest software environments. The virtual-machine extensions (VMX) includes five instructions that manage the virtual-machine control structure (VMCS) and five instruction that manage VMX operation. Additional details of VMX are described in IA-32 Intel Architecture Software Developer’s Manual, Volume 3B. | VMX is intended to support virtualization of processor hardware and a system software layer acting as a host to multiple guest software environments. The virtual-machine extensions (VMX) includes five instructions that manage the virtual-machine control structure (VMCS) and five instruction that manage VMX operation. Additional details of VMX are described in IA-32 Intel Architecture Software Developer’s Manual, Volume 3B. | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 491: | Line 491: | ||
===Added with x86-64=== | ===Added with x86-64=== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 500: | Line 500: | ||
==x87 [[Floating Point]] Instructions== | ==x87 [[Floating Point]] Instructions== | ||
===Original 8087 instructions=== | ===Original 8087 instructions=== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 672: | Line 672: | ||
===Added with 80287=== | ===Added with 80287=== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 681: | Line 681: | ||
===Added with 80387=== | ===Added with 80387=== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 713: | Line 713: | ||
===Added with Pentium Pro=== | ===Added with Pentium Pro=== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 754: | Line 754: | ||
===MMX instructions=== | ===MMX instructions=== | ||
''added with Pentium MMX''<br> | ''added with Pentium MMX''<br> | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 858: | Line 858: | ||
===3DNow! instructions=== | ===3DNow! instructions=== | ||
''added with K6-2''<br> | ''added with K6-2''<br> | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 909: | Line 909: | ||
===3DNow!+ instructions=== | ===3DNow!+ instructions=== | ||
''added with Athlon''<br> | ''added with Athlon''<br> | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 929: | Line 929: | ||
====SSE SIMD Floating-Point Instructions==== | ====SSE SIMD Floating-Point Instructions==== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 1,027: | Line 1,027: | ||
====SSE SIMD Integer Instructions==== | ====SSE SIMD Integer Instructions==== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 1,059: | Line 1,059: | ||
====SSE2 SIMD Floating-Point Instructions==== | ====SSE2 SIMD Floating-Point Instructions==== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 1,161: | Line 1,161: | ||
====SSE2 SIMD Integer Instructions==== | ====SSE2 SIMD Integer Instructions==== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 1,197: | Line 1,197: | ||
====SSE3 SIMD Floating-Point Instructions==== | ====SSE3 SIMD Floating-Point Instructions==== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 1,215: | Line 1,215: | ||
====SSE3 SIMD Integer Instructions==== | ====SSE3 SIMD Integer Instructions==== | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes | ||
Line 1,230: | Line 1,230: | ||
===SSSE3 instructions=== | ===SSSE3 instructions=== | ||
''added with Xeon 5100 series and Core 2''<br> | ''added with Xeon 5100 series and Core 2''<br> | ||
− | {| | + | {| class="wikitable" |
|- | |- | ||
! Instruction !! Meaning !! Notes | ! Instruction !! Meaning !! Notes |
Revision as of 08:58, 17 February 2007
Contents
- 1 Registers
- 2 x86 Integer Instructions
- 2.1 Original 8086/8088 instructions
- 2.2 Added with 80186/80188
- 2.3 Added with 80286
- 2.4 Added with 80386
- 2.5 Added with 80486
- 2.6 Added with Pentium
- 2.7 Added with Pentium Pro
- 2.8 Added with Pentium III
- 2.9 Added with Pentium 4
- 2.10 Added with Pentium 4 supporting SSE3
- 2.11 Added with Pentium 4 6x2
- 2.12 Added with x86-64
- 3 x87 Floating Point Instructions
- 4 SIMD Instructions
- 5 External links
Registers
IA-32 Architecture
General Purpose integer (32 bit) EAX EBX ECX EDX ESI EDI EBP ESP Floating point registers (80 bit) st(0)-st(7) (8 level stack) MMX integer registers (64 bit) MM0 - MM7 (shares bits with the FPU stack) SSE floating point (128 bit) XMM0 - XMM7 Segment Registers (16 bit) CS DS SS ES FS GS Status and control (32 bit) EFLAGS EIP MXCSR EAX - Accumulator for operands and results data. EBX - Pointer to data in the DS segment. ECX - Counter for string and loop operations. EDX - I/O pointer. ESI - Pointer to data in the segment pointed to by the DS register; source pointer for string operations. EDI - Pointer to data (or destination) in the segment pointed to by the ES register; destination pointer for string operations. ESP - Stack pointer (in the SS segment). EBP - Pointer to data on the stack (in the SS segment). AX BX CX DX - 16 bit register sharing bits with the 32 bit registers*. AH AL BH BL CH CL DH DL - 8 bit registers sharing bits with the 16 bit registers*. 31 24 23 16 15 8 7 0 00000000 00000000 00000000 00000000 <--------- EAX EBX ECX EDX ----------> <-- AX BX CX DX -> <- AH -> <- AL -> <- BH -> <- BL -> <- CH -> <- CL -> <- DH -> <- DL -> <--------- ESI EDI EBP ESP ----------> <-- SI DI BP SP -> *Using the 16 and 8 bit registers can be very slow on some processors and should be avoided if possible.
x86 Integer Instructions
This is the full 8086-8088 instruction set, but most, if not all of these instructions are available in 32-bit mode, they just operate on 32 bit registers (eax, ebx, etc) and values instead of their 16-bit (ax, bx, etc) counterparts. See also x86 assembly language for a quick tutorial for this chip.
Original 8086/8088 instructions
Instruction | Meaning | Notes |
---|---|---|
AAA | ASCII adjust AL after addition | used with unpacked binary-coded decimal |
AAD | ASCII adjust AX before division | buggy in the original instruction set, but "fixed" in the NEC V20, causing a number of incompatibilites |
AAM | ASCII adjust AX after multiplication | |
AAS | ASCII adjust AL after subtraction | |
ADC | Add with carry | |
ADD | Add | |
AND | Logical AND | |
CALL | Call procedure | |
CBW | Convert byte to word | |
CLC | Clear carry flag | |
CLD | Clear direction flag | |
CLI | Clear interrupt flag | |
CMC | Complement carry flag | |
CMP | Compare operands | |
CMPSB | Compare bytes in memory | |
CMPSW | Compare words | |
CWD | Convert word to doubleword | |
DAA | Decimal adjust AL after addition | (used with packed binary coded decimal) |
DAS | Decimal adjust AL after subtraction | |
DEC | Decrement by 1 | |
DIV | Unsigned divide | |
ESC | Used with floating-point unit | |
HLT | Enter halt state | |
IDIV | Signed divide | |
IMUL | Signed multiply | |
IN | Input from port | |
INC | Increment by 1 | |
INT | Call to interrupt | |
INTO | Call to interrupt if overflow | |
IRET | Return from interrupt | |
Jcc | Jump if condition | (JA, JAE, JB, JBE, JC, JCXZ, JE, JG, JGE, JL, JLE, JNA, JNAE, JNB, JNBE, JNC, JNE, JNG, JNGE, JNL, JNLE, JNO, JNP, JNS, JNZ, JO, JP, JPE, JPO, JS, JZ) |
JMP | Jump | |
LAHF | Load flags into AH register | |
LDS | Load pointer using DS | |
LEA | Load Effective Address | |
LES | Load ES with pointer | |
LOCK | Assert BUS LOCK# signal | (for multiprocessing) |
LODSB | Load byte | |
LODSW | Load word | |
LOOP/LOOPx | Loop control | (LOOPE, LOOPNE, LOOPNZ, LOOPZ) |
MOV | Move | |
MOVSB | Move byte from string to string | |
MOVSW | Move word from string to string | |
MUL | Unsigned multiply | |
NEG | Two's complement negation | |
NOP | No operation | |
NOT | Negate the operand, logical NOT | |
OR | Logical OR | |
OUT | Output to port | |
POP | Pop data from stack | |
POPF | Pop data into flags register | |
PUSH | Push data onto stack | |
PUSHF | Push flags onto stack | |
RCL | Rotate left (with carry) | |
RCR | Rotate right (with carry) | |
REPxx | Repeat CMPS/MOVS/SCAS/STOS | (REP, REPE, REPNE, REPNZ, REPZ) |
RET | Return from procedure | |
RETN | Return from near procedure | |
RETF | Return from far procedure | |
ROL | Rotate left | |
ROR | Rotate right | |
SAHF | Store AH into flags | |
SAL | Shift Arithmetically left (multiply) | |
SAR | Shift Arithmetically right (signed divide) | |
SBB | Subtraction with borrow | |
SCASB | Compare byte string | |
SCASW | Compare word string | |
SHL | Shift left (multiply) | |
SHR | Shift right (unsigned divide) | |
STC | Set carry flag | |
STD | Set direction flag | |
STI | Set interrupt flag | |
STOSB | Store byte in string | |
STOSW | Store word in string | |
SUB | Subtraction | |
TEST | Logical compare (AND) | |
WAIT | Wait until not busy | Waits until BUSY# pin is inactive (used with floating-point unit) |
XCHG | Exchange data | |
XLAT | Table look-up translation | |
XOR | Exclusive OR |
Added with 80186/80188
Instruction | Meaning | Notes |
---|---|---|
BOUND | Check if r16 or r32 (array index) is within bounds specified by m16&16 or m32&32 | |
ENTER | Create a <nested> stack frame for a procedure | |
INSB | Input byte from port DX into ES:(E)DI | |
INSW | Input word from port DX into ES:(E)DI | |
LEAVE | Set SP to BP, then pop BP or Set ESP to EBP, then pop EBP | |
OUTSB | Output byte from memory location specified in DS:(E)SI to I/O port specified in DX | |
OUTSW | Output word from memory location specified in DS:(E)SI to I/O port specified in DX | |
POPA | Pop DI, SI, BP, BX, DX, CX, and AX | |
PUSHA | Push AX, CX, DX, BX, original SP, BP, SI, and DI | |
PUSHW |
Added with 80286
Instruction | Meaning | Notes |
---|---|---|
ARPL | Adjust RPL of r/m16 to not less than RPL of r16 | |
CLTS | Clears TS flag in CR0 | |
LAR | r16 <- r/m16 masked by FF00H or r32 <- r/m32 masked by 00FxFF00H | |
LGDT | Load m into GDTR | |
LIDT | Load m into IDTR | |
LLDT | Load segment selector r/m16 into LDTR | |
LMSW | Loads r/m16 in machine status word of CR0 | |
LOADALL | LOADALL loads all of the CPU registers, including the "hidden" software-invisible registers. At the completion of a LOADALL instruction, the entire CPU state is defined according to the LOADALL data table. | Undocumented, emulated in BIOS on some computers. |
LSL | Load: r16 <- segment limit, selector r/m16 or Load: r32 <- segment limit, selector r/m32 | |
LTR | Load r/m16 into task register | |
SGDT | Store GDTR to m | |
SIDT | Store IDTR to m | |
SLDT | Stores segment selector from LDTR in r/m16 or Store segment selector from LDTR in low-order 16 bits of r/m32 | |
SMSW | Store machine status word to r/m16 or Store machine status word in low-order 16 bits of r32/m16; high-order 16 bits of r32 are undefined | |
STR | Stores segment selector from TR in r/m16 | |
VERR | Set ZF=1 if segment specified with r/m16 can be read | |
VERW | Set ZF=1 if segment specified with r/m16 can be written |
Added with 80386
Instruction | Meaning | Notes |
---|---|---|
BSF | Bit scan forward on r/m | |
BSR | Bit scan reverse on r/m | |
BT | Bit Test, Store selected bit in CF flag | |
BTC | Store selected bit in CF flag and complement | |
BTR | Store selected bit in CF flag and clear | |
BTS | Store selected bit in CF flag and set | |
CDQ | EDX:EAX <- sign-extend of EAX | |
CMPSD | Compares doubleword at address DS:(E)SI with doubleword at address ES:(E)DI and sets the status flags accordingly | |
CWDE | EAX <- sign-extend of AX | |
INSD | ||
IRETD | Interrupt return (32-bit operand size) | |
IRETDF | ||
IRETF | ||
JECXZ | Jump short if ECX register is 0 | |
LFS | Load FS:r16 or FS:r32 with far pointer from memory | |
LGS | Load GS:r16 or GS:r32 with far pointer from memory | |
LSS | Load SS:r16 or SS:r32 with far pointer from memory | |
LODSD | Load doubleword at address DS:(E)SI into EAX | |
LOOPD | ||
LOOPED | ||
LOOPNED | ||
LOOPNZD | ||
LOOPZD | ||
MOVSD | ||
MOVSX | Move with Sign-Extension | |
MOVZX | Move with Zero-Extend | |
OUTSD | Output doubleword from memory location specified in DS:(E)SI to I/O port specified in DX | |
POPAD | Pop EDI, ESI, EBP, EBX, EDX, ECX, and EAX | |
POPFD | Pop top of stack into EFLAGS | |
PUSHAD | Push EAX, ECX, EDX, EBX, original ESP, EBP, ESI, and EDI | |
PUSHD | ||
PUSHFD | ||
SCASD | Compare EAX with doubleword at ES:(E)DI and set status flags | |
SETcc | Conditionally set byte. | |
SHLD | Double Precision Shift Left | |
SHRD | Double Precision Shift Right | |
STOSD | Store EAX at address ES:(E)DI |
Added with 80486
Instruction | Meaning | Notes |
---|---|---|
BSWAP | Reverses the byte order of a 32-bit register. | |
CMPXCHG | Compare and Exchange | |
CPUID | Returns processor identification and feature information to the EAX, EBX, ECX, and EDX registers, according to the input value entered initially in the EAX register. | |
INVD | Flush internal caches; initiate flushing of external caches. | |
INVLPG | Invalidate TLB Entry for page that contains m | |
RSM | Resume operation of interrupted program | |
WBINVD | Write back and flush Internal caches; initiate writing-back and flushing of external caches. | |
XADD | Exchange and Add |
Added with Pentium
Instruction | Meaning | Notes |
---|---|---|
CMPXCHG8B | Compare EDX:EAX with m64. If equal, set ZF and load ECX:EBX into m64. Else, clear ZF and load m64 into EDX:EAX. | |
RDMSR | Load MSR specified by ECX into EDX:EAX | |
RDPMC* | Read performance-monitoring counter specified by ECX into EDX:EAX | RDPMC was introduced in the Pentium Pro processor and the Pentium processor with MMX technology. |
RDTSC | Read time-stamp counter into EDX:EAX | |
WRMSR | Write the value in EDX:EAX to MSR specified by ECX |
Added with Pentium Pro
Instruction | Meaning | Notes |
---|---|---|
CMOVcc | Conditional move register to memory or register to register. | |
SYSENTER | Transition to System Call Entry Point | Equivalent for AMD is SYSCALL |
SYSEXIT | Transition from System Call Entry Point | Equivalent for AMD is SYSRET |
Added with Pentium III
as part of the SSE branding
Instruction | Meaning | Notes |
---|---|---|
MASKMOVQ | Selectively write bytes from mm1 to memory location using the byte mask in mm2 | |
MOVNTPS | Move packed single-precision floating-point values from xmm to m128, minimizing pollution in the cache hierarchy. | |
MOVNTQ | Move quadword from mm to m64, minimizing pollution in the cache hierarchy. | |
PREFETCH0 | Move data specified by address closer to the processor using T0 hint. | |
PREFETCH1 | Move data specified by address closer to the processor using T1 hint. | |
PREFETCH2 | Move data specified by address closer to the processor using T2 hint. | |
PREFETCHNTA | Move data specified by address closer to the processor using NTA hint. | |
SFENCE | Serializes store operations. | (for Cacheability and Memory Ordering) |
Added with Pentium 4
as part of the SSE2 branding
Instruction | Meaning | Notes |
---|---|---|
CLFLUSH | Flushes cache line containing m8. | |
LFENCE | Serializes load operations. | |
MASKMOVDQU | Selectively write bytes from xmm1 to memory location using the byte mask in xmm2. | |
MFENCE | Serializes load and store operations. | |
MOVNTDQ | Move double quadword from xmm to m128, minimizing pollution in the cache hierarchy. | |
MOVNTI | Move doubleword from r32 to m32, minimizing pollution in the cache hierarchy. | |
MOVNTPD | Move packed double-precision floating-point values from xmm to m128, minimizing pollution in the cache hierarchy. | |
PAUSE | Delays execution of next instruction an implementation-specific amount of time. |
Added with Pentium 4 supporting SSE3
only on processors supporting Hyper-threading
as part of the SSE3 branding
Instruction | Meaning | Notes |
---|---|---|
MONITOR | Sets up a linear address range to be monitored by hardware and activates the monitor. The address range should be of a write-back memory caching type. | |
MWAIT | A hint that allows the processor to stop instruction execution and enter an implementation-dependent optimized state until occurrence of a class of events. |
Added with Pentium 4 6x2
VMX is intended to support virtualization of processor hardware and a system software layer acting as a host to multiple guest software environments. The virtual-machine extensions (VMX) includes five instructions that manage the virtual-machine control structure (VMCS) and five instruction that manage VMX operation. Additional details of VMX are described in IA-32 Intel Architecture Software Developer’s Manual, Volume 3B.
Instruction | Meaning | Notes |
---|---|---|
VMPTRLD | This instruction takes a single 64-bit source operand that is in memory. It makes the referenced VMCS active and current, loading the current-VMCS pointer with this operand and establishes the current VMCS based on the contents of VMCS-data area in the referenced VMCS region. | |
VMPTRST | This instruction takes a single 64-bit destination operand that is in memory. The current-VMCS pointer is stored into the destination operand. | |
VMCLEAR | This instruction takes a single 64-bit operand that is in memory. The instruction sets the launch state of the VMCS referenced by the operand to “clear”, renders that VMCS inactive, and ensures that data for the VMCS have been written to the VMCSdata area in the referenced VMCS region. | |
VMREAD | This instruction reads a component from the VMCS (the encoding of that field is given in a register operand) and stores it into a destination operand that may be a register or in memory. | |
VMWRITE | This instruction writes a component to the VMCS (the encoding of that field is given in a register operand) from a source operand that may be a register or in memory | |
VMCALL | This instruction allows a guest in VMX non-root operation to call the VMM for service. A VM exit occurs, transferring control to the VMM. | |
VMLAUNCH | This instruction launches a virtual machine managed by the VMCS. A VM entry occurs, transferring control to the VM. | |
VMRESUME | This instruction resumes a virtual machine managed by the VMCS. A VM entry occurs, transferring control to the VM. | |
VMXOFF | This instruction causes the processor to leave VMX operation. | |
VMXON | This instruction takes a single 64-bit source operand that is in memory. It causes a logical processor to enter VMX root operation and to use the memory referenced by the operand to support VMX operation. |
Added with x86-64
Instruction | Meaning | Notes |
---|---|---|
CMPXCHG16B | Compare and Exchange Sixteen Bytes |
x87 Floating Point Instructions
Original 8087 instructions
Instruction | Meaning | Notes |
---|---|---|
F2XM1 | Replace ST(0) with (2 * ST(0) - 1) | |
FABS | Replace ST with its absolute value. | |
FADD | Add m to ST or add ST to ST(i) or ST(i) to ST | |
FADDP | Add m to ST or add ST to ST(i) or ST(i) to ST, pop the stack. | |
FBLD | Convert BCD value to real and push onto the FPU stack. | |
FBSTP | Store ST(0) in m80bcd and pop ST(0). | |
FCHS | Complements sign of ST(0) | |
FCLEX | Clear floating-point exception flags after checking for pending unmasked floating-point exceptions. | |
FCOM | Compare ST(0) with m or ST(i) | |
FCOMP | Compare ST(0) with m or ST(i), pop the stack. | |
FCOMPP | Compare ST(0) with ST(1) and pop register stack twice. | |
FDECSTP | Decrement TOP field in FPU status word. | |
FDISI | Sets the interrupt enable mask in the control word. | |
FDIV | Divide ST(0) by m and store result in ST(0), Divide ST(0) by ST(i) and store result in ST(0), Divide ST(i) by ST(0) and store result in ST(i) | |
FDIVP | Divide ST(i) by ST(0), store result in ST(i), and pop the register stack, Divide ST(1) by ST(0), store result in ST(1), and pop the register stack. | |
FDIVR | Reverse Divide | |
FDIVRP | Divide ST(0) by ST(i), store result in ST(i), and pop the register stack | |
FENI | Sets tag for ST(i) to empty. | |
FFREE | Sets tag for ST(i) to empty. | |
FIADD | Integer add m to ST(0) and store result in ST(0). | |
FICOM | Integer compare ST(0) with m. | |
FICOMP | Integer compare ST(0) with m, pop stack . | |
FIDIV | Integer divide ST(0) by m and store result in ST(0) | |
FIDIVR | Integer divide m by ST(0) and store result in ST(0) | |
FILD | Integer push m onto the FPU register stack. | |
FIMUL | Integer multiply ST(0) by m and store result in ST(0) | |
FINCSTP | Increment the TOP field in the FPU status register | |
FINIT | Initialize FPU after checking for pending unmasked floating-point exceptions. | |
FIST | Integer store ST(0) in m. | |
FISTP | Integer store ST(0) in m, pop the stack. | |
FISUB | Integer subtract m from ST(0) and store result in ST(0) | |
FISUBR | Integer subtract ST(0) from m and store result in ST(0) | |
FLD | Push m or ST(i) onto the FPU register stack. | |
FLD1 | Push +1.0 onto the FPU register stack. | |
FLDCW | Load FPU control word from m2byte. | |
FLDENV | Load FPU environment from m14byte or m28byte. | |
FLDENVW | ||
FLDL2E | Push log2</sup>e onto the FPU register stack. | |
FLDL2T | Push log210 onto the FPU register stack. | |
FLDLG2 | Push log102 onto the FPU register stack. | |
FLDLN2 | Push loge2 onto the FPU register stack. | |
FLDPI | Push p onto the FPU register stack. | |
FLDZ | Push +0.0 onto the FPU register stack. | |
FMUL | Multiply ST(0) by m or ST(i) and store result in ST(0), | |
FMULP | Multiply ST(0) by m or ST(i) and store result in ST(0), pop the stack, | |
FNCLEX | Clear floating-point exception flags without checking for pending unmasked floating-point exceptions. | |
FNDISI | Sets the interrupt enable mask in the control word. | |
FNENI | Clears the interrupt enable mask in the control word. | |
FNINIT | Initialize FPU without checking for pending unmasked floating-point exceptions. | |
FNOP | No operation is performed. | |
FNSAVE | Store FPU environment to m94byte or m108byte without checking for pending unmasked floating-point exceptions. Then re-initialize the FPU. | |
FNSAVEW | ||
FNSTCW | Store FPU control word to m2byte without checking for pending unmasked floating-point exceptions. | |
FNSTENV | Store FPU environment to m14byte or m28byte without checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions. | |
FNSTENVW | ||
FNSTSW | Store FPU status word at m2byte or in AX without checking for pending unmasked floating-point exceptions. | |
FPATAN | Replace ST(1) with arctan(ST(1)/ST(0)) and pop the register stack | |
FPREM | Replace ST(0) with the remainder obtained from dividing ST(0) by ST(1) | |
FPTAN | Replace ST(0) with its tangent and push 1 onto the FPU stack. | |
FRNDINT | Round ST(0) to an integer. | |
FRSTOR | Load FPU state from m94byte or m108byte. | |
FRSTORW | ||
FSAVE | Store FPU state to m94byte or m108byte after checking for pending unmasked floating-point exceptions. Then re-initialize the FPU. | |
FSAVEW | ||
FSCALE | Scale ST(0) by ST(1). | |
FSQRT | Computes square root of ST(0) and stores the result in ST(0) | |
FST | Copy ST(0) to m or ST(i) | |
FSTCW | Store FPU control word to m2byte after checking for pending unmasked floating-point exceptions. | |
FSTENV | Store FPU environment to m14byte or m28byte after checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions. | |
FSTENVW | ||
FSTP | Copy ST(0) to m or ST(i) and pop register stack. | |
FSTSW | Store FPU status word at m2byte or in AX after checking for pending unmasked floating-point exceptions. | |
FSUB | Subtract m from ST(0) and store result in ST(0) or Subtract ST(i) from ST(0) and store result in ST(0) or Subtract ST(0) from ST(i) and store result in ST(i) | |
FSUBP | Subtract ST(0) from ST(i), store result in ST(i), and pop register stack | |
FSUBR | Subtract ST(0) from m32real or ST(i) and store result in ST(0) or Subtract ST(i) from ST(0) and store result in ST(i) | |
FSUBRP | Subtract ST(i) from ST(0), store result in ST(i), and pop register stack | |
FTST | Compare ST(0) with 0.0. | |
FWAIT | Check pending unmasked floating-point exceptions. | |
FXAM | Classify value or number in ST(0) | |
FXCH | Exchange the contents of ST(0) and ST(i) | |
FXTRACT | Separate value in ST(0) into exponent and significand, store exponent in ST(0), and push the significand onto the register stack. | |
FYL2X | Replace ST(1) with (ST(1) * log2ST(0)) and pop the register stack | |
FYL2XP1 | Replace ST(1) with ST(1) * log 2 (ST(0) + 1.0) and pop the register stack |
Added with 80287
Instruction | Meaning | Notes |
---|---|---|
FSETPM | Set Protected Mode | Only used on 80287 |
Added with 80387
Instruction | Meaning | Notes |
---|---|---|
FCOS | Replace ST(0) with its cosine | |
FLDENVD | Load FPU environment from m28byte. | |
FNSAVED | Store FPU environment to m94byte or m108byte without checking for pending unmasked floating-point exceptions. Then re-initialize the FPU. | |
FNSTENVD | Store FPU environment to m14byte or m28byte without checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions. | |
FPREM1 | Replace ST(0) with the IEEE remainder obtained from dividing ST(0) by ST(1) | |
FRSTORD | Load FPU state from m94byte or m108byte. | |
FSAVED | Store FPU state to m94byte or m108byte after checking for pending unmasked floating-point exceptions. Then re-initialize the FPU. | |
FSIN | Replace ST(0) with its sine. | |
FSINCOS | Compute the sine and cosine of ST(0); replace ST(0) with the sine, and push the cosine onto the register stack. | |
FSTENVD | Store FPU environment to m14byte or m28byte after checking for pending unmasked floating-point exceptions. Then mask all floating-point exceptions. | |
FUCOM | Performs an unordered comparison of the contents of register ST(0) and ST(i) and sets condition code flags C0, C2, and C3 in the FPU status word according to the results. | |
FUCOMP | Performs an unordered comparison of the contents of register ST(0) and ST(i) and sets condition code flags C0, C2, and C3 in the FPU status word according to the results and pops the stack. | |
FUCOMPP | Performs an unordered comparison of the contents of register ST(0) and ST(i) and sets condition code flags C0, C2, and C3 in the FPU status word according to the results and pops the stack twice. |
Added with Pentium Pro
Instruction | Meaning | Notes |
---|---|---|
FCMOVB | Move if below (CF=1) | |
FCMOVBE | Move if below or equal (CF=1 or ZF=1) | |
FCMOVE | Move if equal (ZF=1) | |
FCMOVNB | Move if not below (CF=0) | |
FCMOVNBE | Move if not below or equal (CF=0 and ZF=0) | |
FCMOVNE | Move if not equal (ZF=0) | |
FCMOVNU | Move if not unordered (PF=0) | |
FCMOVU | Move if unordered (PF=1) | |
FCOMI | Compare ST(0) with ST(i) and set status flags accordingly | |
FCOMIP | Compare ST(0) with ST(i), set status flags accordingly, and pop register stack | |
FUCOMI | Compare ST(0) with ST(i), check for ordered values, and set status flags accordingly | |
FUCOMIP | Compare ST(0) with ST(i), check for ordered values, set status flags accordingly, and pop register stack | |
FXRSTOR | Loads x87 FPU, MMX™ technology, Streaming SIMD Extensions, and Streaming SIMD Extensions 2 state from m512byte. | |
FXSAVE | x87 FPU, MMX™ technology, Streaming SIMD Extensions, and Streaming SIMD Extensions 2 state to m512byte. |
Added with Pentium 4 supporting SSE3
as part of the SSE3 branding
FISTTP Store ST as a signed integer (truncate) in m and pop ST
SIMD Instructions
MMX instructions
added with Pentium MMX
Instruction | Meaning | Notes |
---|---|---|
EMMS | Empty MMX state | This instruction must be executed before executing floating-point instructions. |
MOVD | Move doubleword | |
MOVQ | Move quadword | |
PACKSSDW | Pack doublewords into words (signed with saturation) | |
PACKSSWB | Pack words into bytes (signed with saturation) | |
PACKUSWB | Pack words into bytes (unsigned with saturation) | |
PADDB | Add with wrap-around on byte | |
PADDD | Add with wrap-around on doubleword | |
PADDSB | Add signed with saturation on byte | |
PADDSW | Add signed with saturation on word | |
PADDUSB | Add unsigned with saturation on byte | |
PADDUSW | Add unsigned with saturation on word | |
PADDW | Add with wrap-around on word | |
PAND | Bitwise AND | |
PANDN | Bitwise AND NOT | |
PCMPEQB | Packed compare for equality byte | |
PCMPEQD | Packed compare for equality double word | |
PCMPEQW | Packed compare for equality word | |
PCMPGTB | Packed compare greater than byte | |
PCMPGTD | Packed compare greater than double word | |
PCMPGTW | Packed compare greater than word | |
PMADDWD | Packed multiply on words and add resulting pairs | |
PMULHW | Packed multiply high on words | |
PMULLW | Packed multiply low on words | |
POR | Bitwise OR | |
PSLLD | Packed shift left logical doubleword by amount specified in MMX register or by immediate value | |
PSLLQ | Packed shift left logical quadword by amount specified in MMX register or by immediate value | |
PSLLW | Packed shift left logical word by amount specified in MMX register or by immediate value | |
PSRAD | Packed shift right arithmetic doubleword by amount specified in MMX register or by immediate value | |
PSRAW | Packed shift right arithmetic word by amount specified in MMX register or by immediate value | |
PSRLD | Packed shift right logical doubleword by amount specified in MMX register or by immediate value | |
PSRLQ | Packed shift right logical quadword by amount specified in MMX register or by immediate value | |
PSRLW | Packed shift right logical word by amount specified in MMX register or by immediate value | |
PSUBB | Subtract with wrap-around on byte | |
PSUBD | Subtract with wrap-around on doubleword | |
PSUBSB | Subtract signed with saturation on byte | |
PSUBSW | Subtract signed with saturation on word | |
PSUBUSB | Subtract unsigned with saturation on byte | |
PSUBUSW | Subtract unsigned with saturation on word | |
PSUBW | Subtract with wrap-around on word | |
PUNPCKHBW | Unpack (interleave) high-order bytes from MMX register | |
PUNPCKHDQ | Unpack (interleave) high-order doublewords from MMX register | |
PUNPCKHWD | Unpack (interleave) high-order words from MMX register | |
PUNPCKLBW | Unpack (interleave) low-order bytes from MMX register | |
PUNPCKLDQ | Unpack (interleave) low-order doublewords from MMX register | |
PUNPCKLWD | Unpack (interleave) low-order words from MMX register | |
PXOR | Bitwise EOR |
Extendend MMX|MMX+ instructions
added with 6x86MX from Cyrix; supported on other CPUs too, i.e. Extended MMX on Athlon 64
3DNow! instructions
added with K6-2
Instruction | Meaning | Notes |
---|---|---|
FEMMS | Faster Enter/Exit of the MMX or floating-point state. | |
PAVGUSB | Average of unsigned packed 8-bit values. | |
PF2ID | Converts packed floating-point operand to packed 32-bit integer. | |
PFACC | Floating-point accumulate. | |
PFADD | Packed, floating-point addition. | |
PFCMPEQ | Packed floating-point comparison, equal to. | |
PFCMPGE | Packed floating-point comparison, greater than or equal to. | |
PFCMPGT | Packed floating-point comparison, greater than. | |
PFMAX | Packed floating-point maximum. | |
PFMIN | Packed floating-point minimum. | |
PFMUL | Packed floating-point multiplication. | |
PFRCP | Floating-point reciprocal approximation. | |
PFRCPIT1 | Packed floating-point reciprocal, first iteration step. | |
PFRCPIT2 | Packed floating-point reciprocal/reciprocal square root, second iteration step. | |
PFRSQIT1 | Packed floating-point reciprocal square root, first iteration step. | |
PFRSQRT | Floating-point reciprocal square root approximation. | |
PFSUB | Packed floating-point subtraction. | |
PFSUBR | Packed floating-point reverse subtraction. | |
PI2FD | Packed 32-bit integer to floating-point conversion. | |
PMULHRW | Multiply signed packed 16-bit values with rounding and store the high 16 bits. | |
PREFETCH | Prefetch processor cache line into L1 data cache (Dcache). | |
PREFETCHW | Prefetch processor cache line into L1 data cache (Dcache). |
3DNow!+ instructions
added with Athlon
Instruction | Meaning | Notes |
---|---|---|
PF2IW | Packed Floating-Point to Integer Word Conversion with Sign Extend | |
PFNACC | Packed Floating-Point Negative Accumulate | |
PFPNACC | Packed Floating-Point Mixed Positive-Negative Accumulate | |
PI2FW | Packed Integer Word to Floating-Point Conversion | |
PSWAPD | Packed Swap Doubleword |
Streaming SIMD Extensions|SSE instructions
added with Pentium III
also see integer instruction added with Pentium III
SSE SIMD Floating-Point Instructions
Instruction | Meaning | Notes |
---|---|---|
ADDPS | Add packed single-precision floating-point values from xmm2/m128 to xmm1. | |
ADDSS | Add the low single-precision floating-point value from xmm2/m32 to xmm1. | |
ANDNPS | Bitwise logical AND NOT of xmm2/m128 and xmm1. | |
ANDPS | Bitwise logical AND of xmm2/m128 and xmm1. | |
CMPPS | Compare packed single-precision floating-point values from xmm2/mem with packed single-precision floating-point values in xmm1 register using imm8 as comparison predicate. | |
CMPSS | Compare low single-precision floating-point value from xmm2/m32 with low single-precision floating-point value in xmm1 register using imm8 as comparison predicate. | |
COMISS | Compare low single-precision floating-point values in xmm1 and xmm2/mem32 and set the EFLAGS flags accordingly. | |
CVTPI2PS | Convert two signed doubleword integers from mm/m64 to two single-precision floating-point values in xmm.. | |
CVTPS2PI | Convert two single-precision floating-point values from xmm/m64 to two signed doubleword signed integers in mm. | |
CVTSI2SS | Convert one signed doubleword integer from r/m32 to one single-precision floating-point number in xmm. | |
CVTSS2SI | Convert one single-precision floating-point number from xmm/m32 to one signed doubleword integer in r32. | |
CVTTPS2PI | Convert two single-precision floating-point values from xmm/m64 to two signed doubleword signed integers in mm using truncation. | |
CVTTSS2SI | Convert one single-precision floating-point number from xmm/m32 to one signed doubleword integer r32 using truncation. | |
DIVPS | Divide packed single-precision floating-point values in xmm1 by packed single-precision floating-point values xmm2/m128. | |
DIVSS | Divide low single-precision floating-point value in xmm1 by low single-precision floating-point value in xmm2/m32 | |
LDMXCSR | Load Streaming SIMD Extension control/status word from m32. | |
MAXPS | Return the maximum single-precision floating-point values between xmm2/m128 and xmm1. | |
MAXSS | Return the maximum scalar single-precision floating-point value between xmm2/mem32 and xmm1. | |
MINPS | Return the minimum single-precision floating-point values between xmm2/m128 and xmm1. | |
MINSS | Return the minimum scalar single-precision floating-point value between xmm2/mem32 and xmm1. | |
MOVAPS | Move packed single-precision floating-point numbers from/to xmm2/m128 to/from xmm1. | |
MOVHLPS | Move two packed single-precision floating-point values from high quadword of xmm2 to low quadword of xmm1. | |
MOVHPS | Move two packed single-precision floating-point values from/to m64 to/from high quadword of xmm. | |
MOVLHPS | Move two packed single-precision floating-point values from low quadword of xmm2 to high quadword of xmm1. | |
MOVLPS | Move two packed single-precision floating-point values from/to m64 to/from low quadword of xmm. | |
MOVMSKPS | Extract 4-bit sign mask of from xmm and store in r32. | |
MOVNTPS | Move packed single-precision floating-point values from xmm to m128, minimizing pollution in the cache hierarchy. | |
MOVSS | Move scalar single-precision floating-point value from/to xmm2/m64 to/from xmm1 register. | |
MOVUPS | Move packed single-precision floating-point numbers from/to xmm2/m128 to/from xmm1. | |
MULPS | Multiply packed single-precision floating-point values in xmm2/mem by xmm1. | |
MULSS | Multiply the low single-precision floating-point value in xmm2/mem by the low single-precision floating-point value in xmm1. | |
ORPS | Bitwise OR of xmm2/m128 and xmm1 | |
RCPPS | Returns to xmm1 the packed approximations of the reciprocals of the packed single-precision floating-point values in xmm2/m128. | |
RCPSS | Returns to xmm1 the packed approximation of the reciprocal of the low single-precision floating-point value in xmm2/m32. | |
RSQRTPS | Returns to xmm1 the packed approximations of the reciprocals of the square roots of the packed single-precision floating-point values in xmm2/m128. | |
RSQRTSS | Returns to xmm1 an approximation of the reciprocal of the square root of the low single-precision floating-point value in xmm2/m32. | |
SHUFPS | Shuffle packed single-precision floating-point values selected by imm8 from xmm1 and xmm1/m128 to xmm1. | |
SQRTPS | Computes square roots of the packed single-precision floating-point values in xmm2/m128 and stores the results in xmm1. | |
SQRTSS | Computes square root of the low single-precision floating-point value in xmm2/m32 and stores the results in xmm1. | |
STMXCSR | Store Multimedia Extended Control Status Register. | |
SUBPS | Subtract packed single-precision floating-point values in xmm2/mem from xmm1. | |
SUBSS | Subtract the lower single-precision floating-point numbers in xmm2/m32 from xmm1. | |
UCOMISS | Compare lower single-precision floating-point number in xmm1 register with lower single-precision floating-point number in xmm2/mem and set the status flags accordingly. | |
UNPCKHPS | Interleaves single-precision floating-point values from the high quadwords of xmm1 and xmm2/mem into xmm1. | |
UNPCKLPS | Interleaves single-precision floating-point values from the low quadwords of xmm1 and xmm2/mem into xmm1. | |
XORPS | Bitwise exclusive-OR of xmm2/m128 and xmm1. |
SSE SIMD Integer Instructions
Instruction | Meaning | Notes |
---|---|---|
PAVGB | Average packed unsigned byte integers from xmm2/m128 and xmm1, with rounding. | |
PAVGW | Average packed unsigned word integers from xmm2/m128 and xmm1, with rounding. | |
PEXTRW | Extract the word specified by imm8 from xmm and move it to a r32. | |
PINSRW | Move the low word of r32 or from m16 into xmm at the word position specified by imm8. | |
PMAXSW | Compare signed word integers in xmm2/m128 and xmm1 for maximum values. | |
PMAXUB | Compare unsigned byte integers in xmm2/m128 and xmm1 for maximum values. | |
PMINSW | Compare signed word integers in xmm2/m128 and xmm1 for minimum values. | |
PMINUB | Compare unsigned byte integers in xmm2/m128 and xmm1 for minimum values. | |
PMOVMSKB | Move the byte mask of xmm to r32. | |
PSADBW | Absolute difference of packed unsigned byte integers from xmm2 /m128 and xmm1; the 8 low differences and 8 high differences are then summed separately to produce two word integer results. | |
PSHUFW | Shuffle the words in mm2/m64 based on the encoding in imm8 and store the result in mm1. |
SSE2 instructions
added with Pentium 4
also see integer instructions added with Pentium 4
SSE2 SIMD Floating-Point Instructions
Instruction | Meaning | Notes |
---|---|---|
ADDPD | Add packed double-precision floating-point values from xmm2/m128 to xmm1. | |
ADDSD | Add the low double-precision floating-point value from xmm2/m64 to xmm1. | |
ANDNPD | Bitwise logical AND NOT of xmm2/m128 and xmm1. | |
ANDPD | Bitwise logical AND of xmm2/m128 and xmm1. | |
CMPPD | Compare packed double-precision floating-point numbers from xmm2/m128 with packed double-precision floating-point numbers in xmm1, using imm8 as comparison predicate. | |
CMPSD* | Compare low double-precision floating-point value from xmm2/m64 with low double-precision floating-point value in xmm1 register using imm8 as comparison predicate. | |
COMISD | Compare low double-precision floating-point values in xmm1 and xmm2/mem64 and set the EFLAGS flags accordingly. | |
CVTDQ2PD | Convert two packed signed doubleword integers from xmm2/m128 to two packed double-precision floating-point values in xmm1. | |
CVTDQ2PS | Convert four packed signed doubleword integers from xmm2/m128 to four packed single-precision floating-point values in xmm1. | |
CVTPD2DQ | Convert two packed double-precision floating-point values from xmm2/m128 to two packed signed doubleword integers in xmm1. | |
CVTPD2PI | Convert two packer double-precision floating-point numbers from xmm/m128 to two packed signed doubleword integers in mm. | |
CVTPD2PS | Convert two double-precision floating-point values in xmm2/m128 to two single-precision floating-point values in xmm1. | |
CVTPI2PD | Convert two signed doubleword integers from mm/mem64 to two double-precision floating-point values in xmm. | |
CVTPS2DQ | Convert four packed single-precision floating-point values from xmm2/m128 to four packed signed doubleword integers in xmm1. | |
CVTPS2PD | Convert two packed single-precision floating-point values in xmm2/m64 to two packed double-precision floating-point values in xmm1. | |
CVTSD2SI | Convert one double-precision floating-point number from xmm/m64 to one signed doubleword integer r32. | |
CVTSD2SS | Convert one double-precision floating-point value in xmm2/m64 to one single-precision floating-point value in xmm1. | |
CVTSI2SD | Convert one signed doubleword integer from r/m32 to one double-precision floating-point value in xmm. | |
CVTSS2SD | Convert one single-precision floating-point value in xmm2/m32 to one double-precision floating-point value in xmm1. | |
CVTTPD2DQ | Convert two packed double-precision floating-point values from xmm2/m128 to two packed signed doubleword integers in xmm1 using truncation. | |
CVTTPD2PI | Convert two packer double-precision floating-point numbers from xmm/m128 to two packed signed doubleword integers in mm using truncation. | |
CVTPS2DQ | Convert four packed single-precision floating-point values from xmm2/m128 to four packed signed doubleword integers in xmm1. | |
CVTTSD2SI | Convert one double-precision floating-point number from xmm/m64 to one signed doubleword integer r32 using truncation. | |
DIVPD | Divide packed double-precision floating-point values in xmm1 by packed double-precision floating-point values xmm2/m128. | |
DIVSD | Divide low double-precision floating-point value n xmm1 by low double-precision floating-point value in xmm2/mem64. | |
MAXPD | Return the maximum double-precision floating-point values between xmm2/m128 and xmm1. | |
MAXSD | Return the maximum scalar double-precision floating-point value between xmm2/mem64 and xmm1. | |
MINPD | Return the minimum double-precision floating-point values between xmm2/m128 and xmm1. | |
MINSD | Return the minimum scalar double-precision floating-point value between xmm2/mem64 and xmm1. | |
MOVAPD | Move Aligned Packed Double-Precision Floating-Point Values. | |
MOVHPD | Move High Packed Double-Precision Floating-Point Value. | |
MOVLPD | Move Low Packed Double-Precision Floating-Point Value. | |
MOVMSKPD | Extract 2-bit sign mask of from xmm and store in r32. | |
MOVSD* | Move Scalar Double-Precision Floating-Point Value | |
MOVUPD | Move Unaligned Packed Double-Precision Floating-Point Values. | |
MULPD | Multiply packed double-precision floating-point values in xmm2/m128 by xmm1. | |
MULSD | Multiply the low double-precision floating-point value in xmm2/mem64 by low double-precision floating-point value in xmm1. | |
ORPD | Bitwise OR of xmm2/m128 and xmm1. | |
SHUFPD | Shuffle packed double-precision floating-point values selected by imm8 from xmm1 and xmm1/m128 to xmm1. | |
SQRTPD | Computes square roots of the packed double-precision floating-point values in xmm2/m128 and stores the results in xmm1. | |
SQRTSD | Computes square root of the low double-precision floating-point value in xmm2/m64 and stores the results in xmm1. | |
SUBPD | Subtract packed double-precision floating-point values in xmm2/m128 from xmm1. | |
SUBSD | Subtracts the low double-precision floating-point numbers in xmm2/mem64 from xmm1. | |
UCOMISD | Compares (unordered) the low double-precision floating-point values in xmm1 and xmm2/m64 and set the EFLAGS accordingly. | |
UNPCKHPD | Interleaves double-precision floating-point values from the high quadwords of xmm1 and xmm2/m128. | |
UNPCKLPD | Interleaves double-precision floating-point values from the low quadwords of xmm1 and xmm2/m128. | |
XORPD | Bitwise exclusive-OR of xmm2/m128 and xmm1 |
- CMPSD and MOVSD have the same name as the string instruction mnemonics CMPSD (CMPS) and MOVSD (MOVS), however, the former refer to scalar Double precision|double-precision Floating point whereas the latters refer to Integer strings.
SSE2 SIMD Integer Instructions
Instruction | Meaning | Notes |
---|---|---|
MOVDQ2Q | Move low quadword from xmm to mmx register . | |
MOVDQA | Move aligned double quadword from/to xmm2/m128 to/from xmm1. | |
MOVDQU | Move unaligned double quadword from/to xmm2/m128 to/from xmm1. | |
MOVQ2DQ | Move quadword from mmx to low quadword of xmm. | |
PADDQ | Add packed quadword integers xmm2/m128 to xmm1 | |
PMULUDQ | Multiply packed unsigned doubleword integers in xmm1 by packed unsigned doubleword integers in xmm2/m128, and store the quadword results in xmm1. | |
PSHUFHW | Shuffle the high words in xmm2/m128 based on the encoding in imm8 and store the result in xmm1. | |
PSHUFLW | Shuffle the low words in xmm2/m128 based on the encoding in imm8 and store the result in xmm1. | |
PSHUFD | Shuffle the doublewords in xmm2/m128 based on the encoding in imm8 and store the result in xmm1. | |
PSLLDQ | Shift left xmm1 by imm8 bytes, clearing low-order bits. | |
PSRLDQ | Shift right xmm1 by imm8, clearing high-order bits. | |
PUNPCKHQDQ | Interleave doublewords from the high quadwords of xmm1 and xmm2/m128 into xmm1. | |
PUNPCKLQDQ | Interleave low quadwords of xmm1 and xmm2/m128 into xmm1 register. |
SSE3 instructions
added with Pentium 4 SSE3
also see integer and floating-point instructions added with Pentium 4 SSE3
SSE3 SIMD Floating-Point Instructions
Instruction | Meaning | Notes |
---|---|---|
ADDSUBPD | Add/Subtract packed DP FP numbers from XMM2/Mem to XMM1. | |
ADDSUBPS | Add/Subtract packed SP FP numbers from XMM2/Mem to XMM1. | |
HADDPD | Add horizontally packed DP FP numbers from XMM2/Mem to XMM1. | |
HADDPS | Add horizontally packed SP FP numbers from XMM2/Mem to XMM1. | |
HSUBPD | Subtract horizontally packed DP FP numbers in XMM2/Mem from XMM1. | |
HSUBPS | Subtract horizontally packed SP FP numbers in XMM2/Mem from XMM1. |
SSE3 SIMD Integer Instructions
Instruction | Meaning | Notes |
---|---|---|
MOVDDUP | Move 64 bits representing the lower DP data element from XMM2/Mem to XMM1 register and duplicate. | |
MOVSHDUP | Move 128 bits representing packed SP data elements from XMM2/Mem to XMM1 register and duplicate high. | |
MOVSLDUP | Move 128 bits representing packed SP data elements from XMM2/Mem to XMM1 register and duplicate low. | |
LDDQU | Load 128 bits from Mem to XMM register. |
SSSE3 instructions
added with Xeon 5100 series and Core 2
Instruction | Meaning | Notes |
---|---|---|
PSIGNW | Negate the elements of a register of bytes, words or dwords if the sign of the corresponding elements of another register is negative. | |
PSIGND | Negate the elements of a register of bytes, words or dwords if the sign of the corresponding elements of another register is negative. | |
PSIGNB | Negate the elements of a register of bytes, words or dwords if the sign of the corresponding elements of another register is negative. | |
PSHUFB | Packed Shuffle Bytes. | |
PMULHRSW | Packed Multiply High with Round and Scale. | |
PMADDUBSW | Multiply and Add Packed Signed and Unsigned Bytes. | |
PHSUBW | Packed Horizontal Subtract Word. | |
PHSUBSW | Packed Horizontal Subtract and Saturate Words. | |
PHSUBD | Packed Horizontal Subtract Doubleword. | |
PHADDW | Packed Horizontal Add Word. | |
PHADDSW | Packed Horizontal Add and Saturate Words. | |
PHADDD | Packed Horizontal Add Doubleword. | |
PALIGNR | Packed Align Right. | |
PABSW | Packed Absolute Value Word. | |
PABSD | Packed Absolute Value Doubleword. | |
PABSB | Packed Absolute Value Byte. |