CS232: Computer Architecture II
This set of notes provides an overview of the x86 instruction set architecture and its use in modern software. The goal is to familiarize you with the ISA to the point that you can code simple programs and can read disassembled binary code comfortably. Substantial portions of the ISA are ignored completely for the sake ofsimplicity. The notes use the assembly notation used by the GNU tools, including the assembler as (used by the compiler gcc) and the debugger gdb. Other tools may deﬁne other notations, but such things are merely cosmetic so long as you pay attention to what you are using at the time.
The Basics: Registers, Data Types, and Memory
You may have heard or seen the term “Reduced Instruction SetComputing,” or RISC, and its counterpart, “Complex Instruction Set Computing,” or CISC. While these terms were never entirely clear and have been further muddied by years of marketing, the x86 ISA is certainly vastly more complex than that of MIPS. On the other hand, much of the complexity has to do with backwards compatibility, which is mostly irrelevant to someone writing code today. Furthermore,we need use only a limited subset of the ISA in this class. Modern ﬂavors of x86—also called IA32, or Intel Architecture 32—have eight 32-bit integer registers. The registers are not entirely general-purpose, meaning that some instructions limit your choice of register operands to fewer than eight. A couple of other special-purpose 32-bit registers are also available—namely the instruction pointer(program counter) and the ﬂags (condition codes), and we shall ignore the ﬂoating-point and multimedia registers. Unlike most RISC machines, the registers have names stemming from their historical special purposes, as described below.
%eax %ebx %ecx %edx %esi %edi %ebp %esp %eip %eflags
accumulator (for adding, multiplying, etc.) base (address of array in memory) count (of loop iterations)data (e.g., second operand for binary operations) source index (for string copy or array access) destination index (for string copy or array access) base pointer (base of current stack frame) stack pointer (top of stack) instruction pointer (program counter) ﬂags (condition codes and other things)
32−bit 16−bit AX EAX BX EBX CX ECX DX EDX DI ESI DI EDI SP EBP BP ESP
8−bit high lowAH AL BH BL CH CL DH DL
8 7 0
The character “%” is used to denote a register in assembly code and is not considered a part of the register name itself; note also that register names are not case sensitive. The letter “E” in each name indicates that the “extended” version of the register is desired (extended from 16 bits). Registers can also be used to store 16- and 8-bitvalues, which is useful when writing smaller values to memory or I/O ports. As shown to the right above, the low 16 bits of a register are accessed by dropping the “E” from the register name, e.g., %si. Finally, the two 8-bit halves of the low 16 bits of the ﬁrst four registers can be used as 8-bit registers by replacing “X” with “H” (high) or “L” (low). The x86 ISA supports both 2’s complement andunsigned integers in widths of 32, 16, and 8 bits, single and doubleprecision IEEE ﬂoating-point, 80-bit Intel ﬂoating-point, ASCII strings, and binary-coded decimal (BCD). Most instructions are independent of data type, but some require that you select the proper instruction for the data types of the operands. Try multiplying 32-bit representations of -1 and 1 to produce a 64-bit result, forexample. Use of memory is more ﬂexible in x86 than in MIPS: in addition to load and store operations, many x86 operations accept memory locations as operands. For example, a single instruction serves to read the value in a memory location, add a constant, and store the sum back to the memory location. With x86, memory is 8-bit (byte) addressable and uses 32-bit addresses, although few machines...