Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CIS273
Course Goals • Develop an understanding of principles and design tradeoffs in modern computers • Develop a basic understanding of computer performance • Gain a detailed understanding of processor design for a given instruction set architecture • Gain an understanding of memory organization and memory hierarchies 2 What to Expect from Midterm 2 • T-F / MCQ – 20 points • Short Answer Questions – 30 points • Pipelining – 35 points – Performance – Parallelism – Hazards • Scheduling – 15 points 3 Rules • 1-page cheat sheet / front and back • No electronic devices including calculators • No slide printouts • No collaboration – Automatic ”zero” when caught • Be there on time, otherwise, you won’t be able to enter after 2 PM • 50 minutes exactly, no extra time 4 Chapter 4 — The Processor — 5 The Main Control Unit • Control signals derived from instruction 0 rs rt rd shamt funct 31:26 5:025:21 20:16 15:11 10:6 35 or 43 rs rt address 31:26 25:21 20:16 15:0 4 rs rt address 31:26 25:21 20:16 15:0 R-type Load/ Store Branch opcode always read read, except for load write for R-type and load sign-extend and add Chapter 2 — Instructions: Language of the Computer — 6 Translation and Startup Many compilers produce object modules directly Static linking §2.12 Translating and Starting a Program Chapter 2 — Instructions: Language of the Computer — 7 Producing an Object Module • Assembler (or compiler) translates program into machine instructions • Provides information for building a complete program from the pieces – Header: described contents of object module – Text segment: translated instructions – Static data segment: data allocated for the life of the program – Relocation info: for contents that depend on absolute location of loaded program – Symbol table: global definitions and external refs – Debug info: for associating with source code Chapter 2 — Instructions: Language of the Computer — 8 Linking Object Modules • Produces an executable image 1. Merges segments 2. Resolve labels (determine their addresses) 3. Patch location-dependent and external refs • Could leave location dependencies for fixing by a relocating loader – But with virtual memory, no need to do this – Program can be loaded into absolute location in virtual memory space Chapter 2 — Instructions: Language of the Computer — 9 Loading a Program • Load from image file on disk into memory 1. Read header to determine segment sizes 2. Create virtual address space 3. Copy text and initialized data into memory • Or set page table entries so they can be faulted in 4. Set up arguments on stack 5. Initialize registers (including $sp, $fp, $gp) 6. Jump to startup routine • Copies arguments to $a0, … and calls main • When main returns, do exit syscall Chapter 2 — Instructions: Language of the Computer — 10 Dynamic Linking • Only link/load library procedure when it is called – Requires procedure code to be relocatable – Avoids image bloat caused by static linking of all (transitively) referenced libraries – Automatically picks up new library versions Chapter 2 — Instructions: Language of the Computer — 11 Lazy Linkage Indirection table Stub: Loads routine ID, Jump to linker/loader Linker/loader code Dynamically mapped code Chapter 2 — Instructions: Language of the Computer — 12 Starting Java Applications Simple portable instruction set for the JVM Interprets bytecodes Compiles bytecodes of “hot” methods into native code for host machine Chapter 4 — The Processor — 13 Pipelining Analogy • Pipelined laundry: overlapping execution – Parallelism improves performance §4.6 An O verview of Pipeliningn Four loads: n Speedup = 8/3.5 = 2.3 n Non-stop: n Speedup = 2n/0.5n + 1.5 n 4 stages Chapter 4 — The Processor — 14 MIPS Pipeline • Five stages, one step per stage 1. IF: Instruction fetch from memory 2. ID: Instruction decode & register read 3. EX: Execute operation or calculate address 4. MEM: Access memory operand 5. WB: Write result back to register Chapter 4 — The Processor — 15 Forwarding Paths Most Datapaths are capable of forwarding: - EX to ID - Mem to ID Some Datapaths are ALSO capable of forwarding: - Output of EX to Input of EX Chapter 4 — The Processor — 16 Pipeline Performance • Assume time for stages is – 100ps for register read or write – 200ps for other stages • Compare pipelined datapath with single-cycle datapath Instr Instr fetch Register read ALU op Memory access Register write Total time lw 200ps 100 ps 200ps 200ps 100 ps 800ps sw 200ps 100 ps 200ps 200ps 700ps R-format 200ps 100 ps 200ps 100 ps 600ps beq 200ps 100 ps 200ps 500ps