Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CS318 Programming Assignment 2 Assembler Upload your completed Assembler.java file to the Blackboard page for this assignment. Instructions: • Download the ZIP compressed file from this assignment page in Blackboard (CS318_Prog2.zip), and extract the files. • The starter code files are Assembler.java, LabelOffset.java, and Opcode.java. • The solution from Assignment 1 is provided as a Java Archive (JAR) file: prog1.jar. This JAR contains the completed classes from Assignment 1. The Binary class will be useful for Assignment 2. • The Javadoc documentation for the Binary, ALU, LabelOffset, and Opcode classes are included in the Prog2Doc folder in the zip file. • Complete the two methods in the Assembler class, as described below and indicated by the comments (pass1 and pass2). You must follow the instructions for the actions that each of these methods must perform. • Do not change the assemble method in the Assembler class; do not change the LabelOffset or Opcode classes. • You may write additional private methods in the Assembler class. • You are provided with a Java test program (TestAssembler.java) that runs the assembler over four assembly code programs, and compares your program’s output with the correct output files. The tests increase in complexity. Work on passing Test 1, then Test 2, etc. Guidelines for working with others on this assignment: It is strongly recommended that you work with a partner on this assignment. Your partner is the only person with whom you may share your code solution. You may discuss ideas about how to do things in Java with any of your classmates, tutors, and other students. This assignment requires the use of files, Strings, and other Java concepts in ways that may be new to you. You may communicate with others about how to handle these Java concepts. Overview: This is the second of four assignments where we are building a simulation of a simple computer. In this assignment, you will write the Assembler that translates an assembly language program into machine code. The input to the Assembler is a file with assembly language code. The Assembler has two output files: the binary data segment (.data file) and the binary code segment (.code file). Our assembler supports an assembly language that is a restricted version of the A64 language. It supports the eight A64 instructions listed below, as well as a halt (HLT) instruction that indicates the end of the program. In order to differentiate our simulated computer from the ARMv8 emulator, the registers in our simulated computer will be named R0 through R31. We also support a restricted version of the data segment. The only type is a 32-bit word type, and the values will always be signed decimal integers. The data segment will not contain any labels. References to the data segment from within the code segment assume that the first address of the data segment is offset zero. Instructions supported by our simulated computer CS318 Programming Assignment 2 Page 2 of 4 • Add values in Rm and Rn, put result in Rd: ADD Rd,Rm,Rn • Subtract values in Rn from Rm, put result in Rd: SUB Rd,Rm,Rn • Logical AND of values in Rm and Rn, put result in Rd: AND Rd,Rm,Rn • Logical OR of values in Rm and Rn, put result in Rd: ORR Rd,Rm,Rn • Load value from memory address (value in Rm plus literal offset) into Rd: LDR Rd,[Rm,#N] • Store value to memory address (value in Rm plus literal offset) from Rd: STR Rd,[Rm,#N] • Branch to label if contents of Rm are zero: CBZ Rm,label • Unconditional branch to label: B label • End of program (halt): HLT Machine Language format for the Instructions This is the human-readable format with the MSB on the left and LSB (bit at index zero) on the right. The letter ‘B’ indicates a bit that will be filled in with a 0 or a 1. Instr. opcode Source Reg. 2 Shift Amount Source Reg. 1 Dest. Reg. bit range [31-21] [20-16] [15-10] [9-5] [4-0] ADD 100 0101 1000 BBBBB 000000 BBBBB BBBBB SUB 110 0101 1000 BBBBB 000000 BBBBB BBBBB AND 100 0101 0000 BBBBB 000000 BBBBB BBBBB ORR 101 0101 0000 BBBBB 000000 BBBBB BBBBB Instr. opcode Immediate 11-10 Base Reg. Value Reg. bit range [31-21] [20-12] [11-10] [9-5] [4-0] LDR 111 1100 0010 B BBBB BBBB 00 BBBBB BBBBB STR 111 1100 0000 B BBBB BBBB 00 BBBBB BBBBB Instr. opcode Immediate Register bit range [31-24] [23-5] [4-0] CBZ 1011 0100 BBB BBBB BBBB BBBB BBBB BBBBB Instr. opcode Immediate bit range [31-26] [25-0] B 000101 BB BBBB BBBB BBBB BBBB BBBB BBBB Instr. opcode not used bit range [31-21] [20-0] HLT 110 1010 0010 0 0000 0000 0000 0000 0000 Implementation Overview: Assembler pass1 method: Read the assembly code file and determine the size (in bytes) of the data and code segments, also create a list of the labels in the code segment and their relative offsets. At the end of pass 1, the integer size of the data and code segments must be written to their respective output files. • Data segment: the only data type is the 4-byte (32-bit) word type. The size of the data segment is the number of values multiplied by 4-bytes per value. The .word directive will be the first token on each line. The remaining tokens on each line are a comma separated list of signed decimal values. The data segment has an arbitrary number of lines, and each line has an arbitrary number of values. There are no labels in the data segment. CS318 Programming Assignment 2 Page 3 of 4 • Code segment: Each instruction will be stored in 4-bytes (32-bits). The size of the code segment is the number of instructions multiplied by 4-bytes per instruction. • Code segment labels: A label is a string of letters that ends with a colon. The relative offset of a label is (4 x number of instructions before label). This sets the relative offset of the label as the number of bytes in memory from the beginning of the program to the instruction that follows the label. Use the LabelOffset struct to store the text of the label and its offset value. Assembler pass2 method: Read the assembly code file and write the binary data segment and binary machine language code segment to their respective files. Each line of the binary files contains 1 byte (8 bits) where bit 0 is on the left and bit 7 is on the right. The 4 bytes of each number or instruction are written in little byte first order where the smallest byte is first. For example, the 32-bit string (shown in human-readable format with bit 0 on the right): 1000 0101 1010 1100 0011 0101 1010 1100 written to the binary file with the smallest byte first, and each byte written with bit 0 on the left: bit 0 bit 1 bit 2 bit 3 bit 4 bit 5 bit 6 bit 7 1st byte 0 0 1 1 0 1 0 1 2nd byte 1 0 1 0 1 1 0 0 3rd byte 0 0 1 1 0 1 0 1 4th byte 1 0 1 0 0 0 0 1 Assumptions: • Assume that the assembly language code files used for testing will have the correct format. • The literal values in the assembly language code (data segment and code segment) are always in signed decimal representation, and are within the valid range. You may use the conversion methods from the Assignment 1 Binary class. When you need a binary value that is less than 32 bits (such as a 5- bit register name) you may assume that the lowest bits in the 32-bit representation are correct. • The register numbers in the assembly language code are valid (between 0 and 31, inclusive). • Valid file pathnames are provided. The Assembler class may throw FileNotFoundException, IOException, and any other exceptions that are related to file I/O problems. Suggestions: • In pass 1, write the size of the data and code segments to their respective files. In pass 2, make sure to open these files in append mode so that you do not overwrite what was written during the first pass. The FileWriter and PrintWriter classes allow for opening a file in append mode. • One approach for reading the input file is to use the Scanner class and read one line at a time into a String object using the nextLine method. Use the String class trim method to remove leading and trailing whitespace from the string. Then use other String class methods to extract the information you need from the string. Some String methods that may be useful are split and substring.