Computer Organization and Systems
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Assignment 2: Record Reader
CSE30: Computer Organization and Systems, Summer Session 1
Please read over the entire assignment before starting to get a sense of what you will need to
get done in the next week. REMEMBER: Everyone procrastinates but it is important to know
that you are procrastinating and still leave yourself enough time to finish. Start early, start often.
You MUST run the assignment on the pi-cluster. You HAVE to SSH: You will not be able to
compile or run the assignment otherwise. The assignments are getting longer as this course
progresses.
ACADEMIC INTEGRITY REMINDER: you should do this assignment entirely on
your own with help only from course staff. Consulting with other students (past
or present) who are not in the course may result in an academic integrity violation
which can have serious consequences.
Need help or instructions? See CSE 30 FAQs
Table of Contents
1. Learning Goals
2. Assignment Overview
3. Getting Started
a. Developing Your Program
b. Running Your Program
4. How the Program Works
5. Submission and Grading
a. Submitting
b. Grading Breakdown [50 pts]
Learning Goals
● Parsing input
● Using pointers in C
● Using arrays of pointers
● Allocating memory at runtime with malloc()
Assignment Overview
A common format for moving data between systems is called a “Delimiter Separated Values”
(DSV) file. A DSV file uses a delimiter (',', '#', ' ', '\t', etc) to separate values on a line
of data1. Each line of the file, or “data record”, ends with a newline2 ('\n'). Each record
consists of one or more data fields (columns) separated by the delimiter. Fields are numbered
from left to right starting with column 1. A DSV file stores tabular data (numbers and text) in
plain text (ASCII strings). In a proper DSV file, each line will always have the same number of
fields (columns).
In the example above we have a sample of a Call Detail Record (CDR) in DSV format with the
delimiter as a comma. A CDR file describes a cell phone voice call from one phone to another
phone. Each record has 10 fields or columns. In this example, the first record of the file is a
label for that column (field). Each column can be empty, and the last column is never followed
by a ','. It always ends with a '\n' for every record.
The data that you will deal with in this assignment is sent in a DSV file where the delimiter is
whitespace.You must pick out the right columns from these DSVs to analyze.
2 In Unix systems, lines generally end with '\n' which is ASCII 0x0a (linefeed). On Windows systems,
the end of line is sometimes 2 characters "\r\n" ASCII 0x0d (carriage return) followed by ASCII 0x0a
(linefeed). The Wikipedia page describes this awkward situation. Windows’ use of CRLF dates back to
MS-DOS (1981). Unix, of course, is older. TOPS-10 (c 1967) used CRLF so it predates Unix. The terms
Linefeed and Carriage Return refer to old teletypes: A linefeed would scroll the paper to the next line and
a carriage return would move the print head to the left side of the paper. The Teletype Model 33 is a
famous one. (Professor Chin used this in elementary school!)
1 You may have noticed this option when saving a spreadsheet (Save As CSV file).
Getting Started
Developing Your Program
For this class, you MUST compile and run your programs on the pi-cluster. To access the
server, you will need your cs30wi22xx student account, and know how to SSH into the cluster.
1. Download the files in the repository.
directly from pi-cluster, or download the repo locally and scp the folder to
pi-cluster if that doesn’t work.
2. Fill out the fields in the README before turning in.
Running Your Program
We’ve provided you with a Makefile so compiling your program should be easy!
Additionally, we’ve placed the reference solution binary at:
/home/linux/ieng6/cs30wi22/public/bin/reader-a2-ref
You can use reader-a2-ref as a command as follows:
/home/linux/ieng6/cs30s122/public/bin/reader-a2-ref -c num_cols
< input-file
Makefile: The Makefile provided will create a reader executable from the source files
provided. Compile it by typing make into the terminal. By default, it will run with warnings turned
on. To compile without warnings, run make WARN= instead. Run make clean to remove files
generated by make.
How the Program Works
You will be given the number of columns in each record of the file and a list of column indices.
There may also be a flag, -c. Read input from stdin, parse it as a DSV, and print the
specified columns in the given order. If the -c flag was provided, you also need to print
statistics about the number of records in the input and the longest field (in terms of number of
characters).
Executable called with stdin from the command-line:
./reader [-c] num_cols
The angle brackets around “list of cols ...” indicates it is a mandatory list.
It may look like the program hangs. It is waiting for you to type input and enter it on the
command-line. Type what you want and press enter to send that line to stdin. If you want a
more streamlined way to enter text, redirect stdin as shown below.
Executable called with stdin redirected from a file:
./reader [-c] num_cols < input_filename
“< input_filename” is not part of the command-line arguments, and you will find neither
“<” nor “input_filename” in argv! The “<” is the indirection operator. It is not part of C,
and is in fact part of the bash shell. It sends the contents of input_filename to your
program’s stdin. You can read more about redirection operators on GNU.org.
Delimiters
In our DSVs, fields will be delimited by whitespace of any length. For example, 2 spaces
(' ') is a single delimiter, as is 1 space (' '), 1 tab ('\t'), 3 tabs ('\t\t\t'), or 1 space
and 1 tab (' \t'), or similar combinations. Therefore, the following two DSVs would,
functionally, have the same records.
It means no worries
For the\trest of your days
It's our problem-free philosophy
Hakuna\t\t Matata!
It means no worries
For the rest of your days
It's our problem-free\tphilosophy
Hakuna \t Matata!
Once you have read one record (line) from stdin, your job is to delimit it and print out the
columns corresponding to the indices that are passed in.
Program Implementation
You will be given the following as command-line arguments:
● The number of columns in a record
● A list of column indices
○ This list of indices always comes after the number of columns.
● Optionally: The -c flag. This flag can only appear as the first argument.
Note: You should only choose from the following library functions: malloc(), free(),
atoi(), printf(), fprintf(), getline(), strcmp(), strlen(), and exit() in your
code. You may not use any other function such as strtok().
Input guarantees
You do not have to handle degenerate input - input that violates any of these guarantees.
● All records will have at least one column.
● All the records in the file will have the same number of columns.
● No record will begin with whitespace.
● There will not be any empty columns. (In fact, it is impossible to create one: Our delimiter
is any amount of whitespace, and the only remaining possible location, the beginning, is
ruled out because no record can begin with whitespace.)
● The -c flag only appears as the first argument. If it appears, it will be as argv[1].
● The value provided for “number of columns” on the command line will be positive.
● The indices are valid decimal integers.
● The indices provided will be in-bounds, i.e. if num_col is the number of columns in a
record, then the indices will always be in the range [-num_col, num_col - 1]
inclusive.