CSE 130: Principles of Computer System Design
Goals
The goal for Assignment 3 is to modify your multi-threaded RPC server to implement recursive name resolution, provide im-
mediate persistence for key-value pairs, and improve overall performance and scalability. It must still provide all of the original
functionality from earlier assignments, as well as the following additional features.
1. Ability for a variable to contain a variable name or a value. This includes setting and retrieving these names.
2. Recursive name resolution: if a variable contains a name, look up that name and use its value instead. Repeat until you hit a
number, or until you exceed a limit.
3. Variables must be persistent before your server acknowledges an operation. We’ll provide details on a simple way to do this;
there are faster, but perhaps more complex ways that you may use.
4. Support for millions of variables (key-value pairs). Your hash table will have to support large size for this.
5. Improving performance of your server. Details below.
As usual, you must have a design document and writeup along with your README.md in your git repository. Your code
must build rpcserver using make.
Programming assignment: Recursive name resolution, persistence, performance
Design document
Before writing code for this assignment, as with every other assignment, you must write up a design document. Your design
document must be called DESIGN.pdf, and must be in PDF.
Your design should describe the design of your code in enough detail that a knowledgeable programmer could duplicate your
work. This includes descriptions of the data structures you use, non-trivial algorithms and formulas, and a description of each
function with its purpose, inputs, outputs, and assumptions it makes about inputs or outputs.
Write your design document before you start writing code. It’ll make writing code a lot easier. Also, if you want help with
your code, the first thing we’re going to ask for is your design document. We’re happy to help you with the design, but we can’t
debug code without a design any more than you can.
You must commit your design document before you commit the code specified by the design document. You’re welcome
to do the design in pieces (e.g., overall design and detailed design of marshaling functions but not of the full server), as long as
you don’t write code for the parts that aren’t well-specified. We expect you to commit multiple versions of the design document;
your commit should specify why you changed the design document if you do this (e.g., “original approach had flaw X”, “detailed
design for module Y”, etc.). If you commit code before it’s designed, or you commit your design a few minutes before the
working code that the design describes, you will lose points. We want you to get in the habit of designing components before
you build them.
Since this assignment builds on the previous assignments, you may include material from earlier design documents in your
design for Assignment 3. Don’t assume that someone has read your earlier design documents! Thus, we expect that you’ll include
design (and code) from your earlier assignments this quarter.
Program functionality
This assignment incorporates all of Assignment 2, including the parts from Assignment 1, and builds on them. You should be
familiar with the material from the earlier assignments, so we’re not going to repeat the parts of the RPC protocol that you’ve
already implemented. However, you’ll be responsible for ensuring that earlier parts work, and we will test the functionality
from all three assignments in our testing suite.
As with earlier assignments, you may only use system calls (read(), write(), send(), recv()) for input or output of
user data; this includes persisting key-value pairs (variables). You may use string functions to manipulate user data, and you may
use fprintf() or similar calls for error messages. Note that string functions like sprintf() and sscanf() don’t perform
user input or output, and are thus allowed.
Your code may be either C or C++, but all source files must have a .cpp suffix and be compiled by clang++ with no errors
or warnings using the following flags: -std=gnu++11 -Wall -Wextra -Wpedantic -Wshadow
Assignment 3, CSE 130, Fall 2020 - 1 - © 2020 Ethan L. Miller
Variables holding variable names
Assignment 3 provides only two new RPC calls: setv and getv. These calls are used to set and get a variable’s value if it’s the
name of another variable. All operands and return values for these RPC calls are variable names, with the same format as used
in Assignment 2 (single byte length, followed by the characters in the name). As usual, if there’s an error, no data is returned
following the response header. setv returns an EINVAL (22) error if either variable name is malformed. getv returns an
ENOENT (2) error if the variable name doesn’t exist, and EFAULT (14) if the variable name exists, but its value isn’t a variable
name (in other words, return EFAULT if the variable’s value is an actual number).
Operator Opcode Operands Returns
getv 0x0108 1 Variable name
setv 0x0109 2 Nothing
Note that this requires that the elements in your hash table have space for either a number or a variable name at any given
time, but never both at the same time. You could do this with two slots, but we recommend using a single slot declared as a
union to save space. Use a single flag bit elsewhere in the key-value structure to indicate what type of value it is.
Recursive resolution
Remember those flags from Assignment 2? We’re adding a fourth flag, 0x80, which requests a recursive name resolution if the
flag is set. This flag may be set for any of the math operations (add, sub, mul, div, mod). If it’s set, your server keeps reading
variables, recursively resolving the name, until it gets a number. If it hits a limit (specified on the command line), it returns an
ELOOP (40) error. Note that this approach means that a client that doesn’t know about recursive name lookups gets the same
results that it expected: operation without recursive name resolution! The opcodes including recursive name resolution are:
Operator Opcode Operands
add (+) 0x0181 2–3
sub (-) 0x0182 2–3
mul (*) 0x0183 2–3
div (/) 0x0184 2–3
mod (%) 0x0185 2–3
If the operation doesn’t have the 0x80 bit set, it does a normal, non-recursive, lookup: find the key in the hash table, and use
the value. However, the value it finds can now be a variable name instead of a number! If this happens on a non-recursive lookup,
it’s an error: EFAULT (14).
Think of recursive name resolution like pointer chasing. Suppose you have the following variables defined:
Variable Value
x foo
y 12
z x
foo 8
Recursive name resolution of foo returns 8, without needing to iterate. Similarly, y resolves to 12. x resolves to 8 (x →
foo→ 8), and z resolves to 8 as well (z→ x→ foo→ 8).
If you get stuck in an infinite loop (it can happen!) or an excessively long chain, you’ll break out of the iteration after a
number of iterations specified on the command line. The default is 50 iterations, but this could be set to 1000 by including the
-I 1000 option.
Persistence
You’re going to keep your hash table from Assignment 2, but you now have to make sure that your variables are persistent after
they’re modified. In other words, you have to be able to ensure that, when you start up, you can read in the state of your server
from when it was last stopped. Keep in mind that your server can be “killed” at any time.
Your server is responsible for ensuring that, when you respond to an RPC, all changes that the RPC made must be reflected
in the file system somehow. Your server will be given a scratch directory using the -d dir option. You may use this directory
in any way you want: creating files, creating subdirectories, anything. If your server is killed, and then restarted using the same
scratch directory, it must start up with the same variables and values it had when it was killed.
The simplest way to do this is to maintain a single file for each variable. The files will be small—even the longest variable
name is only 31 bytes. However, there can be (literally) millions of them. With this approach, you’d associate each variable with
Assignment 3, CSE 130, Fall 2020 - 2 - © 2020 Ethan L. Miller
a file. This will work, and will be easy to code up, but may not be too fast. However, it’s a good start, and will ensure that you
get this part of your code working quickly.
Performance & scalability
Simple things first: you need to support up to 64 worker threads in your pool, specified using -N as before. This should (hopefully)
not be a major change.
Your server needs to support millions of variables. That means you’ll need efficient structures for storing your keys and
values. You may use any techniques you want to reduce the total amount of memory you use for your table, as long as your server
always get the right answers. There are many ways to save memory:
• Use a union for the value field in your hash table.
• Have different size elements in your linked list: use larger ones only if needed. You may need to keep free lists
for each of the sizes to be efficient.
• XOR linked lists. It’s a thing—look it up!
• Shave bits wherever you can.
Do you need to do these things? No. But we’re going to award 20 extra credit points to the server that uses the least memory
on the memory-intensive test, 15 extra credit points to the two next best program, and 10 extra credit points to each of the next
five.
Similarly, we’re going to give extra credit points with the same distribution for the fastest servers, as measured by our
rpcclient. Again, there are multiple ways to make your server faster. The most important is going to be improving on “file
per variable” for storing your values persistently, and improving how you lock your hash table.
In order to qualify for extra credit, your program needs at least 125 (out of 150) points of regular credit. Make sure your
code works before trying to optimize it for extra credit. Submissions that use grace days are eligible for extra credit, but late
submissions without grace days are not.
Yes, we’re being deliberately vague on how to make your server bigger and faster. We expect you to do some research on
this to figure out how. You can meet the minimum requirements with the information we’ve given you, but if you expect extra
credit for having the smallest and/or fastest (yes, you’re eligible for both) server, you need to do some extra work on your own.
You’re welcome to discuss general approaches with others in the class but, as usual, you can’t see each others’ design documents
or code, and you can only take written notes on the discussion.
Testing your code
You should test your code on your own system. You can run the server on localhost using a port number above 1024 (e.g.,
8912). Come up with requests you can make of your server, and try them using the Python program we’ve posted on Canvas.
When you’re ready to submit your code, you should consider cloning a new copy of your repository (from gitlab.soe.ucsc.edu)
to a clean directory to see if it builds properly, and runs as you expect. That’s an easy way to tell if your repository has all of the
right files in it. You can then delete the newly-cloned copy of the directory on your local machine once you’re done with it. As
with other assignments, there will be an automated service on gitlab.soe.ucsc.edu that will allow you to run a subset of
the tests that we’ll run on your code for each assignment. See earlier assignments for details.
README and Writeup
As for previous assignments, your repository must include (README.md) and (WRITEUP.pdf). The README.md file should
be short, and contain any instructions necessary for running your code. You should also list limitations or issues in README.md,
telling a user if there are any known issues with your code.
Your WRITEUP.pdf is where you’ll describe the testing you did on your program and answer any short questions the
assignment might ask. The testing can be unit testing (testing of individual functions or smaller pieces of the program) or
whole-system testing, which involves running your code in particular scenarios.
For Assignment 3, please answer the following questions:
• What did you do to increase concurrency in your server? How does it help improve concurrency?
• How is concurrency affected by recursive name lookup, if at all?
• What additional tests did you do because performance and scalability matter?
Assignment 3, CSE 130, Fall 2020 - 3 - © 2020 Ethan L. Miller
Submitting your assignment
All of your files for Assignment 3 must be in the asgn3 directory in your git repository. Your repository must meet the
Assignment Acceptance Criteria described on Canvas when you submit a commit for grading. It’s OK to commit and push
a repository that doesn’t meet minimum requirements for grading. However, we’ll only grade a commit that meets these
minimum requirements. You must submit the commit ID you want us to grade via Google Form, linked to the assignment
page on Canvas. This must be done before the assignment deadline.
ALL ASSIGNMENTSMUST BE SUBMITTED BY SATURDAY, DECEMBER 12TH AT 8 PM, REGARDLESS OF
GRACE DAYS. Assignments submitted after this time will not be accepted.
Hints
• Start early on the design. You should reuse as much of your design and code from your earlier assignments as
you can.
• Attend section the weeks of November 23rd and 30th for help.
• The rules on system calls for Assignment 1 apply for Assignment 3 as well, with exceptions noted above. You
should also get familiar with libpthread.
• Test your code with lots of clients running in parallel, and with lots of variables. You can create a file with a
million variables, and load it. Most of you should know Python, which is very good at creating such text files.
• Consider using getopt(3) to parse command-line options.
• Test your server using the RPC client we’re providing. There will be a new C++ rpcclient posted before
Thanksgiving. Some of the command line options are slightly different, but it provides the same functionality
as the Python version and is a lot faster. However, we’re distributing it as object code, so it might not run
anywhere except Ubuntu 20.04. We’ll keep the Python rpcclient up to date as well, but we’re going to
rename the Python client to rpcclient.pyc so we can post them both. You’ll need version 3 of either of
them.
• If you need help, use online documentation such as man pages and documentation on Makefiles. You’ll
need to install glibc-doc to get the man pages for pthreads. If you still need help, ask the course staff.
You should be familiar with the rules on academic integrity before you start the assignment.
• Read and follow the Assignment Acceptance Criteria on Canvas.
Grading
As with all of the assignments in this class, we will be grading you on all of the material you turn in, with the approximate
distribution of points as follows: design document (35%); coding practices (15%); functionality (40%); writeup (10%). For
Assignment 3, we’re adding an extra 50% for functionality. This means that Assignment 3 is worth 1.5× Assignments 1 and 2.
The extra functionality testing will allow us to (re-)test functionality from earlier assignments, and include it in your grade here.
If you didn’t get something working earlier, now is your chance to get it working and get credit for it!
If you submit a commit ID without a green checkmark next to it or modify .gitlab-ci.yml in any way, your
maximum grade is 5%. Make sure you submit a commit ID with a green checkmark.