Advanced Computer Systems (ACS)
Final Exam
This is your final 5-day take home exam for Advanced Computer Systems. The exam will be evaluated on the 7-point grading scale with external grading, as announced in the course description.
Hand-ins for this exam must be individual. Cooperation on or discussion of the contents of the exam with other students is strictly forbidden. The solution you provide should reflect your knowledge of the material alone. The exam is open-book and you are allowed to make use of the book and other reading material of the course. If you use on-line sources for any of your solutions, they must be cited appropriately. While we require the individual work policy outlined above, we will allow students to ask clarification questions on the formulation of the exam during the exam period. A clarification question should not include any solution ideas to questions in the exam, but only pertain to the meaning of theproblem formulations and associated assumptions in the exam text itself. These questions will only be accepted through the forums on Absalon, and will be answered by the TAs or your lecturer. The goal of the latter policy is to give all students fair access to the same information during the exam period.
A well-formed solution to this exam should include a PDF file with answers to all scenario questions as well as questions posed in the programming part of the exam. In addition, you must submit your code along with your written solution. Evaluation of the exam will take both into consideration.
Do not get hung up in a single question. It is best to make an effort on every question and write down all your solutions than to get a perfect solution for only one or two of the questions. Nevertheless, keep in mind that a concise and well-written paragraph in your solution is always better than a long paragraph.
Note that your solution has to be submitted via Digital Exam in electronic format. In the unlikely event that Digital Exam is unavailable during the exam period, we will publish the exam in Absalon and expect any issues to be reported to the email [email protected]. It is your responsibility to make sure in time that the upload of your files succeeds. We strongly suggest composing your solution using a text editor or LaTeX and creating a PDF file for submission. Paper submissions will not be accepted, and submissions will only be accepted in Digital Exam.
Learning Goals of ACS
We attempt to touch on many of the learning goals of ACS. Recall that the learning goals of ACS are:
(LG1) Describe the design of transactional and distributed systems, including techniques for modularity,performance, and fault tolerance.
(LG2) Explain how to employ strong modularity through a client-service abstraction as a paradigm tostructure computer systems, while hiding complexity of implementation from clients.
(LG3) Explain techniques for large-scale data processing.
(LG4) Implement systems that include mechanisms for modularity, atomicity, and fault tolerance. (LG5) Structure and conduct experiments to evaluate a system's performance.
(LG6) Discuss design alternatives for a modular computer system, identifying desired system propertiesas well as describing mechanisms for improving performance while arguing for their correctness.
(LG7) Analyze protocols for concurrency control and recovery, as well as for distribution and replication.
(LG8) Apply principles of large-scale data processing to analyze concrete information-processingproblems.
Scenarios
Question 1: Data Processing (LG3, LG8)
A car rental company BestCars maintains an operational database storing information on their customers, cars, as well as rental records.
The management of BestCars would like to analyze the interest of their customers in terms of the size of cars. In particular, they assign a bunch of data analysis tasks to you, the best data scientist and developer in BestCars. One of the tasks is as follows: “Produce a list of ids of the users who have only rent big cars, but not small cars.”
Since the analysis is very heavy, to avoid any interference with the operational system, the necessary data for your task has been pre-extracted outside of the database and stored separately. For the aforementioned task, you are provided with two heap files storing two tables whose schemas are as follows: (1)
Cars(car_id, make, model, size_category), and (2) Rentals(user_id, car_id, time). You can assume thevalues of size_category can be either “small” or “big”. You are provided with one server with memory larger than the square root of any table, but insufficient to hold any one of them completely. This applies to any temporary tables that are generated by your algorithm. Therefore, you have to resort to external memory algorithms to solve the problem. If you need to make any further assumptions in your answers, please state them clearly as part of your solution. Please write down your solutions to the following questions.
1. State an external memory algorithm with the minimum cost in terms of I/O to answer the query above. Argue for the algorithm’s correctness and efficiency.
2. State the I/O cost of the algorithm you designed. You can make your own notation of the size of each attribute in any table.
NOTE 1: To state an algorithm, you can reference existing sort-based or hash-based external memoryalgorithms. You should not state all the steps of these existing algorithms from scratch again, but you should clearly state the steps that you need to change in the algorithms you reference, and also how you change these steps. To describe how you change a step, refer to the step and list the sub-steps that need to be executed to achieve your goal.
NOTE 2: Instead of just using several existing external memory algorithms in sequence as black boxes,you should design a single algorithm that addresses the whole task holistically. That is why in NOTE 1 we expect that you will need to show changes to steps of existing algorithms, if necessary.
NOTE 3: Your solution will be graded partly according to the optimality of your algorithm and the qualityof the justifications. That is to say, a solution with justified lower I/O cost will in general help you get a better grade.
Question 2: Replication (LG7)
Three processes, P1, P2, and P3, replicate a common object. The object is updated asynchronously by three different clients, and each client happens to dispatch its update to a different process out of P1, P2, and P3. After receiving each update, the respective processes incorporates the update in its local replica
and then propagates the new value to the other processes. The precise exchange of messages is shown in the following figure:
Figure 1: Exchange of messages among processes
Note that P1 got a client update at A, P2 at B, and P3 at D; the clients are omitted from the figure to avoid cluttering. The system uses vector clocks to keep track of the object version internally. You may assume that all local clock values start at zero at the beginning of the interaction shown. In addition, assume that only updates from clients cause an increment of a local clock, i.e., not every event causes an increment of the local clock except for original client updates. Therefore, (1,1,1) is the maximum vector clock possible in the given interaction. The latter policy ensures that, if update propagation reaches all processes, then we expect the object’s version to converge as either overwriting or merging through a commutative associative function takes place.
Every time an update is received by a process P, the process chooses one out of the following actions to incorporate the update:
(A1) Keep P’s own value;
(A2) Overwrite P’s value with the latest value from the message;
(A3) Call an application-specific merge function to combine both values consistently.
For each point in the execution from A to K shown in Figure 1, give the value of the vector clock at the corresponding process before the update event is processed, the value of the vector clock received in the incoming message or sent in the outgoing message, the action taken by the process to incorporate the update, and the value of the vector clock after the update is processed. Use a table with the following format to report your answer:
Event |
Vector Clock at |
Message Vector |
Action Taken by |
Vector Clock at |
Process Before |
Clock |
Process |
Process After |
|
A |
… |
… |
… |
… |
B |
… |
… |
… |
… |
… |
… |
… |
… |
… |
In addition to the table, provide a paragraph with a brief justification for your reasoning for constructing the table, i.e., discuss when each action should be taken depending on the vector clock values at the given message and event.
Question 3: ARIES Recovery (LG7)
In the recovery scenario described in the following, we ask you to explain the functioning and justification behind the ARIES recovery algorithm. At the beginning of time, there are no transactions active in the system and no dirty pages. A checkpoint is taken. After that, four transactions, T1, T2, T3, and T4, enter the system and perform various operations. The detailed log follows:
LOG |
|||||
LSN |
PREV_LSN |
XACT_ID |
TYPE |
PAGE_ID |
UNDONEXTLSN |
— |
——– |
——- |
—- |
——- |
———– |
1 |
– |
– |
begin CKPT |
– |
– |
2 |
– |
– |
end CKPT |
– |
– |
3 |
NULL |
T1 |
update |
P3 |
– |
4 |
3 |
T1 |
update |
P1 |
– |
5 |
NULL |
T2 |
update |
P1 |
– |
6 |
NULL |
T3 |
update |
P3 |
– |
7 |
5 |
T2 |
update |
P2 |
– |
8 |
NULL |
T4 |
update |
P2 |
– |
9 |
4 |
T1 |
update |
P4 |
– |
10 |
8 |
T4 |
commit |
– |
– |
11 |
9 |
T1 |
abort |
– |
– |
12 |
7 |
T2 |
abort |
– |
– |
13 |
12 |
T2 |
CLR |
P2 |
A |
14 |
10 |
T4 |
end |
– |
– |
15 |
11 |
T1 |
CLR |
P4 |
B |
16 |
15 |
T1 |
CLR |
P1 |
C |
17 |
13 |
T2 |
CLR |
P1 |
D |
18 |
16 |
T1 |
CLR |
P3 |
E |
19 |
18 |
T1 |
end |
– |
– |
——– XXXXXXX —— CRASH! |
—— XXXXXXX ——– |
A crash occurs at the end of the log as shown. Considering again that the system employs the ARIES recovery algorithm, answer the following questions and justify each of your answers.
1. In the log, the undonextLSN values for several CLRs are represented by the letters A–E. For each letter, state the value of the corresponding undonextLSN and why it has the given value.
2. What are the states of the transaction table and of the dirty page table after the analysis phase of recovery? Explain why.
3. Show the additional contents of the log after the recovery procedure completes. Provide a brief explanation for why any new log records shown need to be added.
4. In the system shown, let us assume that the database is memory-resident, i.e., upon start-up all contents of nonvolatile storage are loaded into volatile storage (which is large enough to cache everything). Thus, no page replacements are ever necessary during normal operation. Furthermore, assume plenty of log buffer space is available and no background page writes are performed. Under these conditions, answer the following:
a. Which log records could trigger writes to stable storage during normal operation in the scenario above? Why?
b. Following the ARIES log record structure introduced in the course for update records, is there any information that would not need to be written to stable storage in such a scenario? If so, explain which information and why. Otherwise, justify why all information in these records needs to reach stable storage.
Programming Task
In this programming task, you will develop a simplified edge-to-cloud distributed interpreter abstraction in a scenario inspired by emerging self-checkout supermarkets. Through the multiple questions below, you will describe your design (LG1), expose abstractions as a services (LG2), design and implement these services (LG4, LG6, LG7), and evaluate your implementation with an experiment (LG5).
As with assignments in this course, you should implement this programming task in Java, compatible with JDK 8. As an RPC mechanism, you are only allowed to use Jetty together with XStream and/or Kryo, and we expect you to abstract the use of these libraries behind clean proxy classes as in the assignments of the course. You are allowed to reuse communication code from the assignments when building your proxy classes. You are also allowed to use skeleton code from Assignment 3 to orchestrate experimental workload and measurements as well as JUnit testing skeleton code from the assignments.
In contrast to a complex code handout, in this exam you will be given simple interfaces to adhere to, described in detail below. We expect the implementation that you provide of these interfaces to follow the description given, and to employ architectural elements and concepts you have learned during the course. We also expect you to respect the usual restrictions given in the qualification assignments with respect to libraries allowed in your implementation (i.e., Jetty, XStream, Kryo, JUnit, and the Java built-in libraries, especially java.util.concurrent).
The Self-Checkout Supermarket Scenario
We focus on a scenario inspired by recent developments in self-checkout supermarkets.1,2 In this near-future scenario, a machine learning component is deployed at each store of the fictitious acertainsupermarket.com firm to automatically identify items that are added or removed from each customer’s cart. These events trigger function calls to an edge service that keeps track of the state of the carts for the multiple customers. Many edge service instances are run to account for geographical distribution and reduce latency on queries to the cart state. As such, an edge service instance can support the activities of a number of nearby supermarkets. However, a supermarket interacts with one and only one edge service instance.3 The edge service instances, in turn, invoke function calls on a centralized cloud service that keeps track of item prices and stocks to support the supply chain operation of the supermarket. These calls to the cloud service comprise queries for price and other details for single items as well as updates to item stocks.