Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CSSE2002/7023 Assignment
Objectives
To improve your understanding of the time complexity of algorithms and data structures. To design com-
putational solutions to problems and develop problem-solving skills. To improve written communication
skills; in particular the ability to present algorithms clearly, precisely and unambiguously.
Scenario
In this assignment we consider a data center that receives requests for webpages and returns back the
requested pages. The data center receives millions of requests and needs to answer them with minimal
delay. As a result, it stores millions of webpages across servers to ensure it can answer requests in parallel.
The data center has two offerings on hosting sites with different guarantees on availability and response
time. The first one is aimed at critical web applications (e.g., hosted by hospitals and banks) and the
second one for less critical applications that can sustain a small delay. In this assignment you will design
data structures and algorithms for answering webpage requests efficiently over time.
Note that you can work on each problem separately as each problem is independent of others even if
the notation and the scenario is shared. However, it is advisable to read all problems before commencing.
Problem 1 [8 points]
Each request r arrives with the following information represented as a tuple: r:arrival time, r:webaddress,
r:is critical and r:source, where
• r:arrival time is the arrival time in milliseconds (represented as integers);
• r:webaddress is the address of the requested page;
• r:is critical is the offering that was chosen by the owner of webaddress and can either be True to
denote a critical address and False otherwise;
• r:source denotes where the request was made from and where the webpage should be sent back to.
More than one request can arrive at the same time. Hence, every new request that comes has arrival time
that is either the same or higher than arrival time of previous requests. More than one request can arrive
for the same page. However, if more than one request arrives for the same page, their r:source will be all
distinct. Note that is critical is the same for all requests with the same webaddress as it depends on the
webaddress only.
A dispatcher program processes the requests, stores them, decides which request to process next based
on their priorities. In order to determine which request to answer next, the dispatcher chooses the request
depending on its arrival time and its is critical value. Any request r1 with is critical of True is answered
before a request r2 with is critical of False, if r1 arrives at most 5 milliseconds after r2. Ties are broken
arbitrarily.
A. [3 points] Suppose you have access to is higher priority(r1; r2) function that takes two requests r1
and r2 and returns True if r1 should be answered before r2 given the priority rule described above,
and False otherwise. Given is higher priority function one can use a heap to insert a new request and
eject the next request. We refer to this as Approach 1.
Consider Approach 2 that does not use is higher priority. Instead it uses two heaps: one heap to
maintain requests to critical pages (i.e., those with is critical set to True) and one heap for requests
to non-critical pages.
Let m0 and m1 be the number of requests to critical and non-critical webpages, respectively. In
the worst case, what is the number of comparisons that have to be performed by each of the
approaches for inserting and ejecting a new request. Give your answers using asymptotical notation.
Is Approach 1 or Approach 2 more superior asymptotically? Argue your answer.
B. [4 points] As users, we tend to start refreshing our browsers if a webpage is not loading. This results
in a duplicate request coming to a data center with a new arrival time. We call a request rnew a
duplicate of rold if their webaddress and source fields are the same. To avoid creating a duplicate
request, we wish to update an existing request instead. However, the update to a new request time
also results in the update of the priority of this request.
Write a pseudo-code for a function HeapUpdate(H; rold; rnew) that updates a data structure H of
Approach 1 so that rold is replaced with rnew and the data structure is adopted to re ect the change
in the priority. You can assume that rold is in H. When operating on heap, assume that H is stored
as an array, starting at index 1 as defined in Lecture 13 (slides \Heaps as Array”).
(No mark: Think whether this is this a good idea to update requests based on their new time.)
C. [1 point] The data center has k servers to improve the response time and hence can answer up to
k requests in parallel. Consider the following incorrect algorithm for finding top k requests using
Approach 1: the algorithm returns elements H[1];H[2]; : : : ;H[k].
Give two examples of heaps with number of requests m 5 and k 3. In the first example,
the above algorithm would return an incorrect top k requests. In the second example, the above
algorithm would return the correct top k requests.
In your examples, assume that is critical is True in all requests. Your heaps should be given using
array notation, e.g., H = [ ; a; b; c], de nes a heap with root a, left node b and right node c. You
only need to indicate the value of arrival time in your heap and ignore all other elds.