Algorithm Design COMP3027/3927
Algorithm Design
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Algorithm Design COMP3027/3927
Pre-tutorial questions
Do you know the basic concepts of this week’s lecture content? These questions are only to test yourself.
They will not be explicitly discussed in the tutorial, and no solutions will be given to them.
1. Divide and Conquer
(a) What is the general idea of a Divide-and-Conquer algorithm? Can you break up the general
structure into three steps?
(b) Why are recurrences usually needed to analyse the running time of a Divide-and-Conquer algo-
rithm?
2. Algorithms
(a) Describe the general idea of the algorithm for maximum contiguous subarray.
(b) What are the three main steps in the algorithm for counting inversions?
(c) In the combine step of the Closest-Pair algorithm,
i. Why do we only need to consider points that are within the δ-strip to the left of the dividing
line and the δ-strip to the right?
ii. Suppose the points in these strips are s1, . . . , sk, in ascending order of their y-coordinate. Why
is it that we only need to check pairs (si, sj) for |i− j| ≤ 7?
3. Recurrences
(a) State the master method (write down all three cases).
(b) Try to explain the master method using a recursion tree.
4. Review Tutorial 11 and 12 from COMP2123 on solving recurrences using master method as well as
designing divide and conquer algorithms.
Guide for how to answer tutorial questions
“Pre-work”
1. Draw it out
(a) Use this to identify what your algorithm needs to achieve and what information is given to you.
Can you understand and solve a simple instance of the problem?
2. Why use divide and conquer?
(a) Identify whether and why divide and conquer is a good strategy for this problem. When would
you use divide and conquer over a greedy approach?
3. Brainstorming
1
(a) Algorithms and efficiency aside, how would you and your brain solve this problem? Brute force?
Can you see a pattern?
Design your algorithm
1. Divide
(a) What are your subproblems?
• Ensure these are smaller, independent versions of the exact same problem.
• Check whether you need to process the input in any way to construct the subproblems.
(b) What are your base cases?
• Ensure that the base cases will always be reached in the recursion. The base cases are typically,
but not always, trivial to solve in constant time.
2. Delegate
(a) Recurse on subproblems, aka delegate to the Recursion Fairy.
3. Combine
(a) Assuming you have the correct solution to the subproblems (by inductive hypothesis), how can
you combine them to get a solution to the original problem? It might help to look at simple
instances with one level of recursion, i.e. the subproblems are base cases.
(b) Can you spot a general pattern you can apply to combining general subproblem solutions?
“Post-work”
1. Test your solution
(a) Come up with some edge cases to test your solution on.
2. Argue Correctness
(a) Arguing the correctness of a divide-and-conquer algorithm almost always requires induction
• (To understand why think about the connection between the structure of a divide and conquer
algorithm and compare it to inductive reasoning.)
3. Analyse the running time of your solution
(a) This will often involve setting up and solving a recurrence using a recursion tree.
(b) Setting up the recurrence:
For this you will need to identify the following:
• The number of recursive calls (k)
• The size of the input passed to each recursive call relative to the current size of the input
(n/d)
• The amount of work performed at each recursive call relative to the size of the input (f(n))
This information can be captured by a recurrence relation as follows:
T (n) = k ∗ T (n/d) + f
2
Tutorial
Problem 1
Suppose we are given numbers a, n, where n > 0 is an integer. We wish to calculate the number an.
What is the quickest way to do this? How many multiplication operations do we have to perform? Of
course, we may compute 198 by calculating 19 × 19 = 361, then calculating 193 = 361 × 19 = 6859, then
194 = 6859× 19 = 130321, and so on, until we get 198. This takes seven multiplications in total. Is this the
quickest possible? Note that 8 = 2 × 4, so we can also write 198 = 194 × 194. If we compute 194 first, and
then square it, we need only one more multiplication. The straightforward method would require four more
multiplications: 198 = 194 × 19× 19× 19× 19. Similarly, 194 = 192 × 192. So if we calculate 192 = 361 with
one multiplication, 194 = 3612 = 130321 with one more, we get 198 = 1303212 = 16983563041 with the third
multiplication. This cleverer method requires only three multiplications. The method above seems to work
when the exponent n is even. What do we do when it is odd? Say, we would like to calculate 197. We may
write 7 = 6+1, so 197 = 196×19, then 196 = 193×193, and finally 193 = 192×19. So 193 = 361×19 = 6859,
196 = 68592 = 47045881, and 197 = 47045881× 19 = 893871739. The straightforward method of calculation
requires 6 multiplications, and we needed only 4 here. We can combine the ideas from the two examples
above to get a procedure to calculate the power an for any pair a, n.
Algorithm 1 Power
1: function Power(a, n)
2: if n = 1 then
3: return a
4: end if
5: if n is even then
6: b = Power(a, n/2)
7: return b2
8: else
9: b = Power(a, (n− 1)/2)
10: return a× b2
11: end if
12: end function
Prove that the algorithm correctly computes an. Can you bound the number of multiplications for each
n?
3
Solution: The algorithm Power(a, n) correctly calculates an, as long as it correctly
calculates the relevant smaller powers of a. We may prove this by induction over n that this
happens for any number a.
Proof The base case is n = 1, where the algorithm correctly returns a. For the induction
hypothesis, assume that the algorithm is correct for all k, with 1 ≤ k ≤ n − 1. Consider
Power(a, n). If n is even, the algorithm returns the value b2, where b =Power(a, n/2).
Since n > 1 and even, n/2 is an integer, and 1 ≤ n/2 ≤ n− 1. By the induction hypothesis,
Power(a, n/2) correctly returns an/2, so the algorithm returns b2 = an. If n is odd (n > 1)
then (n − 1)/2 is an integer, and 1 ≤ (n − 1)/2 ≤ n − 1. By the induction hypothesis,
b =Power(a, (n− 1)/2) = a(n−1)/2, and the algorithm correctly returns a× b2 = an in this
case as well. So the algorithm is correct by induction.
Consider the number of multiplications used by the algorithm. It can be described by the
recurrence T (n) = O(1) + T (n/2) which solves to O(log n). A closer analysis shows that the
bound is 2 log2 n.
Problem 2
(Due to Jeff Erickson.) For this problem, a subtree of a binary tree means any connected subgraph. A binary
tree is complete if every internal node has two children, and every leaf has exactly the same depth. Describe
and analyze a divide and conquer algorithm to compute the largest complete subtree (LCS) of a given binary
tree. Your algorithm should return the depth of this subtree. See Figure 1 for an example.
Figure 1: The largest complete subtree of this binary tree has depth 3.
Guided Solution: (Note: This guided solution shows how to approach solving this problem with an
initial approach that doesn’t work and then shows how to fix it. This is meant to illustrate the brainstorming
process, not an illustration of how to describe your algorithm for assignment solutions. In your assignment,
simply give us the final algorithm.)
Let’s go through the 3 steps in designing a good divide-and-conquer algorithm:
1. Divide step: What makes this problem seem less straight forward is that our input comes in the form
of a tree rather than a list. However, as you’ll see, trees are actually much easier to perform divide and
conquer on because they are already recursively defined.
Remember that when we divide, we want to divide into problems that are independent, smaller and, most
importantly, the exact same problem. In a binary tree, we know that each node has at most 2 children, each
of which is the root of an independent subtree (which is by definition a smaller tree) upon which we can
apply the exact same problem. So, as a first start, our divide step creates two subproblems: find the depth
of the LCS contained in the left subtree and the depth of LCS contained in the right subtree.
4
So what then is our base case? There are three attributes we wish to find in a good base case:
1. Trivial
2. Can be solved in constant runtime
3. ALWAYS reachable (no matter which input you start with)
To answer this, we have to ask ourselves “for which kind of tree is it trivial to find both the root of the largest
complete subtree and its depth?”
Since the subproblems are strictly smaller (each subproblem has at least one node fewer than the original
problem), let’s start by considering an empty tree. It seems this satisfies attributes 1 and 2 but what about
attribute 3? Imagine our input is of size n = 1 (a leaf). To get to an empty tree of size n = 0 we must recurse
on the current node’s children. But a leaf does not have any children and hence this base case cannot always
be reached. (Note that we assume here that an empty tree is not a valid input to the problem).
So an empty tree can’t be our (only) base case. What about a leaf (a tree of depth 1) then? The solution is
trivial because the leaf will be the root of the largest complete subtree and its depth is trivially 1. Returning
this takes constant runtime and it is easy to see that if we keep recursing on the children of each current node,
we must eventually reach the leaves. After all, the subproblems are strictly smaller and the only subtrees
containing a single node are the leaves.
2. Delegate step: The delegation step involves recursing on each subproblem. Once we have recursively
called on each subproblem, we can simply assume that it will be solved correctly. Thanks, Recursion Fairy!
Food for thought : Can you see a relation between delegation and the inductive hypothesis in an inductive
argument?)
3. Combine step: Now that we have the correct solutions to the subproblems, we need to combine
them to form our solution to the original problem. Let us first think about what the LCS looks like.
Recursive characterization of LCS There are 3 cases:
1. The LCS is contained in the left subtree
2. The LCS is contained in the right subtree
3. The LCS contains the root (which we call r)
If the LCS is one of the first two cases, the recursive calls would have found it. It remains to find the
LCS that contains the root.
Finding LCS containing root In order to figure out how to do this we start by exploring a few simple
scenarios:
r
1
2
Figure 2: Scenario 1 - A path
5
r1 2
5 6
Figure 3: Scenario 2 - An unbalanced binary tree
r
1 2
3 4 5 6
Figure 4: Scenario 3 - A full binary tree
In Scenario 1 (Figure 2) we can see that at r, the left and right subtrees have nothing in common and
hence r starts its own complete subtree of depth 1. In Scenario 2 (Figure 3) on the other hand, we can see
that because the left and right subtree have at most 1 level in common, the LCS rooted at r is of depth
1 + 1 = 2. In Scenario 3 (Figure 4) we can deduce that both the left and right subtree are identical in both
levels and thus the depth of the LCS rooted at r is of depth 2 + 1 = 3.
From this we guess that in general, the LCS rooted at r is the minimum depth that the left and right
subtree have in common +1. More formally, let’s write LCS(r),LCS(L),LCS(R) to be the depth of the LCS
rooted at r, the left child, and the right child, respectively. So, LCS(r) = min(LCS(L),LCS(R)) + 1. (We
will prove this formally in the inductive step of the correctness proof later on.)
Need to solve an extended problem Unfortunately, the recursive calls only give us the LCS con-
tained in the left and right subtrees which are not necessarily rooted at the left and right children, respectively.
Thus, in order for the combine step to access this information we will instead solve the following extended
version of the problem which asks for not just the LCS but also the LCS rooted at the root of the tree.1
1. Divide step for extended problem Our divide step for the extended problem creates the subproblems:
find the depth of the LCS contained in the left subtree and the depth of the LCS rooted at the left child,
and find the depth of the LCS contained in the right subtree and the depth of the LCS rooted at the right
child. For the base case when the tree is a leaf, our algorithm returns 1 and 1.
2. Delegate step for extended problem We recurse on the left and right subtrees.
3. Combine step for extended problem Our recursive calls now also give us LCS(L) and LCS(R)
which lets us compute LCS(r) and consequently, the depth of the LCS contained in the tree. In particular,
our algorithm’s combine step returns LCS(r) = min(LCS(L),LCS(R)) + 1 as the depth of the LCS rooted at
r, and it returns the max of the following 3 quantities as the depth of the LCS contained in the tree: LCS(r),
the depth of the LCS contained in the left subtree and the depth of the LCS in the right subtree. Note that
the last two quantities are also given by the recursive calls.
1This is similar to the notion of strengthening the induction hypothesis in induction proofs.
6
Arguing the correctness of your solution: Due to the structure of a divide-and-conquer algorithm the
easiest way to argue it’s correctness is often using a proof by induction (See Figure 5). Recall that since we
are now solving the extended problem, we need to argue that the algorithm correctly solves the extended
problem, not the original problem. Thus, the statement we want to prove by induction is the following: for
every binary tree with n vertices, our algorithm returns the depth of the LCS contained in the tree as well
as the depth of the LCS rooted at the root of the tree
Base case: n = 1. When the tree is a leaf, the LCS in the tree as well as the LCS rooted at the
root of the tree consists of the node itself, which has depth 1. Thus, our algorithm is correct.
Inductive hypothesis: Assume that for any rooted binary tree of size 1 to n − 1, the algorithm cor-
rectly finds the LCS contained in the tree as well as the LCS rooted at the root of the tree.
Inductive step: Let T be a tree of size n, r be the root of a tree T of size n. The LCS in T must
either be rooted at r, or entirely contained in either the left subtree (Tl) or right subtree (Tr). Since our
left and right subtrees can be of size at most n − 1 each, by our inductive hypothesis, the recursive calls
correctly compute the LCS contained in these subtrees. Let LCS(L) be the depth of the LCS rooted at the
left child of the root and LCS(R) be the depth of the LCS rooted at the right child of the root. Also, by
inductive hypothesis, the recursive calls correctly compute LCS(L) and LCS(R).
It remains to show that LCS(r) = 1 + min(LCS(L),LCS(R)). We can get a complete subtree rooted at
r by taking r and the first min(LCS(L),LCS(R)) levels of the left and right subtrees. Thus, we get that
LCS(r) ≥ 1 + min(LCS(L),LCS(R)). We now need to argue that LCS(r) ≤ 1 + min(LCS(L),LCS(R)). This
is because if we look at the LCS rooted at r, then removing r and the right subtree from it gives us a complete
subtree rooted at the left child of depth LCS(r) − 1. Similarly, there is a complete subtree rooted at the
right child of depth LCS(r) − 1. Thus, both LCS(L) and LCS(R) are at least LCS(r) − 1 which gives us
min(LCS(L),LCS(R)) ≥ LCS(r)− 1 and so LCS(r) ≤ 1 + min(LCS(L),LCS(R)), as desired.
Figure 5: The relation between divide and conquer and inductive reasoning
Analysing the runtime: To calculate the running time we must first understand what it is that makes
up the running time of the algorithm. To solve an instance of size n, the running time is made up of:
• The running time of the recursive calls
• The work performed at the current level
Hence, to define the running time of the algorithm on an input of size n (n being the number of nodes in the
7
tree), we start by identifying firstly the recursive calls made and the subset of the input upon which they are
made. And secondly, the work done at this level (f(n)). In this case:
1. We make two recursive calls at each level, one of size nl (size of the left subtree) and the other of size
nr (size of the right subtree).
2. The work done at this level comes (in this case) from the combine step and adds up to a constant time
comparison and arithmetic operation.
This results in the following recursion:
T (n) = T (nl) + T (nr) +O(1)
From here there are many ways to figure out the runtime. From intuition we can guess that since we will be
recursing on every node in the tree, a recursion tree might help us figure out the runtime.
Let’s start with the root level of the recursion tree. We know that at the root the number of nodes left to
process is n. At this level, the amount of work done takes O(1) time and the rest of the work is passed down
the recursion tree (see Figure )
Figure 6: Root level of the recursion tree
Next, we can look at subtree Tnl . We know that at the root of this tree, we will perform again O(1) work
and delegate the rest of the work to it’s left and right subtrees. If we repeat this over and over again on both
children each time we can see that O(1) work is being done for every node in the tree (see Figure 7). We
thus conclude that there is n ∗O(1) work being done, which gives us an O(n) runtime.
O(1)
O(1) O(1)
O(1) O(1) O(1) O(1)
Figure 7: Full recursion tree
Problem 3
(Due to Jeff Erickson.) Let n ≥ 2 be a power of two. Suppose you are given a n× n checkerboard with one
(arbitrarily chosen) square removed. Describe and analyze an algorithm to compute a tiling of the board
without gaps or overlaps by L-shaped tiles, each composed of 3 squares. Your input is the integer n and two
integers representing the row and column of the missing square. The output is a list of the positions and
orientations of (n2 − 1)/3 tiles. Your algorithm should run in O(n2) time (note that this is linear in the size
of the checkerboard). Hint: prove that such a tiling always exists by induction on n.
8
Solution: Consider the following recursive algorithm. Base case: n = 2. It is clear that a
2× 2 checkerboard with one square missing can be tiled.
Divide step: Find the n/2 × n/2 quadrant of the checkerboard containing the missing tile.
Then place one tile in the middle of the checkerboard oriented in such a way that it has
exactly one square in each of the other n/2×n/2 quadrants of the checkerboard. Remove the
squares covered by the tile from the checkerboard. Each of the quadrants now has exactly
one missing square so we can recurse on each quadrant. See Figure 8 for an illustration.
Combine step: return the lists from the recursive steps, adding the position and orientation
of the tile we placed in the middle of the checkerboard.
Proof of correctness by induction on n. Base case (n = 2). In this case, the checkerboard
has exactly three squares arranged in an L-shaped that need to be covered. Inductive case.
Suppose that the algorithm is correct for n/2×n/2 checkerboards. Then, the recursive calls
correctly tile the checkerboard except for the three squares in the middle that were removed.
The combine step adds a tile to cover these three squares so we have a valid tiling.
The divide and combine steps take O(1) time as it boils down to finding the quadrant
containing the missing square. Thus, the recurrence relation is T (n) = 4T (n/2) +O(1) = n2
using the Master Theorem.
(See also Chapter 5.1.5 of this book. This is also a good example of strengthening the
induction hypothesis.)
Figure 8: X marks the missing square.
Problem 4
(Exercise 6, Chapter 5 from textbook Algorithm Design) Let T be a complete binary tree with n vertices.
Each vertex v has a distinct weight w(v). A vertex is a local minima if its weight is less than the weight of
its neighbors. Give a divide-and-conquer algorithm that finds a local minima in time O(log n).
9
Solution: The algorithm starts at the root and does the following recursively. If the current
vertex is an internal vertex with smaller weight than both of its children, return the current
vertex as a local minima; otherwise, recurse on one of the children with smaller weight. If
the current vertex is a leaf, then return the current vertex as a local minima.
To see why this is O(log n) time, note that we take O(1) time to decide whether to
return the current vertex as a local minima or to recurse on one of the two children, and
whenever we recurse, we go down one level of the binary tree. Since a complete binary tree
with n vertices has at most O(log n) levels, the total time taken by this algorithm is O(log n).
The proof of correctness is easiest to see without using induction (see below for an induction
proof). Suppose, towards a contradiction, that the algorithm returns a vertex v that is not
a local minima. By definition of the algorithm, v cannot be an internal vertex because the
algorithm only returns an internal vertex u after checking if its weight is indeed smaller than
all of its neighbors. Thus, v must be a leaf. But then the algorithm reached v only because
the v’s parent has larger weight than v. Since v is a leaf, its parent is its only neighbor.
Thus, v is a local minima.
Here’s a proof of correctness via induction.
Base case: (n = 1) The algorithm correctly returns the single vertex as a local minima.
Induction case: (n > 1). Suppose the algorithm is correct for complete binary trees with
less than n vertices. If the root r is a local minima, then the algorithm correctly returns
the root. Otherwise, suppose the algorithm recursed on r’s child u, and let v be the vertex
returned by the recursive call. There are two cases: either u = v is a child of the root r or
not. In the latter case, the correctness follows from the induction hypothesis. For the former
case, the induction hypothesis only tells us that v is smaller than both of v’s children. We
still need to show that v is smaller than its parent r. But this follows from the fact that we
recursed on v precisely because v is smaller than r.
Problem 5
[COMP3927 only] Recall the streaming algorithm for majority. Let k > 0 be the value of the counter at
the end of the stream. In the lecture, we saw an example where k = 1 and yet there is no absolute majority.
For what values of k can we be sure that there is an absolute majority?
Solution: Recall that m is the length of the stream and the domain is {1, . . . , n}. If
k ≥ m/2 + 1, then there is an absolute majority since there are at least k occurrences of
the same item. If k ≤ m/2, it is possible that there is no absolute majority. For example,
suppose m is even and n = 3. Now consider a stream in which the first m/2 elements of
the stream consist of the numbers 1 and 2 alternating between each other, followed by m/2
occurrences of the number 3. After processing the first half of the stream, the counter is 0
and increases to m/2 at the end of the stream. However, there is no absolute majority.
Problem 6
[COMP3927 only] Can you prove a lower bound on the approximation ratio of the “MST-approach”
(doubling the MST) for the Travelling Salesman Problem?
10
Solution: The following instance proves a lower bound of 2(1− 1/n). Consider a complete
graph G on n vertices with a subgraph H consisting of a cycle on n − 1 vertices and the
remaining vertex has an edge to the other vertices, i.e. H is the union of a cycle and
a star. Set the edges in H to have length 1 and all other edges of G not in H to have length 2.
There is a Hamiltonian path of length exactly n by following edges in H. One possible MST
is the star. If the algorithm chooses the wrong Euler tour, for instance, if it does not visit
the leaves of the star in the same order as the cycle, then after shortcutting, it might end
up with a Hamiltonian tour in which all but 2 edges are of length 2. This gives a solution of
2n− 2.
Figure 9: Left figure illustrates H, middle figure illustrates MST and a good Eulerian tour that results in
the optimal Hamiltonian tour, right figure illustrates bad Eulerian tour.