COMP3557-WE01 Section A Design of Algorithms and Data Structures
Section A Design of Algorithms and Data Structures
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMP3557-WE01
Section A Design of Algorithms and Data Structures
Question 1
(a) Basics
i. Is it possible for two events A,B to be independent, B to be not
independent of another event C, yet C to be independent of A?
Justify your answer (if your answer is “yes” then do so by providing
an explicit example). [6 Marks]
ii. Suppose you have a probability space Ω and a probability distribution
P on Ω such that every pair of events A,B ⊆ Ω is independent. Prove
that there must exist an event ω ∈ Ω with P (ω) = 1. [10 Marks]
(b) Universal hashing
Consider the following simple construction of a family of universal hash
functions. Assume that our keys are k bits long and view them as (vertical)
bit vectors of length k. Further assume that the size of our hash table is
n = 2b for some integer b.
Pick a random 0/1 matrix h of size b×k, and let h(x) = h ·x with addition
mod 2, e.g., for k = 4 and b = 3:
1 0 0 0
0 1 1 1
1 1 1 0
·
1
0
1
0
=
1
1
0
matrix h key x h(x)
size b× k length k length b
Prove that the family of hash functions defined like this (that is, drawing
a function from the set means creating a random matrix as described) is
2-universal. [20 Marks]
(c) Cuckoo hashing
Suppose that in cuckoo hashing we have two separate hash tables T1 and
T2 of the same size, and two hash functions h1 and h2, where hi operates
on Ti for i = 1, 2. Assume that when we wish to insert an element x it
will be placed at position h1(x) in T1. If that cell is already occupied then
continued
Page 2 of 3 COMP3557-WE01
the previous occupant y will be displaced to its other choice of location, at
position h2(y) in T2. If that location is occupied as well then its previous
occupant z will be displaced to its respective other location, at position
h1(z) in T1, and so on. Having two tables, as opposed to one bigger one
as discussed in class, does not meaningfully change the protocol.
i. Suppose we have two tables T1 and T2, each of size 5. Further
suppose we have two hash functions
h1 : x 7→ x mod 5 and h2 : x 7→ bx/5c mod 5.
Insert the elements 3, 16, 18, 23, 1, in this order, into empty hash
tables. Illustrate the contents of the tables after each insertion.
[7 Marks]
ii. Find another key x such that upon insertion of x (after the sequence
from above has been inserted) the protocol does not terminate (if we
do not place an explicit bound on the number of attempts). Illustrate
the contents of the tables after each displacement. [8 Marks]
(d) Bloom filters
Suppose you have Bloom filters B1 and B2 of the same size, using the
same hash functions. Further suppose that you have hashed a set S1 into
B1, and a set S2 into B2.
Now consider another Bloom filter B3 that is the bit-wise AND of the bits
of B1 and B2.
i. Explain why B3 is not in general identical to a Bloom filter into which
you hash the elements of S1 ∩ S2. [8 Marks]
ii. Does B3 at least correctly represent the set S1 ∩S2 in the sense that
one gets a positive answer for membership queries of all elements
actually in this set? [8 Marks]
iii. Consider the digits in your CIS username as an integer number m,
e.g., if your username were “kqlz36” then that would be m = 36. If
you find m < 10 then add 20 to that value.
Suppose we want to store a set S of m elements, drawn from a
universe U of 12000 possible keys, in a Bloom filter of exactly n = 113
bits. For this problem instance, what is the best choice for the number
of hash functions (the parameter k in the lecture)? That is, what
continued
Page 3 of 3 COMP3557-WE01
value of k gives the smallest possible probability that a key not in S
is a false positive? What is the probability of a false positive for this
choice of k? [8 Marks]
(e) PRAM algorithms
Consider the following CREW PRAM algorithm for the parallel prefix prob-
lem. Unlike the algorithm that we have seen in class, this one is based
on the list ranking technique. Let the input be x1, . . . , xn, and the task
is to compute yi = x1 ⊗ · · · ⊗ xi for all 1 ≤ i ≤ n, for some associative
operation ⊗.
We assume the input is given as a linear list, where Next[i], 1 ≤ i ≤ n,
denotes the successor of the element xi in the list, with Next[n] = NIL.
We assume that processor Pi is responsible for element xi.
We have two auxiliary arrays Q[1 . . . n] and Value[1 . . . n] in the shared
memory.
1: for Pi, 1 ≤ i ≤ n in parallel do
2: Q[i]← Next[i]
3: Value[i]← xi
4: for r = 1 to dlog ne do
5: if Q[i] 6= NIL then
6: Value[Q[i]]← Value[i]⊗ Value[Q[i]]
7: Q[i]← Q[Q[i]]
8: end if
9: end for
10: end for
Your task is to prove that this algorithm correctly computes all prefix sums
yi for 1 ≤ i ≤ n (it should be obvious that it runs in time O(log n), using
O(n) many processors).
It might be helpful to consider a two-fold inductive hypothesis, much like in
the proof sketch for list ranking, one part for the Value array, and another
for the Q array. [25 Marks]