CSCC46 Social and Information Networks
Social and Information Networks
CSCC46 Social and Information Networks
In other words, we don’t expect you to prove basic tree properties by induction; just provide
some sound reasoning.
(a) (2 points) Write h(T ) in terms of N .
(b) (3 points) Next we want to define the notion of “tree distance.” The intuition we want to capture
is that students that share the same department are closer than, for example, students who only
share the same school. For instance, in the tree in Figure 1 nodes u and t are “closer” than nodes u
and s. We formalize the notion of “distance” as follows.
Given two network nodes (leaf nodes) v and w, let L(v, w) denote the subtree of T rooted at the
lowest common ancestor of v and w, and h(v, w) denote its height (that is, h(L(v, w))). In Figure 1,
L(u, t) is the tree in the circle and h(u, t) = 2. Note that we can think of h(v, w) as a “distance”
between nodes v and w.
For a given node v, what is the maximum possible value of h(v, w)?
(c) (5 points) Given a value d and a network node v, show that there are bd − bd−1 nodes satisfying
h(v, w) = d.
2.2 This part helps you design a decentralized search algorithm in the network.
We will generate a random network on the leaf nodes in a way that models the observation that a node
is more likely to know “close” nodes than “distant” nodes according to our university organizational
hierarchy captured by the tree T . For a node v, we define a probability distribution of node v creating
an edge to any other node w:
pv(w) =
1
Z
b−h(v,w)
where Z =
∑
w 6=v b
−h(v,w) is a normalizing constant. By symmetry, all nodes v have the same normalizing
constant. Next, we set some parameter k and ensure that every node v has exactly k outgoing edges.
We do this with the following procedure. For each node v, we repeatedly sample a random node w
according to pv and create edge (v, w) in the network. We continue this until v has exactly k neighbors.
Equivalently, after we add an edge from v to w, we can set pv(w) to 0 and renormalize with a new Z to
ensure that
∑
w p(w) = 1. This results in a k-regular directed network.
(a) (7 points) Show that Z ≤ logbN . (Hint: use the results in parts 2.1(a) and 2.1(b), and group
nodes by their distance. For each distance d, you know exactly how many nodes are at distance d
and you have a lower bound for the probability of linking to them).
(b) (8 points) For a node v and the target t, let T ′ be the subtree of L(v, t) satisfying:
• T ′ is of height h(v, t)− 1,
• T ′ contains t,
• T ′ does not contain v.
For instance, in Figure 1, T ′ of L(s, t) is the tree in the circle. Let us consider an edge e from v to a
random node u sampled from pv. We say that e points to T
′ if u is one of the leaf nodes of T ′. Show
that the probability of e pointing to T ′ is no less than 1b logbN . (Hint: Note that all of the nodes in
T ′ are the same distance away from v, by our notion of tree distance that we defined in 2.1(b). How
many nodes are in T ′?)
2.3 (15 points) Let the out-degree k for each node be c · (logbN)2, where c and b are constants. Show that
when N grows very large, the probability that v has no edge pointing to T ′ is asymptotically no more
than N−θ, where θ is a positive constant which you need to compute. (Hints: Use the result in part
2.2(b) and recall that each of the k edges is independently created. Also, use limx→∞(1 − 1x )x = 1e .)
4
CSCC46 Social and Information Networks Fall 2020 University of Toronto Scarborough
Argue why the above result indicates that for any node v, we can, with high probability, find an edge to
a (leaf) node u satisfying h(u, t) < h(v, t).
2.4 (10 points) Show that starting from any (leaf) node s, within O(logbN) steps, we can reach any (leaf)
node t. You do not need to prove it in a strict probabilistic argument. You can just assume that for any
(leaf) node v, you can always get to a (leaf) node u satisfying h(u, t) < h(v, t) and argue why you can
reach t in O(logbN) steps.
Power Laws [25 pts]
3.1 [10 pts] The main summary statistics of distributions are called moments. For example, the mean
is also known as the first moment and the variance is also known as the second moment. In Lecture
6, we calculated the mean of a power law distribution. In this question, calculate the skewness of a
power law distribution, which is also known as the third moment. That is, calculate
∫∞
xmin
x3p(x)dx,
where
p(x) =
α− 1
xmin
( x
xmin
)−α
3.2 [15 pts] As we learned in class, the power law can be a great fit for the heavy-tailed data we en-
counter in real-world networks. But one problem we have to overcome is: what parameters are the
best fit? The power law is characterized by the minimum x-value xmin and the slope α. In this
question, we’ll learn how to estimate α from data.
Say you have a dataset {xi | i ∈ {1, . . . , n}} of n numbers, and you suspect these numbers come from
a power law distribution and want to find the best fit α. The strategy we’re going to use to estimate
α is called the maximum likelihood approach. We’re going to compute the log-likelihood of the
data given a power law model, then calculate which value of α maximizes this log-likelihood (we
take the log of the likelihood to simplify the math).
(a) [5 pts] Let L(α) = ln (∏ni p(xi)) be the log-likelihood of the data as a function of α. Plug in the
definition of a power law for p(xi) (shown above in Question 1.1) and simplify the expression
for L(α).
(b) [10 pts] Now take the derivative of L(α) with respect to α and set it to zero:
dL(α)
dα
= 0
This is now a simple single-variable calculus problem. Solve this equation to get the value of α
that maximizes the log-likelihood of the data. That is the value of α that fits the data best.
5