Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Large Scale Social and Complex Networks: Design and Algorithms
ECE 232E
Project 1
Random Graphs and Random Walks
One can use igraph library1 to generate different networks and measure various properties of
a given network. The library has R and Python implementations. You may choose either lan-
guage that you prefer. However, for this project, using R is recommended, as some functionality
might not be implemented for the Python version of the package.
Submission: Please submit your report to Gradescope for grading. Meanwhile, upload a
zip file containing your report and codes to BruinLearn. Name the submission with your
group members’ UIDs.
1
1. Generating Random Networks
1. Create random networks using Erdo¨s-Re´nyi (ER) model
(a) Create undirected random networks with n = 1000 nodes, and the probability p for
drawing an edge between two arbitrary vertices 0.003, 0.004, 0.01, 0.05, and 0.1.
Plot the degree distributions. What distribution is observed? Explain why. Also,
report the mean and variance of the degree distributions and compare them to the
theoretical values.
Hint Useful function(s): sample_gnp , degree , degree_distribution , plot
(b) For each p and n = 1000, answer the following questions:
Are all random realizations of the ER network connected? Numerically estimate the
probability that a generated network is connected. For one instance of the networks
with that p, find the giant connected component (GCC) if not connected. What is
the diameter of the GCC?
Hint Useful function(s): is_connected , clusters , diameter
(c) It turns out that the normalized GCC size (i.e., the size of the GCC as a fraction of
the total network size) is a highly nonlinear function of p, with interesting properties
occurring for values where p = O( 1
n
) and p = O( lnn
n
).
For n = 1000, sweep over values of p from 0 to a pmax that makes the network almost
surely connected and create 100 random networks for each p. pmax should be roughly
determined by yourself. Then scatter plot the normalized GCC sizes vs p. Plot a line
of the average normalized GCC sizes for each p along with the scatter plot.
i. Empirically estimate the value of p where a giant connected component starts to
emerge (define your criterion of “emergence”)? Do they match with theoretical
values mentioned or derived in lectures?
ii. Empirically estimate the value of p where the giant connected component takes
up over 99% of the nodes in almost every experiment.
(d) i. Define the average degree of nodes c = n × p = 0.5. Sweep over the number of
nodes, n, ranging from 100 to 10000. Plot the expected size of the GCC of ER
networks with n nodes and edge-formation probabilities p = c/n, as a function
of n. What trend is observed?
ii. Repeat the same for c = 1.
iii. Repeat the same for values of c = 1.1, 1.2, 1.3, and show the results for these
three values in a single plot.
iv. What is the relation between the expected GCC size and n in each case?
2. Create networks using preferential attachment model
(a) Create an undirected network with n = 1000 nodes, with preferential attachment
model, where each new node attaches to m = 1 old nodes. Is such a network always
connected?
Hint Useful function(s): sample_pa ( barabasi.game )
(b) Use fast greedy method to find the community structure. Measure modularity.
Hint Useful function(s): cluster_fast_greedy , modularity
(c) Try to generate a larger network with 10000 nodes using the same model. Compute
modularity. How is it compared to the smaller network’s modularity?
(d) Plot the degree distribution in a log-log scale for both n = 1000, 10000, then estimate
the slope of the plot using linear regression.
2
(e) In the two networks generated in 2(d), perform the following:
Randomly pick a node i, and then randomly pick a neighbor j of that node. Plot the
degree distribution of nodes j that are picked with this process, in the log-log scale.
Is the distribution linear in the log-log scale? If so, what is the slope? How does this
differ from the node degree distribution?
Hint Useful function(s): sample
(f) Estimate the expected degree of a node that is added at time step i for 1 ≤ i ≤ 1000.
Show the relationship between the age of nodes and their expected degree through
an appropriate plot.
(g) Repeat the previous parts for m = 2, and m = 5. Compare the results of each part
for different values of m.
(h) Again, generate a preferential attachment network with n = 1000, m = 1. Take its
degree sequence and create a new network with the same degree sequence, through
stub-matching procedure. Plot both networks, mark communities on their plots,
and measure their modularity. Compare the two procedures for creating random
power-law networks.
Hint In case that fastgreedy community detection fails because of self-loops, you
may use “walktrap” community detection.
Useful function(s): sample_degseq
3. Create a modified preferential attachment model that penalizes the age of a node
(a) Each time a new vertex is added, it creates m links to old vertices and the probability
that an old vertex is cited depends on its degree (preferential attachment) and age.
In particular, the probability that a newly added vertex connects to an old vertex is
proportional to:
P [i] ∼ (ckαi + a)(dlβi + b),
where ki is the degree of vertex i in the current time step, and li is the age of vertex
i. Produce such an undirected network with 1000 nodes and parameters m = 1,
α = 1, β = −1, and a = c = d = 1, b = 0. Plot the degree distribution. What is the
power law exponent?
Hint Useful function(s): sample_pa_age
(b) Use fast greedy method to find the community structure. What is the modularity?