In this assignment, you will be writing programs for generating temporal snapshots of a network and performing basic link analysis of the network. Then, you will work on methods for link prediction in these networks. Furthermore, you will experiment with community detection methods. You will also prepare a technical document where you briefly report about the data, methods, measurements, experiments, and results of the assignment. Important Notes: • If you are working in a pair, only one of you should submit. The first page of your report should clearly include information of all team members (first name, last name, login, student#, email). • You can write code of your own or use available graph libraries (NetworkX, SNAP, JUNG, etc.). • Your report should be no more than 5 pages. To get full marks, your code must be well-documented, and the report should be well organized, using proper technical language and suitable style (sections, figures, tables). Formatting and Style All reports should be formatted according to the ACM SIG conference proceedings template in LaTex and prepared using Overleaf, a free collaborative authoring tool. When you have completed the assignment, move these two files in a directory (e.g., assignment1/), and use the following command within that directory to electronically submit your files: % submit 4414 a2 a2-code.zip a2-report.pdf You may submit your files as many times as you wish prior to the submission deadline. Make sure you name your files exactly as stated (including lower/upper case letters). You may check the status of your submission using the command: % submit -l 4414 a2 • Make sure you have submitted the correct version; new or missing files will not be accepted after the due date. A. Temporal Graphs (20%) Consider the dataset “dblp_coauthorship.json.gz” available at the following website: Based on this dataset, create the following graphs: (a) dblp2005: An undirected unweighted graph that represents the DBLP co-authorships of the year 2005. Each node represents an author and each edge represents that two authors co-authored at least one paper in 2005. (b) dblp2006: An undirected unweighted graph that represents the DBLP co-authorships of the year 2006. Each node represents an author and each edge represents that two authors co-authored at least one paper in 2006. (c) dblp2005w: An weighted version of the dblp2005 graph, where the weight represents the number of papers that two authors co-authored in 2005. B. Node and Edge Importance in Graphs (30%) For each of the graphs above (dblp2005, dblp2006, dblp2005w) report: i. the names and scores of the 50 most important authors based on PageRank scores ii. the pairs of author names of the 20 most important edges based on edge betweenness scores Briefly comment on the results and how they compare with each other. Use small text font to fit the results in the report. C. Link Prediction in Graphs (40%) Given the co-authorship network of 2005, we want to predict the new co-authorships that will occur in 2006 (i.e., the new edges). Your prediction will be based on a number of prediction methods proposed in the literature and your task is to evaluate their performance. The steps to follow are described below: i. based on dblp2005 create a graph dblp2005-core that includes nodes with degree d>=3. ii. based on dblp2006 create a graph dblp2006-core that includes nodes with degree d>=3. iii. compute the list of friends-of-friends FoF in dblp2005-core; this is the list of pairs of nodes that are exactly two-hops away in the network. FoF is the list of candidate edges to consider for the prediction problem. iv. compute the set of edges T that do not exist in dblp2005-core but exist in dblp2006-core. This is the set of target edges that we would ideally be able to predict. v. compute the set of predicted edges P according to the following link prediction methods: a. RD: random predictor b. CN: common neighbors c. JC: jaccard coefficient d. PA: preferential attachment e. AA: adamic/adar vi. compute the precision at k evaluation metric for values of k={10, 20, 50, 100, |T|}, denoted as P@10, P@20, P@50, P@100, P@T for each method. Briefly comment on the results and how they compare with each other. For all graphs above, you need to obtain and work only on the giant connected component. Report the number of nodes and edges of the graphs (#nodes, #edges). D. Community Detection in Graphs (10%) Pick the co-authorship network of 2005 and apply the Girvan–Newman community detection method to derive communities. The Girvan-Newman method iteratively removes the edge with the highest edge betweenness to derive communities (hierarchical method that creates a dendrogram). Stop the iterative process once the current number of communities is greater than k=10. Report the size (number of nodes) of the 10 (or more) communities currently found in a descending order (first the size of the largest community and last the size of the smallest community). Briefly comment on the results.