There will be 3 homework sets for this class, each of which comprises a blend of analytical and computational questions for you to get some hands-on training on network analysis. For the computational part you are typically required to write Matlab code implementing some the methods discussed in class, with the goal of understanding properties of a real-world network. A starter's guide to Matlab can be found here. It is perfectly fine if you prefer to use other programming languages, or leverage network analysis software of your choice.
The list of homework sets and solutions follows. You can refer to the end of the page for matters about logistics and the policy on grading and collaboration.
This first assignment is about simple manipulations of network graphs, with emphasis on the representation using adjacency matrices. You are asked to prove some properties of the Laplacian matrix and to experiment on how its spectrum can be useful for graph partitioning, as well as to explore the information conveyed by the powers of the adjacency matrix. In the second-to-last problem you will study the email graph of the Enron corporation that was made public during the federal investigation, by computing some simple statistics about the network graph using Matlab (or the software package of your preference). To that end, you will need the following Enron corpus dataset. You are required to submit a writeup of your solutions (typing is not mandatory), including a printout of the (e.g., Matalb) code you developed. Due Thursday February 9 in class. (If you are done before the deadline, you can just slide your solutions writeup under the door of my office in Hopeman 413.)
This second assignment is about descriptive analysis and properties of large network graphs. You are first asked to establish a simple identity about closeness centrality, and compute the PageRank of a toy graph through the stationary distribution of an associated Markov chain. Next, you will explore the Perron-Frobenius theorem and its implications on the limiting behavior of powers of transition probability matrices, which naturally arise with iterative methods for computing PageRank values. While the scaled variant of PageRank is a mechanism implemented to avoid `spider traps', you will investigate how this technique can be hacked by spammers to promote their own webpages. You will then generate power-law distributed data, and compare the performance of three estimators of the power-law exponent. Finally, you will have to simulate a realization of the so-termed Chinese restaurant process, which exhibits preferential attachment (i.e., rich-gets-richer) dynamics. Due Thursday March 23 in class.
This third assignment is about modeling and inference of network graphs. First you will investigate a phase-transition behavior with regards to the emergence (or lack) of a giant connected component in the Erdos-Renyi random graph. You are then asked to derive a simple expressions for the partial correlation among two random variables given a third one, and then use this result to verify that inference of edges in Gaussian graphical models can be equivalently achieved via linear regression. In the last problem you will study a network of Internet blogs on US politics, by trying out differnt graph bisection algorithms in Matlan to identify liberal (i.e., Democrat) and conservative (i.e., Republican) "blogger communities". To that end, you will need the following US political blogs dataset. You are required to submit a writeup of your solutions (typing is not mandatory), including a printout of the Matalb code you developed. Due Thursday Apil 20 in class. (If you are done before the deadline, you can just slide your solutions writeup under the door of my office in Hopeman 413.)
The specification of the deliverables will be clearly indicated in each assignment. Homework sets are typically due by Thursdays in class as per the calendar above. I ask that you please make your best effort to meet the deadline. If you can't, for whatever reason, I have no problem granting homework extensions. That said, do not abuse that prerogative, if you are late once in the semester, that's OK. If you are late more than that you need to organize your time better.
Policy on grading and collaboration
Each of the 4 homework sets contributes 20/3=7% towards your final grade. The purpose of homework is to help you in the learning process. Therefore, it is foolish of you and also unethical to hand in the solution of others as your own. While I am quite confident this will not be an issue, if you engage in unethical behavior you will get no points for the assignment and I will recommend that you drop the class and retake it next year. These cases will also be dealt with according to the University of Rochester's Academic Honesty Policy.
Going back to the topic of learning, collaboration with peers is allowed and encouraged. If you are struggling with the material, your classmates are possibly a more valuable resource than myself. If you are doing well in the class, there is no better use of your time than helping your peers. It is a very surprising fact of life that happiness is to be found when you give to someone, not when you receive from someone. Work on the assignment and try to make progress. If you're stuck, go talk with a classmate. If you are still stuck go talk with your TA. Then come talk with me; or do not hesitate to send me an email with a concrete question that I will reply within the day. After you submit your solution look at mine, it may teach you one or two extra things.