Network Science Analytics is divided into five thematic blocks: Introduction; Graph theory, probability and statistical inference review; Descriptive analysis and properties of large networks; Sampling, modeling and inference of networks; and Processes evolving over network graphs. In this page you will find the lecture slides we use to cover the material in each of these blocks, in addition to suggested additional reading including research papers and book chapters. Links to the slides are also available from the class schedule table.


Acknowledgment: The lecture slides here were originally prepared by myself during the first offering of this class in Spring'14. I have often borrowed figures and material organization ideas from lecture slides available in the websites of a few other great Network Science courses listed under links and resources. Since the lecture material follows closely the contents in the book

  • Eric D. Kolaczyk, "Statistical Analysis of Network Data: Methods and Models," Springer

I have utilized figures from the text that can be downloaded here. So if you like the slides and find them useful due credit as well as my gratitude goes to those creating that original material.

Introduction


This first lecture outlines the organizational aspects of the class as well as its contents.

  • Slides for this introductory block, which I will cover in the first class. We introduce the notion of network and present a "birds-eye" view of the cross-disciplinary area known as Network Science, starting with a historical background. We will continue with a series of motivating examples according to the common taxonomy of technological, social, information and biological networks. In the process, we highlight a series of intriguing relevant questions as well as analysis and computational challenges. (Updated 01/19/17)

Suggested readings:

  • Introduction and Overview from Eric D. Kolaczyk, "Statistical Analysis of Network Data: Methods and Models." Available for download from the UR network.
  • Overview from D. Easley and J. Kleinberg, "Networks, Crowds, and Markets: Reasoning About a Highly Connected World".
  • The "new" science of networks by D. Watts. Accesible review of the major findings in the emerging field of Network Science, with a brief discussion of their relationship with previous work in the social and mathematical sciences.

Optional readings:

Graph theory, probability and statistical inference review


This block is a fast-paced review of the technical background for the forthcoming material on networks analysis. It is intended as a refresher and to ensure everyone is on the same page. I will only cover the required background on graph theory and statistical inference in class. Regarding probability theory, you can get up to speed by going over the posted slides from ECE440 - Introduction to Random Processes.

  • First set of slides covering basic notions and definitions relating to undirected and directed graphs, movement in a graph and connectivity, as well as the emergence of a giant connected component in many real-world graphs. We then outline graph families including complete, regular, bipartite, trees and planar graphs. Also useful are notions of algebraic graph theory, such as the adjacency, incidence and Laplacian matrices of a graph, their relationships and spectral properties. We wrap up with graph data structures and algorithms, and describe breadth-first search to e.g., calculate distances from a given vertex. I plan to cover this material in two lectures. (Updated 02/09/17)

  • Second set of slides reviewing basic elements of statistical inference such as parametric and nomparametric models, and the prototypical problems of estimation, prediction and hypothesis testing. We will outline the concepts of point estimate, confidence interval and test statistic, as well as classical estimators such as the method of moments, maximum likelihood, least-squares, and maximum a posteriori (MAP). All these methods will be discussed in the context of two classical problems: inference of the mean of a distribution, and regression and prediction with linear models. I expect to cover this material in two lectures. (Updated 02/09/17)

Suggested readings:

  • Preliminaries from Eric D. Kolaczyk, "Statistical Analysis of Network Data: Methods and Models." Available for download from the UR network.
  • Mathematics of networks from M. E. J. Newman, "Networks: An Introduction." Available for download from the UR network.

Optional readings:

  • Graphs from D. Easley and J. Kleinberg, "Networks, Crowds, and Markets: Reasoning About a Highly Connected World".
  • First set of slides from ECE440's probability review covering sigma-algebras and the axiomatic definition of probability, conditional probability, and the notion of independence between events. We will then introduce random variables -- both discrete and continuous -- and commonly used distributions. We finish off with expectations, and joint distributions useful to study systems with multiple random inputs.
  • Second set of slides from ECE440's probability review covering Markov's and Chebyshev's inequalities, and different notions of convergence for sequences of random variables (random processes). These notions of convergence allows one to establish and understand remarkable limit theorems, such as the law of large numbers and the central limit theorem. This second block is concluded with conditional probabilities and expectations, with emphasis on their usefulness for various calculations.
  • Notes on probability and statistics prepared by Carlos Fernandez-Granda.

Descriptive analysis and properties of large networks


This block develops methods for the descriptive analysis of network data. The coverage goes all the way from the initial tasks of collecting and converting network measurements into a network graph representation (possibly for the sake of visualization), up to those methods for description and summarization of structure and indentification of patterns in such network graphs.

  • First set of slides about network mapping, meaning the production of a network-based visualization of a complex system. We introduce three basic stages in the mapping process, namely the collection of relational data, the construction of the network graph representation and the graph visualization. The whole process is further illustrated through a couple of case studies: mapping the backbone of "Science", and visualizing the router-level Internet via the k-core decomposition. I expect to cover this material in two lectures. (Updated 02/06/17)

  • Second set of slides about degree distributions, power laws and popularity as a network phenomenon. We introduce the Erdos-Renyi random graph model and evaluate its exponentially-decaying degree distribution. Interestingly, empirical studies suggest that many real-world complex networks exhibit right-skewed, heavy-tailed degree distributions best modeled as power laws. We then explain the related notion of scale-free network, and discuss common methods to visualize and fit power-law degree distributions. We conclude with the study of popularity and the description of the preferential attachment ("rich-gets-richer") model, as a simple generative network model giving rise to power-law degree distributions. I expect to cover this material in two lectures. (Updated 02/22/17)

  • Third set of slides about centrality measures used to judge the importance of individual vertices in a network graph. We introduce and compare three classical measures such as closeness, betweenness, and eigenvector centrality. We then continue with link analysis algorithms for web search, and describe Hubs and Authorities as well as the PageRank algorithm for ranking nodes in graphs. After a brief review of basic definitions and results regarding Markov chains, we analyze PageRank as an equiprobable random walk in the network graph. This interpretation allows us to leverage Markov chain machinery to understand the behavior of a couple algorithmic variants. I expect to cover this materia in three lectures. (Updated 02/24/17)

  • Fourth set of slides about characterzation of network cohesion. We introduce first the notion of cohesive subgroups which is naturally captured by cliques, and describe computationally affordable clique relaxations by familiarity and reachability. We continue with measures of local density such as the clustering coefficient, quantifying the extent of relationships among "equals" (homophily) through assortative coefficients; and network robustness captured by connectivity notions. We conlude with a case study about network-based analysis of epileptic seizure data in humans, leveraging several summaries of network characteristics we discussed so far. I expect to cover this material in one lecture. (Updated 02/28/17)

Suggested readings:

Optional readings:

Sampling, modeling and inference of networks


This block describes modeling and inferential tasks for network data analytics. Smoothly transitioning from descriptive methods we first study network community identification, and then explore the effects of sampling on the extent to which characteristics of an observed network graph reflect the corresponding structural properties of the underlying system being studied. We then present models for describing an observed network graph, while we finally consider the problem of inferring a network graph based on incomplete or indirect measurements.

  • First set of slides about network community detection. We start by describing the famous "strength of weak ties" hypothesis from sociology as a conceptual means of supporting the community structure found in networks. After going over a few examples of network communities, we delve into the problem of network community detection and its formulation as a clustering or graph partitioning problem. We will describe several community detection algorithms in some detail, including: (i) Girvan-Newman's method; (ii) hierarchical clustering; (iii) modularity optimization; and (iv) spectral graph partitioning. I expect to cover this material in three lectures. (Updated 03/03/17)

  • Second set of slides about sampling and estimation in network graphs. We first introduce the fundamental problem of estimating a network summary characteristic from sampled graph data, and illustrate some of the inheret challenges faced. We then review basic concepts of statistical sampling theory including simple random sampling with(out) replacement and inclusion probabilites, leading to the Horvitz-Thompson estimator of totals as well as the capture-recapture method for estimation of group size. Several network graph sampling designs are then discussed, such as snowball and traceroute sampling. Horvitz-Thompson theory is subsequenly applied to estimation of vertex, edge, and higher-order totals. We conclude with the problems of estimating "hidden populations" and degree distributions. I expect to cover this material in two lectures. (Updated 03/21/17)

  • Third set of slides about modeling network graphs. We introduce a number of important classes of network graph models: (i) random graph models; (ii) small-world models; (iii) network growth models; and (iv) exponential random graph models. The emphasis is on model construction, simulation, and inference of model parameters. Throughout, we illustrate some of the statistical uses to which these models have been put, including the detection of network motifs, the evaluation of proposed network generative mechanisms, and the assessment of potential predictive factors of relational ties. I expect to cover this material in three lectures. (Updated 04/11/17)

  • Fourth set of slides about network topology inference, where a portion of the network graph is unobserved and we wish to infer it from measurements. We start with the link prediction problem and describe approaches based on informal scoring of potential edges, as well as classification methods possibly accounting for latent variables. Moving on to inference of association networks, we discuss (partial) correlation networks and Gaussian graphical models. In the process, we illustrate their usefulness in identifying gene-regulatory interactions from microarray data. We conclude with tomoographic network topology inference, whereby measurements are only available in the "perimeter" of the graph and it is necessary to infer the presence or absence of edges and vertices in the "interior". I expect to cover this material in three lectures. (Updated 04/18/17)

Suggested readings:

Optional readings:

Processes evolving over network graphs


  • First set of slides about prediction of static processes defined on network graphs. We start with simple nearest-neighbor prediction, and then transition to more sophisticated model-based approaches that describe vertex attributes in terms of the given network graph structure. Specifically, we study Markov random field models in some detail with emphasis on estimation of model parameters and sampling-based prediction. We conclude with kernel-based regression on graphs, and give examples of kernels that derive from the topology of the network graph to capture similarities among vertex attributes. I expect to cover this material in two lectures. (Updated 04/26/17)

  • Second set of slides about modeling of epidemic processes. Our discussion kicks off with the overly simplistic, yet insightful branching process model. In this case, we show that the epidemic exhibits a dichotomous spreading behavior depending the value of the so-termed reproductive number - either it dies off after finite time or it goes on forever. We then discuss the widely adopted SIR model and its stochastic description in terms of continuous-time Markov chains, while touching upon aspects of simulation and inference of model parameters. We finally revisit the SIR model when allowing for arbitrary contact networks, draw connections with the SIS variant which allows for reinfections, and show that the SIRS model captures the observed tendency for epidemics of certain diseases to synchronize across a population. I expect to cover this material in two lectures. (Updated 05/02/17)

  • Third set of slides about analysis of network flow data. We start with the class of gravity models used to describe observations of the traffic matrix, both to obtain an uderstanding as to how potentially relevant factors affect flow volumes and to be able to make predictions of future flow volumes. We continue with the fundamental problem of traffic matrix estimation, where the goal is to recover unobserved traffic matrix entries from ubiquitous link totals. Here we discuss three statistical methods for traffic estimation, deriving from principles of least squares and Gaussian models, from Poisson models and from entropy minimization. We conclude with the problem of modeling and inference of network cost parameters, studied both at the level of links and of paths. Throughout, the techniques will be illustrated with several case studies including Internet traffic matrix estimation, unveiling of network traffic anomalies, link-count prediction and dynamic delay cartography. I expect to cover this material in three lectures. (Updated 04/26/16)

Suggested readings:

Optional readings: