research

I study community detection in networks with Karl Rohe, primarily using spectral methods. The central idea behind these methods is to estimate parameters in stochastic blockmodels (and more generally random dot product graphs) by taking the singular value decomposition of matrix representations of graphs. Much of this work is motivated by applied problems in social networks, especially on Twitter.

Outside of network analysis, I’m excited about (word) embeddings, GAMs, (semi-parametric) causal inference, statistical software design, and the philosophical foundations of statistics. In addition to my own projects, I spent a fair amount of time writing code and providing computational support for other ongoing projects in my lab.

Ongoing research projects

Latent topics in citation networks: Based on the observation that newer documents can cite older documents, but older documents cannot cite newer documents, we propose a new approach to estimating latent factors in citation networks. First, we extend spectral clustering to networks with missing edges, using ideas from matrix completion. Naive application of these ideas to citation networks is computationally infeasible so we also come up with some computational tricks to make things fast.

Understanding localization of the regularized graph Laplacian: Spectral decompositions of networks are susceptible to localization, or the tendency of eigenvectors to concentrate disproportionately on a small number of nodes. Eigenvectors of the adjacency matrix localize on high degree nodes, and eigenvectors of the graph Laplacian localize on low degree nodes. Localization is undesirable from both a statistical and a computational point of view, and prior work has proposed regularizing the graph Laplacian by artificially increasing the degree of each node by \(\tau \ge 0\). We show that, analogously to ridge regression, the regularization parameter \(\tau\) controls interpolation between two estimates: the eigenvectors of the graph Laplacian and the eigenvectors of the adjacency matrix. We propose a measure of eigenvector localization that we have found to be a useful diagnostic in applied work.

I am also involved in some work to interpret word embeddings, to quantify the amount of attention extended to Twitter accounts by members of the media, to estimate the rank of adjacency matrices, and to identify causal effects in social media feedback loops.

Research software

None of these packages are on CRAN at the moment. I would characterize them all as approximately 75 percent done; the key functionality works but documentation, tests, and other user-friendly elements may be missing.

  • aPPR helps you calculate approximate personalized pageranks from large graphs, including those that can only be queried via an API. aPPR additionally performs degree correction and regularization, allowing users to recover blocks from stochastic blockmodels. Leverages socialsampler and twittercache to cache targeted network samples as the personalized pagerank calculations run. paper

  • vsp performs semi-parametric estimation of latent factors in random-dot product graphs by computing varimax rotations of the spectral embeddings of graphs. The resulting factors are sparse and interpretable, and the estimation procedure is likely the most computationally efficient procedure to obtain a consistent estimates for community membership in stochastic blockmodels at the moment. paper

  • fastRG samples random-dot product graphs much faster than naive sampling procedures and is especially useful when running simulation studies. See the paper for a description of the fastRG core algorithm.

  • fastadi performs self-tuning matrix completion via adaptive thresholding, often outperforming softImpute. See the paper for algorithmic and theoretical details. I have also extended this work to work with matrices where the entire upper triangle is observed as part of my citations project.

Outside of research, I am involved in a number of open source projects in the tidyverse orbits. In particular, I maintain the broom package, and for my contributions am an author on the tidyverse paper. I intermittently participate in the Stan and ROpenSci communities as well.

Publications

Submitted

  1. Alex Hayes, Karl Rohe. Localization, interpolation & the spectral decomposition of the regularized Graph Laplacian

In preparation

  1. Alex Hayes, Karl Rohe, Alex Tahk. Computationally efficient spectral analysis of citation networks, with an application to U.S. Supreme Court opinions.

Published

  1. Hadley Wickham, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, Alex Hayes, Lionel Henry, Jim Hester, Max Kuhn, Thomas Lin Pedersen, Evan Miller, Kirill Müller, David Robinson, Dana Paige Seidel, Vitalie Spinu, Kohske Takahashi, Davis Vaughan, Claus Wilke, Kara Woo, Hiroaki Yutani. Welcome to the Tidyverse. Journal of Open Source Software. 2019. pdf