code

My research code, as well as miscellaneous personal projects in various stages of completion, lives on my Github. I primarily use R, and most of my work as a developer is on methods packages. I am also a proficient Python user, and have passing exposure to SQL, Julia, and C++.

research software

  • aPPR approximates Personalized PageRanks in large graphs, including those that can only be queried via an API, such as the Twitter following graph. aPPR additionally performs degree correction and regularization, allowing users to recover blocks from stochastic blockmodels (see Chen, Zhang, and Rohe 2020). You can combine aPPR with the neocache backend to sample large portions of the Twitter following graph with high Personalized PageRanks around seed nodes (joint work with Nathan Kolbow). I strongly believe that researchers are not analyzing the Twitter following graph enough and I am happy to help you use these packages to collect and analyze that data. Sometimes the code does go stale due to changes in the Twitter API – let me know when this happens and I’ll push a bugfix as fast as I can. slides

  • vsp performs semi-parametric estimation of latent factors in random-dot product graphs by computing varimax rotations of the spectral embeddings of graphs. The resulting factors are sparse and interpretable. The theory work on this was done by Rohe and Zeng (2022+), and then I ended up using varimax rotation a lot in my own data analysis and wrapped some of the infrastructure I developed into this package. I am committed to maintenance of this package and will respond quickly to feature requests or questions about how you might use it in your own research.

  • fastRG samples large, sparse random-dot product graphs very efficiently and is especially useful when running simulation studies for spectral network estimators. I am committed to maintenance of this package and will respond quickly to feature requests or questions about how you might use it in your own research. The fastRG sampling algorithm is described in Rohe et al. (2018).

  • fastadi is a proof-of-concept implementation of AdaptiveImpute, a self-tuning matrix completion with adaptive thresholding that is closely related to softImpute (Cho, Kim, and Rohe 2019, 2018). I extended AdaptiveImpute to the computationally challenging case where the entire upper triangle is observed as part of my work with Karl Rohe on citation networks. This is research code rather than code intended for broad consumption. I make no commitments to maintaining or improving this code unless something about it is blocking an ongoing research project.

design of statistical software

I am particularly interested in the design of statistical software and have been contributed to ROpenSci statistical software reviewing guidelines, as well as early versions of the tidymodels implementation principles. I have some long form explorations of modeling software design on my blog:

I review for the Journal of Open Source Software and the R Journal.

#rstats

I have been involved in a number of open source projects in the tidyverse and tidymodels orbits. I previously maintained the broom package, and am responsible for the 0.5.0 release and a portion of the 0.7.0 release. For these contributions I was generously given authorship on the tidyverse paper. I intermittently participate in the Stan and ROpenSci communities.

I also wrote the distributions3 package, which provides an S3 interface to distribution functions, with an emphasis on good documentation and beginner friendly design. The vignettes in particular are designed to walk students intro stat courses though a litany of classic hypothesis tests. I do not actively maintain distributions3 but there is small community of invested contributors.

Last updated 2022-11-06.

References

Chen, Fan, Yini Zhang, and Karl Rohe. 2020. “Targeted Sampling from Massive Block Model Graphs with Personalized PageRank.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82 (1): 99–126. https://doi.org/10.1111/rssb.12349.
Cho, Juhee, Donggyu Kim, and Karl Rohe. 2018. “Asymptotic Theory for Estimating the Singular Vectors and Values of a Partially-Observed Low Rank Matrix with Noise.” Statistica Sinica. https://doi.org/10.5705/ss.202016.0205.
———. 2019. “Intelligent Initialization and Adaptive Thresholding for Iterative Matrix Completion: Some Statistical and Algorithmic Theory for Adaptive-Impute.” Journal of Computational and Graphical Statistics 28 (2): 323–33. https://doi.org/10.1080/10618600.2018.1518238.
Rohe, Karl, Jun Tao, Xintian Han, and Norbert Binkiewicz. 2018. “A Note on Quickly Sampling a Sparse Matrix with Low Rank Expectation.” Journal of Machine Learning Research 19: 1–13.
Rohe, Karl, and Muzhe Zeng. 2022+. “Vintage Factor Analysis with Varimax Performs Statistical Inference.” arXiv:2004.05387 [Math, Stat], 2022+. https://arxiv.org/abs/2004.05387.