code
My research code, as well as miscellaneous personal projects in various stages of completion, lives on my Github. I primarily use R
, and most of my work as a developer is on methods packages. I am also a proficient Python user, and have passing exposure to SQL, Julia, and C++.
research software
aPPR
approximates Personalized PageRanks in large graphs, including those that can only be queried via an API, such as the Twitter following graph.aPPR
additionally performs degree correction and regularization, allowing users to recover blocks from stochastic blockmodels (see Chen, Zhang, and Rohe 2020). You can combineaPPR
with theneocache
backend to sample large portions of the Twitter following graph with high Personalized PageRanks around seed nodes (joint work with Nathan Kolbow). I strongly believe that researchers are not analyzing the Twitter following graph enough and I am happy to help you use these packages to collect and analyze that data. Sometimes the code does go stale due to changes in the Twitter API – let me know when this happens and I’ll push a bugfix as fast as I can. slidesvsp
performs semi-parametric estimation of latent factors in random-dot product graphs by computing varimax rotations of the spectral embeddings of graphs. The resulting factors are sparse and interpretable. The theory work on this was done by Rohe and Zeng (2022+), and then I ended up using varimax rotation a lot in my own data analysis and wrapped some of the infrastructure I developed into this package. I am committed to maintenance of this package and will respond quickly to feature requests or questions about how you might use it in your own research.fastRG
samples large, sparse random-dot product graphs very efficiently and is especially useful when running simulation studies for spectral network estimators. I am committed to maintenance of this package and will respond quickly to feature requests or questions about how you might use it in your own research. ThefastRG
sampling algorithm is described in Rohe et al. (2018).fastadi
is a proof-of-concept implementation ofAdaptiveImpute
, a self-tuning matrix completion with adaptive thresholding that is closely related tosoftImpute
(Cho, Kim, and Rohe 2019, 2018). I extendedAdaptiveImpute
to the computationally challenging case where the entire upper triangle is observed as part of my work with Karl Rohe on citation networks. This is research code rather than code intended for broad consumption. I make no commitments to maintaining or improving this code unless something about it is blocking an ongoing research project.
design of statistical software
I am particularly interested in the design of statistical software and have been contributed to ROpenSci statistical software reviewing guidelines, as well as early versions of the tidymodels implementation principles. I have some long form explorations of modeling software design on my blog:
I review for the Journal of Open Source Software and the R Journal.
#rstats
I have been involved in a number of open source projects in the tidyverse
and tidymodels
orbits. I previously maintained the broom
package, and am responsible for the 0.5.0
release and a portion of the 0.7.0
release. For these contributions I was generously given authorship on the tidyverse paper. I intermittently participate in the Stan and ROpenSci communities.
I also wrote the distributions3
package, which provides an S3 interface to distribution functions, with an emphasis on good documentation and beginner friendly design. The vignettes in particular are designed to walk students intro stat courses though a litany of classic hypothesis tests. I do not actively maintain distributions3
but there is small community of invested contributors.
Last updated 2022-11-06.