Resume

Statistician | research software engineer | data scientist

This resume is available as a pdf.

Statistician with 6 years of experience analyzing network data, performing causal inference and building tools for data scientists. Track record of strategic thinking, clear communication and successful remote work.

Work Experience

University of Wisconsin, Madison
PhD Student, Department of Statistics

August 2018 - Present

  • Developed statistical methods based on principal components analysis to cluster networks with missing data, to perform regression on networks, and to construct, interpret, and regularize network embeddings.
  • Developed causal inference methods to estimate mediation and spillover effects in social networks, and to determine when product changes have harmful side-effects on behaviors that are difficult to measure. Used causal machine learning to improve precision of estimates while reducing computational requirements by a factor of 5000.
  • Implemented research methods in user-friendly software. Released nine open source R packages to CRAN (notable: fastRG, vsp, distributions3, gdim, aPPR, fastadi).
  • Resolved computational bottlenecks in matrix completion algorithms by designing and implementing sparse matrix methods in R and C++. Scaled methods by three orders of magnitude to handle networks with millions of nodes.
  • Designed an approach to find localized clusters of Twitter users via Personalized PageRank. Managed unreliable Twitter API behavior by caching data in a Neo4J database running in Docker.
  • Collaborated with ROpenSci to design software development standards for statistical software. Reviewed scientific software for ROpenSci, the R Journal, and the Journal of Open Source Software.
Facebook
Research Intern, Core Data Science

Summer 2020 & Summer 2021

  • Prototyped a pipeline to automatically suggest relationships between hashtags, for a team using manual labeling. Prototype embedded a hashtag co-occurrence network and was implemented with Python, PyTorch and SQL.
  • Conducted experiments on hyperbolic embeddings for knowledge graphs and determined non-viability of hyperbolic methods. Advised against additional R&D investment, potentially saving $200k+ in compute costs.
  • Designed a metric, based on calibration of machine learning models, to help product teams understand reliability of prevalence estimates. Metric reported daily on multiple dashboards. Implemented with sklearn, Numpy, pandas.
RStudio
Intern, tidymodels team

Summer 2018

  • Re-factored thousands of lines of R code and developed a new test suite for the broom package (600k+ downloads/month, part of the tidyverse), improving behavioral consistency and reducing maintenance burden.
  • Shipped a major new release of the package (broom 0.5.0). Resolved 80+ open issues and coordinated 40+ pull requests from open source contributors.

Education

University of Wisconsin-Madison
Ph.D. Statistics

Expected spring 2024

Rice University
B.A. Statistics

2018

Skills

  • Network analysis, embeddings, clustering, causal machine learning, interference, mediation
  • Data analysis, visualization, modeling, regression, generalized linear models, hypothesis testing
  • Proficient in R, Python, tidyverse, bash/unix, git; familiar with SQL, C++, Docker, AWS, Julia, Stan