aleatoric
https://www.alexpghayes.com/
Recent content on aleatoricHugo -- gohugo.ioen-USFri, 07 Jun 2019 00:00:00 +0000testing statistical software
https://www.alexpghayes.com/blog/testing-statistical-software/
Fri, 07 Jun 2019 00:00:00 +0000https://www.alexpghayes.com/blog/testing-statistical-software/Motivation Recently I’ve been implementing and attempting to extend some computationally intense methods. These methods are from papers published in the last several years, and haven’t made their way into mainstream software libraries yet. So I’ve been spending a lot of time reading research code, and I’d like to share what I’ve learned.
In this post, I describe how I evaluate the trustworthiness of a modeling package, and in particular what I want from the test suite.type stable estimation
https://www.alexpghayes.com/blog/type-stable-estimation/
Tue, 21 May 2019 00:00:00 +0000https://www.alexpghayes.com/blog/type-stable-estimation/Abstract This post discusses how the mathematical objects we use in formal data modeling are represented in statistical software. First I introduce these objects, then I argue that each object should be represented by a distinct type. Next I present three principles to ensure the type system is statistically meaningful. These principles suggest that existing modeling software has an overly crude type system. I believe a finer type system in statistical packages would result in more intuitive interfaces while increasing extensibility and reducing possibilities for methodological errors.implementing the super learner with tidymodels
https://www.alexpghayes.com/blog/implementing-the-super-learner-with-tidymodels/
Sat, 13 Apr 2019 00:00:00 +0000https://www.alexpghayes.com/blog/implementing-the-super-learner-with-tidymodels/Summary In this post I demonstrate how to implement the Super Learner using tidymodels infrastructure. The Super Learner is an ensembling strategy that relies on cross-validation to determine how to combine predictions from many models. tidymodels provides low-level predictive modeling infrastructure that makes the implementation rather slick. The goal of this post is to show how you can use this infrastructure to build new methods with consistent, tidy behavior. You’ll get the most out of this post if you’ve used rsample, recipes and parsnip before and are comfortable working with list-columns.overlapping confidence intervals: correcting bad intuition
https://www.alexpghayes.com/blog/overlapping-confidence-intervals-correcting-bad-intuition/
Thu, 31 Jan 2019 00:00:00 +0000https://www.alexpghayes.com/blog/overlapping-confidence-intervals-correcting-bad-intuition/Summary In this post I work through a recent homework exercise that illustrates why you shouldn’t compare means by checking for confidence interval overlap. I calculate the type I error rate of this procedure for a simple case. This reveals where our intuition goes wrong: namely, we can recover the confidence interval heuristic by confusing standard deviations and variances.
Checking confidence intervals for overlap Sometimes you may want to check if two (or more) means are statistically distinguishable.some things i've learned about stan
https://www.alexpghayes.com/blog/some-things-ive-learned-about-stan/
Mon, 24 Dec 2018 00:00:00 +0000https://www.alexpghayes.com/blog/some-things-ive-learned-about-stan/Motivation Yesterday, for the first time ever, I coded up a model in Stan and it actually did what I wanted. My current knowledge of Stan is, at best, nascent, but I’ll show you the process I went through to write my first Stan program, pointing out what I wish I’d known along the way.
My goal is to provide a quick and dirty introduction to Stan, hopefully enough to get you started without having to dig into the manual yourself.consent in the presence of correlation
https://www.alexpghayes.com/blog/consent-in-the-presence-of-correlation/
Tue, 11 Dec 2018 00:00:00 +0000https://www.alexpghayes.com/blog/consent-in-the-presence-of-correlation/Motivation This post explores some ideas for a normative ethics of personal data.
To begin, I view ethical actions as those that empower individuals to have agency over their own lives. This line of thinking leads to broadly agreed-upon standards of behavior. To ensure that people maintain control over what is theirs, we must obtain consent before engaging in emotional or physical relationships with them. For this consent to be meaningful, it must be active, ongoing and informed.understanding multinomial regression with partial dependence plots
https://www.alexpghayes.com/blog/understanding-multinomial-regression-with-partial-dependence-plots/
Tue, 23 Oct 2018 00:00:00 +0000https://www.alexpghayes.com/blog/understanding-multinomial-regression-with-partial-dependence-plots/Motivation This post assumes you are familiar with logistic regression and that you just fit your first or second multinomial logistic regression model. While there is an interpretation for the coefficients in a multinomial regression, that interpretation is relative to a base class, which may not be the most useful. Partial dependence plots are an alternative way to understand multinomial regression, and in fact can be used to understand any predictive model.ockham's razor isn't about model selection
https://www.alexpghayes.com/blog/ockhams-razor-isnt-about-model-selection/
Mon, 03 Sep 2018 00:00:00 +0000https://www.alexpghayes.com/blog/ockhams-razor-isnt-about-model-selection/Summary Ockham’s Razor is about what to believe when we have no evidence, not how to pick between theories supported by equal amounts of evidence.
In slighly longer form I’m in the middle of The Science of Conjecture and I just realized that I’ve been misinterpreting Ockham’s Razor for the last several years. Ockham’s Razor says:
Entities are not to be multiplied without necessity.
For a long time, I’d taken this to mean:swans, uncertainty and randomness
https://www.alexpghayes.com/blog/swans-uncertainty-and-randomness/
Tue, 14 Aug 2018 00:00:00 +0000https://www.alexpghayes.com/blog/swans-uncertainty-and-randomness/Motivation Why is probability an appropriate way represent uncertainty?
Statisticians typically emphasize the need to estimate uncertainty in inference and prediction. Despite making heavy use of randomness in statistics, we rarely explain why randomness is an appropriate tool to use to model the world. If we would like others to use statistics, I believe we should provide an explanation of the importance of probability. This post contains one explanation I find personally satisfying.a summer with rstudio
https://www.alexpghayes.com/blog/a-summer-with-rstudio/
Fri, 10 Aug 2018 00:00:00 +0000https://www.alexpghayes.com/blog/a-summer-with-rstudio/Today is the last day of my summer internship with RStudio. This is the first year that RStudio has had an official internship program, and I couldn’t be happier to have been a part of it.
My mandate for the summer has been to make broom better. My project was advised by both Dave Robinson (DataCamp) and Max Kuhn (RStudio). Dave originally wrote the broom package and acted as my primary mentor.speeding up GPX ingest: profiling, Rcpp and furrr
https://www.alexpghayes.com/blog/speeding-up-gpx-ingest-profiling-rcpp-and-furrr/
Fri, 15 Jun 2018 00:00:00 +0000https://www.alexpghayes.com/blog/speeding-up-gpx-ingest-profiling-rcpp-and-furrr/This post is a casual case study in speeding up R code. I work through several iterations of a function to read and process GPS running data from Strava stored in the GPX format. Along the way I describe how to visualize code bottlenecks with profvis and briefly touch on fast compiled code with Rcpp and parallelization with furrr.
The problem: tidying trajectories in GPX files I record my runs on my phone using Strava.reflections on SAMSI's 2018 undergraduate modelling workshop
https://www.alexpghayes.com/blog/reflections-on-samsis-2018-undergraduate-modelling-workshop/
Fri, 01 Jun 2018 00:00:00 +0000https://www.alexpghayes.com/blog/reflections-on-samsis-2018-undergraduate-modelling-workshop/I spent the last week at Statistical and Mathematical Sciences Institute’s (SAMSI) undergraduate modelling workshop. This year the workshop was hosted at North Carolina State University in Raleigh.
Rundown of the workshop About thirty students attended the workshop. To get in there’s a mellow application process. SAMSI covered travel, rooming and food for the participants. We were expected to bring laptops with R and RStudio installed. The purpose of the workshop was to give undergrads experience modelling real world data.comparing runs with riegel's formula and GAMs
https://www.alexpghayes.com/blog/comparing-runs-with-riegels-formula-and-gams/
Wed, 16 May 2018 00:00:00 +0000https://www.alexpghayes.com/blog/comparing-runs-with-riegels-formula-and-gams/Runners often vary the distance and intensity of their workouts. In this post I demonstrate how to compare runs of different lengths using Riegel’s formula. The formula accurately describes the tradeoff between run distance and average speed for aerobic runs up to about a half-marathon in length. Using my Strava data, I demonstrate how to use Riegel’s formula to measure the difficulty of runs on a standardized scale and briefly investigate how my fitness has changed over time with GAMs.predictive performance via bootstrap variants
https://www.alexpghayes.com/blog/predictive-performance-via-bootstrap-variants/
Thu, 03 May 2018 00:00:00 +0000https://www.alexpghayes.com/blog/predictive-performance-via-bootstrap-variants/When we build a predictive model, we are interested in how the model will perform on data it hasn’t seen before. If we have lots of data, we can split it into training and test sets to assess model performance. If we don’t have lots of data, it’s better to fit a model using all of the available data and to assess its predictive performance using resampling techniques. The bootstrap is one such resampling technique.dear students: take course evals seriously
https://www.alexpghayes.com/blog/dear-students-take-course-evals-seriously/
Fri, 08 Dec 2017 00:00:00 +0000https://www.alexpghayes.com/blog/dear-students-take-course-evals-seriously/As the semester ends, I would like to remind students of the value of a well-written course evaluation. Course evaluations allow students to share wisdom with the next generation and to provide feedback to instructors and the university. Despite this, few students fill out narrative reviews. I propose we up our game.
In my ideal world, course1 evaluations are written for students by students. They contain any advice you would go back and give to yourself before the class.numerical gradient checks
https://www.alexpghayes.com/blog/numerical-gradient-checks/
Wed, 18 Oct 2017 00:00:00 +0000https://www.alexpghayes.com/blog/numerical-gradient-checks/Motivation Suppose you have some loss function \(\mathcal{L}(\beta) : \mathbb{R}^n \to \mathbb{R}\) you want to minimize with respect to some model parameters \(\beta\). You understand how gradient descent works and you have a correct implementation of \(\mathcal{L}\) but aren’t sure if you took the gradient correctly or implemented it correctly in code.
Solution We can compare our implemention of the gradient of \(\mathcal{L}\) to a finite difference approximation of the gradient.gentle tidy eval with examples
https://www.alexpghayes.com/blog/gentle-tidy-eval-with-examples/
Mon, 07 Aug 2017 00:00:00 +0000https://www.alexpghayes.com/blog/gentle-tidy-eval-with-examples/I’ve been using the tidy eval framework introduced with dplyr 0.7 for about two months now, and it’s time for an update to my original post on tidy eval. My goal is not to explain tidy eval to you, but rather to show you some simple examples that you can easily generalize from.
library(tidyverse) starwars ## # A tibble: 87 x 13 ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Luke~ 172 77 blond fair blue 19 male ## 2 C-3PO 167 75 <NA> gold yellow 112 <NA> ## 3 R2-D2 96 32 <NA> white, bl~ red 33 <NA> ## 4 Dart~ 202 136 none white yellow 41.about
https://www.alexpghayes.com/about/
Mon, 01 Jan 0001 00:00:00 +0000https://www.alexpghayes.com/about/I just finished my first year as a PhD student at the University of Wisconsin-Madison statistics program. Currently I’m working on some network analysis projects with Karl Rohe.
Before grad school, I got a degree in statistics at Rice University. Last summer I interned at RStudio. Previously I’ve done biostats research at Fred Hutch. Before that I led canoe trips for YMCA Camp Menogyn.
I’m interested in building statistical tools, and how statistics can help people make better decisions.news
https://www.alexpghayes.com/news/
Mon, 01 Jan 0001 00:00:00 +0000https://www.alexpghayes.com/news/May 2019: I recieved an Outstanding Teaching Assistant 2018-2019 award from the UW-Madison stats department for my work as a TA!
March 2019: I co-organized the Chicago R Unconference with Angela Li and Emily Riederer! That’s a lie, Angela organized everything, but it was blast and I spent the weekend helping people make their first open source contributions to broom.
January 2019: rstudio::conf(2019) was an absolute blast! It was a pleasure to spend two days teaching the tidymodels approach to machine learning in R with Max Kuhn and Davis Vaughn (workshop materials).