Recent content on aleatoric
Mon, 03 Sep 2018 00:00:00 +0000

ockham's razor isn't about model selection

Summary Ockham's Razor is about what to believe when we have no evidence, not how to pick between theories supported by equal amounts of evidence.
In slighly longer form I’m in the middle of The Science of Conjecture and I just realized that I’ve been misinterpreting Ockham’s Razor for the last several years. Ockham’s Razor says:
Entities are not to be multiplied without necessity.
Entities are not to be multiplied without necessity.

For a long time, I'd taken this to mean:
Tue, 14 Aug 2018 00:00:00 +0000http://www.alexpghayes.com/blog/swans-uncertainty-and-randomness/Motivation Why is probability an appropriate way represent uncertainty?
Statisticians typically emphasize the need to estimate uncertainty in inference and prediction. Despite making heavy use of randomness in statistics, we rarely explain why randomness is an appropriate tool to use to model the world. If we would like others to use statistics, I believe we should provide an explanation of the importance of probability. This post contains one explanation I find personally satisfying.a summer with rstudio
Fri, 10 Aug 2018 00:00:00 +0000http://www.alexpghayes.com/blog/a-summer-with-rstudio/Today is the last day of my summer internship with RStudio. This is the first year that RStudio has had an official internship program, and I couldn’t be happier to have been a part of it.
My mandate for the summer has been to make broom better. My project was advised by both Dave Robinson (DataCamp) and Max Kuhn (RStudio). Dave originally wrote the broom package and acted as my primary mentor.speeding up GPX ingest: profiling, Rcpp and furrr
Fri, 15 Jun 2018 00:00:00 +0000

speeding up GPX ingest: profiling, Rcpp and furrr

This post is a casual case study in speeding up R code. I work through several iterations of a function to read and process GPS running data from Strava stored in the GPX format. Along the way I describe how to visualize code bottlenecks with profvis and briefly touch on fast compiled code with Rcpp and parallelization with furrr.

The problem: tidying trajectories in GPX files I record my runs on my phone using Strava.
The problem: tidying trajectories in GPX files I record my runs on my phone using Strava.reflections on SAMSI's 2018 undergraduate modelling workshop
Fri, 01 Jun 2018 00:00:00 +0000http://www.alexpghayes.com/blog/reflections-on-samsis-2018-undergraduate-modelling-workshop/I spent the last week at Statistical and Mathematical Sciences Institute’s (SAMSI) undergraduate modelling workshop. This year the workshop was hosted at North Carolina State University in Raleigh.
Rundown of the workshop About thirty students attended the workshop. To get in there’s a mellow application process. SAMSI covered travel, rooming and food for the participants. We were expected to bring laptops with R and RStudio installed. The purpose of the workshop was to give undergrads experience modelling real world data.comparing runs with riegel's formula and GAMs
Wed, 16 May 2018 00:00:00 +0000

comparing runs with riegel's formula and GAMs

Runners often vary the distance and intensity of their workouts. In this post I demonstrate how to compare runs of different lengths using Riegel's formula. The formula accurately describes the tradeoff between run distance and average speed for aerobic runs up to about a half-marathon in length. Using my Strava data, I demonstrate how to use Riegel's formula to measure the difficulty of runs on a standardized scale and briefly investigate how my fitness has changed over time with GAMs.
Thu, 03 May 2018 00:00:00 +0000

predictive performance via bootstrap variants

When we build a predictive model, we are interested in how the model will perform on data it hasn't seen before. If we have lots of data, we can split it into training and test sets to assess model performance. If we don't have lots of data, it's better to fit a model using all of the available data and to assess its predictive performance using resampling techniques. The bootstrap is one such resampling technique.
Fri, 08 Dec 2017 00:00:00 +0000

dear students: take course evals seriously

As the semester ends, I would like to remind students of the value of a well-written course evaluation. Course evaluations allow students to share wisdom with the next generation and to provide feedback to instructors and the university. Despite this, few students fill out narrative reviews. I propose we up our game.

In my ideal world, course1 evaluations are written for students by students. They contain any advice you would go back and give to yourself before the class.
In my ideal world, course1 evaluations are written for students by students. They contain any advice you would go back and give to yourself before the class.numerical gradient checks
Wed, 18 Oct 2017 00:00:00 +0000

numerical gradient checks

Motivation Suppose you have some loss function \(\mathcal{L}(\beta) : \mathbb{R}^n \to \mathbb{R}\) you want to minimize with respect to some model parameters \(\beta\). You understand how gradient descent works and you have a correct implementation of \(\mathcal{L}\) but aren't sure if you took the gradient correctly or implemented it correctly in code.

Solution We can compare our implemention of the gradient of \(\mathcal{L}\) to a finite difference approximation of the gradient.
Solution We can compare our implemention of the gradient of \(\mathcal{L}\) to a finite difference approximation of the gradient.gentle tidy eval with examples
Mon, 07 Aug 2017 00:00:00 +0000http://www.alexpghayes.com/blog/gentle-tidy-eval-with-examples/I’ve been using the tidy eval framework introduced with dplyr 0.7 for about two months now, and it’s time for an update to my original post on tidy eval. My goal is not to explain tidy eval to you, but rather to show you some simple examples that you can easily generalize from.
library(tidyverse) starwars ## # A tibble: 87 x 13 ## name height mass hair_color skin_color eye_color birth_year gender ## <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <chr> ## 1 Luke… 172 77 blond fair blue 19 male ## 2 C-3PO 167 75 <NA> gold yellow 112 <NA> ## 3 R2-D2 96 32 <NA> white, bl… red 33 <NA> ## 4 Dart… 202 136 none white yellow 41.about
Mon, 01 Jan 0001 00:00:00 +0000

about

I'm a first year PhD student in the University of Wisconsin-Madison statistics program. I just graduated from Rice University with a degree in statistics. At Rice, I spent most of time getting Rice DataSci, the fledgingly data science club, off the ground.
I spent my summer interning at RStudio. Previously I’ve done biostats research at Fred Hutch. Before that I led canoe trips for YMCA Camp Menogyn.
I’m interested in how statistics can help people make better decisions.news
Mon, 01 Jan 0001 00:00:00 +0000

news

August 2018: I'm pleased to announce that I'll be co-teaching a workshop on machine learning with Max Kuhn at rstudio::conf 2019.
August 2018: Finished my fantastic summer with rstudio and moved to Madison, Wisconsin.
May 2018: Spent a week learning about climate modelling at SAMSI’s undergraduate workshop. I highly recommend undergrads in statistics check it out!
May 2018: Graduated from Rice with a B.A. Statistics and Distinction in Research and Creative Work.