library(tidyverse)
<- 20
n <- matrix(rnorm(n * 2), ncol = 2)
X <- rnorm(n)
y
ggplot(NULL, aes(x = X[, 1], y = y)) +
geom_point() +
geom_smooth(method = "lm", se = FALSE) +
labs(title = "One dimension of the simulated data", x = expression(X[1])) +
theme_classic()
Motivation
Suppose you have some loss function
Solution
We can compare our implemention of the gradient of
If we take
Example: Checking the gradient of linear regression
Suppose that we have
We want to find an estimate
In the final step above we recognize that the sum of squared residuals can be written as a dot product. Next we’d like to the gradient of this dot product. There’s a beautiful explanation of how to take the gradient of a quadratic form here. The gradient (in matrix notation) is
We can now implement an analytical version of
Next we implement our loss and gradient functions. We assume the loss
function is implemented correctly but want to check the analytical_grad
implementation.
<- function(beta) {
loss <- y - X %*% beta
resid sum(resid^2) / (2 * n)
}
<- function(beta) {
analytical_grad <- -t(y - X %*% beta) %*% X / n
grad as.vector(grad)
}
To perform this check, we need get approximate the gradient in a direction
#' @param f function that takes a single vector argument x
#' @param x point at which to evaluate derivative of f (vector)
#' @param d direction in which to take derivative of f (vector)
#' @param eps epsilon to use in the gradient approximation
<- function(f, x, d, eps = 1e-8) {
numerical_directional_grad f(x + eps * d) - f(x - eps * d)) / (2 * eps)
( }
And then to approximate the entire gradient, we need to combine directional derivatives in each of the unit directions:
<- function(x) {
zeros_like rep(0, length(x))
}
<- function(f, x, eps = 1e-8) {
numerical_grad <- zeros_like(x)
grad for (dim in seq_along(x)) {
<- zeros_like(x)
unit <- 1
unit[dim] <- numerical_directional_grad(f, x, unit, eps)
grad[dim]
}
grad
}
<- function(want, got) {
relative_error - got) / want # assumes want is not zero
(want }
Now we can check the relative error between our analytical implementation of the gradient and the numerical approximation.
<- c(2, 3) # point in parameter space to check gradient at
b
<- numerical_grad(loss, b)
num_grad <- analytical_grad(b)
ana_grad
num_grad
[1] 2.112107 3.286946
ana_grad
[1] 2.112107 3.286946
relative_error(num_grad, ana_grad)
[1] -2.810374e-08 1.553777e-08
The relative error is small, and we can feel confident that our implementation of the gradient is correct.
This post is based off of Tim Vieira’s fantastic post on how to use numerical gradient checks in practice, but with R
code. See also the numDeriv package.