Introduction
When we exercise, our hearts speed up like they’re late for a date with a treadmill. This acceleration is reflected in the shrinking time between successive heartbeats, what the cool kids call the R‑R interval (RRi). If you’ve ever squinted at an electrocardiogram (ECG or EKG) trace, those dramatic vertical spikes are the R‑waves of each cardiac cycle. The time between two R‑waves gives us the RRi, usually measured in milliseconds. It’s deceptively simple, like a one‑ingredient recipe, yet it packs a punch as a real‑time indicator of both your heart rate and the tug‑of‑war between your sympathetic and parasympathetic nervous systems.
So, let’s connect the dots: during exercise, your sympathetic “fight‑or‑flight” side takes the wheel and the RRi shrinks (i.e., heart rate accelerates); during rest or recovery, the parasympathetic “chill‑ax” team rejoins, and the RRi lengthens (i.e., heart rate slows). This seesaw between autonomic branches crafts dynamic RRi patterns that reveal your physiological status and your cardio‑adaptability, like a Netflix algorithm that knows whether you’re into rom‑coms or true crime.
The RRi Dance: From Rest to Recovery
Picture it: at rest, your RRi is high, like a sloth binge‑watching its favorite show. Then you hit the gym, your RRi plummets faster than Bitcoin during a market crash. Once you stop, there’s a gradual recovery, reminiscent of that cliffhanger in a Marvel movie, slow build-up, dramatic pause, and finally resolution.
But calling it just “fast-to-slow” is like describing Game of Thrones as “a show with dragons”. The RRi curve is packed with hidden Easter eggs: baroreflex sensitivity, vagal rebound, sympathetic withdrawal, and more. To mine these gems, we need a model that actually understands the plot twist.
Enter the Castillo‑Aguilar et al. (2025) nonlinear RRi‑vs‑Time model. This beauty captures the composite shape of RRi dynamics across exercise and recovery, embedding asymmetric decay and rebound kinetics, plateau phases, and the “everyone’s heart is different” variability in curve smoothness.
Why Modeling RRi Dynamics Matters
Cardiovascular regulation is about as linear as a Christopher Nolan plot, it loops back on itself, throws in surprises, and keeps you guessing. A solid model of RRi behavior lets researchers and clinicians extract juicy parameters from data: the time constant of vagal reactivation (your parasympathetic chill‑out speed), the latency of sympathetic offset (how quickly you hit the brakes), or the asymptotic RRi level post‑exercise (your heart’s version of “and they lived happily ever after”). These metrics can be biomarkers of autonomic flexibility, cardiorespiratory fitness, or even frailty, because sometimes your heart is a delicate snowflake.
But modeling is only half the battle. Fitting a nonlinear model to real‑world RRi data is like reverse‑engineering Thanos’s gauntlet: given the snap‑tastic curve, which combination of infinity stones (parameters) did it take? The RRi response is notoriously idiosyncratic, highly subject‑specific, noisy, and often non‑stationary. You can fit the model to two different folks and get parameter sets as different as Luke and Darth Vader. Slight data noise and, boom, your fit is more unstable than a Jenga tower in an earthquake.
That’s when uncertainty quantification becomes your trusty sidekick.
A Matter of Guessing (But Doing It Right)
Think of modeling RRi dynamics as dial‑tuning a vintage synthesizer at a rave: decay rates, asymptotes, inflection points, all those knobs control the final beat drop (or curve). Sure, you might stumble on a combo that sounds (or fits) right, but are you sure it’s unique? Could a tweak in the opposite direction produce the same vibe? Which knobs actually matter, and which are just decorations?
Enter Bayesian parameter estimation, which doesn’t just pick the “best” knob settings; it surveys the entire festival grounds of plausible settings given the data. In Bayesian land, each combination is a point in a high‑dimensional rave, our goal is to map out the posterior distribution across that landscape.
Some regions are crowded (high probability), others are deserted wastelands (low probability). Characterizing these areas yields not only central estimates but also credibility intervals, parameter correlations, and maybe even multimodal party clusters. But exploring this terrain is like traversing Mordor, classical Markov Chain Monte Carlo (MCMC) methods such as Metropolis‑Hastings can take eons to converge, especially when dimensions spike or parameters gang up in strong correlations.
Cue the dramatic entry of Hamiltonian Monte Carlo (HMC).
Hamiltonian Monte Carlo: Beyond Random Walks
HMC is like giving your sampler a jetpack and a map, you don’t just wander aimlessly; you use gradient info (the slope of the log‑posterior, AKA “how steep is this hill”) to guide each leap. Instead of stumbling around blind (random‑walk MCMC), HMC surfs the posterior landscape, riding down slopes and coasting over plateaus.
Picture this: you’re Neo in The Matrix, seeing the curves of the probability surface, riding a particle through space with grace and style. The position of the particle is your set of model parameters (
Simulating this combo follows Hamiltonian mechanics, where the total energy (potential + kinetic) is conserved, think of it as your personal “always‑on” energy bar.
From Posterior to Physics: Hamilton’s Equations
The Hamiltonian encapsulates the system’s energy:
is the potential energy, here, the negative log posterior (so higher probability = lower potential). is the kinetic energy, usually a quadratic form in momentum, like .
The magic happens when you integrate Hamilton’s equations:
These reversible, volume‑preserving dynamics let the particle glide through parameter space, conserving total energy, no crazy backtracking or random wanderings. HMC thus can make long, informed jumps without tanking acceptance rates, unlike our hapless Metropolis‑friends.
But here’s the kicker: you need the gradient of the potential energy, i.e., the gradient of the negative log‑posterior (log‑likelihood plus log‑prior). Since we’re assuming flat priors (because today’s focus is the likelihood), the prior contribution drops out, flat function means zero slope:
So for our tutorial, we set
The Gradient Grind: What Powers HMC
Probabilistic programming languages, Stan, PyMC, Turing.jl, automate gradient computation via automatic differentiation. It’s like having Iron Man’s suit calculate your every move. But manually deriving gradients is like opening up the arc reactor: you see which wires do what, you learn where the sparks fly, and you gain true respect for the machinery.
Our RRi‑vs‑Time model is a cocktail of sigmoids and exponentials. Each parameter (
Yes, it’s math‑heavy and notation‑dense, like deciphering ancient Sith scripts. But:
It’s worth it.
A Journey Ahead
In a previous post, we showed how to implement the RRi-vs-Time model in R using HMC. We defined priors, specified the likelihood, and let the sampler do its job. But we didn’t dwell on how the gradient was computed, nor on the intricacies of the target function with respect to each parameter.
That’s what this post is about.
We will derive the full gradient of the RRi-vs-Time likelihood function with respect to its parameters, manually. We will examine each component, apply calculus step-by-step, and connect the math back to the model’s physiological interpretation. It will be dense. But it will also be enlightening. This isn’t just a math exercise, it’s a way to sharpen your modeling intuition, improve your Bayesian inference skills, and gain deeper insight into cardiovascular dynamics.
Because once you’ve hand‑crafted those derivatives, you’ll know why HMC outruns vanilla MCMC, where your model is too fragile, and which parameters truly pull the strings. You’ll transform from a passive sampler‑passenger into the Hamiltonian driver of your own probabilistic saga.
So dust off your calculus cape and let’s dive in, one partial derivative, one leapfrog step at a time. May the gradients be ever in your favor.
Okay, let’s ditch the stuffy academic speak and inject some much-needed personality into this whole data-generating circus. We’re not just dealing with numbers here; we’re uncovering the hidden secrets of the universe… or at least, how this particular set of numbers decided to show up.
The Grand, Totally Spontaneous (Not Really) Data Generation Process
So, imagine our observed RRi signal (
But here’s the kicker, the plot twist worthy of an M. Night Shyamalan film: the true signal, the one our witness data is supposed to be centered around (
So, let
Yeah, looks complicated, right? Like a boss battle in a video game. But somebody’s gotta figure out what those
Arriving at the Log Likelihood: Or, How We Stop Panicking About Tiny Numbers
Okay, remember that whole “Gaussian distribution” thing? Good, because that’s our starting point for figuring out just how likely our observed data is, given our model and its parameters. The Gaussian probability function is basically its ID card, its fingerprint. It looks like this, in all its slightly intimidating glory:
Now, the likelihood for a set of data points (
BUT (and it’s a big “but”, like Sir Mix-a-Lot approved), multiplying a zillion tiny probabilities together gives you an even TINIER number. We’re talking smaller than your chances of winning the lottery while being struck by lightning on a Tuesday. This makes computers sad and prone to errors. So, like any good hacker, we use a trick: logarithms! Logarithms turn those scary multiplications into friendly additions, and they keep our numbers from disappearing into the computational abyss. It’s basically applying a cheat code for numerical stability. Hence, the logarithm of our likelihood looks like this after a bit of log-magic:
Now, let’s simplify this beast. The first term is like trying to divide by taking logs (quotient rule!), and the second term? Well, the logarithm and the exponential function (
Even better,
Finally, the first term in the summation doesn’t even depend on
And there you have it! We’ve wrestled the likelihood function, tamed it with logarithms, and forced it into this neat, much more manageable form. Now we can actually use this for things like, you know, estimating those sneaky
Alright, buckle up, buttercups! We’ve got our fancy log-likelihood function, which is basically our map to finding the “best” parameters. But how do we actually climb this parameter mountain? We need the gradient, the mathematical equivalent of a Sherpa pointing you towards the steepest uphill path. Let’s figure out how this beast changes when we poke it with respect to its various components.
Estimating the Gradient of the Function: Or, Poking the Math Beast to See What Happens
With Respect to : The Mean, The Myth, The Model
First up, let’s see how our log-likelihood changes when we mess with
Look closely! That first term?
Now, let’s tidy up. The
Okay, time for the Chain Rule! This is like assembling a transformer, derive the outer layer (the exponent), then dive inside (the term with
Notice that
Let’s clean this up. The ‘2’ from the exponent comes out and cancels with the ‘2’ in the denominator (
So, after the dust settles, we are left with this beauty, the gradient with respect to
Bam! That tells us how much the log-likelihood score changes if we nudge
With Respect to : The Chaos Coordinator
Now for the spicy one:
Let’s attack the first term like it owes us money.
See that first term now?
Now, let’s use the power rule on
For the second term, let’s get it ready for its close-up. We can pull out constants and rewrite
Alright, Showtime! Let’s derive. The derivative of
Putting it all together, we get:
This is because There was a negative sign in front of the summation term originally:
Okay, now let’s substitute
And there you have it! The gradient with respect to
These gradient functions? They’re our guides! We can plug in current parameter values, calculate the gradient, and it points us in the direction that increases our log-likelihood the fastest (or decreases the negative log-likelihood, same diff). This is crucial for fancy optimization tricks like Gradient Descent or the even cooler Hamiltonian Monte Carlo, which is like exploring the parameter space by pretending it’s a physical system (totally normal, right?).
Next up? We gotta get really specific and calculate the partial derivatives of that complex
Estimation of RRi-vs-Time Gradient
Alright, you magnificent data wranglers! We’ve conquered the log-likelihood mountain, figured out how it reacts to overall mean and variance changes. Now, it’s time to tackle the real boss battle: our specific model function,
Let’s gaze upon its intricate glory once more:
This isn’t just a function; it’s a carefully choreographed dance between seven mysterious parameters,
To assemble this Voltron of derivatives, we gotta derive each piece separately. As they say in all the motivational posters, “Every journey begins with a single step”. Or in this case, “Every gradient vector begins with a single, possibly terrifying, partial derivative”. Let’s dive in!
Deriving the RRi-vs-Time Equations: One Parameter at a Time
Deriving the alpha (
Ah,
Seriously, that’s it. Don’t overthink it. Enjoy this moment, it won’t last.
Deriving the beta (
Essentially, we just peeled
Deriving the
The parameter
See?
Deriving the lambda (
Alright, the fun begins.
Let’s apply the magic formula to the
The derivative of
And rearranging it slightly for maximum aesthetic appeal (or maybe just convention):
That looks intimidating, but it’s just chain rule and quotient rule doing a dance.
Deriving the phi (
Applying the same derivative shortcut to the
The negative signs cancel (yay!), and the derivative of the exponent is
And arranged nicely:
Looking good! Two rate parameters down.
Deriving the tau (
Okay,
Applying the derivative rule to
The negative signs cancel, leaving:
Now applying it to the second term (where the constant is
Two negative signs cancel from the
Adding the results from both terms gives us the derivative with respect to
This one is a bit of a mouthful, reflecting
Deriving the delta (
Finally,
Applying the derivative rule to the second term, with constant
Again, two negative signs cancel (
And there you have it! The derivative with respect to
We did it! We calculated every single partial derivative. Now we can assemble our magnificent gradient vector, the ultimate cheat sheet for navigating the parameter space of our model:
This magnificent… thing… is the full gradient of our RRi-vs-time model. Each component tells us how sensitive the model’s output is to a tiny change in its corresponding parameter at any given time
Expansion of the Gradient of
Okay, team! We’ve got the individual pieces of the puzzle: how the log-likelihood changes when we mess with the mean (
This gradient is the secret sauce, the flux capacitor, the whole reason we went through all that calculus pain. Why? Because it’s what fuels fancy optimization algorithms, particularly Hamiltonian Monte Carlo (HMC). In HMC, we pretend our parameters are tiny particles (
To sound fancy and appease the ghost of Hamilton (the mathematician, not the musical!), we’ll call the gradient of the log likelihood
Okay, quick reality check before the physics gets too wild. In HMC lore, the potential energy
Let’s revisit the gradients we previously derived, the ones that told us how the log-likelihood changes with respect to the mean (
The gradient with respect to
And the gradient with respect to
So, if we were just looking at
BUT WAIT! There’s a catch! The parameters we actually want to estimate are the
Remember that giant vector
[Just a quick note for those keeping score: yes,
Incorporating all those glorious partial derivatives of
This vector, my friends, is the crown jewel. It tells us, for any given set of parameter values (
The calculus part is officially done! We’ve wrestled the derivatives, expanded the gradients, and emerged… well, maybe not unscathed, but victorious. The final boss of math is defeated. Now for the real challenge: translating these beautiful, hand-derived equations into functional R code that won’t just crash and burn. Wish us luck. We’re gonna need it.
Translating Math Into Code
Alright, we survived the calculus-induced fever dream! We’ve got equations that look like they were scribbled by a caffeinated wizard. But as any seasoned adventurer knows, having a map isn’t enough; you need to, you know, actually walk the path. And in the digital realm, “walking the path” means translating this beautiful, terrifying math into cold, hard code. Time to pull these fancy derivatives out of their ivory tower and force them to do something in R.
This “something” is giving our imaginary particle (the one that represents our parameters) directions. We’ll use the negative gradient of the log-likelihood (our
RRi-vs-Time Model: The Core Engine
First things first, we need the actual function that generates the “true” signal (
Let’s remind ourselves of the star of the show:
Where t
will be our time vector (the input data) and params
will be a named list containing our
Code
# ==============================================
# 1. RRi Model (Castillo-Aguilar Equation)
# ==============================================
<- function(t, params) {
rr_model <- params$alpha
alpha <- params$beta
beta <- params$c
c <- params$lambda
lambda <- params$phi
phi <- params$tau
tau <- params$delta
delta
<- beta / (1 + exp(lambda * (t - tau)))
term2 <- (-c * beta) / (1 + exp(phi * (t - tau - delta)))
term3 + term2 + term3
alpha }
Voila! A direct translation. Now, does it actually work? Let’s put it to the test by simulating some data that looks vaguely like the chaos seen during exercise and recovery, and then visualize it. This is where we see if our code is a masterpiece or a hot mess.
Code
# Time vector
<- seq(0.01, 12, 0.01)
t
# list with RRi parameters
<- list(alpha = 800, beta = -300, c = 0.9,
params lambda = -3, phi = -2,
tau = 5, delta = 2)
# list with parameters for SD dynacmis of RRi over time
<- list(alpha = 30, beta = -30, c = 0.7,
params_sd lambda = -1, phi = -0.8,
tau = 5, delta = 3)
# Simulate the "true" (smooth) RRi curve
<- rr_model(t, params)
y_true # And the fluctuations for RRi SD over time
<- rr_model(t, params_sd)
y_sd
# And simulate the observed RRi signal
<- y_true + rnorm(n = length(t), sd = y_sd)
y
ggplot() +
geom_line(aes(t, y), linewidth = 1/4, color = "purple") +
geom_line(aes(t, y_true), linewidth = 1, color = "purple") +
geom_line(aes(t, y_true + y_sd), linewidth = 1/2, linetype = 2, color = "purple") +
geom_line(aes(t, y_true - y_sd), linewidth = 1/2, linetype = 2, color = "purple") +
geom_vline(xintercept = c(5, 7), linetype = 2, color = "gray50") +
annotate("text", label = "Rest", x = 3.5, y = 650, color = "gray50") +
annotate("text", label = "Exercise", x = 6, y = 750, color = "gray50") +
annotate("text", label = "Recovery", x = 8.5, y = 650, color = "gray50") +
labs(y = "RRi (ms)", x = "Time (min)",
title = "Exercise-Induced RRi Dynamics",
subtitle = "Cardiac Autonomic Modulation")
Look at that! Our model can indeed simulate those dramatic shifts in RRi during rest, exercise, and recovery. Even better, we can use a second instance of the same model (with different parameters) to simulate the changing standard deviation over time. See how the noisy purple line (the “observed” data) gets wider during the exercise phase and then narrows down? That’s the time-varying
Log Likelihood Function: Scoring Our Parameter Guesses
Next up, the log-likelihood function. This is our scoring system. Given a set of parameter values, it tells us (on a log scale, because math) how probable our observed data is. Higher log-likelihood? Good parameters! Lower? Probably not the best fit. This function is key to knowing if our imaginary particle is climbing the right hill.
Let’s recall the equation we sweated over:
Now, let’s give this “bad boy” a proper R implementation:
Code
# ==============================================
# 2. Gaussian Log-Likelihood
# ==============================================
<- function(y, t, params, sigma) {
log_likelihood <- rr_model(t, params)
M <- y - M
residuals <- length(residuals)
N <- sigma^2
sigma_squared
-N/2 * log(2 * pi * sigma_squared) - sum(residuals^2) / (2 * sigma_squared)
}
Simple enough! It takes the observed data (y
, t
), the model parameters (params
), and the crucial sigma
value, and gives us back a single number: the log-likelihood score.
To really understand what this function does, let’s try computing the log-likelihood for a range of two parameters (alpha
and c
) while keeping the others fixed. This lets us visualize a small slice of the parameter space as a 3D surface.
Code
# Same parameters as before
# Observed signal
<- y_true + rnorm(n = length(t), sd = 30)
y
# Input parameters
<- expand.grid(
param_data alpha = seq(710, 890, length.out = 100),
c = seq(0.3, 1.5, length.out = 100)
)
# Compute log likelihood
$log_likelihood <-
param_dataapply(param_data, 1, function(x) {
<- within(params, {
new_params <- x[["alpha"]];
alpha <- x[["c"]]
c
}) log_likelihood(y, t, new_params, sigma = 20)
})
<- as.data.table(param_data)
param_data
# Get unique alpha and c values for axes
<- as.matrix(dcast(param_data, alpha ~ c, value.var = "log_likelihood")[,-1])
z_vals <- unique(param_data$alpha)
alpha_vals <- unique(param_data$c)
c_vals
# Create the 3D surface plot
library(plotly)
plot_ly() |>
add_trace(
x = c_vals,
y = alpha_vals,
z = z_vals,
type = "surface",
colorscale = "Viridis",
showscale = TRUE
|>
) layout(
scene = list(
xaxis = list(title = "C (Recovery proportion)"),
yaxis = list(title = "Alpha parameter"),
zaxis = list(title = "Log-Likelihood"),
camera = list(eye = list(x = -1, y = -2.5, z = 0.5))
),title = "Log-Likelihood Landscape for alpha and c Parameters"
)
Behold! A wild log-likelihood landscape appears! You can see hills and valleys. Optimization algorithms basically try to find the peak of the highest hill on this surface. This visualization is great for 2 parameters, but imagine doing this for all 8 parameters simultaneously. The number of points you’d need on a grid explodes faster than a poorly mixed science experiment.
Exactly. Grid approximation for high-dimensional spaces? Not gonna happen unless you have hardware from the future. This is why gradient-based methods are essential, they use the slope (the gradient!) to figure out which way is up without having to check every single possible location.
Gradient of the RRi-vs-time model: The Model’s Sensitivity Report
Before we can calculate the full gradient of the log-likelihood, we need the gradients of the model itself with respect to each of its parameters. This tells us how sensitive the model’s output is to tiny changes in
Let’s recall that glorious (and slightly terrifying) vector of partial derivatives:
Yeah, it still looks like a lot. But trust me, implementing it in R is much less painful than deriving it by hand. We just translate each line of the vector into code.
Code
# ==============================================
# 3. Partial Derivatives of the RRi Model
# ==============================================
<- function(t, params) {
compute_partials <- params$alpha
alpha <- params$beta
beta <- params$c
c <- params$lambda
lambda <- params$phi
phi <- params$tau
tau <- params$delta
delta
# Precompute common terms
<- t - tau
t_tau <- t - tau - delta
t_tau_delta
<- exp(lambda * t_tau)
exp_lambda <- exp(phi * t_tau_delta)
exp_phi <- (1 + exp_lambda)^2
denom1 <- (1 + exp_phi)^2
denom2
list(
d_alpha = rep(1, length(t)), # Is constant, so we repeat 1's
d_beta = 1/(1 + exp_lambda) - c/(1 + exp_phi), # ∂M/∂β
d_c = -beta / (1 + exp_phi), # ∂M/∂c
d_lambda = -beta * t_tau * exp_lambda / denom1, # ∂M/∂λ
d_phi = c * beta * t_tau_delta * exp_phi / denom2, # ∂M/∂φ
d_tau = beta * lambda * exp_lambda / denom1 - c * beta * phi * exp_phi / denom2, # ∂M/∂τ
d_delta = -c * beta * phi * exp_phi / denom2 # ∂M/∂δ
) }
This function returns a list, where each element is a vector showing how the RRi changes over time for a unit change in that specific parameter. It’s like getting a separate sensitivity report for each control knob on our model.
Let’s visualize these partial derivatives to see how a tiny nudge to each parameter affects the RRi curve across time:
Code
<-
partials compute_partials(t, params) |>
as.data.table()
names(partials) <-
c("alpha", "beta", "c",
"lambda", "phi", "tau", "delta")
<-
data cbind(time = t, partials) |>
melt.data.table(id.vars = "time")
ggplot(data, aes(time, value, col = variable)) +
facet_grid(rows = vars(variable), scales = "free_y",
labeller = label_parsed) +
geom_line(show.legend = FALSE, linewidth = 1) +
geom_vline(xintercept = c(5, 7), linetype = 2, color = "gray50") +
labs(x = "Time (min)", y = "Effect on RRi (ms)",
title = "Effect of parameter change on RRi",
subtitle = "Change in RRi for a unit change for each parameter")
This plot is super informative! It shows that changing
Gradient of the Log Likelihood Function: The Master Gradient
Okay, the moment of truth! We combine the log-likelihood formula with the model’s partial derivatives to get the full gradient vector
Remember this monster?
Implementing this in R means calling our rr_model
to get compute_partials
function to get all the
Code
# ==============================================
# 4. Gradient of Log-Likelihood (∇U(q))
# ==============================================
<- function(y, t, params, sigma) {
grad_U <- rr_model(t, params)
M <- y - M
residuals <- compute_partials(t, params)
partials <- length(residuals)
N
# Precompute inverses
<- 1 / sigma^2
sigma_sq_inv <- 1 / sigma^3
sigma_cu_inv
# Gradients for model parameters
<- list(
grad_params alpha = sum(residuals * partials$d_alpha) * sigma_sq_inv,
beta = sum(residuals * partials$d_beta) * sigma_sq_inv,
c = sum(residuals * partials$d_c) * sigma_sq_inv,
lambda = sum(residuals * partials$d_lambda) * sigma_sq_inv,
phi = sum(residuals * partials$d_phi) * sigma_sq_inv,
tau = sum(residuals * partials$d_tau) * sigma_sq_inv,
delta = sum(residuals * partials$d_delta) * sigma_sq_inv
)
# Gradient for sigma
<- (-N / sigma) + sum(residuals^2) * sigma_cu_inv
grad_sigma
# Return -∇logL(q)
<- c(grad_params, list(sigma = grad_sigma))
gradLL
lapply(gradLL, function(x) -x)
}
This function grad_U
is the engine’s computer. It takes the current parameter values (our particle’s position
Leapfrog Integrator: The Particle Simulator
Now that we have the force function (grad_U
), we need an algorithm to simulate the particle’s movement through the parameter space using that force. This is where symplectic integrators like the leapfrog integrator come in. They are specifically designed to preserve the properties of Hamiltonian systems (like ours!) over time, making the simulation stable and efficient.
I’ve already bored you with the leapfrog integrator in previous blog posts, but a quick reminder never hurt anyone (much). It’s a three-step dance:
Give the particle a random kick (initialize momentum
, typically from a standard normal, one for each parameter in ): (Make sure your random momentum vector is the same size as your parameter vector , one for every !)Update the momentum halfway based on the current gradient (the force):
Update the position using this new, half-updated momentum:
Update the momentum the other half of the way using the gradient at the new position:
And then you just repeat steps 2-4 for a set number of “steps”. The complicated part wasn’t the algorithm; it was getting that
Here’s the leapfrog algorithm compacted into a neat R function:
Code
<- function(q, p, dt, y, t) {
leapfrog # Extract sigma and model parameters
<- q$sigma
sigma <- q[names(q) != "sigma"]
params
# Compute gradient
<- grad_U(y, t, params, sigma)
grad
# Update momentum
<- p - 0.5 * dt * unlist(grad)
p
# Update position
<- as.list(x = unlist(q) + dt * p)
q_new
# Recompute gradient at new position
<- q_new$sigma
new_sigma <- q_new[names(q_new) != "sigma"]
new_params <- grad_U(y, t, new_params, new_sigma)
grad_new
# Final momentum update
<- p - 0.5 * dt * unlist(grad_new)
p
list(q = q_new, p = p)
}
This function takes the current position (q
), momentum (p
), a small step size (dt
), and the data (y
, t
), and returns the particle’s new position and momentum after one leapfrog step.
Now, let’s set up a simulation and watch our imaginary particle bounce around the parameter space!
Code
# Initial parameters ----
## Castillo-Aguilar's model
<- list(
params alpha = 800, beta = -300, c = 0.8,
lambda = -2, phi = -1, tau = 6, delta = 3
)## Parameter for the standard deviation
<- 10
sigma
# Data ----
## Time
<- seq(0, 15, length.out = 1000)
t ## Observed RRi
<- rr_model(t, params) + rnorm(1000, sd = sigma)
y
# Parameters for simulation ----
## Number of steps
<- 10000
steps ## Stepsize at each step
<- 0.00025
stepsize ## Random seed
set.seed(1234)
## Initiate random momentum vector
<- rnorm(n = 8, mean = 0, sd = 1)
p
## Parameter
<- c(params, list(sigma = sigma))
q <- vector("list", length = steps)
out
## Leapfrog integrator
for (step in 1:steps) {
<- leapfrog(q = q, p = p, dt = stepsize, y = y, t = t)
out[[step]] <- out[[step]]$q
q <- out[[step]]$p
p }
We set up initial parameter values (q
), give it a random initial kick (p
), choose a tiny step size (stepsize
), and let the leapfrog function run for many steps. Each step calculates the gradient, updates momentum, updates position, calculates the new gradient, and updates momentum again. It’s like a tiny, energetic dance!
To see if the simulation is behaving properly, we plot the phase space: momentum (
Code
<- lapply(out, function(x) {
position as.data.table(x$q)
|> rbindlist()
})
<- lapply(out, function(x) {
momentum as.list(x$p)
|> rbindlist()
})
$step <- 1:steps
position$step <- 1:steps
momentum
<- melt.data.table(position, id.vars = "step", value.name = "p")
position <- melt.data.table(momentum, id.vars = "step", value.name = "q")
momentum
<- position[momentum, on = c("step", "variable")]
sim_data `:=`(q = scale(q), p = scale(p)), list(variable)]
sim_data[,
ggplot(sim_data, aes(p, q)) +
facet_grid(rows = vars(variable), labeller = label_parsed) +
geom_path(aes(group = 1, alpha = step, col = variable), show.legend = FALSE) +
labs(x = "Momentum (p)", y = "Position (q)",
title = "Phase Space of RRi Model",
subtitle = "Particle Moving on the Parameter Space",
caption = "Values are Standardized For Each Parameter")
Look at that beautiful chaos! For most parameters, you see that expected cyclic behavior, like a bunch of pendulums swinging. This means the particle is doing a decent job exploring those dimensions of the parameter space. However, notice the beta
parameter? It’s just drifting off into the abyss! It didn’t settle into that nice, contained, cyclic pattern. This is a clear sign that our simulation isn’t perfectly tuned.
Why does this happen? Could be anything! The stepsize
might be too big (causing the particle to overshoot optimal regions), the number of steps (steps
) might be too few, or the initial conditions might be bad. Debugging HMC simulations and tuning these “hyperparameters” (stepsize
, trajectory length) is a whole art form in itself, a topic for yet another blog post! But for now, we’ve successfully translated our hairy math into functional R code that simulates a particle exploring a parameter space driven by the gradient of the log-likelihood. We’re basically playing god with tiny mathematical particles. And that’s pretty cool.
How I Learned to Stop Worrying and Love the Gradient
Alright, peel me off the floor. We made it! After a rollercoaster ride through the dark forests of calculus, the shimmering plains of probability, and the slightly-too-exciting physics playground of Hamiltonian Monte Carlo, we’ve arrived. This wasn’t just a stroll in the park; this was scaling Mount Doom with a spork. But hey, look at the view! (The view is a bunch of complex R functions, but trust me, it’s breathtaking for a stats nerd).
Looking back, there were moments, weren’t there? Like when we first saw that monster equation for
But through that pain, clarity emerged. We ripped the black box wide open, Neo-style, dodging algebraic bullets to see the gears turning. Every line of derived math, every function in R, is a piece of the puzzle, showing how the elegant (lol) theory connects to the messy reality of heartbeats.
The real mind-bender? Hamiltonian Monte Carlo. It’s not just an algorithm; it’s a philosophy. It treats our parameters like a particle in a physics simulation, exploring the probability space not by stumbling around blindly like a random-walk zombie, but by using momentum to glide. Imagine Han Solo navigating an asteroid field, expertly using the force (or, in our case, the negative gradient of the log probability) to steer clear of boring low-probability zones and slingshot towards the juicy, high-probability ones. That gradient,
And those parameters in the Castillo-Aguilar model? They stopped being just Greek letters and started looking like character traits.
Make no mistake, this path was littered with the digital remains of failed derivative attempts. Manually deriving partials for that nested logistic function? It was like trying to defuse a bomb blindfolded. One sign error, one missed chain rule, and your whole simulation blows up. This is why tools with automatic differentiation are the superheroes of modern probabilistic programming. They have the precision of a brain surgeon’s laser scalpel compared to my rusty butter knife. But doing it by hand? That gives you the battle scars and the deep appreciation for why that laser scalpel is so necessary. You learn how changing
And let’s not forget the leapfrog integrator, the trusty engine of our simulation. Getting its parameters right is the ultimate Goldilocks problem: the stepsize
can’t be too big (overshoots the peak like a drunk driver), and not too small (takes forever, your computer will stage a protest). You need it just right for that smooth, energy-preserving trajectory. Our
Choosing the right model feels like deciding whether a bazooka or a Swiss Army knife is better for a zombie apocalypse. The Castillo-Aguilar model hits a sweet spot: complex enough to look like real RRi data, but simple enough that HMC doesn’t require a supercomputer and a prayer. Add more complexity (like trying to model every single heartbeat nuance), and your sampler starts begging for mercy. Bayesian model comparison is the judge in this eternal battle between “more detail!” and “please, can we go home now?”.
And priors! Oh, the priors. In this exercise, with flat priors, they were just background noise. But in the real world, priors from domain experts (like a cardiologist saying, “Yeah,
Zooming out, this whole pixelated journey underscores the power of modern probabilistic programming frameworks. Stan, PyMC, TFP – they automate the grunt work of differentiation so your brain can focus on the fun stuff: building cool models and interpreting results. But understanding what’s under the hood? Priceless. When your sampler throws a cryptic error, knowing your gradient equations turns you from a clueless bystander into a detective who can actually solve the mystery.
The most beautiful part, the true visual poetry, is in those phase-space plots. They are the Star Wars dogfights of statistics, showing the complex ballet between position and momentum. Interpreting them requires merging your statistical knowledge with a physicist’s intuition. Why is
Of course, bringing HMC models into the real world adds another layer of boss battles. 10,000 leapfrog steps? Easy peasy for research, a nightmare for real-time patient monitoring. This pushes us towards faster methods, trading off some Bayesian purity for speed. It’s a Faustian bargain, but sometimes necessary.
But the biggest payoff? The transformation in us. Rebuilding HMC from scratch is like learning to ride a unicycle on a tightrope made of spaghetti. It’s terrifying, you fall a lot, but when you finally get it, you understand balance and momentum in a way you never could just by watching. For anyone diving into this stuff, it turns HMC from a magical incantation into an open-book exam. You know why half-steps matter, why volume preservation is key, and why energy conservation keeps things sane.
Looking ahead, the possibilities are endless. Our simple RRi model could spawn sequals: hierarchical models for populations, drug interaction spin-offs, or even hybrid models blending our physiological structure with the pattern-recognition power of neural networks. But with great power comes great responsibility; using a misestimated
In the end, our HMC odyssey through the Castillo-Aguilar model is a microcosm of the scientific adventure itself. We’ve taken equations and turned them into a narrative of heartbeats, battling gradients and leapfrog steps like plot twists. And in the simple rhythm of RR intervals, we’ve found not just numbers, but echoes of biological truth, waiting for us to understand them, one Hamiltonian leap, one carefully coded function, one hard-won derivative at a time.
Citation
@misc{castillo-aguilar2025,
author = {Castillo-Aguilar, Matías},
title = {HMC by {Hand:} {Gradients,} {Heartbeats,} {PDEs} and {Other}
{Tales}},
date = {2025-04-19},
url = {https://bayesically-speaking.com/posts/2025-04-19 heartbeats-pdes-and-other-tales/},
doi = {10.59350/450sp-tdq85},
langid = {en}
}