Welcome to Bayesically Speaking

Matías Castillo-Aguilar

doi:10.59350/35tc8-qyj10

Hello stranger

Alright folks, welcome to ‘Bayesically Speaking’! Yes, I know, I’m a word nerd. Sue me. But seriously, I’ve been wanting to share my love for all things Bayesian for ages. It’s like, the internet is down right now, so I figured, ‘What the heck, let’s just do it!’ You know how everyone tells you to ‘follow your dreams’? Well, this is my dream: to spread the gospel of Bayes to the masses (or at least to anyone who’s still reading).

For years, I’ve been obsessed with stats, especially how they can help us unravel the mysteries of health. It’s like, you have these puzzle pieces, and you’re trying to figure out how they fit together. But sometimes, you look at one piece in isolation, and it seems pretty boring. But then you put two pieces together, and BOOM! Suddenly, you see a hidden pattern, a secret message! That’s the magic of statistics for me.

Code

sim_data <- simulate_simpson(n = 100, 
                             difference = 2, 
                             groups = 4, 
                             r = .7) |> 
  as.data.table()

sim_data[, Group := factor(Group, 
                           levels = c("G_1","G_2","G_3","G_4"),
                           labels = c("Placebo", "Low dose", "Medium dose", "High dose"))]

ggplot(sim_data, aes(V1, V2, col = Group)) +
  geom_point() +
  geom_smooth(method = "lm") +
  geom_smooth(method = "lm", aes(group = 1, col = NULL)) +
  scale_color_brewer(type = "qual", palette = 2) +
  labs(x = "Time exposure", y = expression(Delta*"TNF-"*alpha)) +
  theme(legend.position = "top")

In this example, the whole sample correlation is about -0.71 (meaning a negative relationship), but the within group correlation is 0.7 (the exact opposite effect). This phenomenom is better known as Simpson’s Paradox.

The statistics toolbox

Now, let’s talk about the classic stats toolbox. You know, the t-tests, ANOVAs, and regressions (the old reliables). They’re like those trusty old hammers, simple, easy to use, and they get the job done most of the time. But let’s be honest, they’re not exactly Swiss Army knives. They start to struggle when things get messy. Asymmetrical data? Forget about it. Non-linear relationships? Good luck! And don’t even get me started on unbalanced groups or those pesky outliers.

Enter the non-parametric heroes! These guys are more flexible, like those fancy adjustable wrenches. But they can be a bit… mysterious. It’s hard to predict how they’ll behave in new situations, which can be a bit nerve-wracking.

Then there are the ‘black box’ models (neural networks, random forests, and their cousins). These are like those super-powered AI robots. They can handle any data you throw at them, no matter how messy. But sometimes, you have no idea what’s going on inside. It’s like magic, but not in a good way. You just hope it doesn’t turn against you.

Just between us, I only put this picture because it looked cool. Photo from Dan Cristian Pădureț at Unsplash.

So, we have these fancy new tools, but something’s missing. We need to bring our own brains into the equation! After all, we don’t just blindly stare at data. We bring our own experiences, our own biases (hey, we’re human!), and our own gut feelings to the table.

That’s where Bayes comes in, the master of incorporating ‘prior knowledge.’ It’s like saying, ‘Hey, I know a little something about this already, so let’s not start from scratch.’ And instead of just giving us a dry ‘yes’ or ‘no’ answer, Bayes gives us a whole range of possibilities, complete with confidence levels. It’s like, ‘I’m 90% sure this is true, but there’s still a 10% chance I’m completely wrong.’ Much more honest, wouldn’t you say?

Alright, let’s play a little mind game. Imagine you’re convinced this coin is rigged. You’ve flipped it 15 times and seen heads a whopping 10 times! You’re thinking, “This thing is clearly biased towards heads.” You calculate your odds: 10 heads out of 15 flips, that’s a 66% chance of heads! You feel pretty confident, right?

Now, for the twist. You decide to put this coin to the test yourself. You flip it 15 times, and guess what? You get 13 tails! Only 2 measly heads. Talk about a reality check! Suddenly, your confidence is shaken. Based on your own experiment, the odds of getting heads plummet to a measly 13%.

So, what do you do? Throw your hands up in the air and declare your initial belief wrong? Not so fast! We need to find a way to reconcile these conflicting results. This is where Bayesian thinking comes to the rescue.

Let’s Get Bayesian

To figure out the true odds of getting heads, we need to use a bit of Bayesian magic. We’ll call the probability of getting heads “P(H)”.

Now, remember our initial belief? The one based on those first 15 flips? We can represent that belief using something called a Beta distribution.

$P (H) \sim B e t a (10, 5)$

This symbol “

\sim

” means distributed as

Here, the Beta distribution parameters are (10, 5) since we had 10 heads and 5 tails in the prior experiment.

Now, a new experiment with the same 15 tosses gives us 2 heads. To update our prior belief, we can use this information to calculate the posterior probability which can be expressed as follow:

This symbol “ $\propto$ ” means proportional to

$P (H | D a t a) \propto P (D a t a | H) \times P (H)$

Which is equivalent as saying:

$P o s t e r i o r \propto L i k e l i h o o d \times P r i o r$

To find our updated belief about the probability of heads (the “posterior probability”), we need to combine our prior belief with the new evidence. This involves some fancy math – we multiply the likelihood (the probability of getting our observed results) by our prior belief.

Now, there’s a little trick here. Since our prior belief follows a Beta distribution, and our data fits a binomial distribution, the math works out beautifully. The posterior distribution will also be a Beta distribution! This makes our lives much easier.

A Quick Note on Normalization

When we multiply our prior and likelihood, we get a distribution that looks like the final answer, but it’s not quite right. It’s like having a delicious cake batter, but forgetting to bake it! It needs to be “normalized” to make sure it adds up to 1. Usually, this involves some complex math, but luckily, we don’t need to worry about that in this case.

$P (H | D a t a) \sim B e t a (10 + 2, 5 + 13)$

After incorporating the data from the new experiment, the parameters of the Beta distribution become (12, 18) since we had 2 heads and 13 tails in the new experiment, meaning 12 heads and 18 tails in total.

About conjugacy

This is where things get really interesting. Remember how I mentioned that the posterior distribution is also a Beta distribution? That’s not a coincidence! It’s a special property called “conjugacy”.

Think of it like this: Imagine you have a set of building blocks. Your prior belief is one type of block, and the new data you collect is another type. When you combine them, you get a block of the exact same type!

This conjugacy property is a huge time-saver. Instead of doing a bunch of complicated calculations, we can use a much more simple formula to find our updated belief. It’s like having a shortcut through a maze, much faster and easier!

To calculate the posterior probability of getting heads, we can consider the mode (maximum) of the Beta distribution, which is $(a - 1) / (a + b - 2)$ :

$\begin{aligned} P (H | D a t a) & = (12 - 1) / (12 + 18 - 2) \\ = 11 / 28 \\ \approx 0.39 \end{aligned}$

Therefore, the posterior probability of getting heads is approximately 39% when we consider all the available evidence.

Code

# Prior and Likelihood functions
data = function(x, to_log = FALSE) dbeta(x, 2, 13, log = to_log)
prior = function(x, to_log = FALSE) dbeta(x, 10, 5, log = to_log)

# Posterior
posterior = function(x) {
  p_fun = function(i) {
    # Operation is on log-scale merely for computing performance
    # and minimize rounding errors giving the small nature of
    # probability density values at each interval.
    i_log = data(i, to_log = TRUE) + prior(i, to_log = TRUE)
    # Then transformed back to get probabilities again
    exp(i_log)
  }
  
  # Then we integrate using base function `integrate`
  const = integrate(f = p_fun, 
                    lower = 0L,  upper = 1L, 
                    subdivisions = 1e4L,
                    rel.tol = .Machine$double.eps)$value
  p_fun(x) / const
}

## Plotting phase

### Color palette
col_pal <- c(Prior = "#DEEBF7", 
             Data = "#3182BD", 
             Posterior = "#9ECAE1")
### Main plotting code
ggplot() +
  #### Main probability density functions
  stat_function(aes(fill = "Data"), fun = data, geom = "density", alpha = 1/2) +
  stat_function(aes(fill = "Prior"), fun = prior, geom = "density", alpha = 1/2) +
  stat_function(aes(fill = "Posterior"), fun = posterior, geom = "density", alpha = 1/2) +
  #### Minor aesthetics tweaks
  labs(fill = "", y = "Density", x = "Probability of getting heads") +
  scale_fill_manual(values = col_pal, aesthetics = "fill") +
  scale_x_continuous(labels = scales::label_percent(), 
                     limits = c(0,1)) +
  scale_y_continuous(expand = c(0,0), limits = c(0, 6.5)) +
  theme(legend.position = "top",
        legend.spacing.x = unit(3, "mm")) +
  #### Arrows
  geom_curve(aes(x = .81, y = 4.1, xend = .69232, yend = 3.425), curvature = .4,
               arrow = arrow(length = unit(1/3, "cm"), angle = 20)) +
  geom_text(aes(x = .9, y = 4.1, label = "Beta(10,5)")) +
  geom_curve(aes(x = .2, y = 5.9, xend = .07693, yend = 5.45), curvature = .4,
               arrow = arrow(length = unit(1/3, "cm"), angle = 20)) +
  geom_text(aes(x = .29, y = 5.85, label = "Beta(2,13)")) +
  geom_curve(aes(x = .5, y = 5, xend = .3847, yend = 4.4), curvature = .4,
               arrow = arrow(length = unit(1/3, "cm"), angle = 20)) +
  geom_text(aes(x = .55, y = 5, label = "≈ 39%"))

Graphical representation of the posterior probability as the combination of both the data and the prior evidence

Practical implications

This example is a prime example of how Bayes works its magic. Imagine your beliefs as a stubborn old dog, set in its ways. Then comes along some shiny new data, like a juicy steak. Bayes, the wise dog whisperer, gently nudges the old dog towards the steak, showing it that the world is bigger than its initial assumptions. Pretty cool, huh?

And get this, Bayes isn’t just some fancy math trick. It’s basically how our brains are wired! We’re all walking, talking Bayesian machines, constantly updating our internal models based on new experiences. Remember that time you thought your friend was a total grump? But then they surprised you with a hilarious joke? BAM! Your brain, the Bayesian overlord, adjusted your opinion.

As that fancypants statistician, Gelman and Shalizi (2013), once said (probably while sipping a fine whiskey), “Bayesians think they’ve discovered the secret to life, the universe, and everything”. Okay, maybe that’s a slight exaggeration, but you get the idea.

Actually, he said “A substantial school in the philosophy of science identifies Bayesian inference with inductive inference and even rationality as such, and seems to be strengthened by the rise and practical success of Bayesian statistics”, but hey! It’s close enough.

But let’s not get carried away. These powerful tools are like those fancy kitchen gadgets you see on TV infomercials. They promise to revolutionize your life, but sometimes they just end up collecting dust in the cupboard. We need to know when to use them and when to stick to simpler methods. Otherwise, we might end up overcomplicating things and driving ourselves crazy

From past to future

Remember the Dark Ages? I mean, not THAT Dark Ages, but the time before Bayesian stats were cool? Back then, those poor Bayesians were stuck with slide rules and abacuses, trying to wrangle those complex models. Talk about a statistical struggle! But fear not, my friends, for the computer gods have smiled upon us! Now, we mere mortals can unleash the power of Bayes with a few clicks. It’s like having a magic wand that whispers the secrets of the universe.

We, the self-proclaimed ‘Statisticians-in-Training’ (or maybe ‘Statisticians-in-Training-While-Sipping-Coffee’), are on a mission to spread the good word. Imagine, folks, the possibilities! We can finally tell our friends, ‘Oh, you think correlation equals causation? Please, let me introduce you to my Bayesian overlord!’ We can predict the next winning lottery number (well, maybe not, but we can at least pretend). And we can finally understand why our cat keeps knocking over that vase, it’s not just mischievous, it’s statistically significant!

So, grab your favorite caffeinated beverage and let’s dive headfirst into this statistical wonderland. We might not conquer the world (yet), but we’ll definitely have a lot of fun trying. And hey, who knows, maybe we’ll even discover the secret to a perfectly brewed cup of coffee using Bayesian inference. Now that’s a worthy pursuit, wouldn’t you say?

References

Gelman, Andrew, and Cosma Rohilla Shalizi. 2013. “Philosophy and the Practice of Bayesian Statistics.” British Journal of Mathematical and Statistical Psychology 66 (1): 8–38.

Citation

BibTeX citation:

@misc{castillo-aguilar2023,
  author = {Castillo-Aguilar, Matías},
  title = {Welcome to {Bayesically} {Speaking}},
  date = {2023-06-10},
  url = {https://bayesically-speaking.com/posts/2023-05-30 welcome/},
  doi = {10.59350/35tc8-qyj10},
  langid = {en}
}

For attribution, please cite this work as:

Castillo-Aguilar, Matías. 2023. “Welcome to Bayesically Speaking.” June 10, 2023. https://doi.org/10.59350/35tc8-qyj10.