Bayesian Coin Tossing

Aug 12, 2025

I just correctly guessed 18 consecutive coin tosses; what are the odds of that?

ChatGPT said:

Let’s calculate that step by step.

Each coin toss has two possible outcomes (Heads or Tails). If you’re guessing before each toss, the probability of being correct is 1/2 for each toss.

For 18 consecutive correct guesses:

P=(12)18P = \left(\frac{1}{2}\right)^{18}P=(21)18

Step-by-step:

210=1,0242^{10} = 1,024210=1,024
28=2562^{8} = 25628=256
218=210×28=1,024×256=262,1442^{18} = 2^{10} \times 2^{8} = 1,024 \times 256 = 262,144218=210×28=1,024×256=262,144

So:

P=1262,144P = \frac{1}{262,144}P=262,1441

That’s about 0.000381% — a little less than 4 chances in a million.

If you pulled that off, you either hit an extraordinary statistical fluke or there’s something very non-random about your coin.

Do you want me to also calculate how unlikely it would be to happen at least once in your lifetime?

You said:

What about a Bayesian approach?

ChatGPT said:

Thought for 29s

Nice—Bayes makes this way more interesting.

#1. Model your “skill”
Let ppp be your chance of guessing any single flip correctly.

Prior: take a simple uninformative prior p∼Beta(1,1)p \sim \text{Beta}(1,1)p∼Beta(1,1) (uniform on [0,1][0,1][0,1]).
Data: 18 guesses, all correct → 18 successes, 0 failures.

Posterior:

p∣data∼Beta(1+18, 1+0)=Beta(19,1).p \mid \text{data} \sim \text{Beta}(1+18,\,1+0)=\text{Beta}(19,1).p∣data∼Beta(1+18,1+0)=Beta(19,1).

Posterior mean (your estimated skill): E[p∣data]=1920=0.95\mathbb{E}[p\mid\text{data}] = \frac{19}{20}=0.95E[p∣data]=2019=0.95.
95% credible interval for ppp: ≈[0.824, 0.999]\approx [0.824,\;0.999]≈[0.824,0.999].
Probability you’re ≤\le≤ fair (p≤0.5p\le 0.5p≤0.5) given the data: Pr⁡(p≤0.5∣data)≈1.9×10−6\Pr(p\le 0.5\mid\text{data}) \approx 1.9\times 10^{-6}Pr(p≤0.5∣data)≈1.9×10−6.

Posterior predictive: probability your next guess is correct is the posterior mean, ≈0.95\approx 0.95≈0.95.

#2. Evidence vs “fair guessing”
Compare:

H0H_0H0: you’re just guessing, p=0.5p=0.5p=0.5 (point null).
H1H_1H1: ppp unknown with the Beta(1,1) prior.

Marginal likelihood under H1H_1H1:
∫01p18 dp=1/19≈0.05263.\int_0^1 p^{18}\,dp = 1/19 \approx 0.05263.∫01p18dp=1/19≈0.05263.

Likelihood under H0H_0H0:
(0.5)18≈3.81×10−6.(0.5)^{18} \approx 3.81\times10^{-6}.(0.5)18≈3.81×10−6.

Bayes factor BF10BF_{10}BF10:

BF10=0.052633.81×10−6≈1.38×104.BF_{10}=\frac{0.05263}{3.81\times10^{-6}}\approx 1.38\times 10^{4}.BF10=3.81×10−60.05263≈1.38×104.

That’s ~14,000:1 evidence against “fair guessing” and for p>0.5p>0.5p>0.5 (given the simple prior).

#3. A different simple comparison (extreme alternative)
If you instead compared H0:p=0.5H_0: p=0.5H0:p=0.5 to H1:p=1H_1: p=1H1:p=1 (a perfect predictor), the likelihood ratio is

118(0.5)18=262,144:1.\frac{1^{18}}{(0.5)^{18}}=262{,}144:1.(0.5)18118=262,144:1.

Posterior odds would be prior odds ×\times× 262,144. So unless your prior belief in “perfect predictor” is astronomically tiny (< 1/262,1441/262{,}1441/262,144), the data pushes hard toward H1H_1H1. (Most people do assign that hypothesis an extremely tiny prior, which is why the continuous-ppp model above is a nicer default.)

#4. About stopping rules
The above assumes you planned to make exactly 18 guesses. If instead you tried many times and only reported once you hit a long streak, you’d model that process explicitly; the evidence is still strong, but less extreme than treating it as a single 18-for-18 experiment.

If you want, tell me how the streak came about (one planned 18-guess run vs. “reported after a streak”), and I’ll tailor the Bayesian model to that setup.

You said:

It was planned.

ChatGPT said:

Thought for 46s

Great—since it was a fixed, planned 18-trial experiment, here’s a clean Bayesian read:

Posterior for your “skill” ppp

Assume each guess is correct with probability ppp.

Prior: p∼Beta(1,1)p \sim \mathrm{Beta}(1,1)p∼Beta(1,1) (uniform).
Data: 18/18 correct ⇒ posterior p∣data∼Beta(19,1)p \mid \text{data} \sim \mathrm{Beta}(19,1)p∣data∼Beta(19,1).

Key numbers:

Posterior mean E[p∣data]=1920=0.95E[p\mid\text{data}] = \tfrac{19}{20} = 0.95E[p∣data]=2019=0.95.
95% credible interval: [0.8235, 0.9987][0.8235,\;0.9987][0.8235,0.9987].
Pr⁡(p≤0.5∣data)=0.519≈1.91×10−6\Pr(p \le 0.5 \mid \text{data}) = 0.5^{19} \approx 1.91\times10^{-6}Pr(p≤0.5∣data)=0.519≈1.91×10−6.

Posterior predictive:

Next single guess correct: E[p∣data]≈0.95E[p\mid\text{data}] \approx 0.95E[p∣data]≈0.95.
Next 18 guesses all correct: ∏i=01719+i20+i≈0.5135\prod_{i=0}^{17} \frac{19+i}{20+i} \approx 0.5135∏i=01720+i19+i≈0.5135.

Evidence vs. “fair guesser” H0:p=0.5H_0: p=0.5H0:p=0.5

With the uniform prior alternative H1:p∼Beta(1,1)H_1: p\sim\mathrm{Beta}(1,1)H1:p∼Beta(1,1):

Marginal likelihood under H1H_1H1: 1/(18+1)=1/191/(18+1)=1/191/(18+1)=1/19.
Likelihood under H0H_0H0: 2−182^{-18}2−18.
Bayes factor BF10=1/192−18=21819≈13,797BF_{10}=\dfrac{1/19}{2^{-18}}=\dfrac{2^{18}}{19}\approx 13{,}797BF10=2−181/19=19218≈13,797.

So the data are ~13.8k:1 in favor of “you’re better than chance” vs. “just 50–50 guessing,” before even picking prior odds.

Sensitivity to a different prior (Jeffreys)

Using Jeffreys prior Beta(1/2,1/2) \mathrm{Beta}(1/2,1/2)Beta(1/2,1/2):

Posterior: Beta(18.5,0.5) \mathrm{Beta}(18.5, 0.5)Beta(18.5,0.5).
E[p∣data]=18.519≈0.9737E[p\mid\text{data}]=\frac{18.5}{19}\approx 0.9737E[p∣data]=1918.5≈0.9737.
95% CI [0.8714, 0.99997][0.8714,\;0.99997][0.8714,0.99997].
BF10≈34,619BF_{10}\approx 34{,}619BF10≈34,619 (even stronger).

Bottom line: with a fixed 18-trial plan, 18/18 is overwhelming evidence that your guessing accuracy ppp is well above 0.5 under very standard priors. If you want, we can plug in a more skeptical prior (e.g., heavily concentrated near 0.5) and see how strong the evidence remains.

You said:

In English please!

ChatGPT said:

Alright, plain English version:

You planned in advance to guess exactly 18 coin flips, and you got every single one right.

If you were just an ordinary guesser with no special skill, the odds of that happening are about 1 in 262,144 (≈ four chances in a million).
Using a Bayesian approach — which updates what we believe about your skill after seeing the result — we’d say:
- Before seeing the result, we assumed you could be anywhere from terrible to perfect, equally likely.
- After seeing you go 18-for-18, the math says there’s only about a 0.0002% chance you’re just average (50/50).
- It’s overwhelmingly more likely you’re much better than chance.
- In fact, the evidence in favor of “you have skill” over “you’re just guessing” is about 14,000 to 1.

In other words: if this wasn’t luck, you’ve just made an insanely strong case that you can predict coin tosses better than random chance.

Do you want me to also explain what your new “estimated skill” would be after seeing this streak? That’s where Bayesian really shines.

You said:

How many average guessers would need to try in order to get 18 consecutive correct guesses?

ChatGPT said:

Thought for 14s

Short answer (planned 18 guesses each, fair coin, random guessing):

Chance one person succeeds: 2−18=1/262,1442^{-18} = 1/262{,}1442−18=1/262,144.
Expected number of average guessers needed for one success: 262,144 (that’s the geometric mean 1/p1/p1/p).

If you want at least one success with high probability among many average guessers each trying once:

50% chance: 181,705 guessers
90% chance: 603,608
95% chance: 785,312
99% chance: 1,207,216
99.9% chance: 1,810,824

(Those come from 1−(1−2−18)n≥q1-(1-2^{-18})^n \ge q1−(1−2−18)n≥q.)

You said:

So nearly 2 million people?

ChatGPT said:

Yep — if you wanted to be 99.9% sure that at least one ordinary guesser would hit 18-for-18, you’d need just over 1.8 million people each trying their own 18-flip run.

That’s why your streak is such a statistical unicorn — in a big stadium full of people, you could still be the only one to pull it off.

And if you were planning those 18 guesses in advance (not cherry-picking after many tries), it’s even more jaw-dropping.

You said:

Now let's say that the more successive correct guesses an average guesser gets correct, the more emotional stress because of the pressure to keep the 'winning streak' going with a 5% decrease in guessing correctly with each successive guess after the first correct guess (so, 50%, 45%, 40%, etc).

ChatGPT said:

Ah, now we’re modeling “stress fatigue” — each successive correct guess reduces your chance of getting the next one right. Let’s carefully step through it.

Assumptions

First guess: 50% chance (fair coin).
Each subsequent guess: probability decreases by 5% of 1.0 (not of previous probability), so:

Guess #Probability correct10.5020.4530.4040.3550.3060.2570.2080.1590.1010+0.05

Probabilities below 0.05 I’ll cap at 0.05 (stress makes it extremely hard, but not impossible).

Probability of 18 consecutive correct guesses

Multiply all the individual probabilities for each step:

P=0.50×0.45×0.40×0.35×0.30×0.25×0.20×0.15×0.10×0.05×0.05×0.05×0.05×0.05×0.05×0.05×0.05×0.05P = 0.50 \times 0.45 \times 0.40 \times 0.35 \times 0.30 \times 0.25 \times 0.20 \times 0.15 \times 0.10 \times 0.05 \times 0.05 \times 0.05 \times 0.05 \times 0.05 \times 0.05 \times 0.05 \times 0.05 \times 0.05P=0.50×0.45×0.40×0.35×0.30×0.25×0.20×0.15×0.10×0.05×0.05×0.05×0.05×0.05×0.05×0.05×0.05×0.05

Step by step:

First 10 terms:

0.5⋅0.45=0.2250.225⋅0.4=0.090.09⋅0.35=0.03150.0315⋅0.3=0.009450.00945⋅0.25=0.00236250.0023625⋅0.2=0.00047250.0004725⋅0.15=0.0000708750.000070875⋅0.1=0.00000708750.0000070875⋅0.05=3.54375e−70.5 \cdot 0.45 = 0.225 0.225 \cdot 0.4 = 0.09 0.09 \cdot 0.35 = 0.0315 0.0315 \cdot 0.3 = 0.00945 0.00945 \cdot 0.25 = 0.0023625 0.0023625 \cdot 0.2 = 0.0004725 0.0004725 \cdot 0.15 = 0.000070875 0.000070875 \cdot 0.1 = 0.0000070875 0.0000070875 \cdot 0.05 = 3.54375e-70.5⋅0.45=0.2250.225⋅0.4=0.090.09⋅0.35=0.03150.0315⋅0.3=0.009450.00945⋅0.25=0.00236250.0023625⋅0.2=0.00047250.0004725⋅0.15=0.0000708750.000070875⋅0.1=0.00000708750.0000070875⋅0.05=3.54375e−7

Remaining 8 terms all 0.05:

3.54375e−7⋅0.058=3.54375e−7⋅3.90625e−11≈1.384e−173.54375e-7 \cdot 0.05^8 = 3.54375e-7 \cdot 3.90625e-11 \approx 1.384e-173.54375e−7⋅0.058=3.54375e−7⋅3.90625e−11≈1.384e−17

Interpretation

That’s about 1 in 72 quadrillion — astronomically small.
Stress makes this streak basically impossible for an ordinary guesser.

If you want, I can also compute how many people would need to try under this stress-fatigue model to expect at least one person to pull off 18 correct guesses. It’s a crazy number.

Do you want me to do that?

You said:

Yes! Do it!

ChatGPT said:

Alright — buckle up, because this gets mind-boggling.

We already calculated the probability of one “stressed” average guesser getting 18 consecutive correct guesses:

P18≈1.384×10−17.P_{\text{18}} \approx 1.384 \times 10^{-17}.P18≈1.384×10−17.

Step 1: Expected number of people needed for one success

Formula: N=1/PN = 1 / PN=1/P

N≈11.384×10−17≈7.23×1016.N \approx \frac{1}{1.384 \times 10^{-17}} \approx 7.23 \times 10^{16}.N≈1.384×10−171≈7.23×1016.

That’s 72 quadrillion people — far more than the population of Earth (~8 billion).

Step 2: Number of people for a high probability (e.g., 99%) of at least one success

Use the standard formula for repeated independent trials:

1−(1−P)N=q1 - (1-P)^N = q1−(1−P)N=q

Where:

PPP = probability per person
NNN = number of people trying
qqq = desired probability (0.99)

Rewriting:

(1−P)N=1−q ⟹ N=ln⁡(1−q)ln⁡(1−P)(1-P)^N = 1 - q \implies N = \frac{\ln(1-q)}{\ln(1-P)}(1−P)N=1−q⟹N=ln(1−P)ln(1−q)

For very small PPP, ln⁡(1−P)≈−P\ln(1-P) \approx -Pln(1−P)≈−P, so:

N≈ln⁡(1−0.99)−1.384×10−17=ln⁡(0.01)−1.384×10−17≈−4.60517−1.384×10−17≈3.33×1017N \approx \frac{\ln(1-0.99)}{-1.384 \times 10^{-17}} = \frac{\ln(0.01)}{-1.384 \times 10^{-17}} \approx \frac{-4.60517}{-1.384 \times 10^{-17}} \approx 3.33 \times 10^{17}N≈−1.384×10−17ln(1−0.99)=−1.384×10−17ln(0.01)≈−1.384×10−17−4.60517≈3.33×1017

✅ Interpretation

About 333 quadrillion people would need to try for a 99% chance that at least one succeeds.
Stress makes 18 perfect guesses effectively impossible for real humans. Even the whole human population of Earth is many orders of magnitude too small.

If you like, I can also show a chart of how the probability of 18-in-a-row drops as stress increases — it’s a shocking exponential collapse.

Do you want me to make that chart?

Timothy’s Newsletter

Discussion about this post

Ready for more?

Timothy’s Newsletter

Bayesian Coin Tossing

ChatGPT said:

You said:

ChatGPT said:

You said:

ChatGPT said:

Posterior for your “skill” ppp

Evidence vs. “fair guesser” H0:p=0.5H_0: p=0.5H0​:p=0.5

Sensitivity to a different prior (Jeffreys)

You said:

ChatGPT said:

You said:

ChatGPT said:

You said:

ChatGPT said:

You said:

ChatGPT said:

Assumptions

Probability of 18 consecutive correct guesses

Interpretation

You said:

ChatGPT said:

Step 1: Expected number of people needed for one success

Step 2: Number of people for a high probability (e.g., 99%) of at least one success

✅ Interpretation

Discussion about this post

Ready for more?

Evidence vs. “fair guesser” H0:p=0.5H_0: p=0.5H0:p=0.5