Given My Results, What Should My Confidence Level Be About My True Winrate?

A long time ago, this chart started floating around on 2+2:

Heads Up SNG Confidence Chart - Flawed

The question it attempts to answer is the following: Given my winrate over a certain sample of games, what range of winrates should I expect going forward, if I want to be 95% confident I will be somewhere in that range? For example, the table says that if you win 60 of your first 100 games, you should expect a true winrate going forward of somewhere between 50.4% and 69.6% (60% plus or minus 9.6%). The table claims that you should expect this to be the case 95% of the time, absent additional information.

The problem? That's not even close to true. Consider the example of flipping coins. You're not sure if the coin is fair or not, it's just a random one you picked up off the street but it looks normal. You test it, and flip the coin 100 times, coming up with 60 heads and 40 tails. If you're a statistics nerd, you'll quickly notice that this is a pretty rare event for a fair coin – a distribution at least that extreme on either side will only occur about 5.6% of the time. If we put it in terms of the table, let's call heads a win, tails a loss, and each flip a new game. This leads to the exact same part of the table as referenced in the example of the last paragraph.

So, that coin has a true percentage of heads somewhere between 50.4% and 69.6%, right? We should be 95% confidence about that now, 95% confidence that the coin is rigged? You'd risk $2000 against your buddy's $100 that this coin you found on the street is not fair?

If that reasoning sounds suspect, it is. The problem is that prior to going into this, we knew something about coins. We know that almost all of them are extremely close to fair, and that the vast majority of coins lying on the street are not trick coins. Thus, you'd be a fool to bet that this one was, giving 20-1 odds.

The same thing applies to HUSNGs. We know something about the distribution of poker winrates. We know that when someone wins 60 of their first 100 superturbos, their expected winrate over the next 5000 games is actually never going to be better than 60%. The entire analysis is flawed, and to do it correctly in this way, you'd actually have to have massive amounts of data about everybody else's experienced winrates over different sample sizes to get a sense for how to shade the results using good Bayesian thinking.

OK, so that sucks. But there is another way to compare your experienced winrate with your true winrate. It's not as alluring as the other framing, but it has the benefit of actually being correct. What you can do is ask the question, “Let's look at my experienced winrate over my sample size. If my true winrate were different, how likely would it be to get these kind of results over my sample?

Here are some tables that do just that.  Right click and hit "save as" to save a chart, left click to enlarge any individual chart.

Heads Up SNG Confidence Chart 50% Winrate


Heads Up SNG Confidence Chart 52% Winrate

Heads Up SNG Confidence Chart 54% Winrate

How to read the tables: Let's say you're a 56% winner over 250 games, but you're worried you might just be getting lucky, and have a true winrate of 50% going forward. You want to know how often you'd win 56% of your first 250 games, if you really will only win half of HUSNGs going forward. Go to the 50% table – that's what you're supposing is your true winrate – look at the 250 sample row, and over to the 56% experienced column. There, it tells you that this result (or winning even more than 56% of games) will happen to a true 50% winner just 3.3% of the time. Thus, your results happen to 50% winners sometimes, but it's a pretty unusual event.

Unfortunately, this does NOT mean that you have a 3.3% chance of being a 50% winner. Think back to the example of the coin. We said that 60 heads or more is a 2.8% chance for a fair coin out of 100 tosses. That does NOT mean the coin has a 2.8% chance of being fair – since there are way, way more fair coins than unfair ones, the true probability that the coin is fair is probably something like 99.9%, even after getting those initial results.

Still, it's a useful measure of how unlikely it is to get your results if you really had a different winrate. Knowing that those initial results would happen to a 50% winner just 3.3% of the time tells you that you're very likely to be a longterm winner in HUSNGs given your first 250 games.

The math is not extremely complicated. You can play around with more numbers using the Heads Up SNG Binomial Calculator. This just asks you to plug in your winrate, your sample size, and how many games you want to win.

Practical Caveats: There are still assumptions made by calculating the odds in this way that you should be aware of. For one, using this method assumes that you have a constant chance of winning each individual game you play. That implies that your opponent is always of a similar strength, and that you're always playing at a constant skill level. Some people may be surprised at how unlikely some events seem on this table: For example, it suggests a player with a true winrate of 54% should go through a stretch of only winning 52% of games over a 2000 SNG sample just 3.8% of the time, with a stretch of 2000 SNGs at 50% or worse at less than a 1-in-1000 chance of happening. This doesn't really jive with what you see on some sharkscope graphs, with many players with high true winrates going through mult-thousand game stretches with mediocre winrates. One thing to get from these charts is that a lot of variance comes from bad game selection or bad play – it's not usually the best idea to dismiss a long bad stretch on just sick variance.

5
Your rating: None Average: 5 (8 votes)

Comments

PHMERC's picture

you really are a wizard....

you really are a wizard....

NoMeansYes's picture

I've never understood and

I've never understood and always wondered about this chart. Thanks!

Gmoney's picture

This is why it's important to

This is why it's important to stay on top of the game and keep on improving.

Barewire's picture

All this work and you didn't

All this work and you didn't even estimate the distribution of winrates across poker players so that we could infer the answer to the original question? Preposterous.

 

 

 

(jk, this was much needed <3)

Check out my blog (Updated 4/10) and my coaching page!

leuge1970's picture

Довольно интересный материал

Довольно интересный материал Вы пишете. Жаль, что блог еще малопосещаем. Я бы такие темы освещал куда более широко.
School47's picture

Да я думаю чем меньше людей

Да я думаю чем меньше людей посещает хаснг.ком, тем лучше простым регам, которые это делают )

u cnat spel's picture

The practical caveat

The practical caveat paragraph sums up everything I was thinking. So basically, these grahs are useless. If you're breaking even for very long periods of time, review your game selection. Makes sense, thank you for the article.

arh7rf's picture

If you were a true statistic

If you were a true statistic nerd, realizing that the probability of this happening was 5.6%, you would not go ahead and say this is for sure a rigged coin. You would say that p>.05 therefore, we have failed to reject the null hypothesis that there is no difference between this coin and any other coin. Therefore, you would not be ready to risk your money on it. However, when p<.05 but marginally, you would either opt for a larger sample size to bring it down a little further or just go ahead and say, this bitch is rigged and i reject the null hypothesis, I'm throwing $1900 against $100 (gotta make it profitable). You should take a 1-1 payout on the bet all day, everyday.

Can I get an lol pwned on the logic behind the statistic nerd assumption? 

Side note from Wikipedia

The outcome of coin flipping has been studied by Persi Diaconis and his collaborators. They have demonstrated that a mechanical coin flipper which imparts the same initial conditions for every toss has a highly predictable outcome — the phase space is fairly regular. Further, in actual flipping, people exhibit slight bias – "coin tossing is fair to two decimals but not to three. That is, typical flips show biases such as .495 or .503."[6]

Or in laymens terms, 49.5% or 50.3%... not quite fair.

 

For sjovt's picture

Comment from a statistician

Hi,

Nice article, and I like your conclusions that game selection etc. has a big impact on your winrate. Sorry to say though, that your statistical analysis is completely wrong.

What you don't understand is that there is no logical contradiction to the first paragraph (estimating winrates) that you reject and the second paragraph (your bayes graphes) that you argue. To formulate it in an understandable way, it is simply looking at the world from two different (yet sober) view points.

There is nothing wrong with doing statistics on your winrate. An issue as you describe is that you need a fairly big sample to get accuracy. Another issue is that outcomes outside your "stats lab" change, i.e. skill level of opponents, your tilt level etc. However, the same applies for all experiments you do in the real world be it medicine, finance etc. The only thing you can do is try to minimize those factors. Hypothetically, estimate your winrate at different buy-in levels, on weekends or on week days etc. You will never overcome those issues completely, but that doesn't mean they can give describe a lot of information in a clear way that you might otherwise see less clearly, which is what statistics attempts to do.

Lastly, please note that you do not gain any evidence on your actual winrate by doing Bayes in stead. It just allows you to see how unlucky you have been to get a 50% rate when you know that a poker god like yourself has a "true" 53% winrate. But if you move a way from your gut feeling about your skill level, how would objectively assert that? You do standard statistics as you described first, by estimating your winrate. Now, sure you can get "idiotic" confidence bands by using a small sample, but how do you know that it is idiotic? Because you have experience from a much bigger sample - provide that sample to the estimation technique and it will give you a much better idea about confidence bands than your gut felling can tell you.  

Wonkish: what you describe is actually not far from the actual difference between standard (frequency) statistics and bayesian statistics. In frequency statistics we assume that there is an underlying distribution that generates all outcomes that we observe in the world. We take a few of these observations and estimate that distribution. Bayes, in stead, does conditional estimation on a priorly given distribution. I.e. Bayes assumes what the world looks like and then see how well observations fit our world view (i.e. our winrate).