For simplicity, let's assume matchpoints and ignore extra undertricks for extreme overbidding, so what you want is: IF we have X total HCP between us, THEN we have a 50% chance of making 3NT. Find X.
I'm using the double dummy library provided by Prof. Matt Ginsberg (of GIB fame) found at http://www.cirl.uore...ibresearch.html. I took every deal, computed N/S and E/W total HCP, then recorded their double-dummy tricks possible for all 4 declarers. So an entry is in the form (total HCP of pair, tricks possible at NT). Whether the deal should be played in NT, whether they have slam, or if they should be even declaring (read: we have 3HCP together) is ignored. If the tricks differ depending on right-siding, the result for the stronger hand declaring is weighted 2/3, the weaker one 1/3 (if equal, 1/2 each). All entries are collected and summed up for statistics.
Fundamentally, this is a binary classification problem. You have a hidden variable (whether we have 9+ tricks at NT) and an observed variable (total HCP), and you want to predict the hidden using the observed. Unless you have a better idea, we predict all hands with X+ total HCP as making 3NT and any less as not making 3NT. This gives rise to a confusion matrix:
True negative (number of hands where we have LESS than X HCP and LESS than 9 tricks at NT)
False negative (number of hands where we have LESS than X HCP and AT LEAST 9 tricks at NT)
False positive (number of hands where we have AT LEAST X HCP and LESS than 9 tricks at NT)
True positive (number of hands where we have AT LEAST X HCP and AT LEAST 9 tricks at NT)
Then we can calculate a boatload of different statistics from the matrix. The most important one, corresponding to the question asked above, is the positive predictive value, TP/(FP+TP) - verify it matches the description. We want the minimum X such that PPV is at least 50%. And here's where the wheels come off - the magic number is not 25, it's 23. If you restrict it to hands where double-dummy shows 7-11 NT tricks... it drops to 22.
After a couple hours of not believing and hence fruitlessly debugging the program, I gave up. Simply put, when people state that 25HCP is required for NT game, they seem to be answering the question: IF we have 3NT, THEN we have X+ HCP 50% of the time, where X is indeed 25. But that is backwards, and in fact is calibrating the sensitivity, TP/(FN+TP), which measures how many games have 25+HCP.
Now, if you haven't written me off as a complete lunatic yet, is that why bridge is getting more and more aggressive? The people who bid closer to 23HCP games do better in the long run? Is the benefit of double dummy (remember, defenders also get to play perfectly) really a full trick on average?
PS: Data.
HCP #no 3NT #hv 3NT 0 0 0 1 42 0 2 204 0 3 498 0 4 1416 0 5 3726 0 6 8292 0 7 16254 0 8 30240 0 9 51648 0 10 84996 0 11 131166 0 12 185604 0 13 258696 0 14 341816 10 15 430580 46 16 517026 126 17 599910 576 18 661065 2007 19 700184 6088 20 709570 16598 21 668130 38142 22 584836 78236 23 464634 135852 24 322037 195115 25 196346 234280 26 105431 236395 27 50292 208404 28 22352 163252 29 9610 121556 30 3762 81234 31 1310 50338 32 409 29831 33 74 16180 34 2 8290 35 0 3726 36 0 1416 37 0 498 38 0 204 39 0 42 40 0 0