BBO Discussion Forums: System performance metrics - BBO Discussion Forums

Jump to content

  • 3 Pages +
  • 1
  • 2
  • 3
  • You cannot start a new topic
  • You cannot reply to this topic

System performance metrics

#41 User is offline   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,288
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted 2025-March-09, 02:23

What about limiting the study to
1-(p)-1-(p)
?
where opener has shown 16+ unbal or 17+ bal, and responder has shown 0-7, and then focus on how often we miss a good game, or get to the wrong partscore? And see how well the metrics (various entropies, frequencies and promises of opener's first rebid) predict outcomes. Without worrying about COG and slams.

This is something that is of interest because it applies to a lot of different systems, and there are a lot of different schemes that can be plugged into any system. It also has the advantage that it is not confounded by the bids' impact on opps' decisions as the systems only fork after opps have apparently decided not to bid. Finally it is not too much of a stress to base the analysis on shape and HCP only, at least we don't have to worry about 1NT bids that may or may not promise a stopper in the unbid suit.
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

#42 User is offline   DavidKok 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 2,717
  • Joined: 2020-March-30
  • Gender:Male
  • Location:Netherlands

Posted 2025-March-09, 02:30

View Posthelene_t, on 2025-March-09, 00:07, said:

Maybe , to make the simple metrics more relevant, i could calculate crossimps for a manageable subproblem and then see if there is some combination of metrics that predict performance. But this will be huge work and probably just lead to the conclusion that the best system is Moscito because it is simple enough for me to implement without bugs. And a system like Acol whixh works heavily on texture will not do well in a testing environment that only classifies hands by points and shape.
Going in the opposite direction still seems better to me - a multitude of relatively straightforward metrics, which can then be evaluated together. I was shocked to read that Adam went from 'my vague recollection of college physics is that entropy is a rather complex thermodynamic principle [..]. I’m not really sure how to interpret results expressed in that form' to 'This is not particularly useful. [..] [Entropy is] some information theoretic quantity that I can’t translate about “how much more information I need.”' in a little under three hours. That's a lot of studying up on information theory, compression algorithms, probability theory, statistics and game theory, and would take many people years to grasp. Helene, to me your current approach seems productive and useful. Since question answers naturally split the residual uncertainty on a logarithmic scale (i.e. each yes/no bidding question, such as "do you have 3-card support for my hearts?", cuts the remaining uncertainty in hand distribution into separate chunks, and e.g. "what is your answer to my 2 GF asking bid" does the same but with multiple possible answers), entropy is the natural scale of information density even if we don't ask binary yes-no questions but have more complicated dialogue bidding. I interpet the entropy numbers as a standardised scale on which we measure "to what extent has this question been answered already?". I'd love to know more why Adam thinks this is fruitless.

View Posthelene_t, on 2025-March-09, 02:23, said:

What about limiting the study to
1-(p)-1-(p)
?
where opener has shown 16+ unbal or 17+ bal, and responder has shown 0-7, and then focus on how often we miss a good game, or get to the wrong partscore? And see how well the metrics (various entropies, frequencies and promises of opener's first rebid) predict outcomes. Without worrying about COG and slams.

This is something that is of interest because it applies to a lot of different systems, and there are a lot of different schemes that can be plugged into any system. It also has the advantage that it is not confounded by the bids' impact on opps' decisions as the systems only fork after opps have apparently decided not to bid. Finally it is not too much of a stress to base the analysis on shape and HCP only, at least we don't have to worry about 1NT bids that may or may not promise a stopper in the unbid suit.
Limiting the scenarios to investigate the metrics sounds good to me!
0

#43 User is offline   awm 

  • PipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 8,469
  • Joined: 2005-February-09
  • Gender:Male
  • Location:Zurich, Switzerland

Posted 2025-March-09, 04:08

I'd just prefer a metric that can be expressed as a percentage or (even better) expected IMPs. Something like "on X% of hands partner cannot be sure if we have a major suit fit" or "on hands where opponents preempt to three of a major, we expect to lose X IMPs when guessing to pass or compete." These metrics are more interpretable and lead to better decisions about tradeoffs than a "ranking metric" that isn't easily interpretable in terms of bridge.

It's not that I don't understand information theory at all (I do have a PhD in Computer Science); it's that I don't have very good intuition for the meaning of a numerical entropy quantity or how it translates into more familiar numbers (probabilities, expectations). I'd like to be able to get an estimate of how often I'm on a complete guess, or how many IMPs I expect to lose on a specific sort of auction, rather than this mystery number.

Anyway, since the results seem to generally favour more aggressive systems, I think it might be worth thinking about where these systems lose and having at least one metric that reflects that. Besides the "might be too high already" metric that Helene already looked at, I think one of the biggest problems that comes up is the "wrong strain" problem. What I mean is, suppose that our best contract is in a strain where opener didn't show any length (and responder doesn't have extreme length); in other words we need to find a fit somewhere. In these situations it's usually bad for our side when the auction is very high. So for example, if partner opens an EHAA 2 and it turns out that we can make game but only in hearts we will have some issues. Most likely I will need to guess whether to pass (missing the game) or bid (possibly getting too high when partner doesn't have a heart fit). This is less problematic if partner opened 1 (or even pass) because there is more space to find the heart fit.
Adam W. Meyerson
a.k.a. Appeal Without Merit
2

  • 3 Pages +
  • 1
  • 2
  • 3
  • You cannot start a new topic
  • You cannot reply to this topic

1 User(s) are reading this topic
0 members, 1 guests, 0 anonymous users