What about limiting the study to
1♣-(p)-1♦-(p)
?
where opener has shown 16+ unbal or 17+ bal, and responder has shown 0-7, and then focus on how often we miss a good game, or get to the wrong partscore? And see how well the metrics (various entropies, frequencies and promises of opener's first rebid) predict outcomes. Without worrying about COG and slams.
This is something that is of interest because it applies to a lot of different systems, and there are a lot of different schemes that can be plugged into any system. It also has the advantage that it is not confounded by the bids' impact on opps' decisions as the systems only fork after opps have apparently decided not to bid. Finally it is not too much of a stress to base the analysis on shape and HCP only, at least we don't have to worry about 1NT bids that may or may not promise a stopper in the unbid suit.
System performance metrics
#41
Posted 2025-March-09, 02:23
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
#42
Posted 2025-March-09, 02:30
helene_t, on 2025-March-09, 00:07, said:
Maybe , to make the simple metrics more relevant, i could calculate crossimps for a manageable subproblem and then see if there is some combination of metrics that predict performance. But this will be huge work and probably just lead to the conclusion that the best system is Moscito because it is simple enough for me to implement without bugs. And a system like Acol whixh works heavily on texture will not do well in a testing environment that only classifies hands by points and shape.
helene_t, on 2025-March-09, 02:23, said:
What about limiting the study to
1♣-(p)-1♦-(p)
?
where opener has shown 16+ unbal or 17+ bal, and responder has shown 0-7, and then focus on how often we miss a good game, or get to the wrong partscore? And see how well the metrics (various entropies, frequencies and promises of opener's first rebid) predict outcomes. Without worrying about COG and slams.
This is something that is of interest because it applies to a lot of different systems, and there are a lot of different schemes that can be plugged into any system. It also has the advantage that it is not confounded by the bids' impact on opps' decisions as the systems only fork after opps have apparently decided not to bid. Finally it is not too much of a stress to base the analysis on shape and HCP only, at least we don't have to worry about 1NT bids that may or may not promise a stopper in the unbid suit.
1♣-(p)-1♦-(p)
?
where opener has shown 16+ unbal or 17+ bal, and responder has shown 0-7, and then focus on how often we miss a good game, or get to the wrong partscore? And see how well the metrics (various entropies, frequencies and promises of opener's first rebid) predict outcomes. Without worrying about COG and slams.
This is something that is of interest because it applies to a lot of different systems, and there are a lot of different schemes that can be plugged into any system. It also has the advantage that it is not confounded by the bids' impact on opps' decisions as the systems only fork after opps have apparently decided not to bid. Finally it is not too much of a stress to base the analysis on shape and HCP only, at least we don't have to worry about 1NT bids that may or may not promise a stopper in the unbid suit.
#43
Posted 2025-March-09, 04:08
I'd just prefer a metric that can be expressed as a percentage or (even better) expected IMPs. Something like "on X% of hands partner cannot be sure if we have a major suit fit" or "on hands where opponents preempt to three of a major, we expect to lose X IMPs when guessing to pass or compete." These metrics are more interpretable and lead to better decisions about tradeoffs than a "ranking metric" that isn't easily interpretable in terms of bridge.
It's not that I don't understand information theory at all (I do have a PhD in Computer Science); it's that I don't have very good intuition for the meaning of a numerical entropy quantity or how it translates into more familiar numbers (probabilities, expectations). I'd like to be able to get an estimate of how often I'm on a complete guess, or how many IMPs I expect to lose on a specific sort of auction, rather than this mystery number.
Anyway, since the results seem to generally favour more aggressive systems, I think it might be worth thinking about where these systems lose and having at least one metric that reflects that. Besides the "might be too high already" metric that Helene already looked at, I think one of the biggest problems that comes up is the "wrong strain" problem. What I mean is, suppose that our best contract is in a strain where opener didn't show any length (and responder doesn't have extreme length); in other words we need to find a fit somewhere. In these situations it's usually bad for our side when the auction is very high. So for example, if partner opens an EHAA 2♠ and it turns out that we can make game but only in hearts we will have some issues. Most likely I will need to guess whether to pass (missing the game) or bid (possibly getting too high when partner doesn't have a heart fit). This is less problematic if partner opened 1♠ (or even pass) because there is more space to find the heart fit.
It's not that I don't understand information theory at all (I do have a PhD in Computer Science); it's that I don't have very good intuition for the meaning of a numerical entropy quantity or how it translates into more familiar numbers (probabilities, expectations). I'd like to be able to get an estimate of how often I'm on a complete guess, or how many IMPs I expect to lose on a specific sort of auction, rather than this mystery number.
Anyway, since the results seem to generally favour more aggressive systems, I think it might be worth thinking about where these systems lose and having at least one metric that reflects that. Besides the "might be too high already" metric that Helene already looked at, I think one of the biggest problems that comes up is the "wrong strain" problem. What I mean is, suppose that our best contract is in a strain where opener didn't show any length (and responder doesn't have extreme length); in other words we need to find a fit somewhere. In these situations it's usually bad for our side when the auction is very high. So for example, if partner opens an EHAA 2♠ and it turns out that we can make game but only in hearts we will have some issues. Most likely I will need to guess whether to pass (missing the game) or bid (possibly getting too high when partner doesn't have a heart fit). This is less problematic if partner opened 1♠ (or even pass) because there is more space to find the heart fit.
Adam W. Meyerson
a.k.a. Appeal Without Merit
a.k.a. Appeal Without Merit