Removing outliers from DATUM
#1
Posted 2010-March-01, 14:43
We're going to hold a regional imp pairs tournament and we're debating how to determine DATUM scores for each hand. Options are:
1. DATUM = mean of all scores
2. Calculate mean M and standard deviation S of scores for each hand. Then DATUM = mean of all scores that are not (1.07 x S) away from the mean.
In other words, in option 2 we do away with outliers, i.e. scores that are more than 1.07 standard deviations away from the mean. (Note: scores distribution usually does not follow a gaussian curve, so don't try to make sense out of the 1.07 factor - it's just a guess by the orginal author. In fact, the scores distribution tends to be uni, bi or trimodal, depending on the hand.)
Simulations show that option 1 tends to benefeit playing with the field, whereas option 2 comes close to imp scores for each hand that are closer to what we see in a regular teams game.
We are leaning towards option 2, but I'd like to hear some opinions.
#2
Posted 2010-March-01, 15:10
It's a hard problem at the best of times and is intractable at many other. (From what I can tell, "the best of times" means that I can use a random forest)
Here's a couple simple pieces of advice:
1. If you can't generate some kind of model that describes how your board results should be distributed, then trying to develop any kind of defensible outlier detection scheme is a waste of time.
You have directly stated that you can't tell whether scores should be modeled as a unimodal, bimodel, or trimodel distribution. Fair enough, however, if you can't describe a "normal" result, how can you hope to describe an abnormal result?
2. If the author that you are citing is just making a random guess that anything greater than 1.07 * sigma should be treated as an outlier than I don't have much faith in his analysis. There must be something more to this...
I tend to have a conservative bias on these sorts of things.
If you aren't in a damn good position to explain precisely what you want to accomplish and describe how your changes will achieve this end, then its probably better to do nothing at all. The following quote touches on some of these issues,
>Simulations show that option 1 tends to benefeit playing with the
>field, whereas option 2 comes close to imp scores for each hand
>that are closer to what we see in a regular teams game.
However, you fail to explain why we should care about
"Playing with the field" or
"IMP scores in team games"
#3
Posted 2010-March-01, 16:08
The three most obvious ones that come to my mind are double-dummy par, the score which minimizes squared imp differences (what a mean does in total-points scoring - but this would have some very odd behaviours in a few cases), and the median (has a vaguely matchpoint-like feel to it, but it's easy to calculate and takes care of outliers.)
I would want to hear a REALLY good reason for your rather drastically trimmed mean proposal before I'd consider it as anything other than a bizarre outlier of a method
#4
Posted 2010-March-01, 16:55
That being said, it's your three methods that I find really strange!!
I've never, ever seen anyone using any of those, nor would I ever convince people those are good methods. Bridge players aren't statisticians. They want simple ways to understand the DATUM and "mean" or "mean, minus wierd results" are simple enough. "Imps least-squares" just doesn't cut it
Still, the cross-imp proposal seems good and in line with what I'd want. I might try that later. We'd still need to convert it to victory points, though.
#5
Posted 2010-March-01, 17:09
The basic question somebody needs to ask is "what is a datum supposed to represent?", from which the correct type of datum to use usually will follow directly. (To my mind, as soon as you decide your event isn't going to be scored by total points, methods based on mean total-point score are automatically off the list of candidates.)
I did think that - back before the internet introduced cross-imps to everybody - median was fairly widely accepted to be better than mean, but it's never been a popular format in my part of the world, so I can only judge by internet forum traffic.
You have seen bluejak's old article on the subject?
#7
Posted 2010-March-01, 20:30
Cross imps is generally recognised as being fairer/more accurate, if a little harder for inexperienced people to understand where their score came from. However, again for a 2 winner movement, in practice it makes no difference for most sessions whether it is scored a la Butler or cross imps.
Nick
#8
Posted 2010-March-02, 03:37
In the old days, when scores were calculated by hand, it was impossible to calculate the cross IMPs. That was a good excuse. Now we have computers everywhere and the excuse is gone.
Why are cross IMPs better than scoring against a datum? The main reason is that in datum scoring all the data that you have available are reduced to an average of some sort. All other data gets lost in the process. And an average is not a very good discription of a data set, particularly when you are going to do complicated things with it, such as scoring IMPs. To repeat the old example: If you fire a round in front of a hare and one behind it, on average the hare is dead. All hunters know how wrong averages are.
In cross IMPs all data are used. It takes more work for the computer (2 microseconds instead of 1 microsecond), but it is a much better method.
Some of the odd things in Butler (datum scoring) that are the result of using an average:
o The sum of all EW scores is not equal to the sum of all NS scores. (Barring penalties by the TD, in MP pairs this sum is always the number of pairs x 50%, in cross IMPs this sum is always 0.)
o In theory, if you would extrapolate your IMP pairs to 2 tables, you should get the same score as a team match.
o Since your own score is also counted in the datum, you are also playing against yourself (?!?).
In short: If you are living in the 21st century use cross-IMPs. Do not go anywhere near datum scoring.
Rik
The most exciting phrase to hear in science, the one that heralds the new discoveries, is not Eureka! (I found it!), but Thats funny Isaac Asimov
The only reason God did not put "Thou shalt mind thine own business" in the Ten Commandments was that He thought that it was too obvious to need stating. - Kenberg
#9
Posted 2010-March-02, 05:27
The results differ, because the IMP-scale is not linear.
It would be nice to have a par score, but many boards don't have one.
(I remember a board where I could make 6♠ while our opponents could make 6♣.)
At Butler scoring it's easier to handle outliers, you just don't use them when calculating the mean. But this does not help much, if there is no par score.
Using cross-imps you want to have as much scores as possible, so that outliers from both ends even out.
#10
Posted 2010-March-02, 05:53
George Carlin
#11
Posted 2010-March-02, 06:14
Throwing away the top and bottom scores will do fine and is much easier to understand.
#12
Posted 2010-March-02, 09:15
hotShot, on Mar 2 2010, 06:27 AM, said:
(I remember a board where I could make 6♠ while our opponents could make 6♣.)
All boards have a par. On the one that you referenced, the par would be 7♣x down one.
#13
Posted 2010-March-02, 11:18
hanp, on Mar 2 2010, 12:14 PM, said:
Throwing away the top and bottom scores will do fine and is much easier to understand.
Yes, well, I too hear the "use cross imp - its 21st century" argument - and I also see some sense in trying to think of different methods of improving the datum for the Butler method.
I am in a situation where my club is thinking of introducing IMP scored pairs once a month (essentially a lot of the members don't like teams - but some of the better players want more opportunity to practice IMP strategy for when they do play teams outside the club - hence the IMP scored pairs compromise). As the muggins who does much of the scoring and as someone who has to make sure the other scorers know what they are doing, I have to make up my mind which is best. At the moment I am struggling to see a better option than "standard" Butler, i.e. compute the datum after throwing the top and bottom score.
Recently I've regularly rescored our MP sessions by IMPs - both cross imp and butler - and though using any sort of IMP scale (often) makes a difference to the placings versus MP, there is usually no difference at all for Butler vs cross imp - not for the 2 winner movements that we normally use anyway.
Nick
Later edit - have tried comparing butler vs ximp with one winner movements - as I suspected it makes a difference which you use a good deal more often
#14
Posted 2010-March-02, 11:32
Partially agree with Han. I think people can understand butler in the sence that they understand the mechanics of it, but I don't think many people understand the implications of it, beyond that "you should follow roughly the same strategy as with IMPs" which is of course reasonably accurate. Then again, the same argument holds for the other methods.
Anyway, the solution is well-known. Play XIMPs. XIMPs is more accurate in determining the best pair than Butler, and it's easier to understand the implications since it's the same strategy as IMP Teams.
#15
Posted 2010-March-02, 13:27
I came up with a good reason why XIMP is "correct" a few days ago. In the limiting case of just two tables, the XIMP score is the same as the score in a team game. So if IMP pairs strategy is intended to be similar to team strategy, this scoring method corresponds to that.
#16
Posted 2010-March-02, 13:37
George Carlin
#17
Posted 2010-March-02, 13:45
I didn't think of cross-imps because, as I said, we want to convert scores into VP. But you can probably do that divinding the total cross-imps for each board by the nr. of tables and converting the outcome to VP.
#18
Posted 2010-March-02, 13:54
XIMP and teams: the scores are +10 and -10.
Butler: the datum is 185, and the scores are +5 and -5.
In this example, XIMP becomes equivalent to Butler if you divide by the number of tables, but that doesn't apply more generally. For instance, try it with a making and failing slam: XIMP is 14 IMPs, Butler is 10.
#19
Posted 2010-March-02, 14:18
#20
Posted 2010-March-02, 14:26
George Carlin