BBO Discussion Forums: Removing outliers from DATUM - BBO Discussion Forums

Jump to content

  • 2 Pages +
  • 1
  • 2
  • You cannot start a new topic
  • You cannot reply to this topic

Removing outliers from DATUM

#1 User is offline   whereagles 

  • PipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 14,900
  • Joined: 2004-May-11
  • Gender:Male
  • Location:Portugal
  • Interests:Everything!

Posted 2010-March-01, 14:43

Hi all,

We're going to hold a regional imp pairs tournament and we're debating how to determine DATUM scores for each hand. Options are:

1. DATUM = mean of all scores

2. Calculate mean M and standard deviation S of scores for each hand. Then DATUM = mean of all scores that are not (1.07 x S) away from the mean.

In other words, in option 2 we do away with outliers, i.e. scores that are more than 1.07 standard deviations away from the mean. (Note: scores distribution usually does not follow a gaussian curve, so don't try to make sense out of the 1.07 factor - it's just a guess by the orginal author. In fact, the scores distribution tends to be uni, bi or trimodal, depending on the hand.)

Simulations show that option 1 tends to benefeit playing with the field, whereas option 2 comes close to imp scores for each hand that are closer to what we see in a regular teams game.

We are leaning towards option 2, but I'd like to hear some opinions.
0

#2 User is offline   hrothgar 

  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 15,495
  • Joined: 2003-February-13
  • Gender:Male
  • Location:Natick, MA
  • Interests:Travel
    Cooking
    Brewing
    Hiking

Posted 2010-March-01, 15:10

These days, I am spending an awful lot of time working on automatic outlier detection.

It's a hard problem at the best of times and is intractable at many other. (From what I can tell, "the best of times" means that I can use a random forest)

Here's a couple simple pieces of advice:

1. If you can't generate some kind of model that describes how your board results should be distributed, then trying to develop any kind of defensible outlier detection scheme is a waste of time.

You have directly stated that you can't tell whether scores should be modeled as a unimodal, bimodel, or trimodel distribution. Fair enough, however, if you can't describe a "normal" result, how can you hope to describe an abnormal result?

2. If the author that you are citing is just making a random guess that anything greater than 1.07 * sigma should be treated as an outlier than I don't have much faith in his analysis. There must be something more to this...

I tend to have a conservative bias on these sorts of things.

If you aren't in a damn good position to explain precisely what you want to accomplish and describe how your changes will achieve this end, then its probably better to do nothing at all. The following quote touches on some of these issues,

>Simulations show that option 1 tends to benefeit playing with the
>field, whereas option 2 comes close to imp scores for each hand
>that are closer to what we see in a regular teams game.

However, you fail to explain why we should care about

"Playing with the field" or
"IMP scores in team games"
Alderaan delenda est
0

#3 User is offline   Siegmund 

  • Alchemist
  • PipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 1,764
  • Joined: 2004-June-15
  • Gender:Male
  • Location:Beside a little lake in northwestern Montana
  • Interests:Creator of the 'grbbridge' LaTeX typesetting package.

Posted 2010-March-01, 16:08

I find it A) mildly strange that you want to IMP against a datum rather than use cross-imps, and B ) considerably stranger to use either of these two methods to find a datum.

The three most obvious ones that come to my mind are double-dummy par, the score which minimizes squared imp differences (what a mean does in total-points scoring - but this would have some very odd behaviours in a few cases), and the median (has a vaguely matchpoint-like feel to it, but it's easy to calculate and takes care of outliers.)

I would want to hear a REALLY good reason for your rather drastically trimmed mean proposal before I'd consider it as anything other than a bizarre outlier of a method :)
0

#4 User is offline   whereagles 

  • PipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 14,900
  • Joined: 2004-May-11
  • Gender:Male
  • Location:Portugal
  • Interests:Everything!

Posted 2010-March-01, 16:55

Sieg: method 2 isn't an invention of mine. It was shown by a friend who read it somewhere when fiddling with stuff on imp pairs.

That being said, it's your three methods that I find really strange!!

I've never, ever seen anyone using any of those, nor would I ever convince people those are good methods. Bridge players aren't statisticians. They want simple ways to understand the DATUM and "mean" or "mean, minus wierd results" are simple enough. "Imps least-squares" just doesn't cut it :)

Still, the cross-imp proposal seems good and in line with what I'd want. I might try that later. We'd still need to convert it to victory points, though.
0

#5 User is offline   Siegmund 

  • Alchemist
  • PipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 1,764
  • Joined: 2004-June-15
  • Gender:Male
  • Location:Beside a little lake in northwestern Montana
  • Interests:Creator of the 'grbbridge' LaTeX typesetting package.

Posted 2010-March-01, 17:09

I'd be interested in hearing where, if you or he happens to recall it.

The basic question somebody needs to ask is "what is a datum supposed to represent?", from which the correct type of datum to use usually will follow directly. (To my mind, as soon as you decide your event isn't going to be scored by total points, methods based on mean total-point score are automatically off the list of candidates.)

I did think that - back before the internet introduced cross-imps to everybody - median was fairly widely accepted to be better than mean, but it's never been a popular format in my part of the world, so I can only judge by internet forum traffic.

You have seen bluejak's old article on the subject?
0

#6 User is offline   Mbodell 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 2,871
  • Joined: 2007-April-22
  • Location:Santa Clara, CA

Posted 2010-March-01, 17:55

Why not cross imps? That seems the most straightforward.
0

#7 User is offline   NickRW 

  • PipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 1,951
  • Joined: 2008-April-30
  • Gender:Male
  • Location:Sussex, England

Posted 2010-March-01, 20:30

If you use a 2 winner movement, I don't think it makes that much difference which outliers, if any, you remove. It is maybe a little more likely to make a difference to a one winner movement - not sure really - just a gut feel - don't have so much experience scoring one winner movements.

Cross imps is generally recognised as being fairer/more accurate, if a little harder for inexperienced people to understand where their score came from. However, again for a 2 winner movement, in practice it makes no difference for most sessions whether it is scored a la Butler or cross imps.

Nick
"Pass is your friend" - my brother in law - who likes to bid a lot.
0

#8 User is offline   Trinidad 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 4,531
  • Joined: 2005-October-09
  • Location:Netherlands

Posted 2010-March-02, 03:37

There is basically no excuse for scoring IMPs against a datum (Butler method) when you can use cross IMPs.

In the old days, when scores were calculated by hand, it was impossible to calculate the cross IMPs. That was a good excuse. Now we have computers everywhere and the excuse is gone.

Why are cross IMPs better than scoring against a datum? The main reason is that in datum scoring all the data that you have available are reduced to an average of some sort. All other data gets lost in the process. And an average is not a very good discription of a data set, particularly when you are going to do complicated things with it, such as scoring IMPs. To repeat the old example: If you fire a round in front of a hare and one behind it, on average the hare is dead. All hunters know how wrong averages are.

In cross IMPs all data are used. It takes more work for the computer (2 microseconds instead of 1 microsecond), but it is a much better method.

Some of the odd things in Butler (datum scoring) that are the result of using an average:
o The sum of all EW scores is not equal to the sum of all NS scores. (Barring penalties by the TD, in MP pairs this sum is always the number of pairs x 50%, in cross IMPs this sum is always 0.)
o In theory, if you would extrapolate your IMP pairs to 2 tables, you should get the same score as a team match.
o Since your own score is also counted in the datum, you are also playing against yourself (?!?).

In short: If you are living in the 21st century use cross-IMPs. Do not go anywhere near datum scoring.

Rik
I want my opponents to leave my table with a smile on their face and without matchpoints on their score card - in that order.
The most exciting phrase to hear in science, the one that heralds the new discoveries, is not “Eureka!” (I found it!), but “That’s funny…” – Isaac Asimov
The only reason God did not put "Thou shalt mind thine own business" in the Ten Commandments was that He thought that it was too obvious to need stating. - Kenberg
0

#9 User is offline   hotShot 

  • Axxx Axx Axx Axx
  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 2,976
  • Joined: 2003-August-31
  • Gender:Male

Posted 2010-March-02, 05:27

Basically at Butler (using a mean) you are averaging the score, while at cross-imps you are averaging the IMPs.
The results differ, because the IMP-scale is not linear.

It would be nice to have a par score, but many boards don't have one.
(I remember a board where I could make 6 while our opponents could make 6.)

At Butler scoring it's easier to handle outliers, you just don't use them when calculating the mean. But this does not help much, if there is no par score.

Using cross-imps you want to have as much scores as possible, so that outliers from both ends even out.
0

#10 User is offline   gwnn 

  • Csaba the Hutt
  • PipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 13,027
  • Joined: 2006-June-16
  • Gender:Male
  • Interests:bye

Posted 2010-March-02, 05:53

IMP least square sounds like a really cool method!!! Of course coolness is not quite the proper criterion but maybe it should be taken into account...:)
... and I can prove it with my usual, flawless logic.
      George Carlin
0

#11 User is offline   hanp 

  • PipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 2,987
  • Joined: 2009-February-15

Posted 2010-March-02, 06:14

I think that it is a serious flaw if many people cannot understand the scoring method. The IMP table is hard to remember but not hard to understand. Evaluation method (2) will be incomprehensible for many competitors.

Throwing away the top and bottom scores will do fine and is much easier to understand.
and the result can be plotted on a graph.
0

#12 User is offline   ArtK78 

  • PipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 7,786
  • Joined: 2004-September-05
  • Gender:Male
  • Location:Galloway NJ USA
  • Interests:Bridge, Poker, participatory and spectator sports.
    Occupation - Tax Attorney in Atlantic City, NJ.

Posted 2010-March-02, 09:15

hotShot, on Mar 2 2010, 06:27 AM, said:

It would be nice to have a par score, but many boards don't have one.
(I remember a board where I could make 6 while our opponents could make 6.)

All boards have a par. On the one that you referenced, the par would be 7x down one.
0

#13 User is offline   NickRW 

  • PipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 1,951
  • Joined: 2008-April-30
  • Gender:Male
  • Location:Sussex, England

Posted 2010-March-02, 11:18

hanp, on Mar 2 2010, 12:14 PM, said:

I think that it is a serious flaw if many people cannot understand the scoring method. The IMP table is hard to remember but not hard to understand. Evaluation method (2) will be incomprehensible for many competitors.

Throwing away the top and bottom scores will do fine and is much easier to understand.

Yes, well, I too hear the "use cross imp - its 21st century" argument - and I also see some sense in trying to think of different methods of improving the datum for the Butler method.

I am in a situation where my club is thinking of introducing IMP scored pairs once a month (essentially a lot of the members don't like teams - but some of the better players want more opportunity to practice IMP strategy for when they do play teams outside the club - hence the IMP scored pairs compromise). As the muggins who does much of the scoring and as someone who has to make sure the other scorers know what they are doing, I have to make up my mind which is best. At the moment I am struggling to see a better option than "standard" Butler, i.e. compute the datum after throwing the top and bottom score.

Recently I've regularly rescored our MP sessions by IMPs - both cross imp and butler - and though using any sort of IMP scale (often) makes a difference to the placings versus MP, there is usually no difference at all for Butler vs cross imp - not for the 2 winner movements that we normally use anyway.

Nick

Later edit - have tried comparing butler vs ximp with one winner movements - as I suspected it makes a difference which you use a good deal more often
"Pass is your friend" - my brother in law - who likes to bid a lot.
0

#14 User is online   helene_t 

  • The Abbess
  • PipPipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 17,201
  • Joined: 2004-April-22
  • Gender:Female
  • Location:Copenhagen, Denmark
  • Interests:History, languages

Posted 2010-March-02, 11:32

Given the choice I prefer least squares to random forests <_<

Partially agree with Han. I think people can understand butler in the sence that they understand the mechanics of it, but I don't think many people understand the implications of it, beyond that "you should follow roughly the same strategy as with IMPs" which is of course reasonably accurate. Then again, the same argument holds for the other methods.

Anyway, the solution is well-known. Play XIMPs. XIMPs is more accurate in determining the best pair than Butler, and it's easier to understand the implications since it's the same strategy as IMP Teams.
The world would be such a happy place, if only everyone played Acol :) --- TramTicket
0

#15 User is offline   barmar 

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • Group: Admin
  • Posts: 21,600
  • Joined: 2004-August-21
  • Gender:Male

Posted 2010-March-02, 13:27

Didn't we just have a XIMP vs Butler debate a couple of months ago?

I came up with a good reason why XIMP is "correct" a few days ago. In the limiting case of just two tables, the XIMP score is the same as the score in a team game. So if IMP pairs strategy is intended to be similar to team strategy, this scoring method corresponds to that.

#16 User is offline   gwnn 

  • Csaba the Hutt
  • PipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 13,027
  • Joined: 2006-June-16
  • Gender:Male
  • Interests:bye

Posted 2010-March-02, 13:37

huh? Butler is also equivalent to Teams if there are only 2 tables.
... and I can prove it with my usual, flawless logic.
      George Carlin
0

#17 User is offline   whereagles 

  • PipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 14,900
  • Joined: 2004-May-11
  • Gender:Male
  • Location:Portugal
  • Interests:Everything!

Posted 2010-March-02, 13:45

I'm a bit short on time now, but I'll reread the thread and links later tonight.

I didn't think of cross-imps because, as I said, we want to convert scores into VP. But you can probably do that divinding the total cross-imps for each board by the nr. of tables and converting the outcome to VP.
0

#18 User is offline   barmar 

  • PipPipPipPipPipPipPipPipPipPipPipPip
  • Group: Admin
  • Posts: 21,600
  • Joined: 2004-August-21
  • Gender:Male

Posted 2010-March-02, 13:54

Let's say you make a non-vul game at one table, and go down 1 at the other table.

XIMP and teams: the scores are +10 and -10.

Butler: the datum is 185, and the scores are +5 and -5.

In this example, XIMP becomes equivalent to Butler if you divide by the number of tables, but that doesn't apply more generally. For instance, try it with a making and failing slam: XIMP is 14 IMPs, Butler is 10.

#19 User is offline   whereagles 

  • PipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 14,900
  • Joined: 2004-May-11
  • Gender:Male
  • Location:Portugal
  • Interests:Everything!

Posted 2010-March-02, 14:18

Allright, I'm more or less convinced cross-imps is the deal. We'll probably divide by the number of comparisions to get a more normal-looking result (even if in fractionary imps lol) that we can later insert into a VP scale and come up with a VP result.
0

#20 User is offline   gwnn 

  • Csaba the Hutt
  • PipPipPipPipPipPipPipPipPipPip
  • Group: Advanced Members
  • Posts: 13,027
  • Joined: 2006-June-16
  • Gender:Male
  • Interests:bye

Posted 2010-March-02, 14:26

you're right barmar, momentary lack of reason from my side, sorry
... and I can prove it with my usual, flawless logic.
      George Carlin
0

  • 2 Pages +
  • 1
  • 2
  • You cannot start a new topic
  • You cannot reply to this topic

2 User(s) are reading this topic
0 members, 2 guests, 0 anonymous users