Rescaling of scores

published by auroux on Wed, 01/22/2014 - 01:59

Forums:

This is certainly not a complaint, but rather a "food for thought" / "note to future organizers" kind of comment. At this point it is probably safest to stick to the established rules, and I am not sure what my actual position on the question is, but I wanted to bring up the matter of the scores for the GP being rescaled to a 0-100 scale by dividing by the #1 top score, as something to be assessed at the end of this year's GP and possibly re-discussed for future GP's. This rescaling will probably make the effective point values of the rounds very uneven -- at least this is what the first round suggests.

As it is, everyone's scores will be scaled down by an amount determined entirely by Tiit's very impressive performance (congratulations Tiit!) rather than by a broader gauge of the round's difficulty level -- and one that would be less sensitive to how the very best contestant happens to be performing that particular day.

Zoltan Nemeth came up with a very impressive and insightful analysis of this phenomenon in the context of how to score the "crowdsourced" round of the 2013 WPC, see //wpc-2013.blogspot.com/2013/08/around-world-in-80-puzzles-scoring.html where he makes a compelling point based on actual WPC score data that it is a lot less "random" to scale things based on a more predictable gauge such as the 5th best score. (Now, of course the context in which he had to design a scoring system is quite different from the present one, and the overall objectives might not be the same.)

On the flip side, another way to think of the current system is that Tiit almost neutralized the scores of the other top competitors by scaling them all down, which after all might be viewed as an appropriate reward for his performance and a reasonable mechanism for making a clear winner emerge?

In any case, for fairly consistent players who participate in all rounds, the main questionable side effect that I can see is that the round that will be discarded most likely won't be the round in which they really did worst, but rather that in which the top player did best. Probably not a big issue though -- I can certainly live with it.

Denis

Thanks for your thoughts

Permalink Submitted by DrSudoku on Wed, 01/22/2014 - 04:14.

Thanks for your thoughts Denis. I too have made various inquiries about scoring systems over the years on my blog as a competitor (such as during the Tapa Variations Competition, and as part of Crocopuzzle) and on the other side of things as part of the organizing group for Around the World in 80 Puzzles where I followed Zoltan's analysis closely.

We thought about scaling a lot, and decided this year to go with a simple, transparent system instead of a more complicated system that may/may not be relevant for the particular parameters of the Grand Prix competition. There are some key differences here that are important to note from Zoltan's recommendations. Largest in my mind is that the competitor make-up will vary a lot across these 7 events, certainly more than at a WPC where everyone you expect to compete is at the event site. Renormalization built around global median scores may be unstable when there are 500 competitors sometimes and 300 competitors others. Also, looking at last year's GP across numerous hosting sites showed lots of variation in test difficulty and competitor results (# of finishers, and spread). So when we were already implementing several changes in format to make the GP series more uniform, it seemed best to take at least one year to collect more consistent data before engineering a more involved scoring scheme.

Finally, as almost all of the contests here will be used by top solvers (only one of seven is dropped), the effect of a performance like Tiit's will probably also be smaller than you anticipate at the end of the series unlike other contest series where a larger number of contests get dropped. If this were 4 out of 7, I too would be very worried, as in that kind of situation, some paradoxes of scoring can result.

We will conduct a review and get comments from solvers after this year to see if any changes should be made for next year. But I hope this gives you some background on how we thought about the choices here. Again I thank you for your comments now and I look forward to hearing more after July.

User login

You are here

Rescaling of scores

Thanks for your thoughts