[math-fun] Vague question about modeling human skill at specific activities

Thane Plambeck

10 Feb 2012 10 Feb '12

10:59 a.m.

I'd welcome pointers to statistical models of the following vaguely-specified situation I play tennis. There are probably plenty of people who also play tennis whom I could defeat, if they were chosen at random amongst all players who know how to play and have a racket. However, I could easily point to a tennis player X whom I would little chance (essentially zero) of defeating. (S)he in turn could point to a player Y with the same property. And Y could point to player Z. I'm sure that for a middling tennis player such as me, there must be at least five levels, and perhaps many more, of players with this transitive "I'd have little chance of defeating that person" property Eventually we'd reach Novak Djokovic at the top of the world tennis rankings. There are probably only ten players who have any reasonable chance of beating him in a match today. What I'm looking for is a statistical model of such a situation, which I view as somewhat in common in competitive sports. The closest thing I can think of is the ELO chess rating system. I'm interested in answering questions like this. Say I have two randomly chosen worldwide tennis players X and Y. Let's say X defeats Y twice in two matches. What is the chance X will defeat Y in a third match, if no other information is provided? I'm also interested in (again vaguely-defined) "churn" parameter (ie the mixing of player skill levels over time). For example, in chess it seems to be possible for a player to play at specific level for many years. In tennis, most players begin to fade by their thirties if not sooner -- Thane Plambeck tplambeck@gmail.com http://counterwave.com/

Show replies by date

Erich Friedman

10 Feb 10 Feb

11:13 a.m.

New subject: [math-fun] Vague question about modeling human skill at specific activities

...

Say I have two randomly chosen worldwide tennis players X and Y. Let's say X defeats Y twice in two matches. What is the chance X will defeat Y in a third match, if no other information is provided?

from a statistical "maximum likelihood estimation" point of view, the best guess is P=0. but this obviously underestimates the true chance. i would estimate that if you picked two tennis players at random from the population of people who call themselves tennis players, that the true P is closer to 5%. erich, a tennis player

rcs＠xmission.com

11:19 a.m.

New subject: [math-fun] Vague question about modeling human skill at specific activities

I'll add one more vaguely-specified question to Thane's list: He estimated that the spread of tennis skill is at least six steps of 'near certain victory', and maybe much higher. What's the story for other games? I think Chess has a smaller spread: IIRC, 400 Elo rating points is one Plambeck step, and the Elo spread is 1200 - 2900, only 4.25 Pbk. I'll guess that Go has a bigger spread, while soliciting more information from real players. Rich ---------- Quoting Thane Plambeck <tplambeck@gmail.com>:

...

I'd welcome pointers to statistical models of the following vaguely-specified situation

I play tennis. There are probably plenty of people who also play tennis whom I could defeat, if they were chosen at random amongst all players who know how to play and have a racket.

However, I could easily point to a tennis player X whom I would little chance (essentially zero) of defeating. (S)he in turn could point to a player Y with the same property. And Y could point to player Z.

I'm sure that for a middling tennis player such as me, there must be at least five levels, and perhaps many more, of players with this transitive "I'd have little chance of defeating that person" property

Eventually we'd reach Novak Djokovic at the top of the world tennis rankings. There are probably only ten players who have any reasonable chance of beating him in a match today.

What I'm looking for is a statistical model of such a situation, which I view as somewhat in common in competitive sports. The closest thing I can think of is the ELO chess rating system.

I'm interested in answering questions like this.

Say I have two randomly chosen worldwide tennis players X and Y. Let's say X defeats Y twice in two matches. What is the chance X will defeat Y in a third match, if no other information is provided?

I'm also interested in (again vaguely-defined) "churn" parameter (ie the mixing of player skill levels over time). For example, in chess it seems to be possible for a player to play at specific level for many years. In tennis, most players begin to fade by their thirties if not sooner

-- Thane Plambeck tplambeck@gmail.com http://counterwave.com/

_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

Tom Rokicki

12:02 p.m.

New subject: [math-fun] Vague question about modeling human skill at specific activities

I think the number of "levels" depends largely on how much variance we expect in performance from game to game, and how much the game counteracts that variance through "repeated measurement". In Go, for instance, there are a lot of moves, and the poorer player thus has many more opportunities to make the mistakes that the stronger player can exploit. (I believe Go has many many more levels than almost any other board game.) In the 50 meter dash, there is probably very little variance and thus one would also expect many levels. In heads-up Texas Hold'em, on the other hand, the cards themselves introduce a far amount of variance, so there are probably fewer levels. Tennis I know very little about; how hard is it for a random punter to steal a game from a much superior player? Things get fun when the "is-better-than" relation is not transitive; here, linear rating systems start to fail. On Fri, Feb 10, 2012 at 10:19 AM, <rcs@xmission.com> wrote:

...

I'll add one more vaguely-specified question to Thane's list: He estimated that the spread of tennis skill is at least six steps of 'near certain victory', and maybe much higher.

What's the story for other games?

I think Chess has a smaller spread: IIRC, 400 Elo rating points is one Plambeck step, and the Elo spread is 1200 - 2900, only 4.25 Pbk. I'll guess that Go has a bigger spread, while soliciting more information from real players.

Rich

----------

Quoting Thane Plambeck <tplambeck@gmail.com>:

...
I'd welcome pointers to statistical models of the following vaguely-specified situation

I play tennis. There are probably plenty of people who also play tennis whom I could defeat, if they were chosen at random amongst all players who know how to play and have a racket.

However, I could easily point to a tennis player X whom I would little chance (essentially zero) of defeating. (S)he in turn could point to a player Y with the same property. And Y could point to player Z.

I'm sure that for a middling tennis player such as me, there must be at least five levels, and perhaps many more, of players with this transitive "I'd have little chance of defeating that person" property

Eventually we'd reach Novak Djokovic at the top of the world tennis rankings. There are probably only ten players who have any reasonable chance of beating him in a match today.

What I'm looking for is a statistical model of such a situation, which I view as somewhat in common in competitive sports. The closest thing I can think of is the ELO chess rating system.

I'm interested in answering questions like this.

Say I have two randomly chosen worldwide tennis players X and Y. Let's say X defeats Y twice in two matches. What is the chance X will defeat Y in a third match, if no other information is provided?

I'm also interested in (again vaguely-defined) "churn" parameter (ie the mixing of player skill levels over time). For example, in chess it seems to be possible for a player to play at specific level for many years. In tennis, most players begin to fade by their thirties if not sooner

-- Thane Plambeck tplambeck@gmail.com http://counterwave.com/

_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

_______________________________________________ math-fun mailing list math-fun@mailman.xmission.com http://mailman.xmission.com/cgi-bin/mailman/listinfo/math-fun

-- -- http://cube20.org/ -- http://golly.sf.net/ --

Tom Rokicki

12:08 p.m.

New subject: [math-fun] Vague question about modeling human skill at specific activities

I will say this is one thing that has always fascinated me about team sports. Compare soccer, which often ends in 0-0 or 1-1 ties (after regulation); hockey is pretty close to this too. For these games, one would think luck is a huge part. The other extreme is basketball, which may have close to 200 shots on each side with scores of 100 not uncommon. Football and baseball fall between these two extremes. I find it interesting that soccer, which probably has the greatest amount of "luck" (i.e., a single goal, a fraction of an inch that separates a save from a goal, is quite frequently the margin of victory) is also the sport associated with some of the most extreme celebrations and riots associated with the results. On Fri, Feb 10, 2012 at 11:01 AM, Tom Rokicki <rokicki@gmail.com> wrote:

...

I think the number of "levels" depends largely on how much variance we expect in performance from game to game, and how much the game counteracts that variance through "repeated measurement".

Andy Latto

1:48 p.m.

New subject: [math-fun] Vague question about modeling human skill at specific activities

As someone who has played competitive poker, backgammon, and bridge, I've thought about this a lot, mostly in the context of games that have a definite luck component, since people are always asking questions like "which game has more skill?", or "Is poker mostly a game of skill or of luck?", and I've tried to figure out whether there is even a precise way to frame such questions, let alone answer them. In the context of games with a chance component, you can't just talk about an "essentially zero" chance to win, since anyone can sometimes beat anyone. So you have to choose some arbitrary threshold, say 95%, and say that I'm "exactly one level better than you" if I beat you 95% of the time. As long as we choose this threshold the same way for all the games we're considering, we can still hope to make meaningful intergame comparisons on things like number of levels. However, as Tom points out, On Fri, Feb 10, 2012 at 2:01 PM, Tom Rokicki <rokicki@gmail.com> wrote:

...

I think the number of "levels" depends largely on how much variance we expect in performance from game to game, and how much the game counteracts that variance through "repeated measurement".

the length of the contest creates another obstacle to comparison. You have to be a lot more skillful to be ahead after an hour of poker, or win a 7-point match in backgammon, 95% of the time than you do to be ahead after 10 hours of poker, or win a 21-point match in backgammon, 95% of the time. So unless you want to reach conclusions like "soccer has less opportunity for skillful play than 'best-2-out-of-3 soccer games', as can be seen by the fact that it has fewer levels", you want to normalize by making the measure of a level something like "I am exactly one level better than you if I win a 5-hour contest 90% of the time". A third obstacle to answering the question of how many levels of play a game has is how to define the top and bottom of the scales. The top is easy, since there are only a few reasonable choices; you can set the top level as the level of the best human player in the world, or the best human or computer player in the world, or at perfect play (though the last will only give you speculation, not data, unless the game has been solved). But defining the bottom level of a game is much less clear. You mention the cutoff of " all players who know how to play and have a racket." But the number of levels of play is going to depend a lot on whether you count players like me, who could barely play back in high school and haven't picked up a racket since. And you get strange cultural influences on your measurement: I suspect that there are more poker levels than backgammon because almost everyone owns a deck of cards and knows the rules of poker, while those who own a backgammon board and know the rules are likely to be people with some interest and aptitude for games. The worst Go player in Japan is probably considerably worse than the worst Go player in the US, because people who are bad at Go are much more likely to have a Go board and know the rules if they live in Japan than if they live in the US. A related question relates to the ability to make rating systems with good predictive value. Even in games where the ability to beat someone is transitive, the correct rating system depends on "how transitive" the game is. Let's suppose that we want a rating difference of 100 points to mean winning two games out of 3. Then we structure the ratings so that if you and I play a long series of games, and I win 2/3 of them, our ratings will converge to ratings with a difference of 100. But now in designing a rating system, you have to choose a value of p such that if I play a long series of games against a player, and win p of them, our rating difference will converge to 200. So it seems to me that to design a rating system, you have to answer the question "If A beats B 2/3 of the time, and B beats C 2/3 of the time, what fraction of the time will A beat C?" I don't see any a priori reason this number shouldn't vary from game to game, and if it doesn't match the number chosen by the designer of the rating system, then either playing a player 100 points lower, or 200 points lower, will tend to change your rating, and any stable equilibrium will depend not only on the relative skill of the players, but on how often they play opponents at various skill disparities. Andy

Robert Munafo

12:29 p.m.

New subject: [math-fun] Vague question about modeling human skill at specific activities

I've always thought that the "best" games are ones that accomodate a huge range of skill levels. In this regard, Go and football ("soccer" in the U.S.) are my favorites, although I do not play either. Both also have rules that are about as simple as it can get, another reason I admire them. In football there is an Elo-based rating system but I can't figure out what they're saying at [1a]. Obviously the team nature of the game and the fact that individual players get swapped from one team to another makes mathematical modeling more difficult, but I think the original questions (Thane Plambeck and rcs) are equally relevant and answers equally useful. The English football league system [1b] has a pyramid of about 20 levels, the top 14 of which have an exponential distribution: each level has about 1.5 times as many teams as the level above it. The churn rate is about 25 percent per year: out of 22 clubs typically 5 or 6 clubs leave each year (either to move up or move down). I expect that "200 Elo points" for English football would be more than one level of this pyramid, (because 200 point spread means the higher team will win 75% of the time, and hearing the football coverage on the BBC it seems winning is more random than that.) I imagine it's about 100 Elo points per pyramid level. That makes the pyramid about 2000 Elo points high (ignoring the very few leagues at the bottom). This does not include semi-professional and amateur teams, many of which also have leagues but can't ever make it to the Premier League. In Go, the scale is about 3000 points high[2a], and the size of "200 Elo points" varies from about 100 points at the top end to about 300 at the bottom [2b]. I suppose this comes from the way the game changes as you get really good at it, but it also looks suspiciously like a grade inflation problem. - Robert [1a] http://en.wikipedia.org/wiki/World_Football_Elo_Ratings [1b] http://en.wikipedia.org/wiki/English_football_league_system#The_system [2a] http://en.wikipedia.org/wiki/Go_ranks_and_ratings#Elo_Ratings_as_used_in_Go [2b] http://en.wikipedia.org/wiki/File:Estimated_Win_Probabilities_under_EGF_Rati... On Fri, Feb 10, 2012 at 13:19, <rcs@xmission.com> wrote:

...

I'll add one more vaguely-specified question to Thane's list: He estimated that the spread of tennis skill is at least six steps of 'near certain victory', and maybe much higher.

What's the story for other games?

I think Chess has a smaller spread: IIRC, 400 Elo rating points is one Plambeck step, and the Elo spread is 1200 - 2900, only 4.25 Pbk. I'll guess that Go has a bigger spread, while soliciting more information from real players.

Rich

---------- Quoting Thane Plambeck <tplambeck@gmail.com>:

...
I'd welcome pointers to statistical models of the following vaguely-specified situation

I play tennis. There are probably plenty of people who also play tennis whom I could defeat, if they were chosen at random amongst all players who know how to play and have a racket.

However, I could easily point to a tennis player X whom I would little chance (essentially zero) of defeating. (S)he in turn could point to a player Y with the same property. And Y could point to player Z.

I'm sure that for a middling tennis player such as me, there must be at least five levels, and perhaps many more, of players with this transitive "I'd have little chance of defeating that person" property

Eventually we'd reach Novak Djokovic at the top of the world tennis rankings. There are probably only ten players who have any reasonable chance of beating him in a match today.

What I'm looking for is a statistical model of such a situation, which I view as somewhat in common in competitive sports. The closest thing I can think of is the ELO chess rating system.

I'm interested in answering questions like this.

Say I have two randomly chosen worldwide tennis players X and Y. Let's say X defeats Y twice in two matches. What is the chance X will defeat Y in a third match, if no other information is provided?

I'm also interested in (again vaguely-defined) "churn" parameter (ie the mixing of player skill levels over time). For example, in chess it seems to be possible for a player to play at specific level for many years. In tennis, most players begin to fade by their thirties if not sooner

-- Robert Munafo -- mrob.com Follow me at: gplus.to/mrob - fb.com/mrob27 - twitter.com/mrob_27 - mrob27.wordpress.com - youtube.com/user/mrob143 - rilybot.blogspot.com

5034

Age (days ago)

5034

Last active (days ago)

List overview

Download

6 comments

6 participants

participants (6)

Andy Latto
Erich Friedman
rcs＠xmission.com
Robert Munafo
Thane Plambeck
Tom Rokicki