This section is about the problem to estimate the strengths of the players with help of some given data. Here the strength of a player is looked upon as an unknown constant, which has to be found. A development in time is not subject of this section, but of the section about dynamic rankings.

The following data is needed:

n | total number of players |

a_ij | This entry of a quadratic
n×n-matrix a gives the number of frags, player
i made against player j (i.e. how often player
i killed j). The entry a_ii is the number of
suicides of player i. |

This data may be collected over one or more games or maps. The more data,
the better the ranking will be. If the frag numbers `a_ij` are too
small, there is a serious problem with statistical relevance. I will discuss
this in the section about the statistical
relevance.

Now we try to assign to every player `i` his strength
`g_i`. Those strengths should be choosen such that one can see from
`g_i` and `g_j`, how the players `i` and `j` do
in direct comparison:

(g_i-g_j) = (a_ij-a_ji)/(a_ij+a_ji)

Obviously the `g_i` should be between 0 and 1. Of course the above
equation generally cannot be valid for all `i,j`. (For example it may
be, that player 1 is better than 2, 2 better than 3, and 3 better than
1). That's why we try to minimize the sum over all pairs `(i,j)
(i!=j)` of the squares of the differences. The solution is:

g_i = 1/n * \sum_{j!=i} (a_ij+a_ji)/(a_ij+a_ji)

If there is a `(i,j)` with `a_ij+a_ji=0`, the problem is not
well defined. Unfortunately this will occur in practice quite frequently,
because not every two players meet. We suggest to skip such summands and to
decrease the number of players `n` accordingly.

The above formula doesn't take into account the number of suicides of a
player, and there is no canonical way to do this. One could for example
subtract the number `a_ii/B_i`, where `B_i` is the total number
of frags player `i` was involved into. Alternatively one could
distribute his suicides proportionately under his oppenents, before
calculating the strength according to this formula.

Nevertheless those possiblities to take the suicides into account, are very arbitrary, and their consequences are not so clear. In practice none of these solutions was really satisfying.

The first tests with this formula alwas resulted in players on the first ranks, which nearly didn't play. And worse: The longer a player played, the smaller became the chances, that he came onto one of the first places. The problem was, that scores of 1:0 or 2:0 beween two single players clearly dominated the ranking.

Our attempts to devalue or to weigh less those quotients in the formula, seemed to be theoretically proper, but all failed in practice. Finally the only practical solution was to make the list of players more sparse, so that only a certain "core" of player rested, of which everyone did enough frags against every other to found a solid statistics.

Another serious disadvantage of this method of ranking is, that the size
of the data `a_ij` to hold in the memory increases quadratically with
the number of players. Normally most of the `a_ij` are zero, but if
one tries not to save those zeros, the programming expense and the error
proneness increases dramatically.

This is a problem on principle of the method, that can't be avoided by clever programming.

This section is about a ranking method, which tries to calculate the development of the strengths of the players in time. It is similar to the one used for the world ranking lists in chess and table tennis (the ELO ladder). Given constant strength, the calculated strengths converge against the correct values after some time.

First we look at the case very similar to the world ranking lists mentioned
above, where the players play against each other in pairs (i.e. duel). To
every player `i` his strength `g_i` is assigned. The
probablity, that player `i` wins against player `j` is *by
convention*

w_ij = 1 / (1+10^((g_i-g_j)/400))

The number 400 only determines, how high the values in the rankings will
be. Now let the players `i` and `j` play against each
other. How are the `g_i` and `g_j` to be adapted? If
`a_i` is the number of frags of player `i`, and `a_j`
the one of player `j`, the number

e_ij = a_i / (a_i+a_j)

is between `0` and `1` and says, how strong player
`i` was (in comparison with player `j`). If the duel ends 0:0,
we take `e_ij=0.5`.

The new strengths of the players `i` and `j` can now be
calculated as follows:

g_new_i = g_i + K * (e_ij-w_ij) g_new_j = g_j + K * (e_ji-w_ji)

If one of the two new values gets lesser than zero, it is set to zero. The
number `K` determines, how dynamically the ranking will be. For big
`K` a few wins are enough to jump up the ladder. On the other hand
some losses, and one falls down again. If `K` is small, it is
necessary to play rather well over a longer time to go up in the ranking, and
it is possible to loose some duels.

As one can see, it is possible, that on goes down in the ranking, even if one has won a duel. This is the case, of one didn't win high enough according to the strenght of the oppenent.

The number of suicides may simply be substracted from the number of frags
of the player. A problem then occurs, if one or both of the `a_i` and
`a_j` become negative. In this case the formula can't be used.

For that reason, if one of the values is negative, it can be substracted from both values. Two examples:

6 : -2 becomes +8 : 0 -3 : -5 becomes +5 : +3

Players, which don't play any more, or which play too seldom, should fall down in the ranking. Otherwise players, which reached a high score, could stop playing.

So from each rank should some points be substracted in regular intervals. If a player is the ranked less than zero, he is reset to zero.

Perhaps the reader has remarked, that the sum of the `g_i` in the
formula above is constant: `\sum g_i = \sum g_new_i`. The only
exception is, when a `g_i` becomes negative and so is reset to
zero. Ultimately all points of the ranking come from this and were
distributed from below to above.

For the problem of a dynamic ranking, if the players don't duel, but if
many player are in one map, there is a rather nice solution: Instead of the
matrix `a_ij` we process directly one frag after the other, as they
appear in the log files of the game servers. We simply count every frag of
player `i` against player `j` as a duel with result 1:0.

Suicides may be looked upon as -1:0 results against an imaginary player of the same strength.

In practice this method works rather well and is even suitable for vast amounts of data, since for every player only a real number (the strength) has to be rememberd in memory or in a database. The log files of the servers may be read as data streams, processed line by line and then thrown away.

As simple as rankings for multiplayer deathmatches are, other modifications may be ranked. As an example I will tell a bit about CTF (Capture the Flag) rankings.

At first we have to identify the important events. Let's say in CTF games these are captures and frags. (There are other important events, which I don't enumerate here and which partially depend on the exact game.)

For every event we give a certain number of points `x`, for example
7 points for a capture, and 1 point for a frag. Every event is now counted as
`x:0` result for the player, who is concerned, and as `0:x`
result for his oppenent, if a well defined oppenent exists at all. For
events, for which no well defined opponent exists, for example for captures,
we take as strength of the (imaginary) opponent in the ranking formula the
average strength of all players of the oppsosing team.

Team-kills could be processed as `-1:0` results for the player,
which did the kill. The killed player should in this case not be ranked
down.

In dynamic rankings every player starts with zero. Then his calcualted strength will increase, first quickly, then slower and slower, and finally converge against a value, which corresponds to his real strength.

In dynamic rankings of multiplayer games the rate of convergence does not
only depend on the constant `K`, but also in how many frag events the
player is involved per time unit. So if a player plays very hectically, his
rank will converge rather quickly. Another player with the same real
strength, but who is more careful, will be ranked lower in the beginning,
because his rank doesn't increase so quickly.

If there is a decline in time, the latter player has a real disadvantage: The calculated ranks are equilibria between the real strengths and the decline. So the calculated rank of the slower player will be lower than the calculated rank of the more hectic player of the same real strength.

Yes. Christoph Loewe aka AEon has written a program parsing the log files of game servers and writing html pages with many statistics: AEstats.

Which of the methods above is implemented, I don't know. I only did the mathematical background.

by
Michael Becker, 8/2001. Translated: 4/2002, Last modification: 4/2002.