Adding a ranking column to a data.frame of observations

Here is how you add a column that contains the ranking, within some category, of the observation. For example, if we have observations which contain a country of birth variable, we want to be able to say that a given observation belongs to the place with the highest proportion, the place with the second highest proportion, etc.

In summary: First we tabulate the counts by the variable with xtabs(), then we assign each row in the tabulation a rank with rank(), and then we distribute these ranks into the original data.frame using merge(). This should be fairly obvious now in the code below.

 
x = as.data.frame(factor(c('a','a','b','b','c','a','c','d')))
colnames(x) = 'x'
x$weight = c(1,1,2,1,1,3,1,1)
counts= as.data.frame(xtabs(weight~x, x))
counts$rank = rank(counts$Freq, ties.method='random')
x = merge (x,counts, by.x='x', by.y='x')

So now you can select only those observations that have the most populous by picking the highest number assigned to “rank”. I don’t know why R doesn’t assign rank 1 to the most populous, rather than the least, however.

Note — Jim pointed out some extraneous weirdness in my script which I have since edited. The above *now* creates a nice x with rankings… (Wondering what other code mistakes I have left in my nascent blog…. Perhaps now is a good time to start practicing aggressive agile testing approaches….)

This entry was posted on Saturday, July 25th, 2009 at 11:17 pm and is filed under Uncategorized. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response to Adding a ranking column to a data.frame of observations

jim holtman says:

2009-07-27 at 7:02 pm

The result of the script above is:

x weight rank
1 a 1 4
2 a 1 4
3 b 2 4
4 b 1 3
5 c 1 3
6 a 3 2
7 c 1 2
8 d 1 1

I would have thought that ‘a’ would have a rank of 4 for all three of its occurances. If you just take the result of the merge, you get:

> merge (x,counts, by.x=’x’, by.y=’x’)
x weight Freq rank
1 a 1 5 4
2 a 1 5 4
3 a 3 5 4
4 b 2 3 3
5 b 1 3 3
6 c 1 2 2
7 c 1 2 2
8 d 1 1 1

This seems to have the correct ranks assigned to the rows.

Reply

Oregon Demographics