What observations could i do?

How can I rank observations in-group faster?

I have a really simple problem, but I'm probably not thinking vector-y enough to solve it efficiently. I tried two different approaches and they've been looping on two different computers for a long time now. I wish I could say the competition made it more exciting, but ... bleh. rank observations in group I have long data (many rows per person, one row per person-observation) and I basically want a variable, that tells me how often the person has been observed already. I have the first two columns and want the third one person wave obs pers1 1999 1 pers1 2000 2 pers1 2003 3 pers2 1998 1 pers2 2001 2 Now I'm using two loop-approaches. Both are excruciatingly slow (150k rows). I'm sure I'm missing something, but my search queries didn't really help me yet (hard to phrase the problem). Thanks for any pointers! # ordered dataset by persnr and year of observation person.obs <- person.obs[order(person.obs$PERSNR,person.obs$wave) , ] person.obs$n.obs = 0 # first approach: loop through people and assign range unp = unique(person.obs$PERSNR) unplength = length(unp) for(i in 1:unplength) { print(unp[i]) person.obs[which(person.obs$PERSNR==unp[i]),]$n.obs = 1:length(person.obs[which(person.obs$PERSNR==unp[i]),]$n.obs) i=i+1 gc() } # second approach: loop through rows and reset counter at new person pnr = 0 for(i in 1:length(person.obs[,2])) { if(pnr!=person.obs[i,]$PERSNR) { pnr = person.obs[i,]$PERSNR e = 0 } e=e+1 person.obs[i,]$n.obs = e i=i+1 gc() }
Answer:

The answer from Marek in http://stackoverflow.com/questions/6150968/adding-an-repeated-index-for-factors/6151333#6151333 has proven very useful in the past. I wrote it down and use it almost daily since it was fast and efficient. We'll use ave() and seq_along(). foo <-data.frame(person=c(rep("pers1",3),rep("pers2",2)),year=c(1999,2000,2003,1998,2011)) foo <- transform(foo, obs = ave(rep(NA, nrow(foo)), foo$person, FUN = seq_along)) foo person year obs 1 pers1 1999 1 2 pers1 2000 2 3 pers1 2003 3 4 pers2 1998 1 5 pers2 2011 2 Another option using plyr library(plyr) ddply(foo, "person", transform, obs2 = seq_along(person)) person year obs obs2 1 pers1 1999 1 1 2 pers1 2000 2 2 3 pers1 2003 3 3 4 pers2 1998 1 1 5 pers2 2011 2 2

Ruben at Stack Overflow Visit the source

Was this solution helpful to you?