r - complex data.table subset and vectorised maniulation -
ok have complex function built using data.frames , in trying speed i've turned data.table. i'm totally new i'm quite befuddled. anyhow i've made much simpler toy example of want do, cannot work out how translate data.table format. here example in data.frame form:
rows <- 10 data1 <- data.frame( id =1:rows, = seq(0.2, 0.55, length.out = rows), b = seq(0.35, 0.7, length.out = rows), c = seq(0.4, 0.83, length.out = rows), d = seq(0.6, 0.87, length.out = rows), e = seq(0.7, 0.99, length.out = rows), f = seq(0.52, 0.90, length.out = rows) ) dt1 <- data.table(data1) #for later data2 <- data.frame( id =3:1, = rep(3, 3), d = rep(2, 3), f = rep(1, 3) ) m.names <- c("a", "d", "f") data1[match(data2$id, data1$id),m.names] <- data1[match(data2$id, data1$id),m.names] + data2[match(data2$id, data1$id),m.names]
so note in last step want perform addition between pre-existing figures , new data , vectorised across several columns.
in data.table format i've gotten far:
dt1[id %in% data2$id, m.names, with=false]
this selects values want add after lost. appreciate !
edit:
ok i've figure out part of - can use last line of code above achieve vectorised addition part using using data2 store added values follows:
data2[,m.names] <- data2[,m.names] + data.frame(dt1[id %in% data2$id, m.names, with=false])
even 2.5million rows (in dt1) , 10,000 rows in data2 , 6 matching columns takes 0.004sec, still need assign new data2 appropriate dynamically assigned columns in data 1
here's way, using devel version data.table v1.9.5
:
require(data.table) ## v1.9.5+ setdt(data1) ## data1 data.table cols1 = c("a", "d", "f") cols2 = paste0("i.", cols1) setkey(data1, id) ## setkey , prepare join data1[data2, (cols1) := mapply(`+`, mget(cols1), mget(cols2), simplify=false)] # id b c d e f # 1: 1 3.2000000 0.3500000 0.4000000 2.60 0.7000000 1.5200000 # 2: 2 3.2388889 0.3888889 0.4477778 2.63 0.7322222 1.5622222 # 3: 3 3.2777778 0.4277778 0.4955556 2.66 0.7644444 1.6044444 # 4: 4 0.3166667 0.4666667 0.5433333 0.69 0.7966667 0.6466667 # 5: 5 0.3555556 0.5055556 0.5911111 0.72 0.8288889 0.6888889 # 6: 6 0.3944444 0.5444444 0.6388889 0.75 0.8611111 0.7311111 # 7: 7 0.4333333 0.5833333 0.6866667 0.78 0.8933333 0.7733333 # 8: 8 0.4722222 0.6222222 0.7344444 0.81 0.9255556 0.8155556 # 9: 9 0.5111111 0.6611111 0.7822222 0.84 0.9577778 0.8577778 # 10: 10 0.5500000 0.7000000 0.8300000 0.87 0.9900000 0.9000000
the join of form x[i]
performed on key column id
. each row of data2's id
column, corresponding matching rows in data1 found. example, id = 2
data2, matching row 2nd row in data1.
once we've matching rows, evaluate expression in j
, updates data1 columns provided in col1
adding values mget(cols1)
, mget(cols2)
.
cols2
generated i.
prefix, fetches values data.table i -- here data2.
hth
Comments
Post a Comment