r - complex data.table subset and vectorised maniulation -


ok have complex function built using data.frames , in trying speed i've turned data.table. i'm totally new i'm quite befuddled. anyhow i've made much simpler toy example of want do, cannot work out how translate data.table format. here example in data.frame form:

    rows <- 10     data1 <- data.frame(   id =1:rows,                     = seq(0.2, 0.55, length.out = rows),                   b = seq(0.35, 0.7, length.out = rows),                   c = seq(0.4, 0.83, length.out = rows),                   d = seq(0.6, 0.87, length.out = rows),                   e = seq(0.7, 0.99, length.out = rows),                   f = seq(0.52, 0.90, length.out = rows)                  )     dt1 <- data.table(data1) #for later      data2 <- data.frame(   id =3:1,                    = rep(3, 3),                    d = rep(2, 3),                    f = rep(1, 3)     )     m.names <- c("a", "d", "f")      data1[match(data2$id, data1$id),m.names] <- data1[match(data2$id, data1$id),m.names] + data2[match(data2$id, data1$id),m.names] 

so note in last step want perform addition between pre-existing figures , new data , vectorised across several columns.

in data.table format i've gotten far:

    dt1[id %in% data2$id, m.names, with=false] 

this selects values want add after lost. appreciate !

edit:

ok i've figure out part of - can use last line of code above achieve vectorised addition part using using data2 store added values follows:

    data2[,m.names] <- data2[,m.names] + data.frame(dt1[id %in% data2$id, m.names, with=false]) 

even 2.5million rows (in dt1) , 10,000 rows in data2 , 6 matching columns takes 0.004sec, still need assign new data2 appropriate dynamically assigned columns in data 1

here's way, using devel version data.table v1.9.5:

require(data.table) ## v1.9.5+ setdt(data1)        ## data1 data.table  cols1 = c("a", "d", "f") cols2 = paste0("i.", cols1) setkey(data1, id)   ## setkey , prepare join data1[data2, (cols1) := mapply(`+`, mget(cols1), mget(cols2), simplify=false)] #     id                 b         c    d         e         f #  1:  1 3.2000000 0.3500000 0.4000000 2.60 0.7000000 1.5200000 #  2:  2 3.2388889 0.3888889 0.4477778 2.63 0.7322222 1.5622222 #  3:  3 3.2777778 0.4277778 0.4955556 2.66 0.7644444 1.6044444 #  4:  4 0.3166667 0.4666667 0.5433333 0.69 0.7966667 0.6466667 #  5:  5 0.3555556 0.5055556 0.5911111 0.72 0.8288889 0.6888889 #  6:  6 0.3944444 0.5444444 0.6388889 0.75 0.8611111 0.7311111 #  7:  7 0.4333333 0.5833333 0.6866667 0.78 0.8933333 0.7733333 #  8:  8 0.4722222 0.6222222 0.7344444 0.81 0.9255556 0.8155556 #  9:  9 0.5111111 0.6611111 0.7822222 0.84 0.9577778 0.8577778 # 10: 10 0.5500000 0.7000000 0.8300000 0.87 0.9900000 0.9000000 

the join of form x[i] performed on key column id. each row of data2's id column, corresponding matching rows in data1 found. example, id = 2 data2, matching row 2nd row in data1.

once we've matching rows, evaluate expression in j, updates data1 columns provided in col1 adding values mget(cols1) , mget(cols2).

cols2 generated i. prefix, fetches values data.table i -- here data2.

hth


Comments

Popular posts from this blog

google chrome - Developer tools - How to inspect the elements which are added momentarily (by JQuery)? -

angularjs - Showing an empty as first option in select tag -

php - Cloud9 cloud IDE and CakePHP -