r - dplyr summarize with a function of a dataframe -
i'm having trouble carrying out routine using dplyr package. in short, have function takes dataframe input, , returns single (numeric) value; i'd able apply function several subsets of dataframe. feels should able use group_by() specify subsets of dataframe, pipe along summarize() function, i'm not sure how pass (subsetted) dataframe along function i'd apply.
as simplified example, let's i'm using iris dataset, , i've got simple function i'd apply several subsets of data:
data(iris) lm.func = function(.data){ lm.fit = lm(petal.width ~ petal.length, data = .data) out = summary(lm.fit)$coefficients[2,1] return(out) }
now, i'd able apply function subsets of iris based on other variable, species. i'm able manually filter data, pipe along function, example:
iris %>% filter(species == "setosa") %>% lm.func(.)
but i'd able apply lm.func each subset of data, based on species. first thought try following:
iris %>% group_by(species) %>% summarize(coef.val = lm.func(.))
even though know doesn't work, idea try pass each subset of iris lm.func function.
to clarify, i'd end dataframe 2 columns -- first each level of grouping variable, , second output of lm.func when data restricted subset specified grouping variable.
is possible use summarize() in way?
you can try do
iris %>% group_by(species) %>% do(data.frame(coef.val=lm.func(.))) # species coef.val #1 setosa 0.2012451 #2 versicolor 0.3310536 #3 virginica 0.1602970
Comments
Post a Comment