Functional Programming in R with purrr
Functional Programming in R with purrr
I have been writing about the family packages from TidyVerse. Tidyverse is a great collection of R packages offering data science solutions in the areas of data manipulation, exploration, and visualization that share a common design philosophy. Today, I will talk about purrr.
As I enter my second to last semester of masters of science in data science program, I have become accustomed to writing several functions for various analysis. But often writing several functions creates mistakes which throws an error. Take the following code for example:
aov_mpg <- aov(mpg ~ factor(cyl), data = mtcars)
summary(aov_mpg)
aov_disp <- aov(disp ~ factor(cyll), data = mtcars)
summary(aov_disp)
aov_hp <- aov(hp ~ factor(cyl), data = mrcars)
summry(aov_hpp)
aov_wt <- aov(wt ~ factor(cyl), datas = mtcars)
summary(aov_wt)
In the code chunk above, if you wanted to change ANOVAs for number of gears instead of number of cylinders, you would have to go back and change the factor(cyl) call to factor(gear) 4x! This is not very efficient, and you are likely to end up with mistakes as you have to type everything multiple times. It gets more complicated if you have to write functions for hundreds of variables.
This is where purrr comes in. Purrr solves the issue of minimizing repetition with further replication. Here we use purrr, to solve the same one-way ANOVAs for some dependent variables and a set independent variable. We can see that purrr requires less coding and if were to change a variable, we have to do it once. That’s the beauty of purrr.
mtcars %>%
mutate(cyl = factor(cyl)) %>%
select(mpg, disp, hp) %>%
map(~ aov(.x ~ cyl, data = mtcars)) %>%
map_dfr(~ tidy(.), .id = 'source') %>%
mutate(p.value = round(p.value, 5)) %>%
kable() %>%
kable_styling()
source | term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|---|
mpg | cyl | 1 | 817.7130 | 817.71295 | 79.56103 | 0 |
mpg | Residuals | 30 | 308.3342 | 10.27781 | NA | NA |
disp | cyl | 1 | 387454.0926 | 387454.09261 | 130.99888 | 0 |
disp | Residuals | 30 | 88730.7021 | 2957.69007 | NA | NA |
hp | cyl | 1 | 100984.1721 | 100984.17209 | 67.70993 | 0 |
hp | Residuals | 30 | 44742.7029 | 1491.42343 | NA | NA |