Two potential issues encounted by working with summarize() after group_by() in dplyr

In both cases, summarize() does not display expected results no matter how you group_by().

(1) Loading both ‘dplyr’ and ‘plyr’.

(2) Changing the data layout after group_by().

The data below is from an insect feeding experiment divided into six groups in which insects were introduced on feeding diets containing different chemicals that were extracted by solvent from different plants. The number of saliva spots, classified as m- , n- or b-type, were counted from three randomly-selected areas (1-3) on each diet. The purpose here was simply to summarise mean value and SEM of saliva spots (represented as the total of m, n and b types) for all the areas for each treatment (‘group’ × ‘ extract’ × ‘chemical’).

Load the data ‘feed’ :

feed <- read.csv("http://fuminwang.net/wp-content/uploads/2020/01/feed.csv") 

Order factor levels:

feed$group <- factor( feed$group, levels = unique(feed$group) )
feed$extract <- factor( feed$extract, levels = unique(feed$extract) )
feed$chemical <- factor( feed$chemical, levels = unique(feed$chemical) )

Suppose that you have the following packages loaded:

library(plyr)
library(dplyr)
library(tidyr)

Running these:

na.omit(feed)%>% 
mutate( s1 = (m1+n1+b1), s2 = (m2+n2+b2), s3 = (m3+n3+b3)  ) %>%
group_by(group, extract, chemical) %>%
gather('s1','s2','s3', key = 's_id', value = 's') %>%    ## in 'tidyr' 
summarise(s_mean = mean(s), n = n(), s_se = sd(s)/sqrt(n())  )

You will either get error, or get only one global summary like this :

s_mean   n      s_se
28.14013 471   0.3267366

The solution is to unload the ‘plyr’ (see also here), or simply call the summarise() from dplyr (plyr also support a summarise(), so let R know which one you would like to use).

detach("package:plyr", unload=TRUE)  

# or 

na.omit(feed)%>% 
mutate( s1 = (m1+n1+b1), s2 = (m2+n2+b2), s3 = (m3+n3+b3)  ) %>%
group_by(group, extract, chemical) %>%
gather('s1','s2','s3', key = 's_id', value = 's') %>%    ## in 'tidyr' 
dplyr::summarise(s_mean = mean(s), n = n(), s_se = sd(s)/sqrt(n())  )

then to make sure that group_by() follows gather() since the function in tidyr changed the data layout. Otherwise you will get these:

# A tibble: 546 x 6
# Groups:   group, extract [78]
   group extract chemical s_mean     n  s_se
   <fct> <fct>   <fct>     <dbl> <int> <dbl>
 1 1     -       -          30.4    27  1.34
 2 1     -       phe       NaN       0 NA   
 3 1     -       epz       NaN       0 NA   
 4 1     -       rel       NaN       0 NA   
 5 1     -       del       NaN       0 NA   
 6 1     -       asa       NaN       0 NA   
 7 1     -       bca       NaN       0 NA   
 8 1     solvent -          30.2    24  1.14
 9 1     solvent phe       NaN       0 NA   
10 1     solvent epz       NaN       0 NA   
# ... with 536 more rows

Finally, by running these:

na.omit(feed)%>% 
mutate( s1 = (m1+n1+b1), s2 = (m2+n2+b2), s3 = (m3+n3+b3)  ) %>%
gather('s1','s2','s3', key = 's_id', value = 's') %>%    ## in 'tidyr'
group_by(group, extract, chemical) %>% 
dplyr::summarise(s_mean = mean(s), n = n(), s_se = sd(s)/sqrt(n())  )

You get expected summary :

# A tibble: 18 x 6
# Groups:   group, extract [18]
   group extract   chemical s_mean     n  s_se
   <fct> <fct>     <fct>     <dbl> <int> <dbl>
 1 1     -         -          30.4    27 1.34 
 2 1     solvent   -          30.2    24 1.14 
 3 1     ctl_plant phe        31.3    27 1.36 
 4 2     ctl_plant phe        30.7    27 1.13 
 5 2     plant_A   epz        29.1    27 1.18 
 6 2     plant_B   epz        28.3    27 0.924
 7 3     ctl_plant phe        31.9    27 1.31 
 8 3     plant_C   rel        31.5    27 0.917
 9 3     plant_D   rel        31.0    27 1.02 
10 4     ctl_plant phe        30.3    27 1.24 
11 4     plant_E   del        30.4    27 1.35 
12 4     plant_F   del        29.1    27 0.847
13 5     ctl_plant phe        30.3    27 1.07 
14 5     plant_G   asa        19.8    24 0.801
15 5     plant_H   asa        18.7    24 0.844
16 6     ctl_plant phe        29.9    27 0.961
17 6     plant_I   bca        19.0    24 1.01 
18 6     plant_J   bca        21.2    24 1.02 
CategoriesR