In both cases, summarize() does not display expected results no matter how you group_by().
(1) Loading both ‘dplyr’ and ‘plyr’.
(2) Changing the data layout after group_by().
The data below is from an insect feeding experiment divided into six groups in which insects were introduced on feeding diets containing different chemicals that were extracted by solvent from different plants. The number of saliva spots, classified as m- , n- or b-type, were counted from three randomly-selected areas (1-3) on each diet. The purpose here was simply to summarise mean value and SEM of saliva spots (represented as the total of m, n and b types) for all the areas for each treatment (‘group’ × ‘ extract’ × ‘chemical’).
Load the data ‘feed’ :
feed <- read.csv("http://fuminwang.net/wp-content/uploads/2020/01/feed.csv")
Order factor levels:
feed$group <- factor( feed$group, levels = unique(feed$group) )
feed$extract <- factor( feed$extract, levels = unique(feed$extract) )
feed$chemical <- factor( feed$chemical, levels = unique(feed$chemical) )
Suppose that you have the following packages loaded:
library(plyr)
library(dplyr)
library(tidyr)
Running these:
na.omit(feed)%>%
mutate( s1 = (m1+n1+b1), s2 = (m2+n2+b2), s3 = (m3+n3+b3) ) %>%
group_by(group, extract, chemical) %>%
gather('s1','s2','s3', key = 's_id', value = 's') %>% ## in 'tidyr'
summarise(s_mean = mean(s), n = n(), s_se = sd(s)/sqrt(n()) )
You will either get error, or get only one global summary like this :
s_mean n s_se
28.14013 471 0.3267366
The solution is to unload the ‘plyr’ (see also here), or simply call the summarise() from dplyr (plyr also support a summarise(), so let R know which one you would like to use).
detach("package:plyr", unload=TRUE)
# or
na.omit(feed)%>%
mutate( s1 = (m1+n1+b1), s2 = (m2+n2+b2), s3 = (m3+n3+b3) ) %>%
group_by(group, extract, chemical) %>%
gather('s1','s2','s3', key = 's_id', value = 's') %>% ## in 'tidyr'
dplyr::summarise(s_mean = mean(s), n = n(), s_se = sd(s)/sqrt(n()) )
then to make sure that group_by() follows gather() since the function in tidyr changed the data layout. Otherwise you will get these:
# A tibble: 546 x 6
# Groups: group, extract [78]
group extract chemical s_mean n s_se
<fct> <fct> <fct> <dbl> <int> <dbl>
1 1 - - 30.4 27 1.34
2 1 - phe NaN 0 NA
3 1 - epz NaN 0 NA
4 1 - rel NaN 0 NA
5 1 - del NaN 0 NA
6 1 - asa NaN 0 NA
7 1 - bca NaN 0 NA
8 1 solvent - 30.2 24 1.14
9 1 solvent phe NaN 0 NA
10 1 solvent epz NaN 0 NA
# ... with 536 more rows
Finally, by running these:
na.omit(feed)%>%
mutate( s1 = (m1+n1+b1), s2 = (m2+n2+b2), s3 = (m3+n3+b3) ) %>%
gather('s1','s2','s3', key = 's_id', value = 's') %>% ## in 'tidyr'
group_by(group, extract, chemical) %>%
dplyr::summarise(s_mean = mean(s), n = n(), s_se = sd(s)/sqrt(n()) )
You get expected summary :
# A tibble: 18 x 6
# Groups: group, extract [18]
group extract chemical s_mean n s_se
<fct> <fct> <fct> <dbl> <int> <dbl>
1 1 - - 30.4 27 1.34
2 1 solvent - 30.2 24 1.14
3 1 ctl_plant phe 31.3 27 1.36
4 2 ctl_plant phe 30.7 27 1.13
5 2 plant_A epz 29.1 27 1.18
6 2 plant_B epz 28.3 27 0.924
7 3 ctl_plant phe 31.9 27 1.31
8 3 plant_C rel 31.5 27 0.917
9 3 plant_D rel 31.0 27 1.02
10 4 ctl_plant phe 30.3 27 1.24
11 4 plant_E del 30.4 27 1.35
12 4 plant_F del 29.1 27 0.847
13 5 ctl_plant phe 30.3 27 1.07
14 5 plant_G asa 19.8 24 0.801
15 5 plant_H asa 18.7 24 0.844
16 6 ctl_plant phe 29.9 27 0.961
17 6 plant_I bca 19.0 24 1.01
18 6 plant_J bca 21.2 24 1.02