R-bloggers

R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Some Basic Concepts about Design of Experiments and How to Perform their Analysis in R

Posted on January 15, 2022 by R in the Lab in R bloggers | 0 Comments

[This article was first published on R in the Lab, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Basic design of experiments in R for one factor and two factors designs.

You can find all the code, data and results in the GitHub repository for this post: Basic design of experiments.

There is no signal without noise

It never hurts to go back to basics before tackling more complex things. The purpose of this post is to give a brief overview of the basics of design of experiments, their analysis and how to present results using R and packages like ggplot2 and agricolae. Included are one- and two-factor experiments.

What is the design of experiments?

The design of experiments (DOE) deals with the planning and performance of tests with the objective of generating data. Statistical analysis of these data will provide objective evidence that will allow the researcher to resolve questions about a given situation, process or phenomenon.

DOE can be applied to scientific research and in industry. Its goal is to generate knowledge and learning in an efficient manner. Ideally, this process can be considered as a cycle where initial hypotheses are refined as more information is obtained. Thus, we start with an initial hypothesis that we test through experimentation. If the data (our evidence) does not agree with the hypothesis, a new hypothesis is sought to explain the observed discrepancy.

Hypothesis?

Hypotheses are tentative explanations of the research phenomenon that are stated as propositions or assertions. There are different types of hypotheses, such as:

Data and measurements

A datum is a number that is the product of a measurement. A measurement is a process that links abstract concepts with empirical indicators. Of course, measurements could not be made without a measurement instrument, which can be defined as the resource used by the researcher to record information or data.

Other basic definitions in the design of experiments

Experiment

It is a change in the operating conditions of a system or process, which is made with the objective of measuring the effect of the change on one or more properties of the product or result.

Experimental unit

Piece(s) or sample(s) used to generate a value that is representative of the test result.

Response variable (Depedent variable)

Through these variable(s) we know the effect or results of each experimental test. They are the product of our measurements once we make changes in the system.

Controllable factors (Independent variable)

They are process variables that can be modified and fixed at a given level.

Non-controllable factors

These are variables that cannot be controlled during the experiment or the normal operation of the process.

Factors studied

These are the variables that are investigated in the experiment to observe how they affect or influence the response variables.

Levels and treatments

The different values assigned to each factor studied in an experimental design are called levels. A combination of levels of all the factors studied is called a treatment or design point. In the case of experimenting with a single factor, each level is a treatment.

Design matrix

It is the arrangement formed by the treatments that will be carried out, including the repetitions. In this same array are usually included the results of each response variable for each treatment and repetition. For me this is a concept of great importance, since from this matrix it is possible to perform directly the different statistical tests in R.

Random error

It is the observed variability that cannot be explained by the factors studied. It is the result of the effect of factors not studied and experimental error.

Experimental error

A component of random error that reflects the experimenter’s errors in the planning and execution of the experiment.

Basic principles of experimental design

Some concepts of statistical inference

Some hypothesis testing concepts

Experiments with a single factor

When the objective is to study the effect of a single factor on the mean of a response variable taking into account more than two values of the factor under consideration, it is advisable to perform a completely randomized experiment and analyze the results by analysis of variance (ANOVA). The reason for comparing means by ANOVA and not by Student’s t-tests is that the latter test increases false positives as the number of means to be compared increases, i.e., we will find differences between means where there are none.

The analysis of variance consists of separating the total variation observed in each of the sources that contribute to it.

Design and results

Obtaining a single factor design is very simple using R commands. Let’s say we want to compare the effect of three baking times (35, 45 and 60 min) in the average diameter of cookie batches. For each time we baked ten cookie batches so we have ten replicates. Note that each time and replicate have to be run in aleatory order:

set.seed(6) b_time 

Subsequently, the design can be exported in some other format such as CSV:

write.csv(one_fct_dgn, "data/one_fct_dgn.csv", row.names = FALSE)

The results can be integrated into the design matrix directly, let’s simulate the average diameter with a small function:

av_diam set.seed(3) diameter 

Or, once the results have been recorded externally, they can be imported from a CSV file:

exp_one_factor 

Analysis

The aov function is used for the analysis of factorial designs. Previously, it is recommended to convert the factor values into the factor class and then perform the analysis. Here I going to analyze our first design ( one_fct_dgn ):

one_fct_dgn$b_time 

Note the aov requires a formula that explicitly specifies the relationship between the response and the factor(s) in the design. Once this has been done, it is possible to obtain an ANOVA table using the “anova” function:

one_fct_anova F) ## b_time 2 367.51 183.755 44.902 2.587e-09 *** ## Residuals 27 110.49 4.092 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

And export it in the format of your choice:

write.csv(one_fct_anova, "results/anova_one_fct.csv", na = "", row.names = FALSE)

Multiple range comparisons or tests

Once significant differences are established between at least two of the means, the next step is to perform a multiple comparison test, which will allow us to define which means are significantly different. There are several methods that can be used as the LSD method, Tukey’s method, or Duncan’s method. Nice functions for these methods can be found in agricolae package.

LSD

library(agricolae) one_fct_lsd 

Tukey HSD

one_fct_hsd 

Duncan

one_fct_duncan 

The results are consistent among the three methods. The section groups indicates which means are different, different letters mean significant differences. You can export the results of this kind of methods, as a TXT file, with the function capture.output . For example, to export the results of the Tukey method:

capture.output(one_fct_hsd, file = "results/hsd_one_fct.txt")

Verification of model assumptions

Something that is not very widespread in the scientific literature is the verification of the normality of the residuals, the equality between variances for the means of each treatment and the independence between each data recorded. Verification of these assumptions helps to validate the differences established as significant.

Shapiro-Wilks test to verify normality

This test is easy to perform:

one_fct_shapiro 

Here, if p-value > 0.05, we can say that data come, approximately, from a normal distribution.

The results of this analysis can be exported with the capture.output function:

capture.output(one_fct_shapiro, file = "results/shapiro_one_fct.txt")

Bartlett’s test for equality of variances

To perform this test it is possible to use the bartlett.test function:

one_fct_bt 

A p-value > 0.05 mean that treatment means have equal variances. We can export the results in the same way:

capture.output(one_fct_bt, file = "results/bartlett_one_fct.txt")

Independence

To verify independence between observations, residuals are plotted as a function of run order. If the residuals follow a defined pattern it will be a clear indication of lack of independence. For this plot I used the ggplot2 package:

library(ggplot2) one_fct_dgn$residuals 

Tests such as the Durbin-Watson test are also available to verify this assumption. Use the function durbinWatsonTest in the car package:

library(car) one_fct_dw 

A p-value > 0.05 means that data observations are not auto correlated. You can also save the results in the usual way:

capture.output(one_fct_dw, file = "results/durbin_watson_one_fct.txt")

Display of results

Bar graph

To visualize the results of this type of experiments, bar graphs are usually used, whose height represents the magnitude of each mean, together with error bars representing the standard deviation and letters showing the significant differences established by the multiple comparisons test:

library(dplyr) # Means, standard deviations and groups from Tukey HSD one_fct_means % group_by(b_time) %>% summarise( av_diam = mean(diameter), sd_diam = sd(diameter) ) %>% ungroup() %>% arrange(desc(av_diam)) %>% mutate(group = unlist(one_fct_hsd$groups["groups"])) # Bar plot ggplot(one_fct_means, aes(b_time, av_diam)) + geom_col(width = 0.5) + geom_errorbar( aes(ymin = av_diam - sd_diam, ymax = av_diam + sd_diam), width = 0.1 ) + geom_text(aes(label = group, y = av_diam + sd_diam), vjust = -0.5) + xlab("Baking time (min)") + ylab("Average diameter") + theme_classic()

Alternative

Although bar graphs are widely used in the scientific literature of various types, their use hides the true dispersion of the data for each treatment. One thing I have also noticed is that people tend to judge differences between means by the overlap or separation of standard deviations, which seems to me to be more of a bias than an objective criterion.

As an alternative to the bar chart, it is possible to make a graph showing each observation, the mean, and the letters obtained in the multiple comparisons test:

ggplot(one_fct_dgn, aes(b_time, diameter)) + geom_point() + stat_summary(fun = mean, geom = "crossbar", width = 0.2, color = "blue") + geom_text( data = one_fct_means, aes(label = group, y = av_diam), hjust = -3 ) + xlab("Baking time (min)") + ylab("Diameter") + theme_classic()

Two-factor designs

In this type of design the objective is to verify the individual and interaction effect of two factors. Before showing how to obtain and analyze them, it is necessary to recall some concepts:

  • Qualitative factor. Its levels take discrete or nominal values.
  • Quantitative factor. Its levels can take any value within a certain interval. The scale is continuous.
  • Effect of a factor. It is the change observed in the response variable due to a change in the level of the factor.
  • Interaction effect. Two factors interact significantly on the response variable when the effect of one depends on the level of the other.

Design and results

To obtain this kind of designs we can use the function fac.design from DoE.base package. For this we need specify the number of factors and the number of levels of each factor:

library(DoE.base) two_fct_dgn 

For more details about this function, just type ?fac.design in your console. Also note that there is an extra column with the name “Blocks”, this is not big deal since there is just one and we won’t taking it into account further in the analysis. In future posts I will address block designs.

This design can also be exported in the desired format:

write.csv(two_fct_dgn, file = "data/two_fct_dgn.csv", row.names = FALSE)

The results can also be integrated directly or imported once recorded in the design matrix. Here, I’m going to import and work with data previously recorded:

exp_two_factor 

Analysis

To analyze this type of design, the “aov” function is used, specifying each main effect and the interaction effect in the formula:

exp_two_factor$factor1 

You can also use the short form Y ~ A*B , which will be displayed in full once you get the ANOVA table:

two_fct_aov F) ## factor1 3 2125.11 708.37 24.6628 1.652e-07 *** ## factor2 2 3160.50 1580.25 55.0184 1.086e-09 *** ## factor1:factor2 6 557.06 92.84 3.2324 0.01797 * ## Residuals 24 689.33 28.72 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 write.csv(two_fct_anova, file = "results/anova_two_fct.csv", na = "")

Note that previously I also converted the values of each factor in the factor class.

Multiple range comparisons tests

Once a significant interaction effect is established, the next step is to establish which means differ. For this, a Tukey test can be used, for which the interaction effect must first be explicitly specified in a separated aov model:

two_fct_aov_2 

The results of Tukey’s test can be exported directly:

capture.output(two_fct_hsd, file = "results/hsd_two_fct.txt")

Verification of model assumptions

For the verification of the assumptions it is possible to proceed in a manner similar to that of the single-factor designs.

Shapiro-Wilks test to verify normality

two_fct_shapiro 

Bartlett’s test for equality of variances

For Bartlett test I previously made an extra column specifying the combination between factors (treatments). This will indicate the groups to bartlett.test function:

exp_two_factor$treatment 

Independence

exp_two_factor$residuals 

two_fct_dw 

Display of results

Bar graph

The realization of this type of graphs requires keeping in mind the individual means of each treatment and the interaction effects, if there is some. To calculate them together with the standard deviations we used the “group_by” function and then “summarise” from the “dplyr” package:

# Means, standard deviations and groups from Tukey HSD two_fct_means % group_by(factor1, factor2) %>% summarise( av_res = mean(res), sd_res = sd(res) ) %>% ungroup() %>% arrange(desc(av_res)) %>% mutate(group = unlist(two_fct_hsd$groups["groups"])) # Bar plot ggplot(two_fct_means, aes(factor1, av_res, fill = factor2)) + geom_col(width = 0.5, position = position_dodge()) + geom_errorbar( aes(ymin = av_res - sd_res, ymax = av_res + sd_res), width = 0.1, position = position_dodge(width = 0.5) ) + geom_text( aes(label = group, y = av_res + sd_res), vjust = -0.5, position = position_dodge(width = 0.5) ) + labs(fill = "Factor 2") + xlab("Factor 1") + ylab("Average response") + theme_classic()

Alternative

To make the graph showing each observation, the mean, a confidence interval and the letters obtained in the multiple comparisons test, it is necessary to use the data.frame with the observations next to the one with the means:

ggplot(exp_two_factor, aes(factor1, res, group = factor2)) + geom_point(aes(color = factor2), position = position_dodge(width = 0.9)) + stat_summary( fun = mean, geom = "crossbar", width = 0.2, color = "blue", position = position_dodge(width = 0.9) ) + geom_text( data = two_fct_means, aes(label = group, y = av_res), vjust = -4, size = 3.5, position = position_dodge(width = 0.9) ) + ylim(c(55, 130)) + xlab("Factor 1") + ylab("Response") + labs(color = "Factor 2") + theme_classic()

## Interaction plot

A good way to visualize the interaction effect is through interaction.plot function:

Factor1 

That’s it! Thank you very much for visiting this site, I hope you find the content of this post useful. See you soon!

Juan Pablo Carreón Hidalgo 🤓