R-bloggers

R news and tutorials contributed by hundreds of R bloggers

Some Basic Concepts about Design of Experiments and How to Perform their Analysis in R

Posted on January 15, 2022 by R in the Lab in R bloggers | 0 Comments

[This article was first published on R in the Lab, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here) Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

Basic design of experiments in R for one factor and two factors designs.

You can find all the code, data and results in the GitHub repository for this post: Basic design of experiments.

There is no signal without noise

It never hurts to go back to basics before tackling more complex things. The purpose of this post is to give a brief overview of the basics of design of experiments, their analysis and how to present results using R and packages like ggplot2 and agricolae. Included are one- and two-factor experiments.

What is the design of experiments?

The design of experiments (DOE) deals with the planning and performance of tests with the objective of generating data. Statistical analysis of these data will provide objective evidence that will allow the researcher to resolve questions about a given situation, process or phenomenon.

DOE can be applied to scientific research and in industry. Its goal is to generate knowledge and learning in an efficient manner. Ideally, this process can be considered as a cycle where initial hypotheses are refined as more information is obtained. Thus, we start with an initial hypothesis that we test through experimentation. If the data (our evidence) does not agree with the hypothesis, a new hypothesis is sought to explain the observed discrepancy.

Hypothesis?

Hypotheses are tentative explanations of the research phenomenon that are stated as propositions or assertions. There are different types of hypotheses, such as:

Research hypotheses. Tentative propositions about possible relationships between two or more variables.
Null hypothesis. Propositions that deny or refute the relationship between variables.
Alternative hypotheses. These are different or “alternate” possibilities to the research and null hypotheses.

Data and measurements

A datum is a number that is the product of a measurement. A measurement is a process that links abstract concepts with empirical indicators. Of course, measurements could not be made without a measurement instrument, which can be defined as the resource used by the researcher to record information or data.

Other basic definitions in the design of experiments

Experiment

It is a change in the operating conditions of a system or process, which is made with the objective of measuring the effect of the change on one or more properties of the product or result.

Experimental unit

Piece(s) or sample(s) used to generate a value that is representative of the test result.

Response variable (Depedent variable)

Through these variable(s) we know the effect or results of each experimental test. They are the product of our measurements once we make changes in the system.

Controllable factors (Independent variable)

They are process variables that can be modified and fixed at a given level.

Non-controllable factors

These are variables that cannot be controlled during the experiment or the normal operation of the process.

Factors studied

These are the variables that are investigated in the experiment to observe how they affect or influence the response variables.

Levels and treatments

The different values assigned to each factor studied in an experimental design are called levels. A combination of levels of all the factors studied is called a treatment or design point. In the case of experimenting with a single factor, each level is a treatment.

Design matrix

It is the arrangement formed by the treatments that will be carried out, including the repetitions. In this same array are usually included the results of each response variable for each treatment and repetition. For me this is a concept of great importance, since from this matrix it is possible to perform directly the different statistical tests in R.

Random error

It is the observed variability that cannot be explained by the factors studied. It is the result of the effect of factors not studied and experimental error.

Experimental error

A component of random error that reflects the experimenter’s errors in the planning and execution of the experiment.

Basic principles of experimental design

Randomization. Performing the experimental runs (treatments) in a randomized manner is essential to ensure, as far as possible, the assumption of independence of errors.
Repetition. It is running a treatment or combination of factors more than once. This makes it possible to estimate random error and in turn calculate realistic statistics in data analysis.
Blocking. This consists of nullifying the effect of factors in which we are not really interested and which may affect the response variables. Not taking into account their effect can lead us to erroneous conclusions about the factors in which we are interested.

Some concepts of statistical inference

Population. In a very general way, it can be defined as the totality of possible individuals, specimens, objects or measurements of interest on which a study is made. Populations can be finite or infinite.
Parameters. Characteristics that, by means of their numerical value, describe the entire population.
Representative sample. An appropriately selected part of a population that retains key aspects of a population.
Statistic. A quantity obtained from the data of a sample that summarizes the characteristics of the sample.
Statistical inference. These are statistical statements about the population or process based on the information contained in the sample.
Probability distribution. It relates the set of values of a characteristic in a given population to the probability associated with each of these values.

Some hypothesis testing concepts

Statistical hypothesis. An assertion about the values of the parameters of a population or process, which can be tested from the information in a sample.
Test statistic. Formula with which a number is calculated from the data, the magnitude of which makes it possible to determine whether or not the null hypothesis is rejected.
Acceptance region. These are the possible values of the test statistic where the null hypothesis is not rejected.
Rejection region. It is the set of possible values of the test statistic that lead to rejecting the null hypothesis.
Type I error. It is when a null hypothesis that is true is rejected. It can also be called false positive.
Type II error. This is when a null hypothesis that is false is accepted. It is basically a false negative.
Power of the test. It is the probability of rejecting the null hypothesis when it is false.
Predefined significance. It is the maximum risk that the experimenter is willing to take with respect to the type I error. It is denoted by α.
Observed significance. It is the area under the reference distribution beyond the value of the test statistic. It is called p-value.

Experiments with a single factor

When the objective is to study the effect of a single factor on the mean of a response variable taking into account more than two values of the factor under consideration, it is advisable to perform a completely randomized experiment and analyze the results by analysis of variance (ANOVA). The reason for comparing means by ANOVA and not by Student’s t-tests is that the latter test increases false positives as the number of means to be compared increases, i.e., we will find differences between means where there are none.

The analysis of variance consists of separating the total variation observed in each of the sources that contribute to it.

Design and results

Obtaining a single factor design is very simple using R commands. Let’s say we want to compare the effect of three baking times (35, 45 and 60 min) in the average diameter of cookie batches. For each time we baked ten cookie batches so we have ten replicates. Note that each time and replicate have to be run in aleatory order:

set.seed(6) b_time

Subsequently, the design can be exported in some other format such as CSV:

write.csv(one_fct_dgn, "data/one_fct_dgn.csv", row.names = FALSE)

The results can be integrated into the design matrix directly, let’s simulate the average diameter with a small function:

av_diam set.seed(3) diameter

Or, once the results have been recorded externally, they can be imported from a CSV file:

exp_one_factor

Analysis

The aov function is used for the analysis of factorial designs. Previously, it is recommended to convert the factor values into the factor class and then perform the analysis. Here I going to analyze our first design ( one_fct_dgn ):

one_fct_dgn$b_time 
Note the aov requires a formula that explicitly specifies the relationship between the response and the factor(s) in the design. Once this has been done, it is possible to obtain an ANOVA table using the “anova” function:
one_fct_anova F) ## b_time 2 367.51 183.755 44.902 2.587e-09 *** ## Residuals 27 110.49 4.092 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
And export it in the format of your choice:
write.csv(one_fct_anova, "results/anova_one_fct.csv", na = "", row.names = FALSE)
Multiple range comparisons or tests
Once significant differences are established between at least two of the means, the next step is to perform a multiple comparison test, which will allow us to define which means are significantly different. There are several methods that can be used as the LSD method, Tukey’s method, or Duncan’s method. Nice functions for these methods can be found in agricolae package.
LSD
library(agricolae) one_fct_lsd 
Tukey HSD
one_fct_hsd 
Duncan
one_fct_duncan 
The results are consistent among the three methods. The section groups indicates which means are different, different letters mean significant differences. You can export the results of this kind of methods, as a TXT file, with the function capture.output . For example, to export the results of the Tukey method:
capture.output(one_fct_hsd, file = "results/hsd_one_fct.txt")
Verification of model assumptions
Something that is not very widespread in the scientific literature is the verification of the normality of the residuals, the equality between variances for the means of each treatment and the independence between each data recorded. Verification of these assumptions helps to validate the differences established as significant.
Shapiro-Wilks test to verify normality
This test is easy to perform:
one_fct_shapiro 
Here, if p-value > 0.05, we can say that data come, approximately, from a normal distribution.
The results of this analysis can be exported with the capture.output function:
capture.output(one_fct_shapiro, file = "results/shapiro_one_fct.txt")
Bartlett’s test for equality of variances
To perform this test it is possible to use the bartlett.test function:
one_fct_bt 
A p-value > 0.05 mean that treatment means have equal variances. We can export the results in the same way:
capture.output(one_fct_bt, file = "results/bartlett_one_fct.txt")
Independence
To verify independence between observations, residuals are plotted as a function of run order. If the residuals follow a defined pattern it will be a clear indication of lack of independence. For this plot I used the ggplot2 package:
library(ggplot2) one_fct_dgn$residuals 

Tests such as the Durbin-Watson test are also available to verify this assumption. Use the function durbinWatsonTest in the car package:
library(car) one_fct_dw 
A p-value > 0.05 means that data observations are not auto correlated. You can also save the results in the usual way:
capture.output(one_fct_dw, file = "results/durbin_watson_one_fct.txt")
Display of results
Bar graph
To visualize the results of this type of experiments, bar graphs are usually used, whose height represents the magnitude of each mean, together with error bars representing the standard deviation and letters showing the significant differences established by the multiple comparisons test:
library(dplyr) # Means, standard deviations and groups from Tukey HSD one_fct_means % group_by(b_time) %>% summarise( av_diam = mean(diameter), sd_diam = sd(diameter) ) %>% ungroup() %>% arrange(desc(av_diam)) %>% mutate(group = unlist(one_fct_hsd$groups["groups"])) # Bar plot ggplot(one_fct_means, aes(b_time, av_diam)) + geom_col(width = 0.5) + geom_errorbar( aes(ymin = av_diam - sd_diam, ymax = av_diam + sd_diam), width = 0.1 ) + geom_text(aes(label = group, y = av_diam + sd_diam), vjust = -0.5) + xlab("Baking time (min)") + ylab("Average diameter") + theme_classic()

Alternative
Although bar graphs are widely used in the scientific literature of various types, their use hides the true dispersion of the data for each treatment. One thing I have also noticed is that people tend to judge differences between means by the overlap or separation of standard deviations, which seems to me to be more of a bias than an objective criterion.
As an alternative to the bar chart, it is possible to make a graph showing each observation, the mean, and the letters obtained in the multiple comparisons test:
ggplot(one_fct_dgn, aes(b_time, diameter)) + geom_point() + stat_summary(fun = mean, geom = "crossbar", width = 0.2, color = "blue") + geom_text( data = one_fct_means, aes(label = group, y = av_diam), hjust = -3 ) + xlab("Baking time (min)") + ylab("Diameter") + theme_classic()

Two-factor designs
In this type of design the objective is to verify the individual and interaction effect of two factors. Before showing how to obtain and analyze them, it is necessary to recall some concepts:

  Qualitative factor. Its levels take discrete or nominal values.
  Quantitative factor. Its levels can take any value within a certain interval. The scale is continuous.
  Effect of a factor. It is the change observed in the response variable due to a change in the level of the factor.
  Interaction effect. Two factors interact significantly on the response variable when the effect of one depends on the level of the other.

Design and results
To obtain this kind of designs we can use the function fac.design from DoE.base package. For this we need specify the number of factors and the number of levels of each factor:
library(DoE.base) two_fct_dgn 
For more details about this function, just type ?fac.design in your console. Also note that there is an extra column with the name “Blocks”, this is not big deal since there is just one and we won’t taking it into account further in the analysis. In future posts I will address block designs.
This design can also be exported in the desired format:
write.csv(two_fct_dgn, file = "data/two_fct_dgn.csv", row.names = FALSE)
The results can also be integrated directly or imported once recorded in the design matrix. Here, I’m going to import and work with data previously recorded:
exp_two_factor 
Analysis
To analyze this type of design, the “aov” function is used, specifying each main effect and the interaction effect in the formula:
exp_two_factor$factor1 
You can also use the short form Y ~ A*B , which will be displayed in full once you get the ANOVA table:
two_fct_aov F) ## factor1 3 2125.11 708.37 24.6628 1.652e-07 *** ## factor2 2 3160.50 1580.25 55.0184 1.086e-09 *** ## factor1:factor2 6 557.06 92.84 3.2324 0.01797 * ## Residuals 24 689.33 28.72 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 write.csv(two_fct_anova, file = "results/anova_two_fct.csv", na = "")
Note that previously I also converted the values of each factor in the factor class.
Multiple range comparisons tests
Once a significant interaction effect is established, the next step is to establish which means differ. For this, a Tukey test can be used, for which the interaction effect must first be explicitly specified in a separated aov model:
two_fct_aov_2 
The results of Tukey’s test can be exported directly:
capture.output(two_fct_hsd, file = "results/hsd_two_fct.txt")
Verification of model assumptions
For the verification of the assumptions it is possible to proceed in a manner similar to that of the single-factor designs.
Shapiro-Wilks test to verify normality
two_fct_shapiro 
Bartlett’s test for equality of variances
For Bartlett test I previously made an extra column specifying the combination between factors (treatments). This will indicate the groups to bartlett.test function:
exp_two_factor$treatment 
Independence
exp_two_factor$residuals 

two_fct_dw 
Display of results
Bar graph
The realization of this type of graphs requires keeping in mind the individual means of each treatment and the interaction effects, if there is some. To calculate them together with the standard deviations we used the “group_by” function and then “summarise” from the “dplyr” package:
# Means, standard deviations and groups from Tukey HSD two_fct_means % group_by(factor1, factor2) %>% summarise( av_res = mean(res), sd_res = sd(res) ) %>% ungroup() %>% arrange(desc(av_res)) %>% mutate(group = unlist(two_fct_hsd$groups["groups"])) # Bar plot ggplot(two_fct_means, aes(factor1, av_res, fill = factor2)) + geom_col(width = 0.5, position = position_dodge()) + geom_errorbar( aes(ymin = av_res - sd_res, ymax = av_res + sd_res), width = 0.1, position = position_dodge(width = 0.5) ) + geom_text( aes(label = group, y = av_res + sd_res), vjust = -0.5, position = position_dodge(width = 0.5) ) + labs(fill = "Factor 2") + xlab("Factor 1") + ylab("Average response") + theme_classic()

Alternative
To make the graph showing each observation, the mean, a confidence interval and the letters obtained in the multiple comparisons test, it is necessary to use the data.frame with the observations next to the one with the means:
ggplot(exp_two_factor, aes(factor1, res, group = factor2)) + geom_point(aes(color = factor2), position = position_dodge(width = 0.9)) + stat_summary( fun = mean, geom = "crossbar", width = 0.2, color = "blue", position = position_dodge(width = 0.9) ) + geom_text( data = two_fct_means, aes(label = group, y = av_res), vjust = -4, size = 3.5, position = position_dodge(width = 0.9) ) + ylim(c(55, 130)) + xlab("Factor 1") + ylab("Response") + labs(color = "Factor 2") + theme_classic()

 ## Interaction plot
A good way to visualize the interaction effect is through interaction.plot function:
Factor1 
 
That’s it! Thank you very much for visiting this site, I hope you find the content of this post useful. See you soon!
Juan Pablo Carreón Hidalgo 🤓