How To Add All Two Way Interactions In R

When doing linear modeling or ANOVA it'due south useful to examine whether or not the upshot of one variable depends on the level of one or more variables. If information technology does then we have what is chosen an "interaction". This means variables combine or collaborate to affect the response. The simplest blazon of interaction is the interaction between two two-level categorical variables. Let'due south say we take gender (male and female), treatment (yes or no), and a continuous response measure out. If the response to treatment depends on gender, then we have an interaction.

Using R, we can simulate data such every bit this. The following code first generates a vector of gender labels, xx each of "male" and "female person". Then it generates treatment labels, ten each of "aye" and "no", alternating twice so we accept x treated and 10 untreated for each gender. Adjacent we generate the response past randomly sampling from 2 dissimilar normal distributions, one with hateful xv and the other with mean x. Notice nosotros create an interaction by sampling from the distributions in a dissimilar order for each gender. Finally we combine our vectors into a data frame.

gender <- gl(northward = two, yard = 20, labels = c("male","female person")) trt <- rep(rep(c("yes","no"), each=10),ii) set.seed(1) resp <- c(   rnorm(n = 20, mean = rep(c(xv,10), each =10)),   rnorm(n = 20, hateful = rep(c(x,xv), each =ten))   ) dat <- information.frame(gender, trt, resp)

Now that we accept our data, let'due south see how the mean response changes based on the ii "chief" effects:

aggregate(resp ~ gender, information = dat, mean)  ##   gender     resp ## 1   male 12.69052 ## ii female 12.49353  aggregate(resp ~ trt, data = dat, mean)  ##   trt     resp ## 1  no 12.68479 ## ii yes 12.49926

Neither announced to have any effect on the mean response value. But what about their interaction? We can run across this past looking at the mean response by both gender and trt using tapply:

with(dat, tapply(resp, list(gender, trt), mean))  ##              no       yes ## male   x.24884 15.132203 ## female 15.12073  ix.866327

Now nosotros see something happening. The event of trt depends on gender. If you lot're male, trt causes the mean response to increase past about 5. If you lot're female person, trt causes the mean response to subtract by nearly five. The ii variables interact.

A helpful role for visualizing interactions is interaction.plot. It basically plots the ways we just examined and connects them with lines. The first argument, ten.cistron, is the variable yous want on the x-axis. The 2d variable, trace.gene, is how you want to group the lines information technology draws. The third argument, response, is your response variable.

interaction.plot(ten.factor = dat$trt,                   trace.factor = dat$gender,                  response = dat$resp)

int01
The resulting plot shows an interaction. The lines cross. At the ends of each line are the means we previously examined. A plot such as this tin be useful in visualizing an interaction and providing some sense of how strong information technology is. This is a very strong interaction as the lines are most perpendicular. An interaction where the lines cross is sometimes called an "interference" or "combative" interaction upshot.

Boxplots can exist also be useful in detecting and visualzing interactions. Beneath we use the formula note to specify that "resp" be plotted past the interaction of gender and trt. That'southward what the asterisk ways in formula note.

boxplot(resp ~ gender * trt, information=dat)

int02

By interacting ii ii-level variables we basically get a new four-level variable. We see one time again that the consequence of trt flips depending on gender.

A mutual method for analyzing the effect of categorical variables on a continuous response variable is the Analysis of Variance, or ANOVA. In R we can do this with the aov function. Once over again we utilise the formula notation to specify the model. Below information technology says "model response as a part of gender, treatment and the interaction of gender and treatment."

aov1 <- aov(resp ~ trt*gender, information=dat) summary(aov1)  ##             Df Sum Sq Mean Sq F value Pr(>F)      ## trt          1   0.34    0.34   0.415  0.524     ## gender       1   0.39    0.39   0.468  0.499     ## trt:gender   1 256.94  256.94 309.548 <2e-16 *** ## Residuals   36  29.88    0.83                    ## --- ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' i

The main effects by themselves are non significant but the interaction is. This makes sense given our aggregated means higher up. We saw that the hateful response was virtually no different based on gender or trt alone, but did vary substantially when both variables were combined. We can extract the aforementioned information from our aov1 object using the model.tables function, which reports the thousand hateful, the means past primary effects, and the means by the interaction:

model.tables(aov1, blazon = "ways")  ## Tables of means ## One thousand mean ##           ## 12.59203  ##  ##  trt  ## trt ##     no    yes  ## 12.685 12.499  ##  ##  gender  ## gender ##   male person female  ## 12.691 12.494  ##  ##  trt:gender  ##      gender ## trt   male   female person ##   no  10.249 15.121 ##   yes 15.132  ix.866

We can also fit a linear model to these information using the lm function:

lm1 <- lm(resp ~ trt*gender, data=dat) summary(lm1) ##  ## Call: ## lm(formula = resp ~ trt * gender, data = dat) ##  ## Residuals: ##     Min      1Q  Median      3Q     Max  ## -2.4635 -0.4570  0.1093  0.6152  1.4631  ##  ## Coefficients: ##                     Judge Std. Error t value Pr(>|t|)   ## (Intercept)          x.2488     0.2881   35.57  < 2e-xvi *** ## trtyes                4.8834     0.4074   xi.98 3.98e-14 *** ## genderfemale          4.8719     0.4074   11.96 4.26e-14 *** ## trtyes:genderfemale -x.1378     0.5762  -17.59  < 2e-xvi *** ## --- ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.i ' ' 1 ##  ## Residual standard mistake: 0.9111 on 36 degrees of freedom ## Multiple R-squared:  0.8961, Adapted R-squared:  0.8874  ## F-statistic: 103.five on iii and 36 DF,  p-value: < ii.2e-16

This returns a table of coefficients. (Incidentally we can get these aforementioned coefficients from the aov1 object by using coef(aov1).) Notice everything is "significant". This just ways the coefficients are significantly different from 0. It does not contradict the ANOVA results. Nor does it mean the main effects are significant. If nosotros want a test for the significance of master effects, we tin utilize anova(lm1), which outputs the same anova table that aov created.

The intercept in the linear model output is simply the mean response for gender="male" and trt="no". (Compare it to the model.tables output above.) The coefficient for "genderfemale" is what you add to the intercept to get the mean response for gender="female person" when trt="no". Likewise, The coefficient for "trtyes" is what you add to the intercept to get the hateful response for trt="yep" when gender="male person".

The remaining combination to estimate is gender="female" and trt="yes". For those settings, we add all the coefficients together to get the mean response for gender="female" when trt="yes". Because of this information technology'south difficult to interpret the coefficient for the interaction. What does -ten hateful exactly? In some sense, at to the lowest degree in this example, it basically offsets the main effects of gender and trt. If nosotros wait at the interaction plot again, nosotros see that trt="yes" and gender="female" has virtually the same mean response equally trt="no" and gender="male".

lm and aov both give the same results but show different summaries. In fact, aov is simply a wrapper for lm. The only reason to use aov is to create an aov object for use with functions such every bit model.tables.

Using the effects package we can create a formal interaction plot with standard error confined to indicate the doubtfulness in our estimates. Notice you lot can use it with either aov or lm objects.

library(effects) plot(allEffects(aov1), multiline=TRUE, ci.style="bars") # this does the same: plot(allEffects(lm1), multiline=TRUE, ci.way="bars")

int03
Some other type of interaction is one in which the variables combine to amplify an upshot. Permit's simulate some data to demonstrate. When simulating the response we establish a treatment effect for the first 20 observations by sampling 10 each from N(10,1) and N(13,1) distributions, respectively. Nosotros then amplify that effect past gender for the next 20 observations past sampling from Northward(25,1) and Northward(17,1) distributions, respectively.

set.seed(12) resp <- c(   rnorm(due north = 20, mean = rep(c(10,thirteen), each = 10)),   rnorm(n = twenty, mean = rep(c(25,17), each = 10)) )  dat2 <- data.frame(gender, trt, resp) interaction.plot(x.cistron = dat2$trt,                   trace.factor = dat2$gender,                  response = dat2$resp)

int04

In this interaction the lines depart. An interaction effect like this is sometimes called a "reinforcement" or "synergistic" interaction effect. We come across there's a difference between genders when trt="no", simply that divergence is reinforced when trt="aye" for each gender.

Running an ANOVA on these data reveal a significant interaction every bit we wait, simply notice the main furnishings are significant as well.

aov2 <- aov(resp ~ trt*gender, data = dat2) summary(aov2) ##             Df Sum Sq Mean Sq F value   Pr(>F)    ## trt          1   61.3    61.iii   77.36 1.71e-x *** ## gender       1  985.8   985.eight 1244.88  < 2e-16 *** ## trt:gender   i  330.three   330.three  417.09  < 2e-16 *** ## Residuals   36   28.v     0.8                      ## --- ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

That means the effects of gender and trt individually explain a fair corporeality of variability in the data. Nosotros can become a experience for this by looking at the mean response for each of these variables in add-on to the mean response by the interaction.

model.tables(aov2, type = "means")  ## Tables of ways ## Grand mean ##           ## sixteen.13323  ##  ##  trt  ## trt ##     no    yes  ## 14.896 17.371  ##  ##  gender  ## gender ##   male female person  ## xi.169 21.098  ##  ##  trt:gender  ##      gender ## trt   male   female person ##   no  12.805 16.987 ##   aye  9.533 25.209

Fitting a linear model provides a table of coefficients, but once over again it'south hard to interpret the interaction coefficient. As before the intercept is the hateful response for males with trt="no" while the other coefficients are what we add to the intercept to get the other three mean responses. And of course we tin make a formal interaction plot with fault bars.

lm2 <- lm(resp ~ trt*gender, data=dat2) summary(lm2) ##  ## Call: ## lm(formula = resp ~ trt * gender, data = dat2) ##  ## Residuals: ##      Min       1Q   Median       3Q      Max  ## -ane.53043 -0.50900 -0.08787  0.40449  2.08545  ##  ## Coefficients: ##                     Gauge Std. Error t value Pr(>|t|)     ## (Intercept)          12.8048     0.2814  45.503  < 2e-16 *** ## trtyes               -3.2720     0.3980  -8.222 8.81e-ten *** ## genderfemale          4.1818     0.3980  10.508 i.63e-12 *** ## trtyes:genderfemale  11.4942     0.5628  20.423  < 2e-xvi *** ## --- ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.one ' ' i ##  ## Residuum standard mistake: 0.8899 on 36 degrees of freedom ## Multiple R-squared:  0.9797, Adjusted R-squared:  0.978  ## F-statistic: 579.8 on iii and 36 DF,  p-value: < 2.2e-16  plot(allEffects(aov2), multiline=TRUE, ci.style="bars")

int05
What about data with no interaction? How does that look? Let's get-go simulate information technology. Notice how we generated the response. The means of the distribution change for each handling, but the difference between them does not change for each gender.

ready.seed(12) resp <- c(   rnorm(n = 20, mean = rep(c(x,15), each = x)),   rnorm(northward = twenty, mean = rep(c(12,17), each = 10)) )  dat3 <- data.frame(gender, trt, resp) interaction.plot(x.cistron = dat3$trt,                   trace.gene = dat3$gender,                  response = dat3$resp)

int06
The lines are basically parallel indicating the absence of an interaction upshot. The effect of trt does not depend gender. If we do an ANOVA, we see the interaction is not significant.

summary(aov(resp ~ trt*gender, data = dat3))  ##             Df Sum Sq Mean Sq F value   Pr(>F)     ## trt          one 252.50  252.fifty 318.850  < 2e-sixteen *** ## gender       1  58.99   58.99  74.497 two.72e-10 *** ## trt:gender   1   0.61    0.61   0.771    0.386     ## Residuals   36  28.51    0.79                      ## --- ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Of course statistical "significance" is only ane of several things to check. If your data gear up is large enough, even the smallest interaction will appear significant. That'southward how an interaction plot tin can help you determine if a statistically significant interaction is too meaningfully significant.

Interactions can also happen between a continuous and a categorical variable. Allow's come across what this looks by simulating some data. This fourth dimension we generate our response by using a linear model with some random noise from a Normal distribution and and so we plot the data using ggplot. Notice how we map the color of the dots to gender.

set.seed(2) gender <- sample(0:1, size = 40, replace = Truthful) x1 <- sort(runif(north = 40, min = 0, max = 4)) y <- 20 + 7*x1 + 10*gender - 5*gender*x1 + rnorm(twoscore,sd = 0.7) dat4 <- data.frame(gender = gene(gender, labels = c("male","female person")), x1, y)  library(ggplot2) ggplot(dat4, aes(x=x1, y=y, color=gender)) + geom_point()

int07

This looks a lot like our first interaction plot, except nosotros take scattered dots replacing lines. Equally the x1 variable increases, the response increases for both genders, just it increases much more dramatically for males. To clarify this data we utilize the Analysis of Covariance, or ANCOVA. In R this merely ways we use lm to fit the model. Because the scatter of points are intersecting by gender, we want to include an interaction.

lm3 <- lm(y ~ x1 * gender, data=dat4) summary(lm3) ##  ## Phone call: ## lm(formula = y ~ x1 * gender, data = dat4) ##  ## Residuals: ##      Min       1Q   Median       3Q      Max  ## -1.29713 -0.54586 -0.04462  0.50494  1.84173  ##  ## Coefficients: ##                 Gauge Std. Fault t value Pr(>|t|)     ## (Intercept)      20.7300     0.3539   58.58   <2e-sixteen *** ## x1                6.7247     0.1467   45.85   <2e-16 *** ## genderfemale      eight.7090     0.4746   xviii.35   <2e-16 *** ## x1:genderfemale  -four.5517     0.1973  -23.07   <2e-16 *** ## --- ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ##  ## Residual standard fault: 0.7681 on 36 degrees of freedom ## Multiple R-squared:  0.9852, Adjusted R-squared:  0.984  ## F-statistic:   800 on three and 36 DF,  p-value: < ii.2e-16

Unlike the previous linear models with 2 categorical predictors, the coefficients in this model have ready interpretations. If we remember of gender taking the value 0 for males and 1 for females, we see that the coefficients for the Intercept and x1 are the intercept and gradient for the best fitting line through the "male" scatterplot. Nosotros tin plot that as follows:

# save the coefficients into a vector cs <- coef(lm3) ggplot(dat4, aes(x=x1, y=y, color=gender)) + geom_point() +    geom_abline(intercept = cs[1], slope = cs[2])

int08
The female coefficient is what we add to the intercept when gender = 1 (ie, for females). Likewise, the interaction coefficient is what nosotros add to the x1 coefficient when gender = 1. Let's plot that line besides.

ggplot(dat4, aes(x=x1, y=y, color=gender)) + geom_point() +    geom_abline(intercept = cs[1], slope = cs[two]) +    geom_abline(intercept = cs[ane] + cs[three], slope = cs[2] + cs[4])

int09

The gender coefficient is the deviation in intercepts while the interaction coefficient is the difference in slopes. The erstwhile may not exist of much interest, but the latter is certainly important. It tells us that the trajectory for females is -4.v lower than males. ggplot volition really plot these lines for us with geom_smooth function and method='lm'

ggplot(dat4, aes(ten=x1, y=y, color=gender)) + geom_point() +    geom_smooth(method="lm")

int10

Or nosotros can utilize the effects bundle once again.

library(effects) plot(allEffects(lm3), multiline=TRUE, ci.fashion="bands")

int11

Information technology looks like an interaction plot! The deviation here is how incertitude is expressed. With chiselled variables the uncertainty is expressed as bars at the ends of the lines. With a continuous variable, the uncertainly is expressed as bands effectually the lines.

Interactions can get withal more complicated. Ii continuous variables can collaborate. Three variables can interact. You lot can have multiple two-way interactions. And and so on. Even though software makes information technology easy to fit lots of interactions, Kutner, et al. (2005) suggest keeping two things in mind when plumbing equipment models with interactions:

1. Adding interaction terms to a regression model can result in loftier multicollinearities. A partial remedy is to center the predictor variables.

2. When you have a large number of predictor variables, the potential number of interactions is large. Therefore it'southward desirable, if possible, to identify those interactions that most probable influence the response. I thing you can effort is plotting the residuals of a main-furnishings-merely model confronting unlike interaction terms to see which ones appear to be influential in affecting the response.

As an example of #1, run the following R code to see how centering the predictor variables reduces the variance inflation factors (VIF). A VIF in excess of x is usually taken as an indication that multicollinearity is influencing the model. Before centering, the VIF is well-nigh 60 for the main effects and 200 for the interaction. But later centering they fall well below 10.

# stackloss: sample data that come with R fit1 <- lm(stack.loss ~ Air.Period * Water.Temp , data = stackloss) library(faraway) # for vif office vif(fit1) # heart variables and re-check VIF fit2 <- lm(stack.loss ~ I(Air.Period - mean(Air.Flow)) * I(Water.Temp - mean(Water.Temp)) , information = stackloss) vif(fit2)

Every bit an case of #2, the following R code fits a main-effects-merely model and then plots the residuals against interactions. You'll notice that none appear to influence the response. There is no pattern in the plot. Hence we may decide not to model interactions.

fit3 <- lm(stack.loss ~ Air.Menstruum + Water.Temp + Acrid.Conc., data = stackloss) plot(stackloss$Air.Flow*stackloss$Water.Temp, residuals(fit3)) plot(stackloss$Air.Flow*stackloss$Acrid.Conc., residuals(fit3)) plot(stackloss$H2o.Temp*stackloss$Acid.Conc., residuals(fit3))

Reference: Kutner, et al. (2005). Practical Linear Statistical Models. McGraw-Colina. (Ch. 8)

For questions or clarifications regarding this article, contact the UVA Library StatLab: statlab@virginia.edu

View the unabridged collection of UVA Library StatLab articles.

Dirt Ford
Statistical Enquiry Consultant
University of Virginia Library
March 25, 2016