Session 8
MATH 80667A: Experimental Design and Statistical Methods
HEC Montréal
Covariate Explanatory measured before the experiment
Typically, cannot be acted upon.
Example
socioeconomic variables
environmental conditions
All ANOVA models covered so far are linear regression model.
The latter says that
E(Yiβp)average responsel=β0+β1X1i+⋯+βpXpilinear (i.e., additive) combination of explanatories
In an ANOVA, the model matrix X simply includes columns with −1, 0 and 1 for group indicators that enforce sum-to-zero constraints.
In experimental designs, the explanatories are
Random assignment implies
no systematic difference between groups.
Identify external sources of variations
These steps should in principle increase power if the variables used as control are correlated with the response.
Abstract of van Stekelenburg et al. (2021)
In three experiments with more than 1,500 U.S. adults who held false beliefs, participants first learned the value of scientific consensus and how to identify it. Subsequently, they read a news article with information about a scientific consensus opposing their beliefs. We found strong evidence that in the domain of genetically engineered food, this two-step communication strategy was more successful in correcting misperceptions than merely communicating scientific consensus.
Aart van Stekelenburg, Gabi Schaap, Harm Veling and Moniek Buijzen (2021), Boosting Understanding and Identification of Scientific Consensus Can Help to Correct False Beliefs, Psychological Science https://doi.org/10.1177/09567976211007788
We focus on a single experiment; preregistered exclusion criteria led to n=442 total sample size (unbalanced design).
Three experimental conditions:
Boost
Boost Plus
Consensus only (consensus
)
Use post
as response variable and prior
beliefs as a control variable in the analysis of covariance.
their response was measured on a visual analogue scale ranging from –100 (I am 100% certain this is false) to 100 (I am 100% certain this is true) with 0 (I don’t know) in the middle.
Average for the rth replication of the ith experimental group is E(postir)=μ+αiconditioni+βpriorir.Va(postir)=σ2
We assume that there is no interaction between condition
and prior
prior
are the same for each condition
group.Boost
and BoostPlus
) and control (consensus
)Boost
and BoostPlus
(pairwise)Inclusion of the prior
score leads to increased precision for the mean (reduces variability).
emmeans
package, the average of the covariate is used as value.condition
are the same for any value of prior
(parallel lines), but the uncertainty changes.Multiple testing adjustments:
library(emmeans)options(contrasts = c("contr.sum", "contr.poly"))data(SSVB21_S2, package = "hecedsm")# Check balancewith(SSVB21_S2, table(condition))
## condition## Boost BoostPlus consensus ## 149 147 146
library(ggplot2)ggplot(data = SSVB21_S2, aes(x = prior, y = post)) + geom_point() + geom_smooth(method = "lm", se = FALSE)
Strong correlation; note responses that achieve max of scale.
# Check that the data are well randomizedcar::Anova(lm(prior ~ condition, data = SSVB21_S2), type = 2)# Fit linear model with continuous covariatemodel1 <- lm(post ~ condition + prior, data = SSVB21_S2)# Fit model without for comparisonmodel2 <- lm(post ~ condition, data = SSVB21_S2)# Global test for differencescar::Anova(model1)car::Anova(model2)
term | sum of squares | df | statistic | p-value |
---|---|---|---|---|
condition | 14107 | 2 | 3.0 | 0.05 |
prior | 385385 | 1 | 166.1 | 0.00 |
Residuals | 1016461 | 438 |
term | sum of squares | df | statistic | p-value |
---|---|---|---|---|
condition | 11680 | 2 | 1.83 | 0.162 |
Residuals | 1401846 | 439 |
emm1 <- emmeans(model1, specs = "condition")# Note order: Boost, BoostPlus, consensusemm2 <- emmeans(model2, specs = "condition")# Not comparable: since one is detrended and the other isn'tcontrast_list <- list( "boost vs control" = c(0.5, 0.5, -1), #av. boosts vs consensus "Boost vs BoostPlus" = c(1, -1, 0))contrast(emm1, method = contrast_list, p.adjust = "holm")
contrast | estimate | se | df | t stat | p-value |
---|---|---|---|---|---|
boost vs control | -8.37 | 4.88 | 438 | -1.72 | 0.09 |
Boost vs BoostPlus | 9.95 | 5.60 | 438 | 1.78 | 0.08 |
prior
(Holm-Bonferroni adjustment with k=2 tests)
contrast | estimate | se | df | t stat | p-value |
---|---|---|---|---|---|
boost vs control | -5.71 | 5.71 | 439 | -1.00 | 0.32 |
Boost vs BoostPlus | 10.74 | 6.57 | 439 | 1.63 | 0.10 |
# Test equality of variancelevene <- car::leveneTest( resid(model1) ~ condition, data = SSVB21_S2, center = 'mean')# Equality of slopes (interaction)car::Anova(lm(post ~ condition * prior, data = SSVB21_S2), model1, type = 2)
Levene's test of equality of variance: F (2, 439) = 2.04 with a p-value of 0.131.
term | sum of squares | df | statistic | p-value |
---|---|---|---|---|
condition | 14107 | 2 | 3.0 | 0.05 |
prior | 385385 | 1 | 166.1 | 0.00 |
condition:prior | 3257 | 2 | 0.7 | 0.50 |
Residuals | 1016461 | 438 |
Model with interaction condition*prior
. Slopes don't differ between condition.
Should we control for more stuff?
NO! ANCOVA is a design to reduce error
Compare two nested models
Use anova
to compare the models in R.
A moderator W modifies the direction or strength of the effect of an explanatory variable X on a response Y (interaction term).
Interactions are not limited to experimental factors: we can also have interactions with confounders, explanatories, mediators, etc.
In a regression model, we simply include an interaction term to the model between W and X.
For example, if X is categorical with K levels and W is binary or continuous, imposing sum-to-zero constraints for α1,…,αK and β1,…,βK gives E(Y∣X=k,W=w)average response of group k at w=α0+αkintercept of group k+(β0+βk)slope of group kw
Test jointly whether coefficients associated to XW are zero, i.e., β1=⋯=βK=0.
The moderator W can be continuous or categorical with L≥2 levels
The degrees of freedom (additional parameters for the interaction) in the F test are
We consider data from Garcia et al. (2010), a study on gender discrimination. Participants were given a fictional file where a women was turned down promotion in favour of male colleague despite her being clearly more experimented and qualified.
The authors manipulated the decision of the participant, with choices:
sexism
, which assesses pervasiveness of gender discrimination.We fit the linear model with the interaction.
data(GSBE10, package = "hecedsm")lin_moder <- lm(respeval ~ protest*sexism, data = GSBE10)summary(lin_moder) # coefficientscar::Anova(lin_moder, type = 2) # tests
term | sum of squares | df | stat | p-value |
---|---|---|---|---|
sexism | 0.27 | 1 | 0.21 | .648 |
protest:sexism | 12.49 | 2 | 4.82 | .010 |
Residuals | 159.22 | 123 |
Results won't necessarily be reliable outside of the range of observed values of sexism.
Simple effects and comparisons must be done for a fixed value of sexism (since the slopes are not parallel).
The default value in emmeans
is the mean value of sexism
, but we could query for averages at different values of sexism (below for empirical quartiles).
quart <- quantile(GSBE10$sexism, probs = c(0.25, 0.5, 0.75))emmeans(lin_moder, specs = "protest", by = "sexism", at = list("sexism" = quart))
With moderating factors, give weights to each sub-mean corresponding to the frequency of the moderator rather than equal-weight to each category (weights = "prop"
).
The Johnson and Neyman (1936) method looks at the range of values of moderator W for which difference between treatments (binary X) is not statistically significant.
lin_moder2 <- lm( respeval ~ protest*sexism, data = GSBE10 |> # We dichotomize the manipulation, pooling protests together dplyr::mutate(protest = as.integer(protest != "no protest")))# Test for equality of slopes/intercept for two protest groupsanova(lin_moder, lin_moder2)# p-value of 0.18: fail to reject individual = collective.
jn <- interactions::johnson_neyman( model = lin_moder2, # linear model pred = protest, # binary experimental factor modx = sexism, # moderator control.fdr = TRUE, # control for false discovery rate mod.range = range(GSBE10$sexism)) # range of values for sexismjn$plot
More generally, moderation refers to any explanatory variable (whether continuous or categorical) which interacts with the experimental manipulation.
Keyboard shortcuts
↑, ←, Pg Up, k | Go to previous slide |
↓, →, Pg Dn, Space, j | Go to next slide |
Home | Go to first slide |
End | Go to last slide |
Number + Return | Go to specific slide |
b / m / f | Toggle blackout / mirrored / fullscreen mode |
c | Clone slideshow |
p | Toggle presenter mode |
t | Restart the presentation timer |
?, h | Toggle this help |
o | Tile View: Overview of Slides |
Esc | Back to slideshow |
Session 8
MATH 80667A: Experimental Design and Statistical Methods
HEC Montréal
Covariate Explanatory measured before the experiment
Typically, cannot be acted upon.
Example
socioeconomic variables
environmental conditions
All ANOVA models covered so far are linear regression model.
The latter says that
E(Yiβp)average responsel=β0+β1X1i+⋯+βpXpilinear (i.e., additive) combination of explanatories
In an ANOVA, the model matrix X simply includes columns with −1, 0 and 1 for group indicators that enforce sum-to-zero constraints.
In experimental designs, the explanatories are
Random assignment implies
no systematic difference between groups.
Identify external sources of variations
These steps should in principle increase power if the variables used as control are correlated with the response.
Abstract of van Stekelenburg et al. (2021)
In three experiments with more than 1,500 U.S. adults who held false beliefs, participants first learned the value of scientific consensus and how to identify it. Subsequently, they read a news article with information about a scientific consensus opposing their beliefs. We found strong evidence that in the domain of genetically engineered food, this two-step communication strategy was more successful in correcting misperceptions than merely communicating scientific consensus.
Aart van Stekelenburg, Gabi Schaap, Harm Veling and Moniek Buijzen (2021), Boosting Understanding and Identification of Scientific Consensus Can Help to Correct False Beliefs, Psychological Science https://doi.org/10.1177/09567976211007788
We focus on a single experiment; preregistered exclusion criteria led to n=442 total sample size (unbalanced design).
Three experimental conditions:
Boost
Boost Plus
Consensus only (consensus
)
Use post
as response variable and prior
beliefs as a control variable in the analysis of covariance.
their response was measured on a visual analogue scale ranging from –100 (I am 100% certain this is false) to 100 (I am 100% certain this is true) with 0 (I don’t know) in the middle.
Average for the rth replication of the ith experimental group is E(postir)=μ+αiconditioni+βpriorir.Va(postir)=σ2
We assume that there is no interaction between condition
and prior
prior
are the same for each condition
group.Boost
and BoostPlus
) and control (consensus
)Boost
and BoostPlus
(pairwise)Inclusion of the prior
score leads to increased precision for the mean (reduces variability).
emmeans
package, the average of the covariate is used as value.condition
are the same for any value of prior
(parallel lines), but the uncertainty changes.Multiple testing adjustments:
library(emmeans)options(contrasts = c("contr.sum", "contr.poly"))data(SSVB21_S2, package = "hecedsm")# Check balancewith(SSVB21_S2, table(condition))
## condition## Boost BoostPlus consensus ## 149 147 146
library(ggplot2)ggplot(data = SSVB21_S2, aes(x = prior, y = post)) + geom_point() + geom_smooth(method = "lm", se = FALSE)
Strong correlation; note responses that achieve max of scale.
# Check that the data are well randomizedcar::Anova(lm(prior ~ condition, data = SSVB21_S2), type = 2)# Fit linear model with continuous covariatemodel1 <- lm(post ~ condition + prior, data = SSVB21_S2)# Fit model without for comparisonmodel2 <- lm(post ~ condition, data = SSVB21_S2)# Global test for differencescar::Anova(model1)car::Anova(model2)
term | sum of squares | df | statistic | p-value |
---|---|---|---|---|
condition | 14107 | 2 | 3.0 | 0.05 |
prior | 385385 | 1 | 166.1 | 0.00 |
Residuals | 1016461 | 438 |
term | sum of squares | df | statistic | p-value |
---|---|---|---|---|
condition | 11680 | 2 | 1.83 | 0.162 |
Residuals | 1401846 | 439 |
emm1 <- emmeans(model1, specs = "condition")# Note order: Boost, BoostPlus, consensusemm2 <- emmeans(model2, specs = "condition")# Not comparable: since one is detrended and the other isn'tcontrast_list <- list( "boost vs control" = c(0.5, 0.5, -1), #av. boosts vs consensus "Boost vs BoostPlus" = c(1, -1, 0))contrast(emm1, method = contrast_list, p.adjust = "holm")
contrast | estimate | se | df | t stat | p-value |
---|---|---|---|---|---|
boost vs control | -8.37 | 4.88 | 438 | -1.72 | 0.09 |
Boost vs BoostPlus | 9.95 | 5.60 | 438 | 1.78 | 0.08 |
prior
(Holm-Bonferroni adjustment with k=2 tests)
contrast | estimate | se | df | t stat | p-value |
---|---|---|---|---|---|
boost vs control | -5.71 | 5.71 | 439 | -1.00 | 0.32 |
Boost vs BoostPlus | 10.74 | 6.57 | 439 | 1.63 | 0.10 |
# Test equality of variancelevene <- car::leveneTest( resid(model1) ~ condition, data = SSVB21_S2, center = 'mean')# Equality of slopes (interaction)car::Anova(lm(post ~ condition * prior, data = SSVB21_S2), model1, type = 2)
Levene's test of equality of variance: F (2, 439) = 2.04 with a p-value of 0.131.
term | sum of squares | df | statistic | p-value |
---|---|---|---|---|
condition | 14107 | 2 | 3.0 | 0.05 |
prior | 385385 | 1 | 166.1 | 0.00 |
condition:prior | 3257 | 2 | 0.7 | 0.50 |
Residuals | 1016461 | 438 |
Model with interaction condition*prior
. Slopes don't differ between condition.
Should we control for more stuff?
NO! ANCOVA is a design to reduce error
Compare two nested models
Use anova
to compare the models in R.
A moderator W modifies the direction or strength of the effect of an explanatory variable X on a response Y (interaction term).
Interactions are not limited to experimental factors: we can also have interactions with confounders, explanatories, mediators, etc.
In a regression model, we simply include an interaction term to the model between W and X.
For example, if X is categorical with K levels and W is binary or continuous, imposing sum-to-zero constraints for α1,…,αK and β1,…,βK gives E(Y∣X=k,W=w)average response of group k at w=α0+αkintercept of group k+(β0+βk)slope of group kw
Test jointly whether coefficients associated to XW are zero, i.e., β1=⋯=βK=0.
The moderator W can be continuous or categorical with L≥2 levels
The degrees of freedom (additional parameters for the interaction) in the F test are
We consider data from Garcia et al. (2010), a study on gender discrimination. Participants were given a fictional file where a women was turned down promotion in favour of male colleague despite her being clearly more experimented and qualified.
The authors manipulated the decision of the participant, with choices:
sexism
, which assesses pervasiveness of gender discrimination.We fit the linear model with the interaction.
data(GSBE10, package = "hecedsm")lin_moder <- lm(respeval ~ protest*sexism, data = GSBE10)summary(lin_moder) # coefficientscar::Anova(lin_moder, type = 2) # tests
term | sum of squares | df | stat | p-value |
---|---|---|---|---|
sexism | 0.27 | 1 | 0.21 | .648 |
protest:sexism | 12.49 | 2 | 4.82 | .010 |
Residuals | 159.22 | 123 |
Results won't necessarily be reliable outside of the range of observed values of sexism.
Simple effects and comparisons must be done for a fixed value of sexism (since the slopes are not parallel).
The default value in emmeans
is the mean value of sexism
, but we could query for averages at different values of sexism (below for empirical quartiles).
quart <- quantile(GSBE10$sexism, probs = c(0.25, 0.5, 0.75))emmeans(lin_moder, specs = "protest", by = "sexism", at = list("sexism" = quart))
With moderating factors, give weights to each sub-mean corresponding to the frequency of the moderator rather than equal-weight to each category (weights = "prop"
).
The Johnson and Neyman (1936) method looks at the range of values of moderator W for which difference between treatments (binary X) is not statistically significant.
lin_moder2 <- lm( respeval ~ protest*sexism, data = GSBE10 |> # We dichotomize the manipulation, pooling protests together dplyr::mutate(protest = as.integer(protest != "no protest")))# Test for equality of slopes/intercept for two protest groupsanova(lin_moder, lin_moder2)# p-value of 0.18: fail to reject individual = collective.
jn <- interactions::johnson_neyman( model = lin_moder2, # linear model pred = protest, # binary experimental factor modx = sexism, # moderator control.fdr = TRUE, # control for false discovery rate mod.range = range(GSBE10$sexism)) # range of values for sexismjn$plot
More generally, moderation refers to any explanatory variable (whether continuous or categorical) which interacts with the experimental manipulation.