Introduction to experimental design

Session 1

MATH 80667A: Experimental Design and Statistical Methods
HEC Montréal

1 / 30

Outline2 / 30

Outline

Class details

2 / 30

Outline

Class details

Motivation

2 / 30

Outline

Class details

Motivation

Review

2 / 30

Outline

Class details

Motivation

Review

Key concepts in experimental designs

2 / 30

Class details3 / 30

Course content

Content

Basics of experimental design
Statistical inference
Completely randomized designs
Analysis of variance
Blocked designs
Analysis of covariance
Intro to mixed models
Intro to causal inference
Linear mediation analysis

4 / 30

We will focus on simple designs as they lead to simple analysis. For more complicated schemes, consult an expert or find a collaborator.

A single introductory course in statistics does not make one an expert. The purpose is more numerical literacy then expert knowledge.

However, the assessment, activities and interdisciplinary skills are targeted for PhD students

We will spend a lot of time on ANOVA (one-way, two-way, multivariate, repeated measures, etc.)

Course content

Content

Basics of experimental design
Statistical inference
Completely randomized designs
Analysis of variance
Blocked designs
Analysis of covariance
Intro to mixed models
Intro to causal inference
Linear mediation analysis

Cross-disciplinary skills

Scientific workflow
Peer-review
Reporting
Statistical fallacies
Reproducibility

4 / 30

We will focus on simple designs as they lead to simple analysis. For more complicated schemes, consult an expert or find a collaborator.

A single introductory course in statistics does not make one an expert. The purpose is more numerical literacy then expert knowledge.

However, the assessment, activities and interdisciplinary skills are targeted for PhD students

We will spend a lot of time on ANOVA (one-way, two-way, multivariate, repeated measures, etc.)

The reproducibility crisis has changed the publishing landscape, so the requirements for publications are more stringent. I intend to cover these in details.

Statistical fallacies are common mistakes that discredit the validity of your work. Learning to recognize them in the wild is crucial (also as reviewer).

Prerequisites5 / 30

Prerequisites

Math skills

Basic algebra

5 / 30

Prerequisites

Math skills

Basic algebra

Computer science

None

5 / 30

Prerequisites

Math skills

Basic algebra

Computer science

None

Statistics

At the level of OpenIntro Statistics (Chapter 1)

5 / 30

More specifically for statistics:

variable types (continuous, discrete, etc.)
basic graphs (histograms, scatterplots)
hypothesis testing
differences in mean (e.g., t-test)
simple linear regression

The OpenIntro Statistics book can be freely downloaded from https://www.openintro.org/book/os/

Motivation6 / 30

History

Experiments on agricultural trials in Rothamsted ongoing since 1843

7 / 30

https://www.rothamsted.ac.uk/environmental-change-network

http://www.era.rothamsted.ac.uk/images/metadata/rbk1/2012-AJ-12-10.jpg

R.A. Fisher worked 14 years at Rothamsted from 1919 and developed much of the theory underlying experimental design See this recollection by Yates on his contribution: https://doi.org/10.2307/2528399

Fisher was a eugenist and his views are largely decried nowadays.

Modern experiments: A/B testing

Screenshot of Twitter page following changes made to interface in 2021 (the follow button is black for higher contrast).

8 / 30

This illustrates changes to the design (Follow now high-contrast in black). This is an example of A/B testing common in web design.

Evidence-based policy

RAND health insurance study

Student Teacher Achievement Ratio (STAR)

9 / 30

RAND: In a large-scale, multiyear experiment, participants who paid for a share of their health care used fewer health services than a comparison group given free care. It concluded that cost sharing reduced "inappropriate or unnecessary" medical care (overutilization), but also reduced "appropriate or needed" medical care. https://www.rand.org/health-care/projects/hie.html

Tennessee's STAR project: smaller class sizes lead to better outcomes "Over 7,000 students in 79 schools were randomly assigned into one of 3 interventions: small class (13 to 17 students per teacher), regular class (22 to 25 students per teacher), and regular-with-aide class (22 to 25 students with a full-time teacher's aide). Classroom teachers were also randomly assigned to the classes they would teach. The interventions were initiated as the students entered school in kindergarten and continued through third grade." https://dss.princeton.edu/catalog/resource1589

Nobel memorial prize

Composite photo by Andrew Heiss: excerpt from the Washington Post following the announcement of the Nobel prize in economics for experimental approach to solving poverty, Abdul Latif Jameel Poverty Action Lab (J-PAL) video and a Twitter feed with a photo of Abhijit Banerjee and Esther Duflo upon the announcement of the Nobel memorial price.

10 / 30

Review11 / 30

Population and sampling

Defining a target population

Sampling frame

Where to draw sample from

Sampling procedure

Randomness

12 / 30

Convenience samples and non-response bias

Illustration of flawed survey due to differential nonresponse.

13 / 30

Sampling scheme

Simple random sampling

Stratified sampling

Cluster sampling

Multi-stage sampling

14 / 30

Stratified sampling: select the same fraction (gender, ethnicity, etc.) Useful for oversampling rare categories Clustering sampling: villages, housing blocks, classrooms. Lower quality than stratified sampling, but cheaper

Combination of different sampling

Judging the quality of a sample

Summary statistics

Raw data

Pre-testing

15 / 30

Summary statistics: Reported to check representativeness of the sample relative to population.

Raw data: Used for reproducibility and to assess whether data is fraudulent.

Pre-testing: Check whether sampling allocation is sufficiently random.

U. Simonsohn, L. Nelson and J. Simmons. Evidence of Fraud in an Influential Field Experiment About Dishonesty, 2021, https://datacolada.org/98.

Experiments as gold-standard

Screenshot of a paper in BJOG by Hariton and Locascio

Randomised controlled trials (RCTs) are the reference standard for studying causal relationships between interventions and outcomes as randomisation eliminates much of the bias inherent with other study designs.

16 / 30

Paper available at https://doi.org/10.1111/1471-0528.15199

From the abstract: "Randomised controlled trials (RCTs) are the reference standard for studying causal relationships between interventions and outcomes as randomisation eliminates much of the bias inherent with other study designs."

Study type versus sampling

Random versus non-random assignment and sampling

17 / 30

Experimental versus observational

Figure from a New York Times on a RCT studying the effectiveness of wellness programs.

18 / 30

Figure from the Upshot https://www.nytimes.com/2018/08/06/upshot/employer-wellness-programs-randomized-trials.html

based on the paper

Damon Jones, David Molitor, Julian Reif, What do Workplace Wellness Programs do? Evidence from the Illinois Workplace Wellness Study, The Quarterly Journal of Economics, Volume 134, Issue 4, November 2019, Pages 1747–1791, https://doi.org/10.1093/qje/qjz023

Key concepts in experimental design19 / 30

Technical vocabulary

Experimental unit

Measurement unit

Factor

20 / 30

Impact of encouragement on teaching

From Davison (2008), Example 9.2

In an investigation on the teaching of arithmetic, 45 pupils were divided at random into five groups of nine. Groups A and B were taught in separate classes by the usual method. Groups C, D, and E were taught together for a number of days. On each day C were praised publicly for their work, D were publicly reproved and E were ignored. At the end of the period all pupils took a standard test.

21 / 30

Exercise

In pairs, identify

the experimental and measurement units
the factor levels
the response variable

03:00

22 / 30

Comparing groups (factor levels)Without any intervention, variability in output from one observation to the next.
Differences between groups are comparatively stable.

23 / 30

Choices in experimental designsfactor levels being compared
observations to be made (number of repetitions, etc.)
experimental units

24 / 30

Requirements for good experimentsAbsence of systematic error
Precision
Range of validity
Simplicity of the design

25 / 30

Absence of systematic errorAchieved via randomization
Controlling the environment

26 / 30

If we allocate observations non-randomly (for example timing of a course vs instructor), we cannot necessarily distinguish between effects

Controlled environment: reduce as much variability (room temperature, experimental apparatus, instructions, etc.)

Precision

27 / 30

Precisiondepends on the intrinsic variability 
function ofaccuracy of experimental work
number of experimental units / repetitions per unit
design and methods of analysis


28 / 30

Range of validity

What is population?
Identify restrictions
Extrapolation
- if proper random sampling scheme
- range of validity

29 / 30

Simplicity of the design

Simple designs lead to simple statistical analyses

30 / 30

Important: understand reasons for difference

do not limit to experimental knowledge about the differences

Introduction to experimental design

Session 1

MATH 80667A: Experimental Design and Statistical Methods
HEC Montréal

1 / 30

Outline2 / 30

Outline

Class details

2 / 30

Outline

Class details

Motivation

2 / 30

Outline

Class details

Motivation

Review

2 / 30

Outline

Class details

Motivation

Review

Key concepts in experimental designs

2 / 30

Class details3 / 30

Course content

Content

Basics of experimental design
Statistical inference
Completely randomized designs
Analysis of variance
Blocked designs
Analysis of covariance
Intro to mixed models
Intro to causal inference
Linear mediation analysis

4 / 30

We will focus on simple designs as they lead to simple analysis. For more complicated schemes, consult an expert or find a collaborator.

A single introductory course in statistics does not make one an expert. The purpose is more numerical literacy then expert knowledge.

However, the assessment, activities and interdisciplinary skills are targeted for PhD students

We will spend a lot of time on ANOVA (one-way, two-way, multivariate, repeated measures, etc.)

Course content

Content

Basics of experimental design
Statistical inference
Completely randomized designs
Analysis of variance
Blocked designs
Analysis of covariance
Intro to mixed models
Intro to causal inference
Linear mediation analysis

Cross-disciplinary skills

Scientific workflow
Peer-review
Reporting
Statistical fallacies
Reproducibility

4 / 30

We will focus on simple designs as they lead to simple analysis. For more complicated schemes, consult an expert or find a collaborator.

A single introductory course in statistics does not make one an expert. The purpose is more numerical literacy then expert knowledge.

However, the assessment, activities and interdisciplinary skills are targeted for PhD students

We will spend a lot of time on ANOVA (one-way, two-way, multivariate, repeated measures, etc.)

The reproducibility crisis has changed the publishing landscape, so the requirements for publications are more stringent. I intend to cover these in details.

Statistical fallacies are common mistakes that discredit the validity of your work. Learning to recognize them in the wild is crucial (also as reviewer).

Prerequisites5 / 30

Prerequisites

Math skills

Basic algebra

5 / 30

Prerequisites

Math skills

Basic algebra

Computer science

None

5 / 30

Prerequisites

Math skills

Basic algebra

Computer science

None

Statistics

At the level of OpenIntro Statistics (Chapter 1)

5 / 30

More specifically for statistics:

variable types (continuous, discrete, etc.)
basic graphs (histograms, scatterplots)
hypothesis testing
differences in mean (e.g., t-test)
simple linear regression

The OpenIntro Statistics book can be freely downloaded from https://www.openintro.org/book/os/

Motivation6 / 30

History

Experiments on agricultural trials in Rothamsted ongoing since 1843

7 / 30

https://www.rothamsted.ac.uk/environmental-change-network

http://www.era.rothamsted.ac.uk/images/metadata/rbk1/2012-AJ-12-10.jpg

R.A. Fisher worked 14 years at Rothamsted from 1919 and developed much of the theory underlying experimental design See this recollection by Yates on his contribution: https://doi.org/10.2307/2528399

Fisher was a eugenist and his views are largely decried nowadays.

Modern experiments: A/B testing

8 / 30

This illustrates changes to the design (Follow now high-contrast in black). This is an example of A/B testing common in web design.

Evidence-based policy

RAND health insurance study

Student Teacher Achievement Ratio (STAR)

9 / 30

Nobel memorial prize

10 / 30

Review11 / 30

Population and sampling

Defining a target population

Sampling frame

Where to draw sample from

Sampling procedure

Randomness

12 / 30

Convenience samples and non-response bias

13 / 30

Sampling scheme

Simple random sampling

Stratified sampling

Cluster sampling

Multi-stage sampling

14 / 30

Combination of different sampling

Judging the quality of a sample

Summary statistics

Raw data

Pre-testing

15 / 30

Summary statistics: Reported to check representativeness of the sample relative to population.

Raw data: Used for reproducibility and to assess whether data is fraudulent.

Pre-testing: Check whether sampling allocation is sufficiently random.

U. Simonsohn, L. Nelson and J. Simmons. Evidence of Fraud in an Influential Field Experiment About Dishonesty, 2021, https://datacolada.org/98.

Experiments as gold-standard

Randomised controlled trials (RCTs) are the reference standard for studying causal relationships between interventions and outcomes as randomisation eliminates much of the bias inherent with other study designs.

16 / 30

Paper available at https://doi.org/10.1111/1471-0528.15199

Study type versus sampling

17 / 30

Experimental versus observational

18 / 30

Figure from the Upshot https://www.nytimes.com/2018/08/06/upshot/employer-wellness-programs-randomized-trials.html

based on the paper

Key concepts in experimental design19 / 30

Technical vocabulary

Experimental unit

Measurement unit

Factor

20 / 30

Impact of encouragement on teaching

From Davison (2008), Example 9.2

In an investigation on the teaching of arithmetic, 45 pupils were divided at random into five groups of nine. Groups A and B were taught in separate classes by the usual method. Groups C, D, and E were taught together for a number of days. On each day C were praised publicly for their work, D were publicly reproved and E were ignored. At the end of the period all pupils took a standard test.

21 / 30

Exercise

In pairs, identify

the experimental and measurement units
the factor levels
the response variable

03:00

22 / 30

Comparing groups (factor levels)Without any intervention, variability in output from one observation to the next.
Differences between groups are comparatively stable.

23 / 30

Choices in experimental designsfactor levels being compared
observations to be made (number of repetitions, etc.)
experimental units

24 / 30

Requirements for good experimentsAbsence of systematic error
Precision
Range of validity
Simplicity of the design

25 / 30

Absence of systematic errorAchieved via randomization
Controlling the environment

26 / 30

If we allocate observations non-randomly (for example timing of a course vs instructor), we cannot necessarily distinguish between effects

Controlled environment: reduce as much variability (room temperature, experimental apparatus, instructions, etc.)

Precision

27 / 30

Precisiondepends on the intrinsic variability 
function ofaccuracy of experimental work
number of experimental units / repetitions per unit
design and methods of analysis


28 / 30

Range of validity

What is population?
Identify restrictions
Extrapolation
- if proper random sampling scheme
- range of validity

29 / 30

Simplicity of the design

Simple designs lead to simple statistical analyses

30 / 30

Important: understand reasons for difference

do not limit to experimental knowledge about the differences

↑, ←, Pg Up, k	Go to previous slide
↓, →, Pg Dn, Space, j	Go to next slide
Home	Go to first slide
End	Go to last slide
Number + Return	Go to specific slide
b / m / f	Toggle blackout / mirrored / fullscreen mode
c	Clone slideshow
p	Toggle presenter mode
t	Restart the presentation timer
?, h	Toggle this help
o	Tile View: Overview of Slides

Introduction to experimental design

Outline

Outline

Outline

Outline

Outline

Class details

Course content

Course content

Prerequisites

Prerequisites

Prerequisites

Prerequisites

Motivation

History

Modern experiments: A/B testing

Evidence-based policy

Nobel memorial prize

Review

Population and sampling

Convenience samples and non-response bias

Sampling scheme

Judging the quality of a sample

Experiments as gold-standard

Study type versus sampling

Experimental versus observational

Key concepts in experimental design

Technical vocabulary

Impact of encouragement on teaching

Comparing groups (factor levels)

Choices in experimental designs

Requirements for good experiments

Absence of systematic error

Precision

Precision

Range of validity

Simplicity of the design

Outline

Help

Introduction to experimental design

Introduction to experimental design

Outline

Outline

Outline

Outline

Outline

Class details

Course content

Course content

Prerequisites

Prerequisites

Prerequisites

Prerequisites

Motivation

History

Modern experiments: A/B testing

Evidence-based policy

Nobel memorial prize

Review

Population and sampling

Convenience samples and non-response bias

Sampling scheme

Judging the quality of a sample

Experiments as gold-standard

Study type versus sampling

Experimental versus observational

Key concepts in experimental design

Technical vocabulary

Impact of encouragement on teaching

Comparing groups (factor levels)

Choices in experimental designs

Requirements for good experiments

Absence of systematic error

Precision

Precision

Range of validity

Simplicity of the design