Data, Insights and Decisions
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMM1190 Data, Insights and Decisions
Week 8: Research Design &
Experimentation I
General housekeeping for online lecture:
• Please turn on your camera for better engagement
• Please switch your microphone to mute to avoid disruption to the class
• Use your microphone or the chat channel to ask questions or make a comment
• Wait for your lecturer to start
610
Flexibility Week
COMM1190 Data, Insights & Decisions
32 4 5 8 97 101
Intro to
Analytics
Data Exploration
and Visualisation
I
Data Exploration
and Visualisation
II
Predictive
Analytics I
Predictive
Analytics II
Research Design
&
Experimentation
I
Research Design
&
Experimentation
II
Data Ethics Data Communication
x
This week (& next)
Research design & its importance for prescriptive analytics
Organisations need to answer what-if & evaluation type questions which involve casual questions
Experiments using randomised control trials (RCTs) are one way to obtain causal effects
Big data world associated with many opportunities to run experiments (A/B online experiments)
Role for non-experimental or observational data
Data availability provides opportunities to exploit natural experiments (quasi experiments)
Week 8 references
Accessible if interested
Fiebig, D.G. (2017), “Big data: Will it improve Patient-Centered Care?”,
The Patient: Patient–Centered Outcomes Research, 10, 133-139.
Haynes, L., Service, O., Goldacre, B. and Torgenson, B. (2012), “Test, Learn, Adapt: Developing public policy with Randomised Controlled Trials”, Cabinet Office, London.
Varian, H.R. (2016), “Causal inference in economics and marketing”,
PNAS, 113 (27) 7310-7315.
Referenced for completeness
Chattopadhyay, R. and Duflo, E. (2004), “Women as policy makers: Evidence from a randomized policy experiment in India”, Econometrica72, 1409-1443.
Research Design &
Experimentation: Introduction
Motivation
Correlation does not imply causation
Correlation coefficients measure the strength of an association between two variables
Recall spurious correlation examples in week 4 – obvious that these do not imply cause & effect
Other examples may be less obvious to dismiss – there is a correlation between advertising & sales but is it causal?
How do analysts generate evidence that one action will lead to a particular effect on some outcome of interest?
How are causal effects reliably estimated?
Introduction
Organisations require answers to questions & input into decision-making
Research designStrategy of how one addresses these questions by integrating all parts of the analysis to provide the opportunity to deliver answers
Constituent parts of data analysis
Subject matter theory to provide context
Appropriate data
Modelling approach that is appropriate for the data & has the
potential to deliver answers – this design question our focus
Introduction…
What-if & evaluation type questions are crucial for prescriptive analysis
Involves causal questions requiring estimates of causal effects
What if a change is made how will that effect future outcomes?
What impact did an intervention have? Was a policy change that was implemented effective?
Design particularly important for prescriptive analysis
Research design: Can causal question being asked be answered by available data & planned modelling approach?
Introduction…
Case study: Customer churning/retention problem
Descriptive: Is there a problem with customer churn?
Predictive: Which customers are at most risk of churning?
Prescriptive: Which customers are most likely to be
retained if offered incentives to stay? Once implemented
was the incentive cost effective?
Other examples of such questions
Q(a) Will it be profitable if on-line advertising is increased?
Q(b) Will a back-to-work intervention help people get a job?
Q(c) How much should homeowners living near to a chemical plant be compensated for a chemical spill?
Introduction…
Experiments are one way to obtain causal effects
Data deluge in part due to greater opportunities for organizations to collect data & run experiments
An online A/B experiment could address Q(a)
Q(b) would require a field experiment
Some causal questions not amenable to an experiment
Q(c) is such an example but may be able to get reliable answers from available observational data
Important design issues related to natural experiments
Introduction…
Experimental mindset
Partial equilibrium approach - complex questions broken up into tractable components
Power of randomisation to control for confounders
Test, learn, adapt cycle (evidence-based decision-making –as discussed in COMM1110)
Regression: Recall week 4
Regression as an analytical tool
Linear regression model
= 0 + 1 +
Useful descriptive device to capture bivariate relationships
Consider sales () & TV advertising ()(advertising.xlsx)
Q8.1 What key features of
the data are revealed by
the scatter plot?
0
10
20
30
Sa
le
s
in
th
ou
sa
nd
s
0 100 200 300
Advertising budget in thousands of dollars
sales Fitted values
Sales versus TV advertising
Regression as an analytical tool...
Have specified a model
Sales in a market is a linear function of TV advertising
OLS provides best fit conditional on this model
Fine as a descriptive device providing stylized facts
Provides evidence of positive correlation (linear association) between sales & advertising (̂1 = 0.048)
Prediction of sales for out-of-sample market?
� = 7.033 + 0.048
A market where = 100 → � = 11.833
Regression as an analytical tool...
But may want models to do even more – causality & “what-if” counterfactuals
What happens to sales in a particular market if TV advertising were increased?
Doesn’t our regression model reliably answer this question?
At least 2 threats to interpreting ̂1 as causal
Confounding variables leading to omitted variable bias
Is estimated advertising effect biased?
Maybe prices are varying across markets & these are correlated with advertising & hence effects of prices & advertising are confounded
Reverse causality
What if markets with low sales increase advertising?
Regression as an analytical tool...
Prediction models aim to minimize prediction inaccuracy
Interest is focussed on � not ̂1
A regression tree could be used provide accurate predictions, but not to address questions of causality
Questions about causality tend to be more difficult to answer than prediction/forecasting problems
Need a conceptual basis to guide our approach
Design issues become very important
Causality & Experimentation
Causality & notion of ceteris paribus
Causality as defined by philosopher David Lewis:
“Causation is something that makes a difference, and the
difference it makes must be a difference from what would
have happened without it“
In evaluating an intervention (or policy change) think of
counterfactual outcomes & what-if questions
Sales with & without the increase in advertising
In regression context want to define causal effect of on
How does variable change if is changed but all other
relevant factors are held constant
Requires (at a minimum) & to be unrelated
Causality & notion of ceteris paribus...
Multiple regression provides one approach to better estimate the causal effect of interest (say 1)
= 0 + 11 + 22 + ⋯+ +
Now more likely 1 & are uncorrelated
Estimate of 1 controls for other variables & better represents pure impact of 1 (have purged any indirect effect because of correlation with other variables)
An even better approach is to conduct an experiment
Randomised control trials (RCTs) suggested as gold standard
Important to define causal effect of interest & describe how an experiment would be designed to infer causal effect in question
Experiment 1
Impact of back-to-work program on employment
“If a person chosen from population of those looking for work is given access to a back-to-work program, will that increase their chance of employment?”
Implicit assumption: all other factors influencing employment (experience, ability, local employment prospects,...) are held fixed
Experiment:
Choose a group of workers looking for work
Randomly assign them to access the program or not
Compare employment outcomes in next period
Experiment works because characteristics of people are unrelated to whether they receive program or not
RCT evaluating back-to-work program
From Haynes et al. (2012)
Experiment 2
A/B testing of a website landing page
“If a business rearranges its current website, by how much will this change the conversion rate (new customers)?”
Implicit assumption: all other factors that influence who visits the website are held fixed
Experiment 2….
Experiment:
Design the new webpage
Randomly assign different users to old (A) & new (B) website
Compare conversion rates i.e. new customers
Experiment works because characteristics of users are unrelated to which website is seen
In online environments relatively easy to conduct
Experiment 3
Measuring returns to education
“If a person chosen from the population is given another year of education, by how much will his or her wage increase?”
Implicit assumption: all other factors that influence wages such as experience, family background, intelligence etc. are held fixed
Experiment:
Choose a group of people
Randomly assign different amounts of education to them!!!
Compare wage outcomes
Random assignment is infeasible in this case
Experiments are not always possible or ethical
Conducting RCTs
Decide on form of intervention (new program/new website versus status quo)
Flexibility in designing intervention as new program/website may involve several new features (week 8 workshop, Q1)
Determine outcome of interest (employment/conversion rates)
Decide on randomisation unit (workers/customers)
Determine sample size & randomly assign units to
treatment (new program/new website) & control (no program/old website) groups
Care required in this step (week 8 workshop, Q2)
Conducting RCTs…
Compare outcomes to determine treatment effect
Differences in outcomes can reasonably be attributable to treatment as other aspects of data controlled by researcher
Decide on whether to adapt (implement program/use new website) or not on basis of findings
See asynchronous lecture (Experiments in a Big Data World) for Case Study 2 & week 8 workshop, Q3
Experimental evidence