ECON3389 Case Study and Course Project
Case Study and Course Project
Case Study: Tips & Tricks
Case study is NOT simply a summary of an article/blog post. Instead, it should be
a review of the chosen case based on the agenda of our course, i.e. the use of ML in
Economics.
Think about your case as a presentation done in front of you, and try to come up
with any and all questions you might ask during such presentation.
I You will likely not find answers to those questions in the original article/blog post, but you
can try answering those questions yourself.
Of course, if your case involves some ML models/techniques that we have not yet
covered in our class, make sure to very briefly explain what those are.
ECON3389 ML in Economics | Fall’20 Lecture 09: Case Study and Course Project
3/8
Course Project: Agenda
Course project is a full scale research project, and requires a substantial amount of
work to complete.
Unlike case study, project requires you to do all the work — find a dataset, for-
mulate research agenda, apply your knowledge of ML methods to build a reliable
inference/prediction model, and so on.
You will be tasked both with basic data analysis (summary statistics, visual plots),
statistical modeling (estimating a model using R), writing a research paper and cre-
ating a video presentation of your results.
Important: you should start working on the project as soon as possible, and keep
working on it on a regular basis. Rushing everything in the last couple of days will
likely produce inferior results.
ECON3389 ML in Economics | Fall’20 Lecture 09: Case Study and Course Project
4/8
Course Project: Data
Each group can choose any dataset for their project. It could be one of the four
datasets I suggest or any other dataset. Unlike case study, there is no restriction on
how many groups are using the same dataset.
The four datasets available through Canvas are:
I Iowa liquor sales.
I US baseball and basketball salaries.
I Personal income and socio-demographic attributes.
You are free to use any other datasets, as long as it has at least 1000 observations
across at least 10 variables, but you do need to confirm the chosen dataset with me
first.
I If using Kaggle, make sure to not fall into a trap of repeating someone’s steps from one of
Kaggle’s challenges.
ECON3389 ML in Economics | Fall’20 Lecture 09: Case Study and Course Project
5/8
Course Project: Research Question
Specifics of research questions depend on the nature of the data, but in general you
are required to do two things: build an inference/causal analysis model and build a
pure prediction model.
For both models you will need to choose the same outcome variable and use the rest
of the variables as your predictors (explanatory variables).
Inference model will likely be not too complicated — linear regression with a few
non-linear terms and/or interactions with factor variables.
Prediction model, however, can be as complex as you like — polynomial regression,
random forests, neural nets, etc.
ECON3389 ML in Economics | Fall’20 Lecture 09: Case Study and Course Project
6/8
Course Project: Building a Model
Even for relatively simple linear models you will need to make educated decisions
about which variables and in which form to include in the model’s equation.
I If you have factor variables, than the most flexible approach will require interacting all factor
variables with all non-factor ones, which may lead to hundreds of regressors.
I On the other hand, for inference model there may not be any meaningful interpretation for
inclusion of all possible combinations, and thus you will have to balance additional regressors
vs ZCM vs interpretability.
Generally speaking, whenever choosing between competing models, you should always
use the train/test split of your data.
I This is especially important for pure prediction model, where overfitting could be a major
issue.
ECON3389 ML in Economics | Fall’20 Lecture 09: Case Study and Course Project
7/8
Course Project: Tips & Tricks
Start working on course project ASAP. Anywhere between 50% and 80% of all the
work will be data management and analysis in R, and that is something prone to
being stuck with some issue for hours, if not days.
The earlier you start working, the more opportunities you will have to ask me ques-
tions/feedback.
Spread out the workload across all group members — one person can do general data
summary (tables, charts), another one work on best inference model and yet another
one on best prediction model.
Your video presentation should contain the bulk of your findings, but it is also some-
thing that I and your classmates will comment on, giving you a chance to fix any
spotted issues before submitting the final paper.
ECON3389 ML in Economics | Fall’20 Lecture 09: Case Study and Course Project