Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
FINAL PROJECT
PAPER REPLICATION
Learning objective
The final project provides you the experience of implementing the methods and programming skills
acquired during this course. After completing the project you should be able to apply SAS and/or Stata
to develop an analytic file, including identifying the data files and variables required, transforming
variables into the measures of interest, and assessing the accuracy of your work; run and present the
results of your analysis; describe the methods used to develop and analyze the data; and assess peer
reviewed papers’ clarity in describing their methods as well as note potential problems, if any, with the
original analysis.
Assignment description
For the final project, you will execute your analytic plan as described in your Midterm and summarize
your methods and results. You may use SAS and/or Stata to develop the data and run the analysis for
the project. The final project submission will include:
- A document with a methods, results, and discussion section (2-4 pages):
o Methods: describe what you did, including how it differed from the paper and/or what
parts of the methods were not clear from the original paper’s methods and how you
handled them. You can refer to the paper for details, rather than spelling them all out
here, but do describe the components and any modifications you made to the
methods.
o Results: these should reflect the paper’s tables, but the numbers should be taken from
the results you produced. They may not be the same as those in the paper—if you’ve
checked your work, this is OK.
o Discussion: describe what the results mean. Evaluate how your results compare to the
paper, including possible reasons why the results are different and how you would
proceed to resolve substantial differences. Or if your results are very similar to those
from the paper’s, what are the limitations of the analysis and how might those be
addressed?
o Please also include the citation for the paper.
- The location of your programs and logs/listings showing the steps you executed from raw data
to final results. Please include the program names and the order in which they were run.
- You should upload the written portion and provide the location and order of your programs,
logs, and listings.
Steps
21. Ensure that you understand the author’s description of the methods.
2. Break down the methods into the measures that will be needed to produce the tables and
figures you will reproduce.
3. Identify the data files and variables you will need. Find the files on the servers.
4. Process the source data to create an analytic file with the measures you identified before. It may
be useful to focus on one measure or group of measures at a time as you pull variables from the
original files and process them into the measures you will analyze.
5. Check your work – run tables to verify that you’ve recoded correctly, look at particular cases if
they don’t make sense, and compare your N’s to the paper. The N’s may not (probably won’t)
match exactly but they should be fairly close. Also check percentages and means that you can
compare to the paper’s results.
6. For survey data sources:
a. Don’t forget to recode missing values to be missing.
b. For binary measures, e.g., yes/no, use 0 for no and 1 for yes.
c. Set a 0/1 flag to indicate who is in your sample (=1) and who is not (=0). Include all
observations, whether in your sample or not and use the weights properly, when
running descriptive statistics or models. See the slide deck on Weighting from Week 11,
and read the background information for the survey if the link is available.
i. In SAS you will include a DOMAIN statement with your sample flag, e.g.,
DOMAIN insamp; as well as the CLUSTER, STRATA, and WEIGHT statements with
the appropriate variables from the survey. If you need to run separate analyses
by other groups, you can use additional variables on the DOMAIN statement,
e.g., if running models by sex, DOMAIN insamp*sex; or DOMAIN insamp
insamp*sex; to run the analysis for the whole sample and by sex.
ii. In Stata you will use svyset to tell Stata what the cluster (or PSU) and strata
variables are, and use the subpop option when you run svy: prefixed descriptive
statistics or models, e.g., svy, subpop(insamp): [stata command]. If you need to
run separate analysis by other groups, you can use the over() option, which
allows you to list multiple variables. All combinations of values of multiple
variables will be run separately. See subpopulation estimation for examples of
subpop and over, alone and in combination.
iii. You’ll only look at the results for those in your sample, e.g., where insamp=1.
7. For OPTUM based papers:
a. If possible, identify your sample first, so when you create your measures you can reduce
the size of your files by only looking at claims, labs, or confinements from people in the
sample.
b. Extract and develop measures from different claims data sources separately ,e.g.,
measures needed from medical claims, from drug claims, from labs, from confinements.
3Then merge results together by patid. You can do this in separate programs to reduce
the run times of each.
c. Don’t forget about enrollment. If a patid is not enrolled for a period of time, their
information is missing, not zero. You may find it useful to run the continuous
enrollment program we ran to prep those data for the CCW conditions package,
particularly if you need to consider enrollment before and after. The pre_ and post_
variables in the yearly files produced provide the number of continuously enrolled
months before and after each month of the year.
d. Take advantage of the CCW conditions package if the paper controls for individual
conditions. Don’t forget to include prior years of data to cover the reference periods.
e. If analyzing costs, you will need to take into account differing reference years for
STD_COST. There is an excel file in the Optum data folder that contains factors to shift
the reference year to 2019. You can also use the factors to change the reference year to
any other year. If you are unsure of how to do this please ask.
8. Plan your work so that you complete a core analysis, then go back and add or tweak variables or
steps that you may have skipped.
9. In the written portion, use your own words to describe what you did. Organize the document to
have sections on methods, results, and discussion. The methods section doesn’t need to include
all the detail laid out in the paper; it can refer to the paper’s methods section, while describing
where you had to modify or infer what the authors did, and how you addressed any issues. The
results section should show the figures and tables that you replicated. In the discussion section,
note differences and similarities to the paper’s results. Look for differences in sign and relative
magnitude of coefficients, odds ratios, percentages, or significance. For example, if you have
confidence intervals, do they overlap with those in the paper?
10. If you are stuck, ask questions, at office hours or through the Final Project Discussion Board.
11. Submit your project through Blackboard.
Grading
Your instructor will use the following rubric to grade this assignment.
Programs 25 points
- 15 for accuracy: Are the paper methods reflected in the
programs, or, if not, is there an explanation for the
change? Do the programs include steps that allow you
to check that your variable derivations worked?
- 5 for efficiency: Are there unnecessary reads and writes
of files, and are variables and observations kept
appropriately (especially for Optum)?
4- 5 for organization: Is the program easy to follow with
comments for clarity when needed, and are variables
and values labelled?
Written summary 15 points (5 pts each methods, results, discussion)
Methods: Are the methods described accurately and in your own
words, that is, is it clear you understand the methods? Is the
citation for the replicated paper included?
Results: Are your results presented so they can easily be
compared to the paper? Are they complete according to your
analytic plan, with an explanation for parts not done?
Discussion: Have you described the “take-aways” of the results?
Do you address the differences between your results and those
in the paper? Are there suggestions for further study or analysis?
Copy and pasted text, largely
verbatim
Subtract up to 5 points from the appropriate written section
Overall 40 points