Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
FINAL PROJECT
PAPER REPLICATION Learning objective The final project provides you the experience of implementing the methods and programming skills acquired during this course. After completing the project you should be able to apply SAS and/or Stata to develop an analytic file, including identifying the data files and variables required, transforming variables into the measures of interest, and assessing the accuracy of your work; run and present the results of your analysis; describe the methods used to develop and analyze the data; and assess peer reviewed papers’ clarity in describing their methods as well as note potential problems, if any, with the original analysis. Assignment description For the final project, you will execute your analytic plan as described in your Midterm and summarize your methods and results. You may use SAS and/or Stata to develop the data and run the analysis for the project. The final project submission will include: - A document with a methods, results, and discussion section (2-4 pages): o Methods: describe what you did, including how it differed from the paper and/or what parts of the methods were not clear from the original paper’s methods and how you handled them. You can refer to the paper for details, rather than spelling them all out here, but do describe the components and any modifications you made to the methods. o Results: these should reflect the paper’s tables, but the numbers should be taken from the results you produced. They may not be the same as those in the paper—if you’ve checked your work, this is OK. o Discussion: describe what the results mean. Evaluate how your results compare to the paper, including possible reasons why the results are different and how you would proceed to resolve substantial differences. Or if your results are very similar to those from the paper’s, what are the limitations of the analysis and how might those be addressed? o Please also include the citation for the paper. - The location of your programs and logs/listings showing the steps you executed from raw data to final results. Please include the program names and the order in which they were run. - You should upload the written portion and provide the location and order of your programs, logs, and listings. Steps 21. Ensure that you understand the author’s description of the methods. 2. Break down the methods into the measures that will be needed to produce the tables and figures you will reproduce. 3. Identify the data files and variables you will need. Find the files on the servers. 4. Process the source data to create an analytic file with the measures you identified before. It may be useful to focus on one measure or group of measures at a time as you pull variables from the original files and process them into the measures you will analyze. 5. Check your work – run tables to verify that you’ve recoded correctly, look at particular cases if they don’t make sense, and compare your N’s to the paper. The N’s may not (probably won’t) match exactly but they should be fairly close. Also check percentages and means that you can compare to the paper’s results. 6. For survey data sources: a. Don’t forget to recode missing values to be missing. b. For binary measures, e.g., yes/no, use 0 for no and 1 for yes. c. Set a 0/1 flag to indicate who is in your sample (=1) and who is not (=0). Include all observations, whether in your sample or not and use the weights properly, when running descriptive statistics or models. See the slide deck on Weighting from Week 11, and read the background information for the survey if the link is available. i. In SAS you will include a DOMAIN statement with your sample flag, e.g., DOMAIN insamp; as well as the CLUSTER, STRATA, and WEIGHT statements with the appropriate variables from the survey. If you need to run separate analyses by other groups, you can use additional variables on the DOMAIN statement, e.g., if running models by sex, DOMAIN insamp*sex; or DOMAIN insamp insamp*sex; to run the analysis for the whole sample and by sex. ii. In Stata you will use svyset to tell Stata what the cluster (or PSU) and strata variables are, and use the subpop option when you run svy: prefixed descriptive statistics or models, e.g., svy, subpop(insamp): [stata command]. If you need to run separate analysis by other groups, you can use the over() option, which allows you to list multiple variables. All combinations of values of multiple variables will be run separately. See subpopulation estimation for examples of subpop and over, alone and in combination. iii. You’ll only look at the results for those in your sample, e.g., where insamp=1. 7. For OPTUM based papers: a. If possible, identify your sample first, so when you create your measures you can reduce the size of your files by only looking at claims, labs, or confinements from people in the sample. b. Extract and develop measures from different claims data sources separately ,e.g., measures needed from medical claims, from drug claims, from labs, from confinements. 3Then merge results together by patid. You can do this in separate programs to reduce the run times of each. c. Don’t forget about enrollment. If a patid is not enrolled for a period of time, their information is missing, not zero. You may find it useful to run the continuous enrollment program we ran to prep those data for the CCW conditions package, particularly if you need to consider enrollment before and after. The pre_ and post_ variables in the yearly files produced provide the number of continuously enrolled months before and after each month of the year. d. Take advantage of the CCW conditions package if the paper controls for individual conditions. Don’t forget to include prior years of data to cover the reference periods. e. If analyzing costs, you will need to take into account differing reference years for STD_COST. There is an excel file in the Optum data folder that contains factors to shift the reference year to 2019. You can also use the factors to change the reference year to any other year. If you are unsure of how to do this please ask. 8. Plan your work so that you complete a core analysis, then go back and add or tweak variables or steps that you may have skipped. 9. In the written portion, use your own words to describe what you did. Organize the document to have sections on methods, results, and discussion. The methods section doesn’t need to include all the detail laid out in the paper; it can refer to the paper’s methods section, while describing where you had to modify or infer what the authors did, and how you addressed any issues. The results section should show the figures and tables that you replicated. In the discussion section, note differences and similarities to the paper’s results. Look for differences in sign and relative magnitude of coefficients, odds ratios, percentages, or significance. For example, if you have confidence intervals, do they overlap with those in the paper? 10. If you are stuck, ask questions, at office hours or through the Final Project Discussion Board. 11. Submit your project through Blackboard. Grading Your instructor will use the following rubric to grade this assignment. Programs 25 points - 15 for accuracy: Are the paper methods reflected in the programs, or, if not, is there an explanation for the change? Do the programs include steps that allow you to check that your variable derivations worked? - 5 for efficiency: Are there unnecessary reads and writes of files, and are variables and observations kept appropriately (especially for Optum)? 4- 5 for organization: Is the program easy to follow with comments for clarity when needed, and are variables and values labelled? Written summary 15 points (5 pts each methods, results, discussion) Methods: Are the methods described accurately and in your own words, that is, is it clear you understand the methods? Is the citation for the replicated paper included? Results: Are your results presented so they can easily be compared to the paper? Are they complete according to your analytic plan, with an explanation for parts not done? Discussion: Have you described the “take-aways” of the results? Do you address the differences between your results and those in the paper? Are there suggestions for further study or analysis? Copy and pasted text, largely verbatim Subtract up to 5 points from the appropriate written section Overall 40 points