Computational Methods in Statistics
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
STA 4373 – Computational Methods in Statistics
STA 4373 Assignment 2
Instructions.
In this assignment you’ll analyze a COVID dataset and create a PDF of your results using the same Quarto
template I posted to Canvas. As before, when you turn in the file, the filename of the turn-in should be
last names separated by dashes and terminated with -2.pdf. For example, if Joe Shmo, Jane Doe, Mickey
Mouse worked together, they would turn in shmo-doe-mouse-2.pdf.
Again, you may use your text and work in groups of size up to three. Only one delegate of your team
will submit the resulting PDF on Canvas. The PDF should have the names of each of the collaborators on
top. The main advantage to working in a group is that you can bounce ideas off one another, and hopefully
uncover more interesting features of the data.
You may use the internet to access the text’s wepage, other websites directly linked in this document, and
other general-purpose data science in R questions. However, you may not read or use any analyses of this
or related datasets you find online. Failure to follow this rule may be considered a violation of this course’s
academic integrity policy. If you have any questions about this, please contact me.
Please put a new page break before each question so each question starts on its own page (this will
facilitate grading) and never provide output that runs over more than one page if you can help it. Be sure
to echo all your code!
The COVID19 pandemic in Texas.
The Texas Department of State Health Services (DSHS) is the primary municipal body in the state that
tracks the spread of the Covid-19 pandemic and makes information available to the public. To that end it
has two dashboards, one that monitors case counts, available here, and another that focuses on testing and
hospitalization, available here; these were setup in the early days and weeks of the pandemic shutdown in
March and April 2020.
1
Questions.
1. Read in the data ”Cases over Time by County” ("TexasCOVID-19NewCasesOverTimebyCounty.xlsx")
into a variable called new cases, but don’t clean it yet (that will come in the next steps). Then run
the code below to show you’ve succeeded.
Note: You may look at the file you have downloaded in another application, but do not edit it; all
manipulations of the file must be done in R.
Hint: Be sure to look at the whole dataset before reading it in. I encourage you to use readxl::cell limits()
with the ul and lr arguments to get the reading right.
new_cases |> select(1:5) |> glimpse()
# Rows: 254
# Columns: 5
# $ County "Anderson", "Andrews", "Angelina", "Aransas", "~
# $ ‘New Cases 03-04-2020‘ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
# $ ‘New Cases 03-05-2020‘ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
# $ ‘New Cases 03-06-2020‘ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
# $ ‘New Cases 03-07-2020‘ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,~
2. Clean the names to match the naming conventions listed below. Run the code below to show you’ve
succeeded.
new_cases |> select(1:5) |> glimpse()
# Rows: 254
# Columns: 5
# $ county "Anderson", "Andrews", "Angelina", "Aransas", "Archer", "~
# $ ‘03_04_2020‘ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
# $ ‘03_05_2020‘ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
# $ ‘03_06_2020‘ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
# $ ‘03_07_2020‘ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
3. Change all count columns to integers (instead of doubles). Run the code below to show you’ve suc-
ceeded.
new_cases |> select(1:5) |> glimpse()
# Rows: 254
# Columns: 5
# $ county "Anderson", "Andrews", "Angelina", "Aransas", "Archer", "~
# $ ‘03_04_2020‘ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
# $ ‘03_05_2020‘ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
# $ ‘03_06_2020‘ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
# $ ‘03_07_2020‘ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, ~
2
4. Reshape new cases to have columns date, county, cases, and convert the dates into date objects.
Run the code below to show you’ve succeeded.
new_cases
# # A tibble: 158,496 x 3
# county date new_cases
#
# 1 Anderson 2020-03-04 0
# 2 Anderson 2020-03-05 0
# 3 Anderson 2020-03-06 0
# 4 Anderson 2020-03-07 0
# 5 Anderson 2020-03-08 0
# 6 Anderson 2020-03-09 0
# 7 Anderson 2020-03-10 0
# 8 Anderson 2020-03-11 0
# 9 Anderson 2020-03-12 0
# 10 Anderson 2020-03-13 0
# # ... with 158,486 more rows