Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
STATS 3DA3 Homework Assignment
Assignment Standards
Your assignment must conform to the Assignment Standards listed below.
• Write your name and student number on the title page. We will not grade assignments
without the title page.
• You may discuss homework problems with other students, but you have to prepare the written
assignments yourself.
• LATEXis strongly recommended but not strictly required.
• Eleven-point font (times or similar) must be used with 1.5 line spacing and margins of at
least 1~inch all around.
• Use newpage to write solution for each question (1, 2, 3).
• No screenshots are accepted for any reason.
• The writing and referencing should be appropriate to the undergradaute level.
1• Various tools, including publicly available internet tools, may be used by the instructor to
check the originality of submitted work.
• Assignment policy on the use of generative AI:
This includes work created by generative AI tools. Also state in the policy is the fol
lowing, “Contract Cheating is the act of”outsourcing of student work to third parties”
(Lancaster & Clarke, 2016, p. 639) with or without payment.” Using Generative AI tools
is a form of contract cheating. Charges of academic dishonesty will be brought forward
to the Office of Academic Integrity.
Question 1
Download the paper Data Science at the Singularity by David Donoho (2024) at paper. Fol
low the steps to find the most frequently used words and create a word cloud.
• (1) Reference where you obtained the original PDF document.
• (2) Read all PDF document pages and separate each line by \n.
• (3) Split the lines by \n.
• (4) Remove the lines before Abstract. ...... You can print the first few lines and find
the number of lines to remove.
• (5) Create a data frame with lines.
• (6) Tokenize each line and convert each word to a row.
• (7) Convert each word to lowercase.
• (8) Remove stopwords.
• (9) Remove any other words that are not suitable for the word cloud. For example, a single
letter word, symbols [ . , ) , abbreviation, etc.
• (10) Create a term-frequency data frame.
• (11) Produce a word cloud. You can decide on the most frequently used words in the world
cloud—for example, word cloud for the ten most frequently used words.
• (12) Write a summary paragraph (at least two statements) about your word cloud. The
summary should be cast in the context of your chosen text document.
Question 2
Question 2 uses Johns Hopkins GitHub data on the COVID-19 global vaccine administered to
develop a Shiny App.
Visit the website https://github.com/govex/COVID-19/tree/master/data_tables/vaccine
_data/global_data and read the description (readme.md).
3This question will lead to developing a Shiny app so that users can choose the date range to
investigate the COVID-19 vaccine administrated and the number of people for whom at least one
dose has been administered.
• (1) Read the CSV file of https://raw .githubusercontent .com/govex/COVID -19/
master/data_tables/vaccine_data/global_data/time_series_covid19_vaccine
_global .csv into Python. Read the data dictionary at https://github .com/
govex / COVID -19 / blob / master / data _tables / vaccine _data / global _data /
data_dictionary.csv.
• (2) Each row is uniquely defined by country and date in the data frame. What is the
dimension of the data?
• (3) Look at the data dictionary. Describe the Doses_admin and People at least one
dose administered variables.