Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CSE 111 Project 1
Plotting and reasoning
Description:
In this mini project, we will be using python/jupyter notebook to explore public datasets.
Here is our standard data analysis pipeline:
Standard data analysis contains 5 steps. Due to limited time, the step 2 “Get data” and step 4
“Build a model” are considered conditional extra points in our project.
You have two options:
1. Work on the preset topic and use the dataset provided. (No bonus points)
2. Work on a custom topic and get your own dataset. (Bonus points if extra work other than
just downloading the dataset)
Instructions:
Step 0:
Install jupyter notebook and create an empty .ipynb file.
Step 1:
Think about what you would like to know first. Ask an interesting question based on the topic.
You need to write a brief summary on why the answer to the question is important.
For example, an interesting question can be “How does the number of tests per day affect the
curve?” the reason why this question is important can be that we want to know the tradeoff
between the cost of tests and the accuracy of the curve so we can make decisions to reduce the
cost of tests potentially.
You have to pick your own question. However, if your question is too trivial, you will get a poor
score.
Step 2:
Perform data acquisition. Think about what data is required to answer the question. If you go
with our covid 19 topic then you can just download the csv however you will still need to perform
data cleaning steps if necessary. Based on the data, you might need to refine your question in
the previous step.
Save the cleaned dataframe to a new .csv.
(Extra points) If you have your own topic then you have to show how you acquire the data, for
instance, webpage scraping, rss, api etc..
Step 3:
Perform Exploratory Data Analysis. You will need to explore the data and your exploration
should answer your question. You may extract the information from the dataframe. You may
need to perform data aggregation. You may need to perform multiple graph drawings etc..
Step 4: (Extra points)
Train-test-validate a model to represent the data. Make predictions based on the model if your
question is about predictions. The model can be a very simple model.
Step 5:
Summarize your findings.
Submission is a zip which contains the .ipynb file,
the raw dataset and the cleaned dataset.