SIT742 Modern Data Science
1.3. WHAT TO SUBMIT? ⇒[SIT742]⇐
You are required to develop a data exploration report by completing the provided
Jupyter notebook to finish some required analysis, with the exploration data analytics
skills as well as visualization skills. Details requirements can be found in the provided
notebook, and you need follow the notebook requirements to complete the coding and
include the results into the report SIT742T1Report.pdf.
1.2.1 Data Exploration
For a data scientist, after obtaining the dataset, the first most crucial task is to obtain a
good understanding of the data he or she is dealing with. This includes: examining the
data attributes (or equivalently, data fields), seeing what they look like, what is the data
type for each field, and from this information, determining suitable numerical/visual
descriptions.
In this part of this assessment task, you need to complete the provided notebook coding
parts and finish the required analysis in the attributes such as ‘education’, ‘salary’ and
related demographic information (70%).
1.2.2 Text analysis
For the job advertisement data JobPostings.csv, you are required to write Python
code to remove the stop-words, and to extract the high frequency words used in job
advertisements.
After that, you can do one self-defined text analysis task to get insight into those
advertisement information (30%).
1.3 What to Submit?
Please familiarise yourself with the General Requirements (see Section 0.2) on Assignments
Submission. By the due date, you are required to submit the following files to the
corresponding Assignment (Dropbox) in CloudDeakin:
SIT742Task1.ipynb Your Jupyter notebook solution source file for the data exploration
of the data scientists related data. You can fill your name and Deakin ID information
at the relevant place in the first markdown cell.