Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Project Details
You are free to select a period of time to analyse, as well as the type of licensed taxi
you wish to focus on, it is mandatory for you to work on a large scale of dataset (n ≥
100000). You are also free to select attributes you want to study. You are required to
analyse at least FIVE attributes (before feature selection) for this assignment. These
attributes are to be used as candidate features for model selection and/or parameter
tuning. Your report should explain and justify your selection decision. The first
stage of the project is to access and report the target data via descriptive statistics
for a group of selected attributes to characterise the data and make a clear research
goal. Following that, you should build at least ONE appropriate statistical model
to explain the relation between your attributes. You are expected to refine your
model (e.g. feature selection for supervised learning models or a suitable criterion
for optimal number of clusters), and evaluate the performance of your model (e.g.
classification error, MSE, SSE). You are also expected to highlight key findings based
on your results and note findings that you believe are important or unanticipated.
Report
Your report should be a maximum of 15 pages and cover at least the following items:
• Identify the research problem and attributes you want to study.
• Choose appropriate data and describe the procedures for processing and analysing
the data.
• Interpretation of results: Description of trends, comparison of groups, or relationships among your chosen attributes.
• Identify the most important attributes based on certain criterion and your
chosen response.
• Evaluate the performance of your model with an appropriate procedure.
• Make recommendations or prediction based on your results, or actions to be
taken in practice to further improve the performance.
Citation style
You are free to use any citation style such as APA, Harvard etc. Please ensure that
the name, year and title of publication is clearly stated in the reference page.
Assessment
Your report will be assessed according to the following checklist:
2
Research problem,
quality and clarity of report (4
marks)
Lists appropriate research goals succinctly (1m)
Quality writing, spell-checked, correct grammar, and comprehensible sentence structures (1m)
Identifies potential stakeholders, and explain how research is relevant to
stakeholders (1m)
Conclusion: provides recommendations for potential stakeholders based on
analysis of findings (1m)
Data and Attribute Selection
(2 marks)
Clearly states and justify data period (1m)
Clearly states and justify choice of five (or more) attributes to be analysed
(1m)
Use of an appropriate external dataset (Bonus: 2m)
Pre-processing
and Cleansing (3
marks)
Clearly states pre-processing and/or feature engineering steps (1m)
Clearly states data cleansing steps (1m)
Appropriate justification for pre-processing steps, as well as steps for handling missing data (1m)
3
Descriptive analysis (3 marks)
Appropriate choice of summary statistics and suitable graphical tool for presenting for each attribute (1m)
Investigate pairwise relationship between attributes (1m)
Clear description of each attribute based on summary statistics and appropriate plots (1m)
Modelling (6
marks)
No marks possible without any statistical modelling
Clearly specificies the statistical model, with appropriate use of equations
(1m)
State and check all model assumptions (1m)