HAM503 Principles of Data Analytics
Principles of Data Analytics
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
HAM503 Principles of Data Analytics
Individual Assignment
Introduction
Students will be provided with a real-life problem and are required to develop a proposal (about
2500 words) by applying various data analytical tools and skills. Students must evaluate the
possible data models and data analytical tools for the proposed solutions by linking the
knowledge, principles, and concepts of data models, data analytics, data extraction, and data
visualisation.
Data
You can use the given data (1. Insurance dataset, 2. Douban movie) or your collected data for
analysis.
Insurance Dataset
Dataset 1 is a dataset from an insurance company. It is a vehicle insurance where customer
have to pay a premium of certain amount to insurance company every year so that the
insurance company will provide a compensation (called ‘sum assured’) to the customer if an
accident happens.
You have information about demographics (gender, age, region code type), Vehicles (Vehicle
Age, Damage), Policy (Premium, sourcing channel) etc.
Variable Definition
id Unique ID for the customer
Gender Gender of the customer
Age Age of the customer
Driving_License 0 : Customer does not have DL, 1 : Customer already has DL
Region_Code Unique code for the region of the customer
Previously_Insured
1 : Customer already has Vehicle Insurance, 0 : Customer doesn't have
Vehicle Insurance
Vehicle_Age Age of the Vehicle
Vehicle_Damage
1 : Customer got his/her vehicle damaged in the past. 0 : Customer
didn't get his/her vehicle damaged in the past.
Variable Definition
Annual_Premium The amount customer needs to pay as premium in the year
PolicySalesChannel
Anonymized Code for the channel of outreaching to the customer ie.
Different Agents, Over Mail, Over Phone, In Person, etc.
Vintage Number of Days, Customer has been associated with the company
Douban Movie
Dataset 2 is a sample consumer reviews from Douban movie. The reviews are divided into
three categories following the scored stars when reviewers write these reviews.
Variable Definition
id ID of the review
Movie_name Name of the movie.
Score Overall score of the movie
Review_people # of people who review this movie
Star_distribution Overall star distribution
Username The reviewer name
Date Reviewed date.
Star The reviewer scored the movie from 1-5 stars.
Comment The comment content.
Comment_Distribution All the comments distribution of positive(4 and 5 stars) middle(3
stars) and negative(1 and 2 stars).
Like Others likes for one comment.