Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MKTG6010 Assignment
Data Case Analysis (1,000 words)
With the proliferation of the Web 2.0 technologies product, ‘experiences’ can be
conveniently shared on the internet which can reduce information uncertainty for
decision making. Online comments reflect others’ product experience and therefore
help in making a purchase decision. Information gathered via word-of-mouth
(WOM) significantly influences product evaluations and purchase decisions.
eWOM(online WOM) has shifted the power to the consumer. A staggering 92
percent of consumers around the world say they trust earned media, such as
recommendations from friends and family, above all other forms of advertising.
Therefore, sometimes it does not matter how effective your campaign is because a
bad review through the internet can destroy it quickly. Hence, it is important to
understand the experiences of the consumer, both negative as well as positive to be
able to capture the complete picture, adopt the right marketing strategy, and
improve their experience in the future.
A fundamental objective of the motion picture industry has been to understand the
overall experience of moviegoers/ spectators/audience and consequently derive
better financial remuneration from its theatrical exhibition. Classical film theorists
conceived the spectator as a passive participant in receiving the film as a mediated
message. However, there is a substantial transformation in understanding the
moviegoers’ satisfaction from mere spectators watching films to “experiencing” the
film. Considering that the ability of a film to provide a memorable experience
colored with emotions, affects and fantasies is dependent on the pleasures the
movie offers, the desires it elicits and most importantly the motivations behind the
viewer’s watching of the movie, the key to understand this experience is to
understand the nature of the film spectator’s response to the film. Consumers
including moviegoers want experiences which provide a novel and creative escape
from everyday life. The film provides such opportunity through the vicarious
experiences it provides, thus making an indelible impression on their memories and
intersecting with their lives in significant ways.
Suppose you are a movie producer and want to learn about what consumers have
been sharing online and how it will influence box office revenues. A data set from
IMDB has been collected for this purpose. You are only required to analyse and
interpret the data provided. This assessment will test your knowledge and ability in
analysing quantitative data by using a variety of methods learned in class.
You will be required to use various machine learning techniques to address the
following questions by analysing the data provided.
Q1: Use topic modelling to figure out what users are talking about on IMDB. Are the
topics for action the same as the ones for comedy? (40 marks)
Q2: Use sentiment analysis to estimate ratings of the top 2 topics for action and
comedy movies and interpret the results. (30 marks)
(Hint: You can apply sentiment analysis on the tweets highly relevant to each of the
topics separately. For example, the top 30% of the tweets which are highly relevant
to a topic.)
Page 2 of 3
Q3: Use regression analysis to figure out how the sentiment scores (you obtained
from Q2) for the top 2 topics for action and comedy movies influence the box office
revenue. What are the differences in results for action and comedy movies? What
are the managerial implications from the results? (30 marks)
You need to get yourselves familiar with the corresponding data. Please go through
the data description carefully. What information has been collected in the data?
What does each variable (column) represent? You may need to clean up the data
before actual analysis.
You’re free to use other necessary techniques learned in/outside this course (e.g.,
descriptive statistics, tabulation). For each question, you will need to specify what
test(s) was (were) used and what information from the survey e.g., variable(s) was
used?
Data Description
The data file is Exam_data.csv. It contains:
movie: movie name
imdbid: unique IMDB ID
review_post_date: the data that a review was posted on IMDB
review: review on IMDB
rating: rating on IMDB (10 point scale-max: 10; min:1)
user_name: the name of the user who posted this review
num_helpful: the number of yes votes for helpfulness on IMDB
num_helpful; the total number of votes for helpfulness on IMDB (i.e., yes + no
votes)
box_office_revenue: the total sales ($) of the movie
movie_distributor: movie distributor
budget: movie budget ($)
release_date: movie release date
close_date: movie close date
mpaa: movie ratings by Motion Picture Association (i.e., G: General audiences – All
ages admitted; PG: Parental guidance suggested – Some material may not be
suitable for children; PG-13: Parents strongly cautioned – Some material may be
inappropriate for children under 13; R: Restricted – Under 17 requires
accompanying parent or adult guardian.)
genre: the movie genre (i.e., Action, Comedy, Drama, Fantasy, and Horror)
max_screens: the maximum number of screens shown on for this movie
Page 3 of 3
Submission Instructions:
1. Your answers are to be typed with appropriate outputs shown in word
document.
2. Word limit: 1,000 words (excluding appendices). This limit is a soft limit;
meaning you can go somewhat beyond this limit without grading penalty.
But remember the more you write, the more contribution and insight the
grader will expect to see.
3. You need to submit your answers in word document (or PDF) as well as the
python codes and output (.ipynb file).
Marking Criteria:
Marks awarded will be based on the following:
1. Good understanding of issues (questions) you want to address.
2. The appropriate application of ML techniques using the “right”
measures/variables.
3. Provision of appropriate analysis citing relevant Python outputs as evidences
to your findings.
4. Ability to apply and communicate results in context of the research
questions/issues highlighted.
5. Overall professional presentation of written work; e.g. Layout, grammar,
integration of results & findings, clarity of recommendations etc.
Note that you will be provided with a different dataset depending on your
SID. Please check your SID and the last digit of SID, and download and use
an appropriate dataset. Don’t try to collaborate with other students as
it is an individual assignment involving different datasets.