COMM1190: DATA, INSIGHTS AND DECISIONS
DATA, INSIGHTS AND DECISIONS
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMM1190: DATA, INSIGHTS AND DECISIONS
FINAL EXAMINATION
1. Writing Time: 3 Hours.
2. Reading and Submission Allowance Time: 1 Hour.
3. This is an Online Open-Book Exam, your responses must be your original
work. You must attempt this Final Exam by yourself without any help from
others. Thus, you have NOT worked, collaborated or colluded with any other
persons in formulating your responses. The work that you are submitting for
your Final Exam is your OWN work.
4. Release date/time (via Moodle): Saturday, 3rd December 1:00pm (Australian
Eastern Time Zone)
5. Submission date/time (Via Turnitin): Saturday, 3rd December 5:00pm
(Australian Eastern Time Zone)
6. Failure to upload the exam by the submission time will result in a penalty of
15% of the available marks per hour of lateness.
7. This Examination Paper has 9 pages, including the cover page.
8. Total number of Questions: 3 Questions.
9. Answer all 3 Questions.
10. Total marks available: 100 marks. This examination is worth 50% of the total
marks for the course.
11. Questions are not of equal value. Marks available for question sub-parts are
shown on this examination paper.
12. Answers to questions are to be written in the exam answer sheet template
provided. Please ensure that you provide all details required on the cover
sheet of your Final Exam answer sheet.
13. Failure to submit exam answers with the correct exam answer sheet will result
in 10% penalty of your overall exam marks.
PAGE 2 OF 9
This Final Exam is an open book/open web, and further information is
available “Here”.
• You are permitted to refer to your course notes, any materials provided by
the course convenor or lecturer, books, journal articles, or tutorial
materials.
• It is sufficient to use in-text citations that include the following information:
the name of the author or authors; the year of publication; the page
number (where the information/idea can be located on a particular page
when directly quoted), For example, (McConville, 2011, p.188).
• You are required to cite your sources and attribute direct quotes
appropriately when using external sources (other than your course
materials).
• When citing Internet sources, please use the following format:
website/page title and date.
• If you provide in-text citations, you MUST provide a Reference List. The
Reference list will NOT BE counted towards your word limit.
16. Students are advised to read the Final Exam paper thoroughly before
commencing.
17. The Lecturer-in-Charge (LiC) / Exam Referee will be available online (via
Moodle) after the Open-book Exam paper is released for a period of one hour.
PAGE 3 OF 9
QUESTION 1 40 MARKS
This question consists of three parts – Part A, B and C. Answer all parts.
The following dataset relates quality of wine (scores between 0 and 10) to the
following input variables:
1 - fixed acidity
2 - volatile acidity
3 - citric acid
4 - residual sugar
5 - chlorides
6 - free sulfur dioxide
7 - total sulfur dioxide
8 - density
9 - pH
10 - sulphates
11 – alcohol
PART A. (16 Marks)
You are provided with the following output:
PAGE 4 OF 9
Please answer the following questions:
1. Which model has been fitted? Write down the mathematical equation for this model.
(2 marks)
2. Which variables are significant at 5% significance level in explaining the dependent
variable? Which test is used to test the significance of variables: formulate the null,
the alternative and the test statistic, and explain how the conclusion to reject or not
to reject is drawn? (4 marks)
3. Provide interpretation of the coefficients associated with variables citric acid and
alcohol. (2 marks)
4. Comment on the overall model fit referring to 2 statistics in the output and explain
the meaning of each statistic. (4 marks)
5. Explain why we need adjusted R-squared and cannot just rely on R-squared.
(2 marks)
6. If you decide to make a prediction for the quality of wine, would you keep the
proposed model or adjust it in some way? Explain your proposal. If a new model
should be used, write down the regression equation and provide an explanation
behind the choice of this model. (2 marks)
PART B. (10 Marks)
To simplify wine classification, we assign the following scores to poor, okay, and good
wines: Poor wines are those with scores of 3 or 4; okay wines are those with scores of
5 or 6; good wines are those with scores of 7 or 8.
Two models were fitted to the data:
Output 1:
PAGE 5 OF 9
Output 2:
Please answer the following questions:
1. Write down mathematic expressions for both models and briefly explain the right-
hand side and left-hand side of the equation. (5 marks)
2. Comment on the significance of variables in Model 1 and Model 2. (2 marks)
3. Which model performs better and why? Which criteria have you used to determine
the quality of fit? Explain. (3 marks)
PAGE 6 OF 9
PART C. (14 Marks)
Two different classification trees were fitted to the data, where we only aim to identify
good wine (i.e. those wines scoring 7 or 8) and the following output was produced:
Tree 1:
PAGE 7 OF 9
Tree 2:
1. What do the decimal numbers and percentage numbers in each node represent?
Provide a brief explanation (you can use the first split of a tree branch for your
explanation). (4 marks)
2. What are the most influential variables characterising the quality of wine?
(1 mark)
3. Consider the following confusion matrix:
PAGE 8 OF 9
Comment on the quality of the prediction and compare the results from the trees.
Please calculate all possible accuracy rates to facilitate your explanation. (7 marks)
4. Provide two facts to demonstrate the consistency of results from the classification
tree above with the findings reported in Parts A and B. (2 marks)
QUESTION 2 25 MARKS
Consider the following mini-case:
SOSText (fictionalised name) is a non-profit that offers support via text messages for
people who are going through mental health crises. The support is primarily driven by
a network of volunteers servicing communities in major cities. For years, the non-profit
had been collecting a database of messages exchanged and used the data to triage
the incoming calls for help and to create training modules to help train its volunteers to
better manage difficult conversations with people in great distress. In a 2020 report,
the non-profit (which first launched in 2013) stated that “by implementing data science
tools and machine learning from day one, we have created the largest mental health
dataset in the world.” A report section titled “Data Philosophy” added that they share
data to support more innovative research, policy, and community organising. Unlike
other large-scale datasets on mental health and crisis, the organisation claimed its
data had incredible volume, velocity, and variety. However, the positive impact of
SOSText remained constrained as financial reasons limited their ability to scale their
offerings to more people in different contexts. To overcome this issue, one of the
founders of SOSText decided to launch a for-profit spin-off called SmartAssist.ai. This
new service planned to use SOSText data (which it said was anonymised) to gain
insights that would be incorporated into building new training modules for people
recruited by the new spin-off. SOSText and SmartAssist.ai created a data-sharing
agreement that allowed controlled access to data solely to build models for training
that would improve mental health more broadly to reach a broader and more diverse
population of people in distress. SmartAssist.ai would also share a portion of the profits
from that software with SOSText to help sustain the operations of the volunteer-driven
organisation. SmartAssist.ai also wants to build predictive tools to help assistors
identify people who might be at risk of harming themselves to engage in the proactive
intervention.
Based on the case above, answer the following questions:
a) Identify and discuss TWO (2) ethical concerns presented in the case. Analyse
the concerns through the lens of the ethical theories covered in the course. What
aspects of the ethical landscape do they highlight? (15 marks)
b) Identify and discuss at least TWO (2) data ethics dimensions discussed in the
course relevant to the above case. How might the risks associated with these
dimensions be mitigated? (10 marks)
PAGE 9 OF 9
QUESTION 3 35 MARKS
Consider the following:
You are part of a policy team in the Australian government tasked with examining how
to reduce the gender-pay-gap between men and women. One policy that has been
suggested is to improve the representation of women on board of Australian public
companies.
The minister is particularly interested in overseas experience and the team discovers
that in 2003 Norway passed a law mandating 40 percent representation of each gender
on the board of public limited liability companies.
Your team is tasked with figuring out whether a board quota is an effective policy for
reducing the gender pay gap. Answer all of the following:
a) One member of your team suggests looking at the gender-pay-gap in Norway
in 2002 and 2004 to determine the causal effect of the policy. Is this a good
strategy? Why or why not. (10 marks)
b) Another team member suggests comparing teenage driving accidents in
Norway and the United States (which did not enact such a policy) in 2008. Is
this a good strategy? Why or why not. (10 marks)
c) A third team member suggests calculating changes in the gender-pay-gap in
Norway between 2002 and 2004 and then comparing those to the changes in
the gender-pay-gap in another country between 2002 and 2004. Is this a good
strategy? Why or why not. (15 marks)