Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
COMP3425 Data Mining
Assignment 1
Maximum marks 100
Minimum to pass
hurdle
30
Length
Maximum of 8 pages excluding cover sheet, bibliography and
appendices.
Layout
A4. At least 11 point type size. Use of typeface, margins and
headings consistent with a professional style.
Submission deadline 9 am, Monday 14 March
Submission mode Electronic, PDF via Wattle, file-name includes u-number
Estimated time 15 hours
Penalty for lateness 100% after the deadline has passed
First posted: 21 Feb, 9am
Last modified: 21 Feb, 9am
Questions to: Wattle Discussion Forum
This assignment specification may be updated to reflect clarifications and modifications
after it is first issued.
In this assignment, you are required to submit a single report comprising your answers to
set questions in the form of a single PDF file with a file-name that includes your University
u-number ID. The first page must have a clearly identified title and author, identified by
both name and university u-number. You may also attach supporting information
(appendices) in the same PDF file. Appendices will not be marked but may be treated as
supporting information to your answers.
This is a single-person assignment and should be completed on your own. Make certain you
carefully reference all the material that you use. Any material that you wish to quote must
have the source clearly referenced. It is unacceptable to present any portion of another
author's work as your own. Anyone found doing so will be penalised in marks. In addition,
CECS procedures for plagiarism will apply.
It is strongly suggested that you start working on the assignment right away. You can submit
as many times as you wish. Only the most recent submission at the due date will be
assessed.
Task
The Australian Computer Society Code of Professional Conduct 2014 is expected to be
applied by all Computing Professionals in Australia. It sets out six values but stresses the
primacy of the public interest as the overriding value. In 2017, the US Branch of the
Association for Computing Machinery (ACM), recognizing the ubiquity and far-reaching
impact of algorithms in daily lives, issued a Statement on Algorithmic Transparency and
Accountability incorporating seven Principles designed to address potential harmful social
discrimination due to bias. In 2018, the Australian Government Office of the Australian
Information Commissioner released the Guide to Data Analytics and the Australian Privacy
Principles (APP). These three documents are provided with this assignment specification.
You must also read the paper, Clarke R. (2018), “Guidelines for the Responsible Application
of Data Analytics” Computer Law & Security Review 34, 3 (Jul-Aug 2018), that is provided
with this assignment specification and hereafter referred to as the Guidelines. You must also
read the paper, Du, Liu and Hu, (2020) “Techniques for Interpretable Machine Learning”,
Communications of the ACM 63(1) that is also provided with the assignment.
You are to consider the application of the ACS code of conduct, the 7 US ACM Principles,
Clarke’s Guidelines and Du et al’s Techniques to the following fictitious ad targeting
scenario. You may also use the APP guide, where it is helpful.
Ad Targeting Scenario (from Clarke R. (2016) “Big Data, Big Risks”, Information Systems
Journal 26, 1 (January 2016) 77-90, PrePrint at http://www.rogerclarke.com/EC/BDBR.html
A social media service-provider accumulates a vast amount of social transaction data, and some
economic transaction data, through activity on its own sites and those of strategic partners. It applies
complex data analytics techniques to this data to infer attributes of individual digital personae. It
projects third-party ads and its own promotional materials based on the inferred attributes of online
identities and the characteristics of the material being projected.
The 'brute force' nature of the data consolidation and analysis means that no account is taken of the
incidence of partial identities, conflated identities, obfuscated identities, and imaginary, fanciful,
falsified and fraudulent profiles. This results in mis-placement of a significant proportion of ads, to
the detriment mostly of advertisers, but to some extent also of individual consumers. It is challenging
to conduct audits of ad-targeting effectiveness, and hence advertisers remain unaware of the low
quality of the data and of the inferences. This approach to business is undermined by inappropriate
content appearing on childrens' screens, and gambling and alcohol ads seen by partners in the
browser-windows of nominally reformed gamblers and drinkers.
You must answer the following questions, clearly indicating which question you are
answering within your submission. The page lengths suggested for each question here are
for guidance only; the given page length limit for the overall assignment is mandatory.
Question 1. (1 page) Consider the ACS code of conduct. For each of the six values, taking
account of any relevant sub-parts, discuss whether the value was demonstrated in the
scenario and to what extent. If you assess any value as largely irrelevant to the scenario,
then a very brief reason for this assessment is sufficient.
Question 2. (1/2 page) Consider the 7 US ACM Principles. Looking closely at Principle 1,
Awareness, discuss how this principle is applied (or not) in the scenario and identify any
“potential harm” that might have ensued.
Question 3. (2 pages) Consider the numbered guidelines in Table 2 of Clarke’s Guidelines
for the responsible application of data analytics. From every segment (1 General, 2 Data
Acquisition, 3 Data analysis, and 4 Use of the Inferences) choose one guideline that you
consider would have been applied in the scenario. Its application may not be explicit in the
scenario description, but it should be relevant and important to the scenario and you can
argue that it was applied properly and therefore did not contribute to the negative
consequences of the scenario. Explain its role in the scenario including how it would have
contributed to positive outcomes. Justify why it is more relevant than every one of the other
guidelines that you consider would have been applied in the same segment. Argue how it is
more or less relevant than any guidelines in the same segment that you consider may have
been disregarded in the scenario. Be careful to consider the intention of the guidelines
rather than an overly literal interpretation; you may rephrase the chosen guideline for the
scenario context where beneficial. For further explanation of this point, see Section 3 in
Clarke’s Guidelines.
Question 4. (1 page) (a) Choose one, numbered guideline (e.g. guideline 3.3) in Table 2 of
the Guidelines that you consider to have been disregarded in the scenario. You may choose
any guideline that you did not choose for Question 3. Discuss how the failure to consider
the guideline could have contributed to the negative outcome of the scenario. (b) In
addition, identify any other potential consequences that could have occurred due to the
failure to consider that same guideline. For this purpose, the consequences you identify are
not necessarily explicit within the scenario description. You might find it helpful to think of
this activity as contributing to a risk assessment process prior to your hypothetical
involvement in the analysis work of the scenario.
Question 5. (1 page) Consider the paper by Du et al, Techniques for Interpretable Machine
Learning. Discuss whether and how intrinsic and post-hoc interpretability techniques could
be applied to the scenario and what benefits could ensue.
General Comments
An abstract or executive summary is not required. A cover sheet is optional and does not
contribute to the page count. No particular layout is specified, but you should follow a
professional style and use no smaller than 11 point typeface and stay within the maximum
specified page count. Page margins, heading sizes, paragraph breaks and so forth are not
specified but a professional style must be maintained. Text beyond the page limit or word
count limit will be treated as non-existent. Appendices may be used and do not contribute
to the page count, but appendices might be only quickly scanned or used for reference and
will not be specifically marked.
You must properly attribute the source documents provided for your assignment (but not
this assignment specification itself) and any other reference materials you choose to use.
You are not required to use additional materials. No particular referencing style is
required. However, you are expected to reference conventionally, conveniently, and
consistently. Your references should be sufficient to unambiguously identify the source, to
describe the nature of the source, and also to retrieve the source in online and (if possible)
traditional publisher formats.
An assessment rubric is provided. The rubric will be used to mark your assignment. You are
advised to use it to supplement your understanding of what is expected for the
assignment and to direct your effort towards the most rewarding parts of the work.
Your assignment submission will be treated confidentially, but it will be available to ANU
staff involved in the course for the purposes of marking.
Assessment Rubric
This rubric will be used to mark your assignment. You are advised to use it to supplement your understanding of what is expected for the assignment and to
direct your effort towards the most rewarding parts of the work. Your assignment will be marked out of 100, and marks will be scaled back to contribute to
the defined weighting for assessment of the course.