Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Database Systems INFO20003
Assignment 2: SQL
Due: 6:00pm Friday, 28 April 2023
Weighting: 10% of your total assessment
Research Papers Database
Description
You and a group of fellow undergrads have created a start-up like Google Scholar called ‘newScholar’.
newScholar is a free accessible web search engine that provides a broad range of information on
scholarly publications. It also provides information on researchers and relations among scholarly
publications and researchers.
For each researcher, newScholar records the researcher’s details such as first name, last name, and
one email address. Each researcher can also be associated with a few keywords representing their
‘research area’, such as Databases, Machine learning, Psychology, Medicine, etc. For each researcher,
newScholar maintains their list of publications. The researchers who author a publication together are
called ‘co-authors’.
For each publication, newScholar stores its title, date of publication, start page number (e.g., 475), end
page number (e.g., 500), and a list of authors (there can be multiple authors of a publication, where
each author is a researcher). Each publication can also be associated with a few keywords representing
its ‘research area’, such as Databases, Machine learning, Psychology, Medicine, etc. NewScholar
database will not store the actual publications, but rather a link to the document objects. Each publication
is linked to one document object. For each document object, newScholar stores the URL link and the
document size in KB. Each publication has a list of references (i.e., it “cites” other publications), where
each reference is another publication. If publication A is in the reference list of another publication B,
then A is “cited by” B. Figure 1 shows an example publication with its basic information and its list of
references.
NewScholar manually curates a list of top 10 publications every fortnight depending on the number of
citations of the publications. A publication can make it to top 10 more than once over time. For each
paper that is in the top 10 for a particular fortnight at a particular position, newScholar keeps a record of
the start date when a publication reached that position (beginning of the fortnight) and end date (end of
the fortnight). If a publication is again in the next fortnight’s top 10 (whether in the same position, or
different), a new row is recorded, with the start/end dates being the start/end dates of this new fortnight.
For example, the publication entitled “Learning to index” can be number 1 for the fortnight from January
1st - January 14th, 2022 but the rank drops to number 3 from January 15th - January 28th, 2022. This
would result in 2x rows, one with position 1 and one with position 3. Note that we do not need to know
the rationale why this is the current ranking.
Database Systems INFO20003 A2 S1 2023 Page 2 of 7
Fig 1: An example of publication with title, authors, and the list of references
The Data Model
Fig 2: The physical ER model of newScholar startup database.
Database Systems INFO20003 A2 S1 2023 Page 3 of 7
Assignment 2 Setup
A dataset is provided which you can use when developing your solutions. To set up the dataset,
download the file newScholar_2023.sql from the Assignment link on Canvas and run it in Workbench.
This script creates the database tables and populates them with data. Note that this dataset is provided
for you to experiment with: but it is not the same dataset as what your queries will be tested against (the
schema will stay the same, but the data itself may be different). This means when designing your queries
you must consider edge cases even if they are not represented in this particular data set.
The script is designed to run against your account on the Engineering IT server
(info20003db.eng.unimelb.edu.au). If you want to install the schema on your own MySQL Server
installation, uncomment the lines at the beginning of the script.
Note: Do NOT disable only_full_group_by mode when completing this assignment. This mode is
the default, and is turned on in all default installs of MySQL workbench. You can check whether it is
turned on using the command “SELECT @@sql_mode;”. The command should return a string
containing “ONLY_FULL_GROUP_BY” or “ANSI”. When testing, our test server WILL have this mode
turned on, and if your query fails due to this, you will lose marks.
The SQL tasks
In this section are listed 10 questions for you to answer. Write one (single) SQL statement per question.
Subqueries and nesting are allowed within a single SQL statement – however, you may be penalized
for writing overly complicated SQL statements.
DO NOT USE VIEWS (or ‘WITH’ statements/common table expressions) to answer questions.
Some clarifications:
1. If publication A has 5 citations, that means publication A has been cited by 5 other publications
(i.e., it does not mean that publication A cited 5 other publications).
2. A researcher’s total number of citations refers to the count of citations of all the publications
they’ve authored. If a publication has multiple citations, all the citations count towards the total
count. For instance, if a researcher has 3 publications; the first one is cited 2 times, the second
one 3 times, and the third one 5 times, we say that the researcher’s total number of citations is
10.
Database Systems INFO20003 A2 S1 2023 Page 4 of 7
Questions:
1. List all publications with no references (i.e., find publications that do not cite any other
publications). Your query should return results of the form (publicationID, title). (1 mark)
2. Find the most recent publication. Assume there are no ties (only one publication is the most
recent). Your query should return results of the form (publicationID, title, dateOfPublication). (1
mark)