Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
INFO20003
Assignment 1: ER Modelling
Google scholar
Google scholar is a free accessible web search engine that provides a broad range of information on scholarly
publications. It also provides information on researchers, relations among scholarly publications and researchers,
and different metrics to measure impact of publications.
Your team has been asked to design a MySQL database to store information about the researchers, the details of
the publications, and relations among researchers and publications. With your design, Google scholar hopes to
improve access and searching of the data. The following specifications have been provided to your team to assist
in your design.
Researchers
For each researcher profile, Google scholar records the researcher’s details such as first name, last name, a unique
email address tied to the researcher, and the institution(s) the researcher is affiliated with. A researcher can be
affiliated with one or multiple institutions. For each institution, the name of the department, if exists (e.g.,
Computing and Information Systems), and the name of the institution (e.g., The University of Melbourne) are stored.
Each researcher can also be associated with a few keywords representing their ‘research area’, such as Databases,
Machine learning, Psychology, Medicine, etc.
Each researcher is also associated with multiple metrics providing insight on their research quality. A researcher
has a total citation count number (e.g., 203) that denotes the total number of times their publications have been
cited by other publications (see ‘Publications’ for the details on citations), an h-index number (e.g., 15) and an i10-
index number (e.g., 12). An h-index value of 15 means that this researcher has 15 publications that have each been
cited at least 15 times (similarly an h-index of 14 implies that the researcher has 14 publications that have each
been cited at least 14 times). An i10-index value is the number of publications with at least 10 citations.
Publications
Google scholar maintains three types of publications: conference papers, journals, and book chapters. For each
publication, Google scholar stores its title, year of publication, publication venue name, a list of authors (there can
be multiple authors of a publication, where each author is a researcher), start page number (e.g., 475), end page
number (e.g., 500), a short description (which is denoted as its abstract), and publisher company (assume only one
publisher company per publication). Google scholar may also store a few keywords such as Databases, Machine
learning, Psychology, etc for the ‘research area’ of each publication. Each publication has a list of references (i.e.,
it “cites” other publications), where each reference is another publication. If publication A is in the reference list of
another publication B, then B is “cited by” A. For each publication, Google scholar also shows the total number of
its citations, which denotes the number of times that the publication was cited by other publications. The following
figure shows an example publication with its basic information and its list of references.
INFO20003 S1 2022 A1 Page 2 of 5
Fig 1: An example of publication with authors, abstract and the list of references
Google scholar database will not store the actual publications, but rather a link to the document objects. Each
publication is linked to one document object. For each document object, Google scholar stores the URL link and
the document size in KB.
For each journal, the volume number and the issue number of the journal publication are also stored. Each book
chapter has an edition number. Each conference paper is associated with a conference event where one of the
authors (researchers) of the paper needs to make an oral presentation of the paper. Multiple conference papers
can be presented in the same conference event, but a conference paper can be presented in only one conference
event. Google scholar stores the following information for each conference event: the location (e.g., Melbourne,
Australia) of the conference, the start date, the end date of the conference, and the URL to the conference event
website. For each conference paper, the author who presents that paper in the conference, is also stored.
Research activities
For each researcher, Google scholar stores their list of publications. The researchers who author a publication
together are called ‘co-authors’. A researcher can create a list of publications as ‘bookmark’ that they find
important. A publication can be added or removed from the ‘bookmarked’ list of a researcher, but Google scholar
does not record the history of bookmarks. Only the publications that are currently in the ‘bookmarked’ list for a
researcher, are stored.
Featured publications
Google scholar manually curates a list of top 10 publications every fortnight depending on the number of citations
of the publications. A publication can make it to top 10 more than once over time. We must be able to determine
INFO20003 S1 2022 A1 Page 3 of 5
how many times a publication has been in the top 10 (but we do not need to know the rationale why this is the
current ranking, i.e. we do not need to track publication citations throughout time). A publication can stay in the
top 10 at the same or different rank. For example, the publication entitled “Learning to index” can be number 1
from January 1st – January 14th, 2022 but the rank drops to number 3 from January 15th-January 31st, 2022. For
each such rank, we must keep a record of start date when a publication reached that position and end date when
it was removed from the position (and potentially from the list altogether).
Business Requirements
Your database design needs to be able to meet the business' needs to answer questions as:
1. How many researchers have at least 10 publications?
2. How many publications of a researcher “Farhana Choudhury” have at least 5 citations?
3. Which researcher has the highest number of publications in the entire Google scholar?
4. How many researchers are associated with the institute “The University of Melbourne”?
5. When did the researcher “Farhana Choudhury” and “Renata Borovica-Gajic” have their first co-authored
publication?
6. Who are the researchers who published in any journal in 2021, where the journal has the research area
“Databases”?
7. Which researcher has the highest h-index at “The University of Melbourne”?
8. Which publication has referenced (cited) the highest number of other publications?
9. Are there any publications whose document size exceeds 500KB?
10. Which keyword is the most popular (meaning that the highest number of researchers have stated it as
one of their research areas)?