Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
PHY328 Final Project
For your final project we will put together everything we have learned so far.
From the motion of stars near the centre of our Galaxy, we know that there is a supermassive
black hole at its centre, weighing nearly 4.1 million times the mass of our Sun. For their role in
discovering this black hole, Andrea Ghez and Reinhard Genzel were awarded the 2020 Nobel Prize
in Physics.
Supermassive black holes also solve the mystery of Quasars. A Quasar, which is short for Quasi-
Stellar Object (QSO) are an extremely bright type of galaxy, with a central region that outshines
the entire galaxy. We believe that quasars are caused by the accretion of gas from the galaxy onto
a supermassive black hole at its centre.
Indeed, we now think that all galaxies host a supermassive black hole (King et al 2003), but that
the difference between quasars and ordinary galaxies is whether the black hole is accreting or not.
In your project, you will use data derived from pictures of the night sky to classify galaxies in to
quasars and ordinary galaxies. If you want to know more about the background, read the popular
science background to the 2020 Nobel Prize.
1.1 The Data
The Sloan Digital Sky Survey (SDSS) is a project designed to survey as much of the sky as possible.
It obtains both images and spectra.
When imaging the night sky, SDSS takes pictures through as series of coloured filters, measuring
the brightness of every object in each filter. The filters are shown below:
1
The dataset provided (data/SDSS_galaxies.csv) contains the brightness of galaxies in each filter,
as measured by the SDSS. It also contains the classification of each galaxy as either a normal
galaxy, or a QSO. Your task is to use the colours of objects in the SDSS, to train a ML algorithm
to recognise QSOs.
Aside: astronomical fluxes and colours Astronomers use weird units. The brightness mea-
surements in the dataset are in units called magnitudes. A magnitude in a given filter is related to
the flux in that filter by:
= −2.5 log10
3631,
where is the flux in a filter , measured in units of milli-Jansky, instead of something sensible
like Wm−2Hz−1. In astronomy, a colour refers to the difference between two magnitudes, e.g.:
− ≡ − = −2.5 log10
.
You should use colours, as defined above, as features to classify your galaxies.
Data issues: The dataset contains some of the issues you might find in real-world datasets, for
example the different colours may have very different means and variances, and there are very
different numbers of galaxies and quasars (i.e it is an unbalanced dataset).
2
2 The Assignment
Perform any necessary feature engineering required to prepare your dataset for classification.
Code your own Neural Network and apply it, with an appropriate design, to classify objects into
galaxies and QSOs based upon their colour. Compare the performance of your network to a random
forest classifier. You may, optionally compare the performance against the MPLClassifier Neural
Network classifier built into sklearn as well. I would like to stress that this is optional!
Use any appropriate metrics to decide upon the appropriate hyper-parameters for your classifiers
and to compare the performance of the two approaches.
Your report should take the form of a Jupyter notebook that explains your methodology, your
results and discusses the performance of the two approaches.
The notebook should be clean and readable. Supporting Python code and tests should be contained
within seperate module (.py) files where appropriate.
3 Assessment
Your project will be assessed on the quality of your code, both in the notebook and accompanying
module files. It will also be assessed on the quality of your results, the choices you make in training
and the discussion of your results. See the rubric for full details.