Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
852L1 Data Processing, Coding & Visualisation Workshop 2
Workshop 2
This week’s workshop focuses on how to compute measures of centrality and spread in Excel.
The purpose of this workshop is to compute measures of centrality and spread analytically—that is, by
using the definitions and manually constructing the formulas. This will give you a better
understanding of the meaning of each measure. We will then compare the results with the Excel built-
in functions.
In order to do the exercises below, you need to download the datasets first. Go to the Canvas page of
this module (Week 2) and download the Excel workbook: “Workshop2-Data.xlsx”.
Save the workbook in a folder that is convenient for you. Now open the Excel workbook and get ready
to do the exercises below.
Exercise 1: Time to Do Business
1. Go to the “TimeToDoBusiness” worksheet in the Excel workbook.
2. Compute the mean analytically, that is by using its formal definition.
(Reminder: have a look at the formal definition on the lecture slides)
(Hint: use the built-in Excel functions to sum and count observations)
3. Compute the mean using the built-in Excel function. Compare the value obtained with the one
you computed manually.
(Hint: in Excel the mean is given by the AVERAGE function)
4. Compute the median analytically, that is by using its formal definition.
(Reminder: have a look at the formal definition and examples on the lecture slides)
(Hint: you need sort observations first)
5. Compute the median using the built-in Excel function. Compare the value obtained with the one
you computed manually.
(Hint: in Excel the median is given by the MEDIAN function)
6. Compute the first and third quartiles analytically, that is by using their formal definitions.
(Hint 1: the first quartile is the ‘middle value’ of the lower half of the observations)
(Hint 2: the third quartile is the ‘middle value’ of the upper half of the observations)
7. Compute the first and third quartiles using the built-in Excel functions. Compare the values
obtained with the ones you computed manually.
(Hint: in Excel the quartiles are given by the QUARTILE function)
8. Find the smallest and largest observations in the dataset.
(Hint: minimum and maximum)
9. Compute the interquartile range (IQR).
10. Create a table containing the Five-number summary.
852L1 Data Processing, Coding & Visualisation Workshop 2
11. Compute the variance analytically, that is by using its formal definition.
(Reminder: have a look at the slides for the definition and examples on how to manually construct it)
(Hint: you might need to create additional columns to store the deviations from the mean)
12. Compute the variance using the built-in Excel function. Compare the value obtained with the one
you computed manually.
(Hint: in Excel the variance is given by the VAR.S function)
13. Compute the standard deviation analytically, that is by using its formal definition.
14. Compute the standard deviation using the built-in Excel function. Compare the value obtained
with the one you computed manually.
(Hint: in Excel the variance is given by the STDEV.S function)
15. Comment on the shape of the distribution without plotting a histogram. Do you expect it to be
symmetric or skewed?
Exercise 2: Box-plot
The aim of this exercise is to create a Box-plot using the dataset from Exercise 1. The idea is to create
a “Stacked Bar” plot and modify it accordingly.
In order to do so, you need to perform the following steps:
1. Create a table with the following values:
a. The difference between the first quartile and the minimum, i.e. ! –
b. The first quartile, i.e. !
c. The difference between the median and the first quartile, i.e. – !
d. The difference between the third quartile and the median, i.e. " –
e. The difference between the maximum and the third quartile, i.e. Max –"
2. Select the values (b) to (d) in the table above and go to the “Insert” tab. Within the “Charts”
section, select the “Column” chart and then “2-D bar, Stacked Bar”.
3. If you get three different bars, you have to select “Switch Row/Column” under “Chart Design”.
We have now created a Stacked Bar plot, but this is not quite what we want. We need to edit the chart
further.
4. Right-click on the most-left bar, and then “Format Data Series …”. It should have opened a
menu on the left of your screen. Within that menu, search for “Fill” and select “No fill”.
5. To add the left whisker, keep the bottom box selected and follow this path:
“Chart Design” tab → “Add Chart Element” → “Error Bars” → “More Error Bars Options …”.
Tick the “Minus” direction and click on “Custom” below “Error Amount”.
Delete the “Negative Error Value” and select the cell where you have calculated item (a) in the
852L1 Data Processing, Coding & Visualisation Workshop 2
table above (i.e ! – ). You should now have the left wisher.
6. To add the right whisker, follow the same steps as above by clicking the most-right bar instead.
When prompted, select “Plus” and “Positive Error Value” instead.
7. Remove the y-axis by clicking on it and delete it.
8. Select the same colour for the middle boxes.