MSCI521: Statistics and Descriptive Analytics
Statistics and Descriptive Analytics
MSCI521: Statistics and Descriptive Analytics | Department of Management Science
MSCI521: Statistics and Descriptive Analytics
Coursework
Guidelines
Answer both questions. Equal marks are available for both questions. The % marks indicated
for sections of questions are approximate.
You can include any graphs and outputs that you find relevant to the problem in the report,
but make sure that they are properly referenced and interpreted in the text. Remember that it
is up to you to interpret what a figure or table shows, not for the marker to infer.
It is also important to explain how you came to one or the other conclusion. For example, you
should explain why you think that there is or there isn’t an effect in the model. Simply stating
“there is” or “there is not” is not acceptable.
Do not include any appendices (they will be discarded during the marking).
You must submit an electronic version of your report through Moodle along with a
coursework declaration form.
Do not include your name anywhere in the work, but please do include your library id!
Your report must be between 2000 and 3000 words and should not exceed 20 pages.
Question 1 – 50 marks
An analyst has collected the sample of 534 respondents, measuring the wages of people in the UK and
other information about them:
wage – wage in GBP per hour,
education – number of years of education, starting from elementary school (across all
programmes),
experience – number of years of work experience (which is calculated as
),
age – age in years,
ethnicity – categorical variable, indicating, whether the respondent is Caucasian or of another
ethnicity,
region – categorical variable, showing, whether the respondent is from South or not,
gender – gender of the respondent,
occupation – categorical variable, indicating the occupation of person, can be:
o worker – tradesperson or assembly line worker,
o technical – technical or professional worker,
o services – service worker,
o office – office and clerical worker,
o sales – sales worker,
o management – management and administration;
sector – the sector of work of respondent, which can be manufacturing, construction, other,
union – nominal variable, indicating whether there is a union, related to the job,
In order to study the impact of variables on wage, you need to build a regression model. Do the
following steps in order to fulfil the task:
1. Analyse the data and explain what you observe, and discuss its possible causes. [25% mark]
2. Based on (1) and your understanding of the problem, propose an appropriate regression
formulation for the problem. Explain all the transformations that you propose (if any), and
which variables should be included and why. [30% mark]
3. Do regression diagnostics of the model from (2) and fix any problems you find. Explain what
you do and why [20% mark]:
a. Are there any apparent issues in the residuals (any patterns)?
b. Does the variance of the error appear to be constant?
c. Do the errors appear to be normally distributed?
d. Are there outliers?
4. Assuming that the standard regression assumptions hold, use your model from (3) to answer
the following questions: [25% mark]
2 MSCI521: Statistics and Descriptive Analytics | Department of Management Science
a. How do the years of experience impact the wage?
b. What is the meaning of the intercept in your model?
c. What is the average effect of the presence of a union on the wage?
d. What is the interpretation of the 99% confidence interval of the parameter for the
education?
e. What is the impact of age on wage?
Question 2 – 50 marks
In order to determine the main factors, influencing the price of cars, a company collected a sample of
82 cars with 27 variables, measuring different characteristics:
Manufacturer;
Model;
Type – categorical variable with levels "Small", "Sporty", "Compact", "Midsize", "Large" and
"Van";
Price – price of a standard version of a car (in $1,000);
Min.Price – price for a basic version (in $1,000);
Max.Price – price for “a premium version” (in $1,000);
MPG.city – fuel consumption in the city, miles per US gallon;
MPG.highway – fuel consumption on a highway;
AirBags – Air Bags standard: none, driver only, or driver & passenger;
DriveTrain – either rear wheel, or front wheel, or 4WD;
Cylinders – number of cylinders;
EngineSize – engine size in litres;
Horsepower – maximum horsepower;
RPM – revs per minute at maximum horsepower;
Rev.per.mile – engine revolutions per mile (in highest gear);
Man.trans.avail – is a manual transmission version of the car available?
Fuel.tank.capacity – fuel tank capacity in US gallons;
Passengers – passenger capacity in persons;
Length – length in inches;
Wheelbase – wheelbase in inches;
Width – width inches;
Turn.circle – U-turn space in feet;
Rear.seat.room – rear seat room in inches;
Luggage.room – luggage capacity in cubic feet;
Weight – weight in pounds;
Origin – a non-USA or USA company origins;
Make – combination of Manufacturer and Model.