Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Assignment 2 Solutions
Total marks: 45 Assessment value: 17%
Background
This assignment uses the same dataset as Assignment 1. The population of interest is round cut diamonds for sale in New Zealand.
Data was collected from a random sample of 229 round cut diamonds from a single New Zealand based retailer of diamond jewellery.
The data consists of the price of the diamond (in $NZ), the colour grade (a six point scale from D to I, D being colourless to I being nearly colourless) and the Lab which did the certification of the colour grade (Lab A or B). The data is in the Excel file Diamonds_2024_S1.xlsx
Import this data into RStudio and call it diamonds.
Use the data to answer the following questions in the spaces provided. You can re-size the answer spaces.
Use RStudio and incorporate the code you used and the output into your answers.
When you are finished your assignment, save it as a pdf and upload to Stream Assignment 2 Dropbox.
Part A: Comparing means: Analysis of diamond price according to lab of certification [25 marks]
We are interested in whether the average price of round cut diamonds differs depending on the lab that did the certification.
A1: Use RStudio to draw a side-by-side boxplot of price versus lab for the diamonds in the sample. Include your code as well as your plot. [3 marks]
A2: Use RStudio to calculate numerical summaries for price for each lab. Fill in the table with the values rounded appropriately. Include your code below the table. [4 marks]
|
Lab A |
Lab B |
Minimum |
|
|
Lower Quartile |
|
|
Median |
|
|
Mean |
|
|
Upper Quartile |
|
|
Maximum |
|
|
Sample size |
|
|
Rcode used:
A3: What do the plots and numerical summaries tell you about the prices of diamonds that were classified by the different labs in the sample? Hint: consider comparisons of centre, spread, shape and outliers. [4 marks]
A4: Do a two-sample t-test to determine if there is any evidence that there is a difference in mean prices for diamonds certified by Labs A and B in the population.
a. Step 1: Write the hypotheses. [2 marks]
The null: The alternative: |
b. Use RStudio to do the two-sample t-test. Include your code and output. [1 mark]
c. Step 2: State the value of the test statistic. [1 mark]
d. Step 3: State the statistical decision with reason. [1 mark]
e. Step 4: Write your conclusion. [2 marks]
f. Step 5: Check the conditions are met. [1 mark]
The issue of lack of representativeness was discussed in Assignment 1. Discuss whether the normality condition is met.
g. Write a sentence to interpret the confidence interval. Explain how it adds to your conclusion. [4 marks]
A5: Explain why a two-sample t-test is better than a t-test of differences for this context. [1 mark]
A6: An employee noted that Lab A tends to certify diamonds that are more expensive. She picks two diamonds from the store’s catalogue and looks up which lab certified which diamond. However, she sees that the more expensive diamond was certified by lab B. Does this contradict the analysis in A4? Explain. [1 mark]
Part B: Exploratory Data Analysis of Lab and Colour Grade. [9 marks]
We are interested in investigating if the distribution of the colour grade depends on which lab certified the diamond.
B1: Use RStudio to produce a table of counts for the colour grade of the diamonds by the Lab that certified them. Put lab as the rows and colour as the columns. Include your code and table. [2 marks]
B2: Use RStudio to produce a table of diamond colour grade as a proportion of the total number of diamonds certified by each lab. Round the values to 2 decimal places. Include your code and table. Note: the rows of your table should sum to 1. [2 marks]
B3: Use RStudio to produce a side-by-side bar plot of the distribution of diamond colour grade as a proportion of the diamonds certified by each lab. Include your code and plot. [2 marks]
B4: What do the tables and plot tell you about the distribution of colour grade for the two labs? [3 marks]
Part C: Inferential Analysis of lab and colour. [11 marks]
C1: Do a Chi-squared test of lab and colour.
a. Step 1: Write the hypotheses. [2 marks]
The null: The alternative: |
b. Use RStudio to do the Chi-Squared test. Include the RStudio output here. Include your code and output. [1 mark]
c. Step 2: State the value of the test statistic. [1 mark]
d. Step 3: State the statistical decision with reason. [1 mark]
e. Step 4: Write your conclusion. [1 mark]
f. Step 5: Check the conditions are met. [2 marks]
The issue of lack of representativeness was discussed in Assignment 1. What is the other condition? Discuss whether this condition is met. Include any code you use and its output.
g. Use RStudio to calculate the residuals. Include your code and output [1 mark]
h. Do the residuals add to your conclusion? Explain. [2 marks]