ALY6010
Two-sample Confidence Intervals & Hypothesis Testing
Module 4 Project
Two-sample Confidence Intervals & Hypothesis Testing
ALY6010-21661
Overview and Rationale
This assignment is designed to provide you with hands-on experiences in estimating and hypothesizing with
two samples of interest. The data set is provided in an Excel workbook and contains a wide range to data types
that you will need to work with.
Remember that for your references you should use books, scientific journals, strictly academic sources.
Course Outcomes
This assignment is directly linked to the following key learning outcomes from the course syllabus:
CO1: Explore the use of statistical software in data analysis through hands-on applications,
CO4: Perform estimations of population parameters using confidence intervals based on one sample and
perform estimations of the difference between two population parameters of the same kind based on two
samples.
CO6: Perform various hypothesis tests, including those for a population parameter (single sample), and the
difference between two population parameters of the same kind (two samples), and perform analysis of
variance (ANOVA).
CO7: Interpret meaningful relationships and patterns in the data in relation to a given business question
Prepare the following assignment using R Markdown.
Submit (1) your HTML Report and (2) your original Rmd file.
Remember that your report will be reviewed using Turnitin. Review your Turnitin score, and if it higher than
20%, fix your report, submit again, and repeat until your score is lower than 20%.
Do not use t.test() codes on your report, always use the formulas we talked in class, they are also presented in
the book. Same for other tests.
Title.
Present a title.
Introduction:
Prepare a well-informed introduction, supported by academic references. Demonstrate your understanding of
the following topics:
1. Hypothesis testing and its application in an industry of your interest.
2. The different applications of z test, t test and F test for two sample comparisons.
3. Importance of proper referencing in Academic writing.
4. Briefly describe all data sets used in this report and their purpose.
Use at least 2 academic references per topic, besides our course book.
Analysis section:
Include in this section all the tasks described below. For every task, present a title and explanation, remember
that this is a professional report, and your readers need to know about the tasks (a short title and a short
explanation will do it). In addition, adding your own title to each task and your own explanations will decrease
Turnitin scores.
Conclusions:
Make an overall observation of the whole project, the meaning of the results you obtained regarding the
direction of the data or project and explain any new analytical and R programing skills you gained. Also,
imagine you are preparing this report for a company or research institution, therefore, you must make
meaningful contributions, think about what recommendations you can provide.
Bibliography:
Use APA format.
References must be used on the main body of your report: Technically speaking, if you do not mention any
references in the main text of your report, then it is like you did not use any, even if you add a list at the end.
Present references in the main body of your reports, in the place where you use them as an information
source; use either only the first author’s last name and year, e.g., (Bluman, 2017) and then list them in the
bibliography section in alphabetical order or use a number in order of appearance/use, then list them in the
bibliography section in that numerical order.
Appendix:
Mention the attached Rmd file.
Note: Before you begin this assignment, you must install and load the library “MASS” into your R Studio so
that you can use sample data available within this package.
Use the following code: install.packages(“MASS”) then library(MASS)
Do not present the install.package() code on your report, run it directly on your R Studio's console tab.
Remember: Do not present long raw data sets on your report, ONLY the results of your data analyses.
Task 1 (for each task, add your own title on your report).
Present some descriptive statistics of the public data set cats.
Check the data set cats using ?cats code on your Console.
Be organized and make sure to:
1. Select the appropriate descriptive statistics to present.
2. Select the appropriate visualizations to present the data (tables and /or graphs)
This is basically an open question, in which you must show your analytical, organization, and data presentation
skills.
Task 2
Assuming that the samples are independent of each other, and the variance of both population is unknown,
answer the following research question: Do male and female cats have the same body weight (in Kilograms)?
(Bluman, chapter 9-2).
Ho: µ1 = µ2
Ha: µ1 ≠ µ2
Hint: one way to get separate R vectors for male and female cat body weight values is to use the subset
function as follows:
male = subset(cats, subset=(cats$Sex=="M")), similar for females.
Present your hypothesis.
Use α = 0.01 for your hypothesis testing procedure.
* Present the critical value and compare it to your test value.
* Explain the result of your test.
* All your codes for this task should be contained in one single R chunk. Prepare a table or use inline R codes
to present your answers.
Task 3
In task 2 you tested the hypothesis for the differences in the means between female and male cats body
weight. In task 3, let’s test for the difference in their variances (Bluman, chapter 9-5).
Use alpha α = 0.01
* Present the critical value and compare it to your test value.
* Explain the result of your test.
* All your codes for this task should be contained in one single R chunk. Prepare a table or use inline R codes
to present your answers.