ALY6010 Discrete probability and normal distributions
Discrete probability and normal distributions
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
ALY6010 Module 1 Project
Discrete probability and normal distributions
Overview and Rationale
This assignment is designed to provide you with hands-on experience in performing descriptive statistical
methods on a data set. The data set is provided in an Excel workbook and contains a wide range to data types
that you will need to work with.
Assignment Summary
Using the data provided in the attached Excel workbook, apply the methods of graphical and numerical
descriptive statistics.
Follow the instructions in the project document to analyze the data presented in the Excel workbook. Then
complete a report summarizing your data analyses.
Important note on the report: for this project, your report will be a HTML file produced using R Markdown.
Important note 2: I understand that some students are still learning R and R Markdown. If you are in this
group, this week deadline is flexible, and you can present your report up to five days after the deadline.
Files to submit: Important to remember, for this project you must submit two files:
• Your R Markdown File.
• Your HTML report.
Tasks to complete before starting your project.
1. Install the latest versions of R and R Studio on your computer.
(Read file: 01a R Install, create folder and project.ppt)
2. Create a folder on your computer named “ALY6010 R Project” and a subfolder named “DataSets”.
3. From R Studio, create an R Project for this class using the “ALY6010 R Project” folder you created
above. (Read file: 01a R Install, create folder and project.ppt)
4. Learn how to import data sets into R using the strategy requested by your instructor.
(Read file: 03 R Import data sets.ppt)
5. Learn how to use R Markdown. We will use only basic codes to produce the HTML outcome reports.
(Read file: R Markdown Introduction.ppt)
6. Save the file “M1data_carsales.xlsx” inside your DataSets folder.
7. Create an R Markdown file inside your ALY6010 R Project, name this file: Project1_myname.Rmd.
(Read file: R Markdown Introduction.ppt)
8. Import the data set into R Studio using the strategy you leant above and present the code using an
initial R chunk.
9. Do not present install.package() codes on your report. If you need to install any new package, do it
directly in the R Studio console.
Create an initial R chunk to activate your libraries and import your data sets. Use the following header on this R
chunk: {r message=FALSE, warning=FALSE}
Some libraries to include on your libraries R Chunk. If you do not have them, install the packages in the
console.
library(readxl)
library(tidyverse)
library(dplyr)
library(DT)
library(RColorBrewer)
library(rio)
library(dbplyr)
library(psych)
library(FSA)
Report starts here
Title.
Create a Title to your report with the report’s name (Project 1 Report), name and CRN of the class, your name,
your instructor’s name, and date you submit the report. Here, there is an example:
Introduction.
Create a title for your Introduction section. Here, there is a code example:
(A) Write some sentences to present general information about car sales market, global and in India.
Here there are some websites you can read, these are examples, find others if you prefer:
(B) Write a paragraph describing and explaining the importance of discrete and continuous probability
distributions.
(C) Write a sentence describing the data set you are about to use.
Analysis section.
Task 1
If you don’t know the dplyr::select() and psych::describe() codes, this will be a good opportunity to learn.
• Create an R Chunk.
•Start with the name of the data set, then using the pipes %>% , apply code dplyr::select() to select only the
variables Efficiency, Power_bhp, Seats, Km, and Price.
• Using a second pipe, apply code psych::describe(), nothing inside the parenthesis.
• Run the code. Two things that should call your attention: descriptive statistics are in the columns, not in the
rows, and there are too many decimals. Correct these issues.
• Using another pipe, enter code t() to transpose values. Run code and observe.
• Using another pipe, enter code round(2) to reduce decimals to only 2.
• Using another pipe, enter code knitr::kable() to improve table presentation.
Present the table on your Report.
Write some observations about the code strategy you just learnt.
Task 2.
• Prepare and present a bar plot to show the frequencies of variable location.
• Prepare and present a bar plot to show the frequencies of variable fuel type.
• Prepare and present a bar plot to show the frequencies of variable transmission.
• Prepare and present a bar plot to show the frequencies of variable owner.
Important: Use code par(mfrow=c(2,2)) to organize your bar plots presentation in a 2x2 matrix.
• Improve your graphs presentation with clear y- and x- axes labels, colors.
Task 3
Create a table with the variable location on the rows, and present their corresponding frequencies, cumulative
frequencies, percentages, and cumulative percentages.
If you have decimals, always reduce them to 2 or 3 only.
Follow these steps:
• Create a table to present locations and its frequencies.
• Convert table using as.data.frame()
• Rename columns: Var1 to Location and Freq to Frequency.
• Use code mutate() to create three new columns (these are new calculated fields):
• The cumulative frequencies, name column: CumFrequency.
• The percentages, name column: Percentage
• The cumulative percentage, name column: CumPercentage
Present it using a table library of your choice: library(DT) or library(knitr).
Optional: to apply kable, practice these codes to present your table, you will need to install package
kableExtra.
knitr::kable(digits = 2, caption = "Task 3 Table") %>%
kable_classic(full_width = FALSE, font_size = 12)
Task 4.
Repeat the codes used for task 3, this time present frequencies, cumulative frequencies, percentages, and
cumulative percentages for variable owner.
Task 5
• Prepare horizontal box plots and one histogram to display the data distribution of numerical kilometers.
• Use the code par(mfrow=c(2,1), mai=(1,1,1,1)) at the beginning of the R chunk.
• mfrow will present the two figures one of top of the other as a group, in this case, c(2,1) indicates 2 rows and
1 column. Mai will change the margins of your figures, bottom, left. Top. Right. Play with the mai numbers to
observe changes.
• Remove the title of your graphs by using main = NA.
Remember to always make observations after each task.
Task 6.
Similar to task 5, this time present the box plot and histogram for variable price.
Task 7
Prepare and present a box plot to display the price distribution per location.
Your figure must contain several boxes inside.
Provide your figure with a good presentation format.
Remember to always make observations after each task.
Task 8
Similar to task 7, prepare and present a box plot to display kilometers distribution per owner.
Task 9
Apply and present the outcomes of code boxplot.stats() for variable kilometers.
Explain the information obtained with the application of this code.
Task 10
With the information obtained in task 9, prepare and present a dotchart() to display the quartiles values
($stats) for variable kilometers.