Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MGT 205 R Project - Output
Overview
In this project, you will use a data set of property values from
2007 - 2019. The data contain sales prices for houses and units
with 1, 2, 3, 4, and 5 bedrooms.
The data are: date of sale; price in dollars; property type (unit or house); number
of bedrooms
START HERE
Run the code block below.
library(tidyverse)
## ── Attaching packages ─────────────────────────────────────── tidyverse 1.3.1 ──
## ✓ ggplot2 3.3.5 ✓ purrr 0.3.4
## ✓ tibble 3.1.3 ✓ dplyr 1.0.7
## ✓ tidyr 1.1.3 ✓ stringr 1.4.0
## ✓ readr 2.0.0 ✓ forcats 0.5.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
Load the property.csv file into a data frame.
prop <- read_csv("property.csv")
## Rows: 347 Columns: 6
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (2): saledate, type
## dbl (2): price, bedrooms
## lgl (2): V1, V2
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
1. How many rows and columns are there in the data set?
dim(prop)
## [1] 347 6
2. Calculate the summary statistics for the “price” column.
summary(prop$price)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 316751 427681 507596 547434 626106 1017752 1
3. Remove the unwanted variables V1 and V2 from the data set.
Show the head() of your data set to confirm that the unwanted
variables are gone.
prop<-prop[,-(5:6)]
head(prop)
## # A tibble: 6 × 4
## saledate price type bedrooms
##
## 1 30/06/2007 421291 house 3
## 2 30/06/2007 548969 house 4
## 3 30/06/2007 368817 unit 2
## 4 30/06/2008 441854 house 2
## 5 30/06/2008 419628 house 3
## 6 30/06/2008 559580 house 4
4a. Are there any NA values in the data set?
any(is.na(prop))
## [1] TRUE
4b. If any NA values are in the “price” column, replace it with the
mean price.
prop$price[which(is.na(prop$price))] <- mean(prop$price, na.rm = TRUE)
4c. If any NA values are in any of the other columns, remove the
entire row from the data set.
prop<-na.omit(prop)
4d. Show that there are no more NA values in the data set.
any(is.na(prop))
## [1] FALSE
5. Rename the “saledate” column to “date”. Show the head() of
your data set to confirm the change.
colnames(prop)[1]<-"date"
head(prop)
## # A tibble: 6 × 4
## date price type bedrooms
##
## 1 30/06/2007 421291 house 3
## 2 30/06/2007 548969 house 4
## 3 30/06/2007 368817 unit 2
## 4 30/06/2008 441854 house 2
## 5 30/06/2008 419628 house 3
## 6 30/06/2008 559580 house 4
6. Run the code below to convert the variable “bedrooms” into a
factor variable using the as.factor() function. Replace “prop” in
the code with whatever you called your data set. You do not need
to do anything else in this question after you edit and run the
code.
prop$bedrooms<-as.factor(prop$bedrooms)
7a. Calculate and show the mean price of the house properties
and unit properties.
prop %>% group_by(type) %>% summarise(mean_price =mean(price))
## # A tibble: 2 × 2
## type mean_price
##
## 1 house 626587.
## 2 unit 439743.
7b. Display the two values from 7a as a bar chart. Be sure to
include useful information to help understand your chart.