Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MGSC 7000 Assignment 1
This is a Group Assignment. Work as a team and submit one assignment per team. Please answer the following questions using R. The deliverable for this assignment is a R file with all the codes for each of the question. Before answering each question, clearly mention what question you are answering then provide the code in the subsequent lines. After providing the code you need to use # to provide your comments in the code to explain what the code is doing so it can be easily read by another person. Some questions need your interpretation. You need use # provide interpretation at the end of each question. Points will be deducted for failure to provide interpretation. For example: For Question 2 (a). You should clearly mention that the average selling price of laptops is $508.1, the median price was $500.1 etc.. Question 1: The file Housing.csv contains data collected by the US Census on housing. This data is obtained from the StatLib archive . The dataset has 506 cases (i.e., different areas within Boston). The data was originally published by Harrison, D. and Rubinfeld, D.L. `Hedonic prices and the demand for clean air', J. Environ. Economics & Management, vol.5, 81-102, 1978. There are 14 variables in the dataset. Data Dictionary: CRIM per capita crime rate by town ZN proportion of residential land zoned for lots over 25,000 sq.ft. INDUS proportion of non-retail business acres per town. CHAS Charles River dummy variable (1 if tract bounds river; 0 otherwise) NOX nitric oxides concentration (parts per 10 million) RM average number of rooms per dwelling AGE proportion of owner-occupied units built prior to 1940 DIS weighted distances to five Boston employment centers RAD index of accessibility to radial highways TAX full-value property-tax rate per $10,000 PTRATIO pupil-teacher ratio by town LSTAT % lower status of the population MEDV Median value of owner-occupied homes in $1000 a. Create a table that tells us mean of median value of owner-occupied homes (i.e., average of MEDV) that are near Charles River vs. those that are not (see CHAS variable). [5 points] b. Create a table that informs us the mean proportion of residential land zoned for lots over 25,000sqft (ZN variable) for different RAD (index of accessibility to radial Page 2 highways) levels. For example, the table should tell us what is the mean ZN when RAD=2 and so on. [5 points] c. Add a new variable called CAT_MEAN to the dataset. This variable should be a binary variable with CAT_MEAN=1 if the median value of owner occupied homes is above $30,000 else 0. [10 points] d. Create a table to inform how many homes in the dataset have or do not have tract bounds with the river. [5 points] e. Create a correlation table for all the variables in the data. Provide a detailed analysis based on the correlation table. [10 points] f. Create a table with only CRIM, INDUS, CHAS, and MEDV variables and with only the observations with MEDV more than 30. [5 points] Question 2: The file LaptopSales.csv contains data collected on laptops sold during 2008. This data was obtained from ENBIS – the European Network for Business and Industrial Statistics. Below is the data dictionary. LaptopSales Date purchase date Configuration A numerical code representing a combination of screen size, battery life, RAM, etc. Each code corresponds to a particular combination. Customer Postcode postcode in London of the customer Store Postcode postcode in London of the store Retail Price price of laptop in GBP Screen Size screen size of laptop (Inches) Battery Life battery life of laptop (Hours) RAM RAM size of laptop(GB) Processor Speeds processor speed of laptop (GHz) Integrated Wireless? whether the laptop has integrated wireless or not HD Size HD size of laptop (GB) Bundled Applications? whether the laptop comes with bundled applications or not customer X X geo coordinates for customer location. customer Y Y geo coordinates for customer location. store X X geo coordinates for store location store Y - Y geo coordinates for store location Imagine you are a Data Scientist for a company called ABC Laptops. Utilizing this data you need to help your company with their business planning decision. Answer the following questions. These are Page 3 open-ended questions. As a scientist, your answers need to provide as much information as possible to the company. a) At what prices are the laptops actually selling for? That is create a table with the summary statistics for the retail price column. You can report min, max, mean, median, sd etc… [5 points] b) Create columns called “Month” and “Week.” These columns should tell us the month and week when the laptop was sold. [Hint: You will recognize that the Date column is in character format and needs to be converted to Date format.] [10 points] c) Are the prices consistent across retail outlets? How are the prices varying across retail outlets? (Hint: every retail store has a unique Store Postcode. You need to first calculate the mean selling prices by group and report it.) [5 points] d) How does price change with configuration, ram, and screen size? That is, what is the mean and median prices for different configurations, ram, and screen size. [10 points] e) Which stores are selling the most? [10 points] f) What is the revenue from each store? What percent of revenue from each store is from selling 15in monitor vs. 17in monitor? [10 points] g) Create a new data table with revenue for each store, number laptops sold by each store, what percent of the total revenue does each store contribute, and what percent of sales volume does each store contribute. [10 points]