Foundations of Data Analysis for Business
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MISM 6202
Foundations of Data Analysis for Business
PROBLEM SET 2
Use the file ‘commutervan.csv’ for all questions. Refer to ‘commutervan_data_dictionary.csv’ for
descriptions of the variables included in this dataset.
Commuter Van Express, Inc. (CVE) is a commuter shuttle service with operations in a large US
city where public transportation is frequently delayed and overcrowded. The company operates
a fleet of 14-passenger vans, equipped with WiFi and comfortable seats, to provide shuttle
service along fixed routes between residential neighborhoods and the city center during
commuting hours (running toward the city center in the morning and towards the residential
areas in the evening). CVE’s customers use a website or mobile app to book a seat in one of
the shuttles, thereby guaranteeing that space will be available. Vans run on a regular schedule
along each route, and the app includes location tracking to provide users with real-time arrival
information.
CVE uses analytics platforms to collect and organize data on their ongoing operations. One set
of metrics includes ride volume (number of rides actually taken each day) and revenues, which
can vary per ride due to volume discounts on package purchases (e.g. 12 rides for the price of
10), a monthly subscription option, and various short-term coupons and promotions. Their
platforms also track user activity in the mobile app, including actions like starting a new session,
tapping on a stop, booking a ride, etc.
It is April 1, 2016 and CVE has hired you as a consultant to help them understand their recent
performance and develop a method to forecast future rides and revenues. To assist in your
analysis, the company has provided you with daily data from its analytics platforms for the first
quarter of 2016. The dataset has been reviewed by CVE’s analytics team and confirmed to be
clean and free of errors.
Use RStudio to answer the following questions. Provide your written answers, along with any
relevant tables and charts, in a single PDF file. Any charts included in your report should be
properly labeled and formatted for an audience of company executives. Do not include R code
in your PDF report. RMarkdown is not required or suggested for this assignment. You should
also submit a single .R script file with your code for the analysis.
Regression Analysis.
1. Because customers value flexibility in their commuting plans, CVE allows customers to
cancel a booking without penalty up until the van they booked arrives at their chosen
stop. As a result, not all ride bookings result in a ride actually taking place. Estimate a
simple linear regression model to understand the relationship between daily bookings
and daily completed rides. Report the estimated regression equation and R2 value and
interpret them in words.