Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Project Statement
This project accounts for 30% of the total marks for this course.
The deliverable is a PowerPoint file with video narration and speaker notes, and an appendix file.
Bike sharing has become increasingly popular across the globe. Today, such programs operate in more
than 1,000 cities, with more than half a million bicycles in use. The principle of bike sharing is simple:
individuals use bicycles on an as-needed basis without the costs and responsibilities of bike ownership.
It is short-term bicycle access, which provides its users with an environmentally friendly form of public
transportation. This flexible scheme targets daily mobility and allows users to access public bicycles at
unattended bike stations; bicycle reservations, pickup, and drop-off are all self-service. Commonly
concentrated in urban settings, bike sharing programs also provide multiple bike station locations that
enable users to pick up and return bicycles to different stations.
This project is about Capital Bikeshare (CaBi) in the metropolitan area of Washington DC (DC), which
covers not only the DC area, but also some parts of two nearby states, Maryland (MD), and Virginia (VA).
You are a business consultant working for the bike-sharing program.
Bike-sharing data
Your manager just referred you to download historical bike-sharing data by first visiting the following
site https://ride.capitalbikeshare.com/system-data; then click “downloadable files”. This would direct
you to the following site https://s3.amazonaws.com/capitalbikeshare-data/index.html, which contains
data of millions bike trips from July 2010 – 2022 September. Since the data come from the US, please
be aware of the difference in date formats between the US (mm/dd/yyyy) and Australia (dd/mm/yyyy).
It is also known that CaBi has changed the format of the data files recently. It is part of this project that
you need to decide how to consolidate tables coming from different sources and/or with different
formats.
The bikeshare data online are stored in comma-separated values (CSV) files. The first task that you need
to do before tackling any analyses is to place these datasets to your SAS folder (e.g., OrionDB on your H
drive) and convert them to the required SAS format. To do so, you will use SAS Enterprise Guide (EG) to
open each dataset by using File > Import Data to import the csv dataset. You will then be asked to
specify data; please try to understand what each step means. Most likely, you may just click next, ok, or
finish. After a dataset is loaded to SAS EG, you need to use File > Export to save it as a SAS dataset in
your SAS folder to be retrieved by your queries.
Regional factors
As said, CaBi not only serves DC, but some cities in MD and VA. Even within DC, the district is divided
into four quadrants of unequal areas: Northwest (NW), Northeast (NE), Southeast (SE), and Southwest
(SW). Each city and DC quadrant presents distinct characteristics (e.g., some are culturally rich, some are
more populated, and some have more crimes). Therefore, different regions may reveal different bike-
sharing use patterns. You may download detailed information of all CaBi bike stations from
https://opendata.dc.gov/datasets/capital-bike-share-locations/, in which the last column (attribute)
REGION_NAME shows whether a station is in DC, VA, or MD. If a station is within DC, the attribute
NAME would reveal the corresponding quadrant that it is located.
In the above file for station locations, you can find the locations of bike stations in the GPS coordinate
system. For example, the coordinate of a station is (x, y), where x is the longitude coordinate and y is the
latitude coordinate. The following link helps you to understand more about the GPS coordinate system:
https://www.ubergizmo.com/how-to/read-gps-coordinates/. If you want to locate a place on Google
Map by its latitude and longitude, you can also do it. For details, see the following link
https://support.google.com/maps/answer/18539.
If you are interested in estimating the distance traveled for a ride, assuming that a bike rental starts
from (1, 1) and ends at (2, 2), it is recommended that you estimate it using the so-called taxicab
distance, which is |1 ? 2| + |1 ? 2|. See the following figure for interpretation. For more
information, please see https://study.com/academy/lesson/taxicab-geometry-history-formula.html.
Note that whether the distance is in degrees (without any conversion from longitude-latitude
coordinates), miles, or kms, your findings, interpretations or insights should not change. It is
recommended that you just quote the distances in degrees.
Weather data
Weather plays an important role when people decide whether or not to use bike-sharing. You are
required to explore the relationship between weather (e.g., temperature, wind speed and humidity) and
the bike-sharing rentals in this project. There are two known ways to download free historical weather
data. The first way is to manually capture weather data month by month from Weather Underground
(wunderground.com).
First visit https://www.wunderground.com/ and try to search the weather condition in DC.
(There are other locations in MD and VA that you may also try, where there are many bike
stations as well.)
You will be led to the site of a nearby weather station, which may be different from time to
time.
Click the History tab on the page, and then choose to view Monthly weather data. Once you
choose a month, click View. For example, the following link shows the weather data of Oct 2011
measured at the Ronald Reagan Washington National Airport station (within DC):
https://www.wunderground.com/history/monthly/us/va/arlington/KDCA/date/2011-10
Scroll down the page, and you will see the table of Daily Observations. Use your mouse to copy
the table and paste it to an Excel spreadsheet.
Another way, as suggested by a former student, is to download weather data from NOAA. Try:
https://www.ncdc.noaa.gov/cdo-web/search. Search for "Daily Summaries" at relevant weather stations
for a time period then "Add to Cart" - NOTE that this is a free service, but you'll have to type in an email
address so that you can get the data download link once it processes.
Holiday data
Another factor that influences the bike-sharing rentals is holidays. You can easily search the dates of the
US federal holidays and/or MD, and VA state holidays each year.
The Task
Your manager asked you to collect and analyze the data and “let the data speak.” You understand that
the company wants to further grow the market and induce more users. Before they do it, they want to
have some insights from the data.
In this project, you are expected to manage and clean the data collected; some of them may contain
missing data, different formatting, and incomplete information. The goal is to overcome such obstacles
commonly encountered in reality to derive business insights from the datasets that can be used to
promote CaBi’s bike-sharing business.