Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
GTFS Data Analysis
ENGG1001 Assignment 2
1 Getting started
To start, download gtfs.zip from Blackboard and extract the contents. The gtfs.zip folder
contains all the necessary files to start this assignment. You will be required to implement
your assignment in a file called a2.py This is the only file you should upload.
Note: The functionality of your assignment will be marked by automated tests. This
means that the output of each function, as well as the output of your overall program, must
be exact to the specifications outlined below in the Implementation section.
2 Data description
The General Transit Feed Specification (GTFS) defines a common format for public trans-
portation schedules and associated geographic information. GTFS feeds let public transport
agencies publish their transit data and developers write applications that consume that data
in an interoperable way.
This assignment will require the following files and the corresponding specific attributes
from the GTFS;
• stops.txt: Stops where vehicles pick up or drop off riders. Also defines stations and
station entrances.
– stop id: Identifies a stop.
– stop name: Name of the stop.
– stop lat: Latitude of the stop location.
– stop lon: Longitude of the stop location.
– parent station: It contains the name of the parent station. For example, UQ
Lakes has several platforms (from A to E) where each is identified as a specific
stop, the parent station enclosing all is ’place UQLAKE’.
• stop times.txt: Times that a vehicle arrives at and departs from stops for each trip.
– trip id: Identifies a trip.
1
– arrival time: Arrival time at a specific stop for a specific trip on a route. Often,
arrival and departure times at a stop will be the same. For times occurring after
midnight on the service day, the values are greater than 24:00:00 in HH:MM:SS
local time for the day on which the trip schedule begins.
– departure time: Departure time from a specific stop for a specific trip on a route.
– stop id: Identifies a stop.
• trips.txt: Trips for each route. A trip is a sequence of two or more stops that occur
during a specific time period.
– service id: Uniquely identifies a set of dates when service is available for one or
more routes. ID referencing calendar.service id.
– trip id: Identifies a trip.
• calendar.txt: Service dates specified using a weekly schedule with start and end
dates.
– service id: Uniquely identifies a set of dates when service is available for one or
more routes.
– monday: Indicates whether the service operates on all Mondays in the date range
specified by the start date and end date fields. 1 - Service is available for all
Mondays in the date range, 0 - Service is not available for Mondays in the date
range.
– tuesday to sunday: Functions in the same way as monday except applies to other
days from tuesday to sunday.
– start date: Start service day for the service interval.
– end date: End service day for the service interval.
3 Implementation
Within this section, the following function will be developed;
• arriving buses(stop name, start time, end time, date, interval length):
This function returns a list containing the number of buses arriving at stop name be-
tween start time and end time on date with aggregation intervals of interval length.
Denote ta the arrival time of a bus at stop name, the first value in the returned list
should be the number of buses satisfying the following conditions; start time ≤ ta <
start time+ interval length, the second value in the list should be the number of
buses where start time+interval length ≤ ta < start time+2*interval length,
etc.
– stop name: str
– start time: str in the HH:MM format
2
– end time: str in the HH:MM format
– interval length: int in minutes
– date: str date in the format ’YYYYmmdd’. You can assume the provided
date exists in the available date range in calendar.txt. If not, the result should
be an empty list.
You must write the following functions as part of your implementation. You are encour-
aged to add your own additional functions if they are beneficial to your solution.
• read data() -> df, df, df, df: This function reads the csv files and returns four
data frames named trips, stops, stop times, calendar in accordance with the
file names. Date and time columns in the relevant data sets should be converted to
datetime objects. Note that arrival time and departure time in stop times can
be greater than 24:00:00 (night time within a service day can take values between
24:00:00 and 29:59:59); rows with this type of entries should be removed.
Example:
>>> trips, stops, stop times, calendar = read data()
trips.size, stops.size, stop times.size, calendar.size
(694288, 140250, 18963581, 1550)
• find service(calendar, date) -> list: This functions returns a list containing
the service ids operating on the date. The returned service ids should satisfy
two conditions; (i) date must be between start date and end date, (ii) service must
operate on the day of the week for the given date.
• create subsets(stops, stop times, trips, calendar, stop name, date) -> df,
df: As a first step, this function (i) finds the stop id(s) corresponding to the stop name
in the data frame stops, (ii) creates a subset, named stoptime subset, of the data
frame stop times consisting of the rows which contain the previously chosen stop id(s),
and (iii) creates a subset, named trip subset, of the data frame trips consisting
of the rows which contain the same trip ids as stoptime subset. At this stage,
stoptime subset and trip subset would include all the observations related to stop name
without taking into account the operation date.
Second, this function should identify the service id(s) operating on the date through
the function find service described above, and create further subsets of trip subset
and stoptime subset (with the same names) accounting for the available service id(s)
on the date and the corresponding trip id(s), respectively. Note that if stop name
does not exist in stops[’stop name’], stops[’parent station’] must be searched.
Hint: empty command from Pandas can be useful to determine whether the resulting
series or data frame is empty.
Example:
>>> stoptime subset, trip subset = create subsets(stops, stop times, trips,
calendar, ’place UQLAKE’, ’20201116’)
stoptime subset.size
4893
3
• calculate counts(start time, end time, interval, stoptime subset) -> list:
This functions considers the data frame stoptime subset, and returns a list containing
the number of vehicles between start time and end time with aggregation intervals of
interval. Mathematically speaking, denote ta the arrival time of a bus at stop name,
the first value in the returned list should be the number of buses satisfying the follow-
ing conditions; start time ≤ ta < start time+ interval length, the second value
in the list should be the number of buses where start time+interval length ≤ ta <
start time+2*interval length, etc.
Example:
>>> calculate counts(’06:00’, ’09:00’, 15, stoptime subset)
[3, 3, 4, 7, 8, 12, 12, 13, 13, 16, 13, 13]
4 Example Output
>>> start time = ’06:00’
end time = ’09:00’
stop name = ’place UQLAKE’
day = ’20201116’
interval = 30
counts = arriving buses(stop name, start time, end time, day, interval)
counts
6, 11, 20, 25, 29, 26
5 Marking Criteria
5.1 Functionality Assessment
The functionality will be marked out of 12. Your assignment will be put through a series
of tests and your functionality mark will be proportional to the number of tests you pass.
If, say, there are 25 functionality tests and you pass 20 of them, then your functionality
mark will be 20/25 * 12. You will be given the functionality tests before the due date
for the assignment so that you can gain a good idea of the correctness of your assignment
yourself before submitting. You should, however, make sure that your program meets all the
specifications given in the assignment. That will ensure that your code passes all the tests.
Note: Functionality tests are automated and so string outputs need to exactly match what
is expected.
5.2 Code Style Assessment
The style of your assignment will be assessed by one of the tutors, and you will be marked
according to the style rubric provided with the assignment. The style mark will be out of 3.
4
6 Assignment Submission
You must submit your completed assignment electronically through Blackboard. The only
file you submit should be a single Python file called a2.py (use this name – all lower case).
This should be uploaded to Blackboard>Assessment>Assignment 2. You may submit your
assignment multiple times before the deadline – only the last submission will be marked.
Late submission of the assignment will not be accepted. In the event of exceptional
personal or medical circumstances that prevent you from handing in the assignment on
time, you may submit a request for an extension. See the course profile for details of how to
apply for an extension.