Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
GTFS Data Analysis
ENGG1001 Assignment 2
1 Getting started To start, download gtfs.zip from Blackboard and extract the contents. The gtfs.zip folder contains all the necessary files to start this assignment. You will be required to implement your assignment in a file called a2.py This is the only file you should upload. Note: The functionality of your assignment will be marked by automated tests. This means that the output of each function, as well as the output of your overall program, must be exact to the specifications outlined below in the Implementation section. 2 Data description The General Transit Feed Specification (GTFS) defines a common format for public trans- portation schedules and associated geographic information. GTFS feeds let public transport agencies publish their transit data and developers write applications that consume that data in an interoperable way. This assignment will require the following files and the corresponding specific attributes from the GTFS; • stops.txt: Stops where vehicles pick up or drop off riders. Also defines stations and station entrances. – stop id: Identifies a stop. – stop name: Name of the stop. – stop lat: Latitude of the stop location. – stop lon: Longitude of the stop location. – parent station: It contains the name of the parent station. For example, UQ Lakes has several platforms (from A to E) where each is identified as a specific stop, the parent station enclosing all is ’place UQLAKE’. • stop times.txt: Times that a vehicle arrives at and departs from stops for each trip. – trip id: Identifies a trip. 1 – arrival time: Arrival time at a specific stop for a specific trip on a route. Often, arrival and departure times at a stop will be the same. For times occurring after midnight on the service day, the values are greater than 24:00:00 in HH:MM:SS local time for the day on which the trip schedule begins. – departure time: Departure time from a specific stop for a specific trip on a route. – stop id: Identifies a stop. • trips.txt: Trips for each route. A trip is a sequence of two or more stops that occur during a specific time period. – service id: Uniquely identifies a set of dates when service is available for one or more routes. ID referencing calendar.service id. – trip id: Identifies a trip. • calendar.txt: Service dates specified using a weekly schedule with start and end dates. – service id: Uniquely identifies a set of dates when service is available for one or more routes. – monday: Indicates whether the service operates on all Mondays in the date range specified by the start date and end date fields. 1 - Service is available for all Mondays in the date range, 0 - Service is not available for Mondays in the date range. – tuesday to sunday: Functions in the same way as monday except applies to other days from tuesday to sunday. – start date: Start service day for the service interval. – end date: End service day for the service interval. 3 Implementation Within this section, the following function will be developed; • arriving buses(stop name, start time, end time, date, interval length): This function returns a list containing the number of buses arriving at stop name be- tween start time and end time on date with aggregation intervals of interval length. Denote ta the arrival time of a bus at stop name, the first value in the returned list should be the number of buses satisfying the following conditions; start time ≤ ta < start time+ interval length, the second value in the list should be the number of buses where start time+interval length ≤ ta < start time+2*interval length, etc. – stop name: str – start time: str in the HH:MM format 2 – end time: str in the HH:MM format – interval length: int in minutes – date: str date in the format ’YYYYmmdd’. You can assume the provided date exists in the available date range in calendar.txt. If not, the result should be an empty list. You must write the following functions as part of your implementation. You are encour- aged to add your own additional functions if they are beneficial to your solution. • read data() -> df, df, df, df: This function reads the csv files and returns four data frames named trips, stops, stop times, calendar in accordance with the file names. Date and time columns in the relevant data sets should be converted to datetime objects. Note that arrival time and departure time in stop times can be greater than 24:00:00 (night time within a service day can take values between 24:00:00 and 29:59:59); rows with this type of entries should be removed. Example: >>> trips, stops, stop times, calendar = read data() trips.size, stops.size, stop times.size, calendar.size (694288, 140250, 18963581, 1550) • find service(calendar, date) -> list: This functions returns a list containing the service ids operating on the date. The returned service ids should satisfy two conditions; (i) date must be between start date and end date, (ii) service must operate on the day of the week for the given date. • create subsets(stops, stop times, trips, calendar, stop name, date) -> df, df: As a first step, this function (i) finds the stop id(s) corresponding to the stop name in the data frame stops, (ii) creates a subset, named stoptime subset, of the data frame stop times consisting of the rows which contain the previously chosen stop id(s), and (iii) creates a subset, named trip subset, of the data frame trips consisting of the rows which contain the same trip ids as stoptime subset. At this stage, stoptime subset and trip subset would include all the observations related to stop name without taking into account the operation date. Second, this function should identify the service id(s) operating on the date through the function find service described above, and create further subsets of trip subset and stoptime subset (with the same names) accounting for the available service id(s) on the date and the corresponding trip id(s), respectively. Note that if stop name does not exist in stops[’stop name’], stops[’parent station’] must be searched. Hint: empty command from Pandas can be useful to determine whether the resulting series or data frame is empty. Example: >>> stoptime subset, trip subset = create subsets(stops, stop times, trips, calendar, ’place UQLAKE’, ’20201116’) stoptime subset.size 4893 3 • calculate counts(start time, end time, interval, stoptime subset) -> list: This functions considers the data frame stoptime subset, and returns a list containing the number of vehicles between start time and end time with aggregation intervals of interval. Mathematically speaking, denote ta the arrival time of a bus at stop name, the first value in the returned list should be the number of buses satisfying the follow- ing conditions; start time ≤ ta < start time+ interval length, the second value in the list should be the number of buses where start time+interval length ≤ ta < start time+2*interval length, etc. Example: >>> calculate counts(’06:00’, ’09:00’, 15, stoptime subset) [3, 3, 4, 7, 8, 12, 12, 13, 13, 16, 13, 13] 4 Example Output >>> start time = ’06:00’ end time = ’09:00’ stop name = ’place UQLAKE’ day = ’20201116’ interval = 30 counts = arriving buses(stop name, start time, end time, day, interval) counts 6, 11, 20, 25, 29, 26 5 Marking Criteria 5.1 Functionality Assessment The functionality will be marked out of 12. Your assignment will be put through a series of tests and your functionality mark will be proportional to the number of tests you pass. If, say, there are 25 functionality tests and you pass 20 of them, then your functionality mark will be 20/25 * 12. You will be given the functionality tests before the due date for the assignment so that you can gain a good idea of the correctness of your assignment yourself before submitting. You should, however, make sure that your program meets all the specifications given in the assignment. That will ensure that your code passes all the tests. Note: Functionality tests are automated and so string outputs need to exactly match what is expected. 5.2 Code Style Assessment The style of your assignment will be assessed by one of the tutors, and you will be marked according to the style rubric provided with the assignment. The style mark will be out of 3. 4 6 Assignment Submission You must submit your completed assignment electronically through Blackboard. The only file you submit should be a single Python file called a2.py (use this name – all lower case). This should be uploaded to Blackboard>Assessment>Assignment 2. You may submit your assignment multiple times before the deadline – only the last submission will be marked. Late submission of the assignment will not be accepted. In the event of exceptional personal or medical circumstances that prevent you from handing in the assignment on time, you may submit a request for an extension. See the course profile for details of how to apply for an extension.