Analyzing Immunization Data
Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CSCI 141 Analyzing Immunization Data
UNICEF maintains a database which houses data sets related to health, development, and other information
related to maternal and child health. For this project, we will use immunization data maintained by
UNICEF, which contains information on yearly vaccinations administered to children around the world.
This data considers vaccines for the following infectious diseases and agents (abbreviation for vaccine
shown in all caps): tuberculosis, BCG; diphteria, pertussis, and tetanus, DTP1 and DTP3; meningococcal
disease, MCV1 and MCV2; hepatitis B, HEPBB and HEPB3; Haeomphilus influenza, HIB1; polio, IPV3
and POL3; pneumococcal disease, PCV3; rubella, RCV1; rotavirus, ROTAC; and Yellow Fever Virus,
YFV. Data is categorized as the percentage of children vaccinated, and is provided both globally and
regionally (e.g. East Asia and Pacific, Middle East and North Africa, etc.).
You will create functions to process the data and will write a main program that performs data QC and
makes use of your functions. You have been given three files:
• vaccine_data.csv is a comma-delimited text file which contains all of the data
• Project_4.py is a skeleton file where you will write your functions, import lines have been provided
for you, but you must write the def lines according to the specifications below
• Project_4_Main.py is a skeleton file which will contain your main program
In order to complete this assignment you must have functional versions of the following packages installed:
pandas, numpy.
BE AWARE: Your project submission (Project_4.py and Project_4_Main.py) will be graded on
style in terms of using pandas methods where appropriate and writing compact code as needed and
specified in the instructions. In order to receive full credit, you must use pandas objects and the
pandas/numpy libraries to edit data when possible. This doesn’t mean you can’t use multiple lines
or include conditionals, but rather that if something can be done with pandas function or method,
you shouldn’t write loops to iterate over data frames and series even if you can get the expected
output. Implementations which write code to take the place of pandas functions/methods and/or
that use other imported objects may not receive credit. Manipulations to data performed without
corresponding code (e.g. opening data in Excel and editing it) will receive no credit.