Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
IE582 Fall Assignment-2 Due Date- 9/24/2021 1) You can complete this assignment in a group of two. 2) Whether you are working alone or in groups, please ensure to assign yourself a group number under- “IE582F2021_Assignment_2”. 3) One submission is required per group. 4) Students in PhD Program are encouraged to work individually. 5) The required Code Snippets and expected Data Visualizations (You can choose different colors and alpha values) in the last section Codes and Visulaizations 6) You have been provided with the keywords for the visualizations in Codes and Visulaizations section. Problem Descriptions Information about the dataset: The assignment questions 1-4 are based on uspop dataset and metapop is the corresponding data dictionary. The question 5 is based on the perceptron learning algorithm. Question Set 1: Understanding the structure of the dataset (you could read basic ideas on databases online) Background: “Primary Key” is a unique representation for each of the records (rows) in the dataset. In other words, for a given primary key, there would be only one record (row) in the dataset. Based on the uspop dataset and metapop data dictionary, please answer the following questions. Please derive the necessary support or reasoning with the help of R Programming. 1. For the given dataset, any single column alone would not be sufficient to serve as a primary key. Provide support. (Hint: You need to prove the fact only for the columns which are not part of the data dictionary) 2. A combination of two columns would completely satisfy requirements of the primary key. Please find those two columns and provide support for your answer. (Hint: Same as earlier question.) 3. Between FIPS and CTYNAME columns, which would be a correct choice to uniquely identify records corresponding to a US county. Question Set 2: Identify top states and counties with highest number of older population For this exercise, the older population is defined as the population with age equal to or greater than 65 years 1. Please identify and list the top 10 states and counties with the highest number of older population. 2. Draw Bar Graphs to display your results with X axis being State or County Name and Y axis being its old Population. Please refer to the Codes and Visulaizations section for the expected visualization (Two Bar Graphs- one for the state and one for county) Question Set 3: Comparison between Male and Female Population. 1. Identify and list all the states having total female population greater than total male population. 2. Identify top 5 states which have the highest number of counties having total female population greater than total male population. 3. Identify top 5 states which have the highest percentage of counties having total female population greater than total male population. Question Set 4: Data Standardization Exercise The USA has a very large geographical area. Hence, a high level of heterogeneity is observed at a county level on multiple aspects. Thus, it becomes very important to standardize the data before performing comparison at a county level. For this exercise, we will use a technique very similar to “Crude Rate” definition. We will standardize race and ethnicity related total population numbers at a county level for the Pennsylvania State. We will implement this technique in two directions. - Horizontal Direction (Within Comparison) - In this approach, the county level race and ethnicity related total population numbers are standardized with respect to the total population of the concerned county. For example: For the Center County, assume that Asian Alone and total population numbers are 518 and 20000 respectively. Then, the standardized Asian Alone population number would be 259. - Vertical Direction (Across Comparison) - In this approach, the county level race and ethnicity related total population numbers are standardized with respect to its corresponding state level total population. For example: For the Center County, assume that White Alone population number is 2347 and for the Pennsylvania state, White Alone population number is 5000. Then, the standardized White Alone population number for the Center County would be 4694. . 1. For the “Pennsylvania” State, obtain data standardized values for White Alone Male population in the horizontal direction for each of the counties. 2. For the “Arkansas” State, obtain data standardized values for Black Alone Female population in the vertical direction for each of the counties. 3. Using codes in questions 1 and 2, write a function named “data_normalization” which takes Arguments/Parameters - State Name, Standardization Direction and Male or Female Value. It returns a dataframe/tibble containing appropriate standardized values for each of the counties in the state for total Male/Female Population. Bonus Question Generalize subquestion 3 to all variants of Male or Female instead of only for Total Male or Female. Codes and Visualizations Initial Code Block: Initialization of the libraries and importing the input data for the assignment ipak <- function(pkg){ new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])] if (length(new.pkg))