Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
CS2035
Assignment a) The population of the US in 1990 was estimated at 249.62 million. Verify that the counties present in this data set cover more than 99% of the US population. b) The state names are abbreviated with only two letters in the data set. With the help of the file US-states-abbrev.txt, add a variable that displays the full name of the state. c) Generate a table that displays, for every state, the median value of the percentage of votes at the county level, for each party. The table should look like this (use the full name for states): state democrat republican Perot Alabama 42.9 46.3 10.5 Arizona … … … … … … … Wyoming … … … d) Create a function plot linreg(T, party, var) that has three inputs: the table T of the imported data set, a string variable party that represents the percentage of votes received by the party and another string variable var that represents a demographic variable of the data set. This function will create a scatter plot of the two variables party and var with the linear regression line. You will indicate in the plot title the values of the intercept, slope and unadjusted R2 . For example, the call plot linreg(T,”democrats”,”black”) should produce a figure similar to Figure 1. e) Using plot linreg(), create a figure with 15 subplots. The subplots will be organized in three rows, one for each political party and five columns. The column will represent the following variables: crime, income, college, white and black. f) Explore the dataset and describe (preferably with a figure) something you find interesting, that has not been covered in this exercise (it could be trends, unexpected distribution, etc.). Page 2 of 5 University of Western Ontario CS2035 Exercise 2 – Basketball Player Doping? A basketball league is worried that one of its player is taking athletic performance-enhancing drugs (“doping”). The league suspects that the player may have started doping around the 40th game last season. The league is providing you with the player’s points-per-game (PPG) for every match of last season in order to establish if the apparent increase in this player’s PPG can be caused by chance alone. See the file named basketball-ppg.csv. a) Plot the time series of the player’s PPG as a function of the game number (game 1 is the first of the season, game 2 the second, and so on). Mark with a dashed vertical line the time t ∗ when the player is suspected to start doping. b) Check visually that the distributions of PPGs before and after t ∗ are approximately normally distributed. c) Check normality with the more formal Kolmogorov-Smirnov test. d) Based on your findings about normality of the PPGs, explain why a Z-test can be used to test if the distribution of PPGs after is different from before t ∗ . State the Null Hypothesis. e) Determine, with a 1% confidence level, if the change before/after t ∗ in the player’s PPGs is due to chance alone. Does your analysis support the league’s suspicion? Page 3 of 5 University of Western Ontario CS2035 Exercise 3 – Sales Analysis A large company would like to have a brief analysis regarding the sales of its new division, Great Products Inc., that manufactures and sells electronic components. The company has extracted from its main databases the sales records for Great Product Inc. and has sent you the following files: • db cust country.csv: The global list of their customers’ unique identification number (not only customers of Great Products) and their country of origin. • db cust orders.csv: Sales orders fulfilled by Great Products Inc. that shows the order ID and the customer’s ID. • db order ref.csv: The information that links a sales order ID with the reference ID of the item sold as well as the quantity shipped. • db ref price.csv: The unit price, in dollars, of an item given its reference ID. a) Merge the information from all four files such that you end up with a table that contains only the customers from Great Products Inc, their customer ID, the ID of the order (the transaction), the country of origin of the customer, the reference ID of the item they purchased, the quantity purchased, and the unit price of that item. b) Display a breakdown by countries of the total revenues generated by Great Product Inc.