Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
Background
In order to eliminate poverty, it is imperative to be able to identify households suffering from poverty and target them with assistance. However, the identification of households in poverty relies on data from consumption surveys that is difficult, expensive, and time-consuming to collect.
Therefore, recent efforts have been focused on the use of “rapid surveys” that rely a limited number of poverty identifiers that serve as effective proxies for the calculation of a household’s poverty status.
Objective
The World Bank has asked you to identify the most important variables that determine a household’s poverty status to help them reduce the cost associated with compiling data to predict poverty.
Data
The data provided for analysis is household responses to a World bank consumption survey. Each observation has a unique household id to reflect the survey responses of that distinct household. Further, each household is labeled in or out of poverty through the Poor indicator variable. A sample of the data is as follows
Notice that all of the variables are encoded as random character strings but reflect actual survey questions. For categorical variables, these variables may reflect questions such as does your household have items such as Bar soap, Cooking oil, Matches, and Salt. Numeric questions often ask things like How many working cell phones in total does your household own or How many separate rooms do the members of your household occupy? The project is not meant for you to determine the real meaning of the variables you select, rather just identify the best variables in their encoded state to best predict poverty.