Hello, dear friend, you can consult us at any time if you have any questions, add WeChat: THEend8_
MG4F7 Project Assignment
The Empirical Analysis of the Wealth of Nations Why are some countries rich and others poor? We are going to study several possible drivers of economic development: countries’ human capital; countries’ efforts to develop new technologies; countries’ business environments; and, countries’ political institutions. . For all 217 countries, and for the year 2010, I have extracted available data on the following variables:
• GDP per capita, PPP (constant 2011 international $); this is a measure of national income produced in a country, per person, in a given year
• Life expectancy at birth, total (years); this is a measure of human capital as health and well-being
• Research and development expenditure (% of GDP); this is a measure of engagement in technological progress;
• Cost of business start-up procedures (% of GNI per capita); this is a measure of the business-friendliness of a country (something like a measure of “red tape”)
• CPIA transparency, accountability, and corruption in the public sector rating (1=low to 6=high); this is a measure of the quality of government institutions 1. Examine the outcome variable (GDP per capita). a. What is the median? b. What is the mean? c. Does the difference between median and mean suggest the presence of a skew in the distribution – if so, in which direction? Make a histogram plotting national income (save the graph and include it in your responses – please do likewise any time you are asked for a figure or table). Does it look as you’d expect?
2. Examine the explanatory variables. a. Make a correlation table that includes the outcome variable (y) and all of the variables (x1, x2, x3 ,…) The Stata command is “corr y x1 x2 x3 …”. What is the correlation between R&D expenditure and income? Does it surprise you? b. Now examine the univariate correlation between income and R&D expenditure (corr y x1). What is the correlation you see now? Any idea what might be happening? c. Create a variable that indicates that all variables are non-missing. Examine the univariate correlation between income and R&D expenditure (corr y x1) if the non-missing indicator is equal to 1. What is the correlation you see? Does this clarify the findings in 3(a) and 3(b)? What does it suggest about how you should run your multivariate regressions? d. Examine and report each pair-wise correlation between explanatory variables (corr x1 x2, corr x1 x3, corr x1 x4, corr x2 x3, corr x2 x4, corr x3 x4). Where do you see the greatest potential collinearity problem? 3. We next examine the simple relationships between each explanatory variable and the outcome variable (GDP per capita). a. Please produce scatter plots in which income per capita is plotted against each of the explanatory variables of interest. What key limitation can you see in the graph in which the quality of government institutions is the explanatory variable? What key concern do you see in the graph in which the cost of business start-up is the explanatory variable? How can you address this concern? b. Estimate the simple regressions predicting income per capita with each of the explanatory variables of interest. Please report the slope and intercept for each regression (or just incorporate the regression output into your project submission). c. What is the y-intercept in the simple regression of income per capita on the cost of business start up procedures? What does it mean in practice? Is it a “realistic” y-intercept in the sense of describing a potential reality? d. What is the slope in the simple regression of income per capita on the cost of business start up procedures? How is income predicted to
change if a country were to see a decline in the cost of opening a business from 60% of national income per capita to 10%? e. Based on the simple regression of income per capita on the cost of business start up procedures, what is the predicted level of income per capita in a country with start up costs equal to 100% of national income per capita? What is the approximate 95% prediction interval for income in a country with start up costs equal to 100% of national income per capita? Is this level of start up costs an outlier in the data? f. What is the y-intercept in the simple regression of income per capita on R&D expenditures? What does it mean in practice? Is it a “realistic” y-intercept in the sense of describing a potential reality? g. What is the slope in the simple regression of income per capita on R&D expenditures? Suppose a government minister proposes an ambitious policy increasing R&D expenditures by 0.5% of national income (GDP). The minister argues that this will increase income per capita by $10,000. Do you think this is likely? Explain. h. What is the y-intercept in the simple regression of income per capita on life expectancy? What does it mean in practice? Is it a “realistic” y-intercept in the sense of describing a potential reality? i. What is the R-squared in the simple regression of income per capita on life expectancy? How does it compare to the R-squared in the other simple regressions? j. Based on part 4(i), do you think that life expectancy has an important role in causing higher incomes? Propose a mechanism that would produce such a causal relationship. k. Suppose you were skeptical that the observed relationship between income per capita and life expectancy is causal. Propose one reverse causality mechanism and one omitted variables mechanism that would produce the positive relationship observed. 4. Let’s see what we would observe if we happened to draw particular subsamples for our estimates. Make sure your data are sorted by country name. Generate a country code that is increasing as you go down the dataset (so Afghanistan is 1, Albania 2, etc., down to Zimbabwe at 217). a. Estimate the regression line predicting income using R&D expenditure for country code 1-50; 51-100; 101-150; and, 151-217.
Report the estimated slopes. Why do the slopes differ from one regression to the next? b. If you were trying to infer the relationship between R&D spending and income per capita for the entire set of 217 countries from just one of these subsamples, how would you do it (hint: you can think of this as having a few different “sample” signals, and you are trying to estimate where a “population” parameter is likely to be)? Would each of the subsamples allow you to produce a reasonable inference about the relationship present in all 217 countries? 5. We’ll next look at multivariate regressions. Estimate a model predicting income per capita using life expectancy, business start-up costs and R&D expenditure. (Hint: make sure you have addressed the issue with the start-up cost data.) a. Present evidence of a collinearity problem arising from the inclusion of both life expectancy and business start-up costs. b. Can you make an argument for including life expectancy and dropping business start-up costs from the empirical model? c. Can you make an argument for including business start-up costs and dropping life expectancy from the empirical model? d. Produce a path diagram indicating the relationships among business start-up costs, life expectancy, and income per capita. If you do not have a clear idea about the direction of causality, just draw in a line with arrows in both directions, but make clear the signs (positive or negative) of the relationships.