Search for question
Question

Part 2 REGRESSION ANALYSIS

(a) Run a regression to determine the impact of the 2013 unemployment rate (UnempRate2013) on the per

capita income (PerCapitalne) in a county. What is the estimated slope? Explain what this number

means in words in terms of the unemployment rate and in terms of per capita income. Also indicate

if the relationship is statistically significant at the 10%, 5%, and 1% levels. For this first pass, use

homoskedastic standard errors.

(b) Re-run the regression from part (a) but this time use heteroskedastic standard errors. Are your

coefficients the same as in part (a)? Why? Are your standard errors (of your betas) the same as in part

(a)? Why?

(c) Run the same regression as in part (b) but now also include the following additional regressors: percentage

of the population that is college-educated (Ed5CollegePlusPct), percentage of the population that is

black (BlackNonHispanic Pct 2010), and percentage of the population that is Hispanic (Hispanic Pct 2010.

Now, what is the estimated impact of unemployment rate in 2013 on per capita income? Also indicate

if the relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are

using heteroskedastic standard errors.

(d) Provide economic/econometric intuition as to why the impact of the unemployment rate's impact on

per capita income changed between parts (b) and (c). Note that I am asking you to think about the

context (and hence the "story" behind these data).

(e) Construct a 95% confidence interval for the slope coefficient on UnempRate2013 found in Part 2(c). Write

out your calculations. Clearly indicate how this confidence interval relates to whether UnempRate 2013

is statistically significant or not in this context by relating your answer to your constructed confidence

interval.

(f) You recall from Part 1 that both the means of per capita income and of unemployment rate in 2013 are

quite different across metro and nonmetro areas. You therefore want to explore this in more detail. Run

the regression from Part 2(c) using only metro areas in 2013 (i.e., Metro2013--1). [Hint: You need to

restrict the data based on a criterion before running the regression.] Now, what is the estimated effect

of the 2013 unemployment rate on per capita income and also indicate if the relationship is statistically

significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic standard

errors.

(g) Now, run the regression from Part 2(c) using only non-metro areas in 2013 (Metro2013--0). [Hint:

You need to restrict the data based on a criterion before running the regression]. Now, what is the

estimated effect of the 2013 unemployment rate on per capita income and also indicate if the relationship

is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using heteroskedastic

standard errors.

(h) What did you learn from the comparison between results in parts (f) and (g)? Explain your answer.

Note that I again am asking you to think about the context (and hence the "story" behind these data).

(i) Return to the full sample. Now, run a regression to determine the impact of changing the percentage of

the population which is college educated (Ed5CollegePlusPct) on the per capita income (PerCapitalne)

in a county. Include controls for the unemployment rate in 2013 (UnempRate2013), percentage of the

population that is black (BlackNonHispanicPet2010), percentage of the population that is Hispanic

(HispanicPet2010) and now also include a dummy variable for metro status (Metro2013). Now, what is

the estimated impact of percentage with a college education on per capita income? Also indicate if the

relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using

heteroskedastic standard errors.

(j) It is quite common in econometrics to model income variables nonlinearly. Construct a new variable

and call it "logine" or whatever you prefer, where logine-In (PerCapitalne). Provide summary statistics

for this new variable. (Hint: Think back to how you constructed summary statistics in Part 1.)

(k) Now run a regression model with logine as the dependent variable (and we are also going to start

controlling for metro status in addition to the other controls). In other words, the control variables are

unemployment rate in 2013 (UnempRate2013) as the main regressor, while also including the other

regressors: percentage college educated (Ed5CollegePlusPct), percentage non-Hispanic black in 2010

(BlackNon HispanicPet2010), percentage Hispanic in 2010 (HispanicPct 2010), and metro status in 2013

(Metro2013). Now, what is the estimated effect of UnempRate 2013 in words? Also indicate if the

relationship is statistically significant at the 10%, 5%, and 1% levels? Make sure that you are using

heteroskedastic standard errors. [Careful not to leave out any variables in your regression specification

in STATA]

(1) What is the null hypothesis corresponding to the F-statistic as reported in the output for the regression

in part (k)? What is the conclusion of the reported F-test? Explain (i.e. Do you reject or fail to reject

the stated null hypothesis above and how do you know this?)

(m) Construct a 95% confidence interval for the slope coefficient on UnempRate2013 in Part 2(k). As

usual, write out your calculations. Clearly indicate how this confidence interval relates to whether

UnempRate2013 is statistically significant or not in this context by relating your answer to your

constructed confidence interval.

(n) Discuss what the standard error of the regression (SER), R-squared and adjusted R-squared in part (k)

are telling you in terms of the numbers that you have found. Using what you know about the difference

between the two formulas, explain specifically why the R² and R² statistics so similar for this case.

(0) Use an F-test to test the joint significance of the additional regressors: Ed5CollegePlus, BlackNon-

Hispanic Pct 2010, Hispanic Pct 2010, and Metro2013. Find this test statistic and clearly indicate the

conclusions of the test.

(p) If you had more time to study this question and/or more or different data, what would you suggest

doing next? Propose additional variables to add and/or different specifications to try and give specific

reasons why you are suggesting these. Answers will vary for this part of the problem.