1,763
Views
2
CrossRef citations to date
0
Altmetric
Articles

A Spreadsheet Tool for Learning the Multiple Regression F-test, T-tests, and Multicollinearity

Abstract

This note presents a spreadsheet tool that allows teachers the opportunity to guide students towards answering on their own questions related to the multiple regression F-test, the t-tests, and multicollinearity. The note demonstrates approaches for using the spreadsheet that might be appropriate for three different levels of statistics classes, so teachers can select the context that is most appropriate for their particular needs. The spreadsheet tool is linked to this article, and materials are provided in the appendices for teachers to use as handouts, homework questions, and answer keys.

1. Introduction

This note was inspired by a question that commonly arises when teaching multiple regression analysis: “How does multicollinearity differ from the case of the two independent variables jointly influencing the dependent variable?” The purpose of this note is to present a spreadsheet tool that allows teachers the opportunity to guide students towards answering on their own this question, as well as others related to the multiple regression F-test, the t-tests, and multicollinearity. The note demonstrates approaches for using the spreadsheet that might be appropriate for three different levels of statistics classes, so teachers can select the context that is most appropriate for their particular needs.

The materials for Statistics Level 1 describe the basic F-test and the t-test confidence intervals for multiple regression with two independent variables and how multicollinearity affects those test results. The materials for Statistics Level 2 include those elements and add problems that ask the students to calculate the areas within those intervals so that students can develop a better appreciation for how multicollinearity affects those tests differently. Statistics Level 3 augments those materials with Bonferroni-type corrections of significance levels.

The teaching context assumed here is that of a lab session directed by a teaching assistant that follows a lecture on the relevant material by the course instructor. However, the materials are easily adaptable for a variety of circumstances ranging from in-class discussion to homework problems worked by the students independently. The material in the text of the note might best be viewed as the lecture material that proceeds the lab session. A lab handout, including homework questions and a chart for students to complete, as well as suggested answers for that chart and for those homework questions are in the appendices.

The linked spreadsheet contains eight worksheets. The worksheet Basic can be adapted easily for any teaching goal involving the F-test and the t-tests in this regression context; it was used to develop all of the materials discussed here. The other worksheets within that spreadsheet are those mentioned in the appendices for student use.

2. Materials for Statistics Level 1 (and common to all three levels)

Consider the case in which two independent variables (x1i and x2i) are assumed to linearly affect the dependent variable (yi) as given by the population regression model below, with observations i=1…n and error term εi.

(1)

The sample predicted value of the dependent variable is given in Equation (2), where the s are least squares estimators of the coefficients above:

(2)

Since the coefficient of determination is often presented before the students see the tests of the slope coefficient significance but is related to the overall F-test, a teacher can use it to foreshadow how the slope coefficients and multicollinearity can affect the evaluation of a sample regression model. The coefficient of determination can be expressed in the form given in Equation (3), where , , and are the variances of the three variables and r is the correlation coefficient between the two independent variables.

(3)

First, by having the students assume that the correlation between the two independent variables equals zero the teacher will help the students see that the model fits the data better as one or both of the slope coefficients increase in absolute value away from zero. Second, by assuming that the two slope coefficients are small in absolute value the teacher can point out that the sample model also can fit well if the correlation coefficient is large enough. The important emphasis for this discussion is that these are possibly two different outcomes; a good model might result from having large slope coefficients or from a large correlation coefficient.

The hypotheses for the overall F-test, the level of significance, and the F-test statistic are given below ( is the sample error term variance).

(4) (5)

Using Equations (4) and (5) followed by the same substitution used in the formula for the coefficient of determination, the rarely taught confidence region for the F-test is:

(6)

Equation (6) determines an ellipse in space centered at the value of zero for both and . The interior of this “F-test ellipse” is the region of and values for which the analyst concludes that the two slope coefficients are jointly insignificant. The important points about this confidence region are that it allows the students to see how different regression results can be significant or not and that it will allow the students to learn how multicollinearity affects the test results. Footnote1 demonstrates this elliptical confidence region in spaceFootnote2.

Figure 1 F-Test Joint Confidence Interval

Figure 1 F-Test Joint Confidence Interval

The lab materials associated with this student level are in Appendix A. Early in the lab session, the students will have the opportunity to manipulate the standard deviations of x1 and x2 and the regression standard error to see how the F-test ellipse changes. The first homework question asks the students to examine how changing the level of significance changes the size of the ellipse. And later, the students will have the opportunity to change the correlation coefficient to see how multicollinearity changes the size and tilt of the F-test ellipse (as demonstrated in , which are discussed below).

The hypothesis tests and confidence intervals for the two individual slope coefficient t-tests are presented in Equations (7) and (8). The notation for the significance level allows for the possibility that the analyst might use one level of significance for the F-test and lower levels of significance for the t-test.

(7a) (7b) (8a) (8b)

The two t-test confidence intervals form a rectangle; the “t-test rectangle” and its interior is the region of and values for which the analyst must conclude that both slope coefficients are individually insignificant. overlays the F-test ellipse and the t-test rectangle. The labels (a)-(e) are explained below.

Figure 2 F-Test and t-test Confidence Intervals

Figure 2 F-Test and t-test Confidence Intervals

As with the F-test ellipse, early in the lab materials presented in Appendix A the students will have the opportunity to manipulate the standard deviations of x1 and x2 and the regression standard error to see how the t-test rectangle changes. The first homework question asks the students to examine how changing the level of significance changes the size of the rectangle.

Subsequent homework questions ask the students to change the correlation coefficient between the two independent variables so they can learn how multicollinearity affects the F-test results and the t-test results. through preview how the spreadsheet presents these impacts. Students often have a good sense about how the t-test rectangle changes with multicollinearity and can readily understand that the F-test ellipse increases in size, although the exact calculations of those areas are deferred until the Statistics Level 2 material.

Figure 3 Correlation Equal to −0.30

Figure 3 Correlation Equal to −0.30

Figure 4 Correlation Equal to 0.60

Figure 4 Correlation Equal to 0.60

Figure 5 Correlation Equal to −0.90

Figure 5 Correlation Equal to −0.90

Figure 6 Correlation Equal to 0.95

Figure 6 Correlation Equal to 0.95

Instructors do need to explain the tilt in the F-test ellipse because the direction of the tilt is the opposite of students tend to expect as it is the opposite of the sign of the correlation coefficient. To help the students, the teacher can remind the students of two concepts and then bring those concepts together to explain the tilt. First, the students should remember that the regions outside the F-test ellipse represent regions where the two slope coefficients are jointly significant. Second, if the two independent variables are positively (negatively) correlated, then they likely affect the dependent variable similarly (differently) and so the slope coefficients would have the same sign (different signs). So, by tilting the F-test ellipse negatively (positively), positive (negative) multicollinearity makes it more likely that the two slope coefficients are jointly significant when they have the same sign (different signs). Note that in and (the two cases in which the correlation coefficient is positive), the areas outside of the F-test ellipse are mainly in the northeast and southwest quadrants where the slope coefficients have the same sign. Similarly, in and (the two cases in which the correlation coefficient is negative), the areas outside the F-test ellipse are mainly in the northwest and southeast quadrants where the slope quadrants have different signs.

The final concept covered in the Statistics Level 1 material relates to the results of the F-test and the two t-tests. CitationGeary and Leser (1968) (followed shortly by CitationDuchan (1969)) noted that there are six possible combinations of outcomes for the three tests. The labels (a)-(e) in correspond to the first five outcomes while the sixth arises only with multicollinearity and is discussed below.

  1. The F-test allows the analyst to conclude that the slope coefficients are jointly significant and the t-tests allow the analyst to conclude that both slope coefficients are individually significant.

  2. The F-test forces the analyst to conclude that the slope coefficients are jointly insignificant and the t-tests force the analyst to conclude that both slope coefficients are individually insignificant.

  3. The F-test allows the analyst to conclude that the slope coefficients are jointly significant and the t-tests allow the analyst to conclude that one of the slope coefficients is individually significant.

  4. The F-test forces the analyst to conclude that the slope coefficients are jointly insignificant and, yet, the t-tests allow the analyst to conclude that one of the slope coefficients is individually significant.

  5. The F-test allows the analyst to conclude that the slope coefficients are jointly significant and the t-tests force the analyst to conclude that both slope coefficients are individually insignificant.

  6. The F-test forces the analyst to conclude that the slope coefficients are jointly insignificant and, yet, the t-tests allow the analyst to conclude that both slope coefficients are individually significant.

The appropriate areas for outcome (e) are small (outside the F-test ellipse but inside the t-test rectangle) so the arrow in points to one such region. This result is exactly the case that motivated this note; it is one in which the two variables work together to have a statistically significant joint influence on the dependent variable even if neither variable's marginal influence is significant. It is important to emphasize at this point that is drawn assuming that the correlation coefficient between the two independent variables equals zero, so this outcome is not a result of multicollinearity. This outcome can occur with multicollinearity, but unlike outcome (f), multicollinearity is not a prerequisite.

Further, we can also see in through that the areas corresponding to Geary and Leser's outcome (e), the region in which we conclude that the slope coefficients are jointly significant but individually insignificant increases in size as the correlation coefficient increases in absolute value away from zero. So, the initial student question that motivated this paper does make sense. It can be difficult to distinguish between two independent variables jointly influencing the dependent variable and two independent variables being multicollinear. Clearly, the teacher needs to emphasize that the context of the analysis will be the driving force in distinguishing between the two possibilities in any specific case.

The change in the shape of the F-test ellipse due to multicollinearity allows for the possibility that the two slope coefficients can be insignificant in the F-test but significant in the both of the two individual t-tests (Geary and Leser's outcome (f)). This possibility occurs at the extreme ends of the F-test ellipse (as the stretching of the F-test ellipse is more obvious at higher correlation coefficient values this possibility is denoted by the arrows only in ). However, obtaining these types of results is unlikely as, in the context of , that outcome would require the two slope coefficients to have different signs when the variables are positively correlated (and the same signs when the variables are negatively correlated as in ).

The general concept that the area of the F-test ellipse increases with multicollinearity is easy to motivate, but many students are not familiar with calculating the areas of ellipses. So, if an instructor wants to skip that topic (as well as the Bonferroni-type corrections), the materials for Statistics Level 1 are sufficient. However, if teachers want their students to take the next step of using the spreadsheet to calculate the areas of the F-test ellipse and the t-test rectangle, then the materials for Statistics Level 2 are the appropriate materials to use.

3. Additional Materials for Statistics Level 2

The materials for Statistics Level 2 repeat everything in Statistics Level 1 and augment them with spreadsheet calculations of the areas of the F-test ellipse and t-test rectangle. The spreadsheet has the area calculations discussed below already programmed into the appropriate cells, so teachers don't need to worry the their students' programming accuracy.

Given a general ellipse

the area of an ellipse is:

Thus, the area of the F-test ellipse in Equation (6) is:

(9)

Clearly, this area increases as the correlation coefficient increases.

Reworking Equations (8a) and (8b) allows the area of the t-test rectangle to be defined as in Equation (10) below.

(10)

Students readily understand that the area of the t-test rectangle increases as the correlation coefficient increases.

The homework materials ask the students to record the sizes of both the F-test ellipse and the t-test rectangle. The students will learn, as demonstrated in , that multicollinearity affects the t-test rectangle more than it does the F-test ellipse.

Table 1: Comparing the F-test and t-tests

4. Supplemental Materials for Statistics Level 3

The final concept to consider is the possibility of using Bonferroni-type of corrections. The context here assumes that the analyst has chosen a significance level for the F-test (α in this note) and then will reduce the significance level of the t-test a more appropriate level (α* in this note). One ad-hoc version of the Bonferroni Correction to the significance level of the t-tests is developed by beginning with definition of the union of rejecting the two t-test null hypotheses, as noted below.

Next, if we assume that

then we see:

That inequality leads to the definition of one ad-hoc Bonferroni Correction for this context as given in equation (11).

(11)

The instructional materials in Appendix C begin with a pattern very similar to the pattern of the materials presented for Statistics Level 1 and Statistics Level 2. The specific questions are a bit different because this assignment has the students work with a 10% significance level for the F-test and a 5% significance level for the two t-tests rather than working with only a 10% significance level for all three tests.

The last bit of these instructional materials take a different tack to the Bonferroni-type corrections, using a question that students naturally raise. A teacher might ask it as, “What level of significance for the t-tests would eliminate Geary and Leser's problematic outcomes (d) and (f)?” A student might ask it as, is there a level of significance that moves the t-test rectangle to be tangent to the F-test ellipse? The follow-up question is naturally, does that level of significance change with the correlation between the two independent variables?

The answer, as demonstrated in the Answer Key in Appendix C, is that the appropriate level of significance is derived from the relationship:

(12)

So, multicollinearity is irrelevant here. The exercises first ask the students to solve for Equation (12) and the implied level of significance and then incorporate those results into their analysis.

5. Concluding Comments

In the introductory applied statistics course I teach, the first course in a two-course sequence, I am always rushing through the multiple regression material at the end of the semester. I found that in teaching through the Statistics Level 2 material I was trying to cover too much material at a busy time in the semester. That is why I created the rather arbitrary separation between the Statistics Level 1 materials and the Statistics Level 2 materials. At the time of this writing, I plan to use the Statistics Level 1 materials in the statistics course when I teach it next. The Statistics Level 2 materials were appropriate for the second course in the sequence, basic econometrics, as multiple regression analysis is that course's prime focus.

The textbook I use for the basic econometrics course does not discuss Bonferroni-type corrections, which is one of the two reasons why I will not use the Statistics Level 3 materials for it. The other reason follows from the point that the course introduces students to reading applied economics journal articles, and the articles that are readable for students at that level rarely mention Bonferroni-type corrections. So, the topic doesn't seem very important for this level of student. However, it was a couple of students in my advanced econometrics course who asked the question about shifting the t-test rectangle to match the F-test ellipse and thereby motivated the presentation of that topic in the Statistics Level 3 materials. So, I will use the Statistics Level 3 materials for the advanced econometrics class.

Acknowledgements

This paper benefitted substantially from comments given by Nancy Haskell, anonymous reviewers, and the Editor of JSE, Dr. William Notz. Dr. Notz also suggested incorporation of the material related to Bonferroni-type corrections. The usual disclaimer applies.

Notes

1 The spreadsheet tends to draw an incomplete ellipse as the spreadsheet is designed to accommodate a very wide range of values for the inputs. The ellipse can be completed easily in any specific example by adjusting cells ‘Basic’!B20:C21. Similar adjustments can be made to the other worksheets if necessary.

2 The spreadsheet uses standard deviations of x1 and x2 and regression standard error (se) equal to one, a sample size of one-hundred, and a ten percent significance level for all three tests; as will be noted in the student exercises below, those values can be changed. is also drawn assuming that the correlation between the two independent variables equals zero, another value that the students will be able to change easily.

3 Geary, R.C., and C.E.V. Leser. February 1968. Significance Tests in Multiple Regression. The American Statistician. 22(1)20–21.

4 Duchan, Alan I. June 1969. A Relationship Between the F and t Statistics and the Simple Correlation Coefficients in Classical Least Squares Regression. The American Statistician. 23(3):27–28.

References

  • Duchan, Alan I. June 1969. A Relationship Between the F and t Statistics and the Simple Correlation Coefficients in Classical Least Squares Regression. The American Statistician. 23(3): 27–28.
  • Geary, R.C., and C.E.V. Leser. February 1968. Significance Tests in Multiple Regression. The American Statistician. 22(1)20–21.

Appendix A

Materials for a Statistics Level 1 Class

Lab Materials

Begin with the worksheet Statistics 1 Handout (which follows and is included in the linked spreadsheet). Your TA will begin by describing various components of the spreadsheet.

  • The information in columns A-C present abbreviated versions of the results from Excel's:

    Descriptive statistics,

    Correlation analysis, and

    Regression analysis.

  • The information in columns A-C is used in the “Input for Data Analysis” section.

  • The analyst selects the significance level. For this case, the F-test significance level (cell ‘Statistics 1’!G12) simply uses the t-test significance level (cell ‘Statistics 1’!G11) although that equality could be changed.

  • The graph presents the F-test ellipse ( and values outside the ellipse allow the analyst to reject the null hypothesis for the F-test), the critical values for the t-test of ( values to the left and right allow the analyst to reject the null hypothesis), and the critical values for the t-test of ( values above and below allow the analyst to reject the null hypothesis).

As your TA increases the standard deviation of each independent variable and the sample size, in each case making the statistical point that we have more information about the relationship between the variables, notice how the F-test ellipse and the t-test rectangle are affected. You should be able to predict the affect of each change in advance; if not please be certain to ask why the change lead to the affect. Similarly, as your TA increases the standard error of the estimate, which says that our regression model is less accurate, you should be able to predict how the t-test rectangle and the F-test ellipse change.

Return the elements back to their original values. Next your TA will change the values of the pseudo regression results in columns P, Q, and R to match those on the worksheet on the next page. Enter the values into your spreadsheet at the same time; your figure should match the TA's exactly.

Remember that there are six possible combinations of F-test and t-test results.

  1. F-statistic is significant and both slope coefficient t-statistics are significant.

  2. F-statistic is insignificant and both slope coefficient t-statistics are insignificant.

  3. F-statistic is significant and only one of the two slope coefficient t-statistics are significant.

  4. F-statistic is insignificant and only one of the two slope coefficient t-statistics are significant.

  5. F-statistic is significant and both slope coefficient t-statistics are insignificant.

  6. F-statistic is insignificant and both slope coefficient t-statistics are significant.

The fifth possibility is the case in which the two independent variables work together to help explain the dependent variable even though neither variable has a significant marginal impact. The sixth possibility occurs only when the two independent variables are correlated with each other, and so you will not see that possibility until you get to one of the multicollinearity situations.

Your TA will work through the worksheet with the case of the significance level equaling 5% and the correlation coefficient for the two independent variables equaling 0.0. For this particular case, you should see that the first possibility occurs twice and the fourth possibility occurs twice.

Homework assignment

Change the significance to 10%.

  1. Complete the second column in the worksheet.

  2. Use the first two columns (both with r=0) to explain how the worksheet demonstrates the impact of increasing the level of significance.

  3. In the second column (α=10% and r=0), which estimation results demonstrate the concept of variables working together although individually have insignificant marginal influences? Is this a result of multicollinearity?

  4. What is unexpected about the test results for Data 2 (α=10% and r=0) and for Data 4 (α=10% and r=0)? Do these outcomes result from multicollinearity?

  5. Complete the remainder of the worksheet.

  6. Importantly, multicollinearity increases (in absolute value) the critical value of the slope coefficient (value of that allows you to reject the null hypothesis).

    1. Demonstrate that point mathematically.

    2. Demonstrate that point using the estimation results above.

  7. Multicollinearity also increases the size of the F-test ellipse and tilts it. Which estimation results demonstrate the impacts of those effects?

  8. In the third, fourth, fifth, and sixth columns (|r|>0), which estimation results demonstrate the concept of variables working together although individually have insignificant marginal influences? Do these outcomes result from multicollinearity?

  9. What is unexpected about the test results for Data 11 (α=10% and r=+.6)?

Statistics 1 Handout

Write an "R" when the null hypothesis is rejected.

Answer Key

Write an "R" when the null hypothesis is rejected.

For the TA:

  • As the standard deviation of either variable increases, which would signify that the analyst has more information about the linear relationship:

    The F-test ellipse compresses along that variable's slope coefficient axis. Such compression makes it easier to reject the null hypothesis for the F-test.

    The t-test rectangle also compresses along that variable's slope coefficient axis. Such compression makes it easier to reject the null hypothesis for the t-test.

  • As the sample size increases:

    The F-test ellipse compresses in both dimensions, reflecting the increased information available to the analyst.

    The t-test rectangle similarly compresses in both dimensions.

  • As the standard error of the regression increases:

    The F-test ellipse expands in both directions as the standard error increases, corresponding to the increased inaccuracy of the regression fit.

    The t-test rectangle similarly expands in both dimensions.

Change the significance to 10%.

2. Use the first two columns (both with r=0) to explain how the worksheet demonstrates the impact of increasing the level of significance.

Data 2 … with the higher level of significance the t-statistic for becomes significant.

Data 3 … with the higher level of significance the F-statistic becomes significant.

Data 4 … with the higher level of significance the t-statistic for becomes significant.

Data 5 … with the higher level of significance the F-statistic becomes significant.

Data 7 … with the higher level of significance the F-statistic becomes significant.

Data 10 … with the higher level of significance the F-statistic becomes significant.

3. In the second column (α=10% and r=0), which estimation results demonstrate the concept of variables working together although individually have insignificant marginal influences? Is this a result of multicollinearity?

Data 7

Data 10

No, this is not a result of multicollinearity as r=0.

4. What is unexpected about the test results for Data 2 (α=10% and r=0) and for Data 4 (α=10% and r=0)? Do these outcomes result from multicollinearity?

  • In both cases, one t-statistic is significant (so one independent variable has a significant marginal influence) but the F-statistic is insignificant.

  • This is not a result of multicollinearity as r=0.

6. Importantly, multicollinearity increases (in absolute value) the critical value of the slope coefficient (value of that allows you to reject the null hypothesis).

  1. Demonstrate that point mathematically.

    As an example:

    So, as |r| increases from zero to one, the standard error of the slope coefficient increase and the critical value increases.

  2. Demonstrate that point using the estimation results above.

Data 2 … the t-statistic for becomes insignificant as r increases in absolute value.

Data 3 … the t-statistic for becomes insignificant as r increases in absolute value.

Data 4 … the t-statistic for becomes insignificant as r increases in absolute value.

Data 5 … the t-statistic for becomes insignificant as r increases in absolute value.

Data 8 … the t-statistics for both and become insignificant as r increases in absolute value.

Data 11 … the t-statistics for both and become insignificant as r increases in absolute value.

7. Multicollinearity also increases the size of the F-test ellipse and tilts it. Which estimation results demonstrate changes in the F-test ellipse? What are those changes and what are the impacts of those changes on the F-statistic?

Data 4 … The F-test ellipse tilts negatively with a positive r, so the F-statistic becomes significant at large positive r values.

Data 5 … The F-test ellipse becomes larger as| r| increases, so the F-statistic changes from significant to insignificant. But, the F-test ellipse also tilts negatively with a positive r, so the F-statistic becomes significant at large positive r values.

Data 7 … The F-test ellipse becomes larger as| r| increases, so the F-statistic changes from significant to insignificant. But, the F-test ellipse also tilts negatively with a positive r, so the F-statistic becomes significant at large positive r values.

Data 8 … The F-test ellipse becomes larger as| r| increases, so the F-statistic changes from significant to insignificant. But, the F-test ellipse also tilts negatively with a positive r, so the F-statistic becomes significant at large positive r values.

Data 10 … The F-test ellipse tilts positively with negative r so the F-statistic stays significant with negative r values. Because the F-test ellipse tilts negatively with a positive r, the F-statistic becomes insignificant with positive r values.

Data 11 … The F-test ellipse tilts positively with negative r so the F-statistic stays significant with negative r values. Because the F-test ellipse tilts negatively with a positive r, the F-statistic becomes insignificant with positive r values.

8. In the third, fourth, fifth, and sixth columns (|r|>0), which estimation results demonstrate the concept of variables working together although individually have insignificant marginal influences? Do these outcomes result from multicollinearity?

With r=0 (so multicollinearity is not a factor) as well as at higher r-values:

Data 7 and Data 10

At high correlation coefficients, so multicollinearity is a factor:

Data 3, Data 4, Data 5, Data 8, Data 11

9. What is unexpected about the test results for Data 11 (α=10% and r=+.6)?

Both the t-statistic for and the t-statistic for are significant, indicating that both variables have significant individual marginal influences, but the F-statistic is insignificant. So, jointly, the two variables offset each other. This result is because of the multicollinearity.

Appendix B

Materials for a Statistics Level 2 Class

Lab Materials

Begin with the worksheet Statistics 2 Handout(which follows and is included in the linked spreadsheet). Your TA will begin by describing various components of the spreadsheet.

  • The information in columns A-C present abbreviated versions of the results from Excel's:

    Descriptive statistics,

    Correlation analysis, and

    Regression analysis.

  • The information in columns A-C is used in the “Input for Data Analysis” section.

  • The analyst selects the significance level. For this case, the F-test significance level (cell ‘Statistics 2’!G12) simply uses the t-test significance level (cell ‘Statistics 2’!G11) although that equality could be changed.

  • The graph presents the F-test ellipse ( and values outside the ellipse allow the analyst to reject the null hypothesis for the F-test), the critical values for the t-test of ( values to the left and right allow the analyst to reject the null hypothesis), and the critical values for the t-test of ( values above and below allow the analyst to reject the null hypothesis).

As your TA increases the standard deviation of each independent variable and the sample size, in each case making the statistical point that we have more information about the relationship between the variables, notice how the F-test ellipse and the t-test rectangle are affected. You should be able to predict the affect of each change in advance; if not please be certain to ask why the change lead to the affect. Similarly, as your TA increases the standard error of the estimate, which says that our regression model is less accurate, you should be able to predict how the t-test rectangle and the F-test ellipse change.

Return the elements back to their original values. Next your TA will change the values of the pseudo regression results in columns P, Q, and R to match those on the worksheet on the next page. Enter the values into your spreadsheet at the same time; your figure should match the TA's exactly.

Remember that there are six possible combinations of F-test and t-test results.

  1. F-statistic is significant and both slope coefficient t-statistics are significant.

  2. F-statistic is insignificant and both slope coefficient t-statistics are insignificant.

  3. F-statistic is significant and only one of the two slope coefficient t-statistics are significant.

  4. F-statistic is insignificant and only one of the two slope coefficient t-statistics are significant.

  5. F-statistic is significant and both slope coefficient t-statistics are insignificant.

  6. F-statistic is insignificant and both slope coefficient t-statistics are significant.

The fifth possibility is the case in which the two independent variables work together to help explain the dependent variable even though neither variable has a significant marginal impact. The sixth possibility occurs only when the two independent variables are correlated with each other, and so you will not see that possibility until you get to one of the multicollinearity situations.

Your TA will work through the worksheet with the case of the significance level equaling 5% and the correlation coefficient for the two independent variables equaling 0.0. For this particular case, you should see that the first possibility occurs twice and the fourth possibility occurs twice.

Also, notice that the spreadsheet calculates the area of the F-test ellipse and the t-test rectangle. The general formulas for those areas are:

In this particular case, the areas equal:

Given the formulas for the two areas, it should be clear that both increase as the correlation between the two independent variables increases in absolute value. The question you will answer in this lab is, which area increases faster?

Homework assignment

Change the significance to 10%.

  1. Complete the second column in the worksheet.

  2. Use the first two columns (both with r=0) to explain how the worksheet demonstrates the impact of increasing the level of significance.

  3. In the second column (α=10% and r=0), which estimation results demonstrate the concept of variables working together although individually have insignificant marginal influences? Is this a result of multicollinearity?

  4. What is unexpected about the test results for Data 2 (α=10% and r=0) and for Data 4 (α=10% and r=0)? Do these outcomes result from multicollinearity?

  5. Complete the remainder of the worksheet.

  6. Importantly, multicollinearity increases (in absolute value) the critical value of the slope coefficient (value of that allows you to reject the null hypothesis).

    1. Demonstrate that point mathematically.

    2. Demonstrate that point using the estimation results above.

  7. Multicollinearity also increases the size of the F-test ellipse and tilts it. Which estimation results demonstrate the impacts of those effects?

  8. In the third, fourth, fifth, and sixth columns (|r|>0), which estimation results demonstrate the concept of variables working together although individually have insignificant marginal influences? Do these outcomes result from multicollinearity?

  9. As the correlation coefficient between the two independent variables increases in absolute value, which area increases faster: the F-test ellipse or the t-test rectangle?

  10. What is unexpected about the test results for Data 11 (α=10% and r=+.6)

Statistics 2 Handout

Answer Key

For the TA:

  • As the standard deviation of either variable increases, which would signify that the analyst has more information about the linear relationship:

    The F-test ellipse compresses along that variable's slope coefficient axis. Such compression makes it easier to reject the null hypothesis for the F-test.

    The t-test rectangle also compresses along that variable's slope coefficient axis. Such compression makes it easier to reject the null hypothesis for the t-test.

  • As the sample size increases:

    The F-test ellipse compresses in both dimensions, reflecting the increased information available to the analyst.

    The t-test rectangle similarly compresses in both dimensions.

  • As the standard error of the regression increases:

    The F-test ellipse expands in both directions as the standard error increases, corresponding to the increased inaccuracy of the regression fit.

    The t-test rectangle similarly expands in both dimensions.

Change the significance to 10%.

2. Use the first two columns (both with r=0) to explain how the worksheet demonstrates the impact of increasing the level of significance.

The areas of the F-test ellipse and the t-test rectangle decrease reflecting the smaller critical values that are consistent with the larger significance level.

Data 2 … with the higher level of significance the t-statistic for becomes significant.

Data 3 … with the higher level of significance the F-statistic becomes significant.

Data 4 … with the higher level of significance the t-statistic for becomes significant.

Data 5 … with the higher level of significance the F-statistic becomes significant.

Data 7 … with the higher level of significance the F-statistic becomes significant.

Data 10 … with the higher level of significance the F-statistic becomes significant.

3. In the second column (α=10% and r=0), which estimation results demonstrate the concept of variables working together although individually have insignificant marginal influences? Is this a result of multicollinearity?

Data 7

Data 10

No, this is not a result of multicollinearity as r=0.

4. What is unexpected about the test results for Data 2 (α=10% and r=0) and for Data 4 (α=10% and r=0)? Do these outcomes result from multicollinearity?

  • In both cases, one t-statistic is significant (so one independent variable has a significant marginal influence) but the F-statistic is insignificant.

  • This is not a result of multicollinearity as r=0.

6. Importantly, multicollinearity increases (in absolute value) the critical value of the slope coefficient (value of that allows you to reject the null hypothesis).

  1. Demonstrate that point mathematically.

    As an example:

    So, as |r| increases from zero to one, the standard error of the slope coefficient increase and the critical value increases.

  2. Demonstrate that point using the estimation results above.

Data 2 … the t-statistic for becomes insignificant as r increases in absolute value.

Data 3 … the t-statistic for becomes insignificant as r increases in absolute value.

Data 4 … the t-statistic for becomes insignificant as r increases in absolute value.

Data 5 … the t-statistic for becomes insignificant as r increases in absolute value.

Data 8 … the t-statistics for both and become insignificant as r increases in absolute value.

Data 11 … the t-statistics for both and become insignificant as r increases in absolute value.

7. Multicollinearity also increases the size of the F-test ellipse and tilts it. Which estimation results demonstrate changes in the F-test ellipse? What are those changes and what are the impacts of those changes on the F-statistic?

Data 4 … The F-test ellipse tilts negatively with a positive r, so the F-statistic becomes significant at large positive r values.

Data 5 … The F-test ellipse becomes larger as| r| increases, so the F-statistic changes from significant to insignificant. But, the F-test ellipse also tilts negatively with a positive r, so the F-statistic becomes significant at large positive r values.

Data 7 … The F-test ellipse becomes larger as| r| increases, so the F-statistic changes from significant to insignificant. But, the F-test ellipse also tilts negatively with a positive r, so the F-statistic becomes significant at large positive r values.

Data 8 … The F-test ellipse becomes larger as| r| increases, so the F-statistic changes from significant to insignificant. But, the F-test ellipse also tilts negatively with a positive r, so the F-statistic becomes significant at large positive r values.

Data 10 … The F-test ellipse tilts positively with a negative r so the F-statistic stays significant with negative r values. Because the F-test ellipse tilts negatively with a positive r, the F-statistic becomes insignificant with positive r values.

Data 11 … The F-test ellipse tilts positively with a negative r so the F-statistic stays significant with negative r values. Because the F-test ellipse tilts negatively with a positive r, the F-statistic becomes insignificant with positive r values.

8. In the third, fourth, fifth, and sixth columns (|r|>0), which estimation results demonstrate the concept of variables working together although individually have insignificant marginal influences? Do these outcomes result from multicollinearity?

With r=0 (so multicollinearity is not a factor) as well as at higher r-values:

Data 7 and Data 10

At high correlation coefficients, so multicollinearity is a factor:

Data 3, Data 4, Data 5, Data 8, Data 11

9. As the correlation coefficient between the two independent variables increases in absolute value, which area increases faster: the F-test ellipse or the t-test rectangle?

The area of the t-test rectangle increases faster than the area of the F-test ellipse.

10. What is unexpected about the test results for Data 11 (α=10% and r=+.6)?

Both the t-statistic for and the t-statistic for are significant, indicating that both variables have significant individual marginal influences, but the F-statistic is insignificant. So, jointly, the two variables offset each other. This result is because of the multicollinearity.

Appendix C

Materials for a Statistics Level 3 Class

Lab Materials

Consider the regression model with two independent variables.

Further:

and: .

As noted in class, one ad-hoc Bonferroni Correction to the significance level of the t-tests is:

Also as noted in class, Geary and Leser Footnote[3] (followed shortly by Duchan Footnote[4]) compared the outcomes of the overall F-test and the slope coefficient t-tests for the case of a regression model with two independent variables. Assuming that the analyst used the same significance level for all three tests, the six possible outcomes were:

  1. F-statistic is significant and both slope coefficient t-statistics are significant.

  2. F-statistic is insignificant and both slope coefficient t-statistics are insignificant.

  3. F-statistic is significant and only one of the two slope coefficient t-statistics are significant.

  4. F-statistic is insignificant and only one of the two slope coefficient t-statistics are significant.

  5. F-statistic is significant and both slope coefficient t-statistics are insignificant.

  6. F-statistic is insignificant and both slope coefficient t-statistics are significant.

Instead of doing a Bonferroni-type of correction, the analyst might ask the question, “What level of significance for the t-tests would eliminate the problematic 4th and 6th outcomes?” In other words, is there a level of significance that moves the t-test rectangle to be tangent to the F-test ellipse? Further, does that level of significance change with the correlation between the two independent variables?

Homework assignment

  1. In the worksheet Statistics 3 Handout(which follows and is included in the linked spreadsheet)., set the significance level for the F-test to 10% and, as an ad-hoc Bonferroni correction, set the significance level for the t-test to 5%.

    1. Complete the first two columns in the handout.

    2. Use the first two columns (both with r=0) to explain how the worksheet demonstrates the impact of the ad-hoc Bonferroni correction.

    3. In the first column, which estimation results demonstrate the concept of variables working together although individually have insignificant marginal influences? Is this a result of multicollinearity?

    4. In the first column, what is unexpected about the test results for Data 2 and for Data 4? Do these outcomes result from multicollinearity?

    5. Complete the remainder of the handout.

    6. Importantly, multicollinearity increases (in absolute value) the critical value of the slope coefficient (value of that allows you to reject the null hypothesis).

      1. Demonstrate that point mathematically.

      2. Demonstrate that point using the estimation results above.

    7. Multicollinearity also increases the size of the F-test ellipse and tilts it. Which estimation results demonstrate the impacts of those effects?

    8. In the third, fourth, fifth, and sixth columns (|r|>0), which estimation results demonstrate the concept of variables working together although individually have insignificant marginal influences? Do these outcomes result from multicollinearity?

    9. As the correlation coefficient between the two independent variables increases in absolute value, which area increases faster: the F-test ellipse or the t-test rectangle?

    10. This question suggests one advantage to using some type of Bonferroni correction. Change the significance level for the t-tests to 10% (which equals the significance level for the F-test) and change the correlation coefficient between the independent variables to +0.6. In this case, what is unexpected about the test results for Data 11 and how does the Bonferroni correction overcome this problem?

  2. Solve for the significance level that moves the t-test rectangle to be tangent to the F-test ellipse.

    11. Show your derivation of the appropriate t-statistic.

    12. Does that t-statistic change with the correlation between the two independent variables?

    13. In the worksheet Statistics 4, insert your formula into cell ‘Statistics 4’!G13. The cell ‘Statistics 4’!G14 calculates the significance level associated with that t-statistic. What is it when the correlation coefficient equals zero?

    14. Change the correlation coefficient value from 0.0 to −0.3 to +0.6 to −0.9 to +0.95 (which mirrors your previous work). Observe how the t-test rectangle, the F-test ellipse change, and the t-test significance level change. What is the t-test significance level in each case?

Statistics 3 Handout

Answer Key

  1. In the worksheet Statistics 3, set the significance level for the F-test to 10% and, as an ad-hoc Bonferroni correction, set the significance level for the t-test to 5%.

    2. Use the first two columns (both with r=0) to explain how the worksheet demonstrates the impact the ad-hoc Bonferroni correction.

    The area of the t-test rectangle increases reflecting the larger critical values that are consistent with the smaller significance level.

    Data 2 … with the lower level of significance the t-statistic for becomes insignificant.

    Data 4 … with the lower level of significance the t-statistic for becomes insignificant.

    3. In the first column, which estimation results demonstrate the concept of variables working together although individually have insignificant marginal influences? Is this a result of multicollinearity?

    Data 7

    Data 10

    No, this is not a result of multicollinearity as r=0.

    4. In the first column, what is unexpected about the test results for Data 2 and for Data 4? Do these outcomes result from multicollinearity?

    • In both cases, one t-statistic is significant (so one independent variable has a significant marginal influence) but the F-statistic is insignificant.

    • This is not a result of multicollinearity as r=0.

    6. Importantly, multicollinearity increases (in absolute value) the critical value of the slope coefficient (value of that allows you to reject the null hypothesis).

    1. Demonstrate that point mathematically.

      As an example:

      So, as |r| increases from zero to one, the standard error of the slope coefficient increase and the critical value increases.

    2. Demonstrate that point using the estimation results above.

      Data 3 … the t-statistic for becomes insignificant as r increases in absolute value.

      Data 5 … the t-statistic for becomes insignificant as r increases in absolute value.

      Data 8 … the t-statistics for both and become insignificant as r increases in absolute value.

      Data 11 … the t-statistics for both and become insignificant as r increases in absolute value.

      7. Multicollinearity also increases the size of the F-test ellipse and tilts it. Which estimation results demonstrate changes in the F-test ellipse? What are those changes and what are the impacts of those changes on the F-statistic?

      Data 4 … The F-test ellipse tilts negatively with a positive r, so the F-statistic becomes significant at large positive r values.

      Data 5 … The F-test ellipse becomes larger as| r| increases, so the F-statistic changes from significant to insignificant. But, the F-test ellipse also tilts negatively with a positive r, so the F-statistic becomes significant at large positive r values.

      Data 7 … The F-test ellipse becomes larger as| r| increases, so the F-statistic changes from significant to insignificant. But, the F-test ellipse also tilts negatively with a positive r, so the F-statistic becomes significant at large positive r values.

      Data 8 … The F-test ellipse becomes larger as| r| increases, so the F-statistic changes from significant to insignificant. But, the F-test ellipse also tilts negatively with a positive r, so the F-statistic becomes significant at large positive r values.

      Data 10 … The F-test ellipse tilts positively with a negative r so the F-statistic stays significant with negative r values. Because the F-test ellipse tilts negatively with a positive r, the F-statistic becomes insignificant with positive r values.

      Data 11 … The F-test ellipse tilts positively with a negative r so the F-statistic stays significant with negative r values. Because the F-test ellipse tilts negatively with a positive r, the F-statistic becomes insignificant with positive r values.

      8. In the third, fourth, fifth, and sixth columns (|r|>0), which estimation results demonstrate the concept of variables working together although individually have insignificant marginal influences? Do these outcomes result from multicollinearity?

      With r=0 (so multicollinearity is not a factor) as well as at higher r-values:

      Data 7 and Data 10

      At high correlation coefficients, so multicollinearity is a factor:

      Data 3, Data 4, Data 5, Data 8, Data 11

      9. As the correlation coefficient between the two independent variables increases in absolute value, which area increases faster: the F-test ellipse or the t-test rectangle?

      The area of the t-test rectangle increases faster than the area of the F-test ellipse.

      10. This question suggests one advantage to using some type of Bonferroni correction. Change the significance level for the t-tests to 10% (which equals the significance level for the F-test) and change the correlation coefficient between the independent variables to +0.6. In this case, what is unexpected about the test results for Data 11 and how does the ad-hoc Bonferroni correction overcome this problem?

      With all of the significance levels equal to 10%, then (a) both the t-statistic for and the t-statistic for are significant, indicating that both variables have significant individual marginal influences while (b) the F-statistic is insignificant. This result is a bit problematic unless you conclude that the two variables offset each other and happens because of the multicollinearity. The ad-hoc Bonferroni correction overcomes this problem by making it more difficult to conclude that the two slope coefficient estimates are individually significant.

  2. Solve for the significance level that moves the t-test rectangle to be tangent to the F-test ellipse.

    11. Show your derivation of the appropriate t-statistic.

    The critical values for the F-test and the t-test:

    To find the value of where the F-test ellipse is tangent to the t-test rectangle, we maximize the value of the F-test ellipse with respect to . At a simple point, we substitute the critical value of in for .

    12. Does that t-statistic change with the correlation between the two independent variables?

    No, there is no correlation coefficient in the formula for the t-statistic.

    13. In the worksheet Statistics 4, insert your formula into cell ‘Statistics 4’!G13. The cell ‘Statistics 4’!G14 calculates the significance level associated with that t-statistic. What is it when the correlation coefficient equals zero?

    14. Change the correlation coefficient value from 0.0 to −0.3 to +0.6 to −0.9 to +0.95 (which mirrors your previous work). Observe how the t-test rectangle, the F-test ellipse change, and the t-test significance level change. What is the t-test significance level in each case?

    • Obviously both the t-test rectangle and the F-test ellipse increase in size as the correlation coefficient increases and the F-test ellipse tilts as it did before. The difference is that now the t-test rectangle is larger than it was when the significance level was 5% for the t-tests.

    • The t-test significance level does not change with the correlation coefficient, so it stays at 3.23%.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.