Abstract
This study aims to improve the conceptual understanding of the interrelationships among individual-level and school-level factors of academic performance by presenting a context-based conceptual framework of academic performance and articulating relationships among the factors. In addition, this study intends to advance the statistical methodology of local regression analysis through a case study analyzing predictor variables of American College Test (ACT) score for 447 public high schools in Missouri. A school-level statistical model of ACT score with nine predictor variables relevant to student, teacher, and school characteristics is tested. Ordinary least squares (OLS) global regression analysis derives a model of five predictor variables, showing that schools with higher parent income and education levels, more double-parent family background, larger class size, and more experienced teachers tend to have higher ACT scores. Geographically weighted regression (GWR) local regression analysis is conducted using the five globally verified predictor variables to minimize violations of regression assumptions, particularly multicollinearity, in local models. Geographic distributions of local regression coefficients are examined at a series of local regression neighborhoods to draw integral conclusions of variable effects for local areas. Analyses show that using globally verified predictor variables in GWR effectively avoids multicollinearity that would otherwise appear. The results highlight critical local regression neighborhoods at which certain local areas start to show opposite local variable effects from the global variable effects.
El propósito de este estudio es mejorar el entendimiento conceptual de las interrelaciones entre los factores que obran a nivel individual y a nivel de escuela de secundaria sobre el desempeño académico, con la presentación de un marco conceptual sobre desempeño académico apoyado en contexto, articulando las relaciones entre los factores. Además, este estudio tiene la intención de mejorar la metodología estadística de análisis de regresión local por medio de un estudio de caso en el que se analizan las variables predictivas de los puntajes de la Prueba de Ingreso a la Universidad (American College Test, ACT), aplicada a 447 escuelas públicas de secundaria de Missouri. Se puso a prueba un modelo estadístico de los puntajes de la ACT a nivel de escuela, con nueve variables predictivas relevantes para caracterizar al estudiante, al maestro y la escuela. El análisis de una regresión global de mínimos cuadrados ordinarios (MCO) da lugar a un modelo de cinco variables predictivas, el cual muestra que las escuelas con padres de ingresos y niveles de educación más altos, mayores antecedentes familiares con padres en pareja, clase de tamaño más grande y maestros más experimentados, tienden a registrar puntajes ACT más elevados. Se corrió un análisis de regresión local de regresión geográficamente ponderada (RGP) utilizando cinco variables predictivas, verificadas globalmente para minimizar en los modelos locales la violación de supuestos de la regresión, en particular la multicolinealidad. Las distribuciones geográficas de los coeficientes de regresión local se examinan en una serie de vecindarios de regresión local para sacar conclusiones integrales de efectos variables para áreas locales. Los análisis muestran que utilizando variables predictivas globalmente verificadas en la RGP se evita efectivamente la multicolinealidad, que de otro modo aparecería. Los resultados destacan vecindarios críticos de regresión local en los que ciertas áreas locales empiezan a mostrar efectos locales variables contrarios a los efectos globales variables.
Acknowledgments
Portions of this research was supported by a Missouri State University Summer Faculty Fellowship awarded to Xiaomin Qiu. The authors would like to thank the editor, Mei-Po Kwan, and anonymous reviewers who provided constructive comments that improved this article from its original form.
Notes
1. Common GWR software provides two default methods to determine an optimized neighborhood distance, either by minimizing a cross-validation (CV) function or by minimizing the corrected Akaike Information Criterion (AICc; Fotheringham, Charlton, and Brunsdon2002).
2. An F value of a model is used in an ANOVA to test the statistical significance of a model improving prediction over what would be expected by chance, by comparing the variation due to regression with the variation due to error while accounting for the respective degrees of freedom. Here we do not calculate the statistical significance level from an F value but use the F value to indicate model reliability for the comparison of GWR modeling at different bandwidths. Although F values at different bandwidths are not directly comparable due to the different degrees of freedom involved, in our case the F values are substantially different and, therefore, the comparison is still valid. It is worth noting that the F value for GWR modeling is based on the effective number of parameters and the corresponding effective number of degrees of freedom in local regression (CitationFotheringham, Charlton, and Brunsdon 2002).
3. An observation is influential if its deletion causes substantial changes in the fitted model. The Cook'sdistance, measuring the difference between the fitted values obtained from the full data and the fitted values obtained by deleting a target observation, is used here to determine whether an observation is influential.
4. The t values instead of raw coefficient estimates are of interest because the local t value, calculated from dividing the local coefficient estimate by the corresponding local standard error of the estimate, accounts for uncertainty in the local estimate. The t value is, therefore, a normalized statistic that is appropriate for the comparison of different locations and variables.