3,539
Views
1
CrossRef citations to date
0
Altmetric
Articles

Estimation of Panel Data Models with Random Interactive Effects and Multiple Structural Breaks when T is Fixed

&

Abstract

In this article, we propose a new estimator of panel data models with random interactive effects and multiple structural breaks that is suitable when the number of time periods, T, is fixed and only the number of cross-sectional units, N, is large. This is done by viewing the determination of the breaks as a shrinkage problem, and to estimate both the regression coefficients, and the number of breaks and their locations by applying a version of the Lasso approach. We show that with probability approaching one the approach can correctly determine the number of breaks and the dates of these breaks, and that the estimator of the regime-specific regression coefficients is consistent and asymptotically normal. We also provide Monte Carlo results suggesting that the approach performs very well in small samples, and empirical results suggesting that while the coefficients of the controls are breaking, the coefficients of the main deterrence regressors in a model of crime are not.

1 Introduction

Dealing with structural breaks is an important step in most, if not all, empirical economic research. This is particularly true in panel data comprised of many cross-sectional units, such as individuals, firms or countries, which are all affected by major economic events. The worry is that if left unattended, existing breaks will manifest themselves as omitted variables, leading to inconsistent estimates of the slope coefficients of the model. It is therefore, important to know if and when structural breaks have occurred. Of course, such knowledge is rarely available in practice, which means that it has to be inferred from the data. We need to be able to check if there are any breaks present and, if there are, to infer both the break dates and the regime-specific slope coefficients. This should be possible even if the number of time periods, T, is fixed and only the number of cross-sectional units, N, is large, as many economic datasets have this “short” form. The procedure should also be easy to implement, it should not require data to be stationary, and it should be robust to unobserved heterogeneity. This last requirement is potentially very important because unattended heterogeneity can be mistaken for structural breaks. The current article contributes by developing a procedure that meets the above list of demands.

While the literature concerned with structural breaks in time series is huge, the literature concerned with such breaks in panel data is much smaller (see Boldea, Drepper, and Gan Citation2020, for a recent overview). Yet, panel data are particularly susceptible to structural change. One reason for this is that the sample frequency is usually much lower than in pure series data. Panel datasets therefore, tend to have long time spans, which means that the assumption of constant coefficients is likely to be violated because of major economic events. Another reason is that while T is usually quite small, because the number of cross-sectional units for which time series data is readily available is ever-increasing, N is potentially very large. This is important because the larger is N, the higher the risk that at least some of the cross-sectional units are subject to structural change. A related issue is how many breaks there are. If the literature on structural breaks in panel data is sparse, the part of the literature that deals with an unknown number of breaks is almost nonexistent.Footnote1 The only exceptions known to us are Boldea, Drepper, and Gan (Citation2020), Li, Qian, and Su (Citation2016), and Qian and Su (Citation2016), where the last two studies assume that both N and T are large, which is again something that we would like to avoid in the present article.Footnote2

Another prominent feature of the type of disaggregated “micro” panel data that we have in mind, where typically the regressors explain only a small fraction of the variation in the dependent variable, is the presence of unobserved heterogeneity. Studies such as Ahn, Lee, and Schmidt (Citation2013), Bai (Citation2009), Moon and Weidner (Citation2015), Pesaran (Citation2006), Robertson and Sarafidis (Citation2015), and Westerlund, Petrova, and Norkute (Citation2019) allow for unobserved heterogeneity in the form of interactive effects that are handled with by using either some kind of “de-factoring” or generalized method of moments (GMM); however, they do not allow for breaks and many assume that T is large. Li, Qian, and Su (Citation2016) allow for both multiple breaks and interactive effects, but then again in their article T is large. Boldea, Drepper, and Gan (Citation2020) allow for interactive effects without for that matter requiring any correction thereof. This makes their approach very simple, although at a cost in terms of additional restrictive conditions. In particular, it is assumed that the omitted variables bias caused by the omitted interactive effects is time-invariant, up to the breakpoints, which limits the type of effects and regressors that can be permitted.

The proposed methodology builds upon the so-called “adaptive group fused” Lasso approach of Li, Qian, and Su (Citation2016), and Qian and Su (Citation2016), which is suitable when the variation in the slopes has a natural ordering, as when time-stamped like in the current article. However, because in our setup T is fixed, we cannot use principal components as a means to purge the interactive effects as in Li, Qian, and Su (Citation2016). In fact, a major complication when T is fixed is that we cannot easily separate the breaks from the effects. Qian and Su (Citation2016) transform their data by taking first-differences before applying Lasso, which is expected to work also when T is fixed. However, differencing can only handle time-invariant effects. Moreover, while differencing solves the separation problem, it does so in an awkward way, since the (time-varying) slope coefficients in the model for the data in differences are not the same as for the data in levels.

The approach used in the present article can be seen as a reaction to the discussion of the last paragraph. The idea is to apply Lasso to cross-sectionally demeaned data. The demeaning does not affect the slopes and it makes the resulting estimator, henceforth referred to as “post-demeaned Lasso least squares (LS),” “PDL2S” for short, robust to interactive effects, provided that they satisfy a certain random coefficient condition; hence, the term “random interactive effects.”Footnote3 Another advantage of the new procedure is that it puts almost no assumptions on the structure of the breaks. In fact, there can be no breaks at all, and if there are breaks present the procedure does not make any assumptions about their number. The procedure is therefore, valid even if some, or indeed all, regimes have a single observation, which is very useful when wanting to detect a break as quickly as possible. Yet another advantage is that the procedure does not place any conditions on the serial correlation properties of the data. Hence, the data can be stationary, as required in the bulk of the previous literature (see Baltagi, Kao, and Liu Citation2017, for a discussion), but it does not have to be.

The rest of the article is organized as follows. Section 2 describes the model and the PDL2S approach that we will use to estimate it. Section 3 reports our main asymptotic results, whose accuracy in small samples is evaluated by means of Monte Carlo simulation in Section 4. Section 5 presents the results of a small empirical illustration using as an example the economics of crime. Section 6 concludes. All proofs and theoretical results of secondary nature are provided in the supplementary materials.

2 Model and Estimator

Consider a scalar panel data variable yi,t, observable across t=1,,T time periods and i=1,,N cross-section units. The data generating process of this variable is given by(2.1) yi,t=xi,tβt+ui,t,(2.1) (2.2) ui,t=λift+εi,t,(2.2) where xi,t is a p×1 vector of known regressors with βt being a conformable vector of unknown slope coefficients that we allow to change over time, and ui,t is a composite error term that can be both serially and cross-sectionally correlated in a very general fashion. The assumption we make is that ui,t admits to a common factor structure in which ft and λi are r×1 vectors of unobserved factors and loadings, respectively, and εi,t is a mean zero error term.Footnote4 The interactive effects are here given by λift. Any cross-sectional dependence in ui,t is assumed to be captured by these effects, so that the remainder, εi,t, is completely idiosyncratic. As usual, r and p are assumed to be fixed numbers.

We will assume that β1,,βT takes on m + 1 distinct vectors α1,,αm+1, such that(2.3) βt=αj(2.3) for t=Tj1,,Tj1,j=1,,m+1,m[0,T1],T0=1 and Tm+1=T+1. Hence, in this model, βt has m + 1 distinct regimes, or m breaks, that occur at time T1,,Tm. At the one end of the scale, we have m = 0, in which case there is only one regime and β1==βT=α1, whereas, at the other end, m=T1, which means that there are as many regimes as time periods, and hence βt=αt for all t=1,,T. It is useful to stack α1,,αm+1 and β1,,βT into the (m+1)p×1 and Tp×1 vectors Am=[α1,,αm+1] and BT=[β1,,βT], respectively, and to denote by Tm={T1,,Tm} the set of breakpoints when m > 0. If m = 0, then we define Tm=T0= (the empty set). It is also useful to note that if m+1=T, so that each regime contains only one observation, then the set of breakpoints is given by TT1={1,,T}. In what follows, we will therefore, use TT1 to denote the full set of time series observations.

Remark 1.

The model in (2.1) supposes that βt is cross-section invariant, which can be restrictive (see e.g., Baltagi, Griffin, and Xiong Citation2000). Cross-section invariance is, however, not strictly necessary for our approach to work. Of course, with T fixed, the cross-sectional variation cannot be completely unrestricted. But then this is also not necessary, as in empirical work in economics (and elsewhere) we are typically only interested in the average marginal effect. In the supplementary materials, we therefore, consider a random coefficient extension of (2.1) in which the effect of xi,t, denoted βi,t, is allowed to vary over both i and t but where the mean effect is given by E(βi,t)=βt. The proposed PDL2S procedure works well even in this case.

Remark 2.

The fact that the number of time periods within each regime is completely unrestricted is noteworthy because in the existing literature it is standard to assume that the break regimes are expanding with T (see e.g., Baltagi, Feng, and Kao Citation2016). There is also no need to truncate the sample endpoints, and in this way restrict the breakpoint to the middle of the sample, which is again standard in the literature. This means that breaks can be detected very quickly.

The goal of this article is to infer Am and Tm. Let us therefore, denote by Am00=[α10,,αm0+10] the true value of Am , where m0 is the true value of m. The set of true breakpoints is henceforth denoted Tm00={T10,,Tm00}. It is also useful to introduce BT0=[β10,,βT0] as the true value of BT .

Denote by a¯t=N1i=1Nai,t the cross-sectional average of any variable ai,t, and let a˜i,t=ai,ta¯t be the cross-sectionally demeaned version of ai,t. In this notation, (2.1) can be written as(2.4) y˜i,t=x˜i,tβt+u˜i,t.(2.4)

Cross-sectional demeaning is tantamount to demeaning with respect to common time effects. It is important to point out, however, that while we do allow for common time effects, our model does not necessarily include such effects. Hence, unless λi=λ for all i, so that λift=λft, the model is misspecified. Demeaning is still key, though, as it enables us to eliminate the mean of λi from the regression error in (2.4), which is enough to ensure consistency and asymptotic (mixed) normality as long as the remaining part is uncorrelated with x˜i,t.

To estimate BT0, we propose minimizing the following objective function:(2.5) lγ(BT)=1Ni=1Nt=1T(y˜i,tx˜i,tβt)2+γ·t=2Twt||βtβt1||,(2.5) where γ=γ(N)>0 is a tuning parameter, wt is a data-driven weight defined by wt=||β̇tβ̇t1||κ,κ>0 is a user-specified constant, and β̇t is a preliminary estimator of βt , which is obtained by minimizing the first term in lγ(BT). That is, β̇t is simply the period-by-period LS estimator;(2.6) β̇t=(i=1Nx˜i,tx˜i,t)1i=1Nx˜i,ty˜i,t.(2.6)

Simple as it may be, it is useful to be able to write this estimator in a more general notation. Let us therefore, introduce(2.7) QN(Tm)=diag(1Nt=T0T11i=1Nx˜i,tx˜i,t,,1Nt=TmTm+11i=1Nx˜i,tx˜i,t),(2.7) (2.8) RN(Tm)=[1Nt=T0T11i=1Nx˜i,ty˜i,t1Nt=TmTm+11i=1Nx˜i,ty˜i,t],(2.8) whose dimensions are given by (m+1)p×(m+1)p and (m+1)p×1, respectively. These quantities are well defined not only when m > 0 but also when m = 0, in which case Tm+1=T1=T+1, and hence QN(T0)=N1t=1Ti=1Nx˜i,tx˜i,t and RN(T0)=N1t=1Ti=1Nx˜i,ty˜i,t. We also note how QN(TT1)=diag(N1i=1Nx˜i,1x˜i,1,,N1i=1Nx˜i,Tx˜i,T) and RN(TT1)=[N1i=1Nx˜i,1y˜i,1,,N1i=1Nx˜i,Ty˜i,T]. In this notation,(2.9) ḂT=[β̇1β̇T]=[β̇1(TT1)β̇T(TT1)]=QN(TT1)1RN(TT1).(2.9)

The proposed PDL2S estimator of BT0 is given by(2.10) B̂T=[β̂1β̂T]=argminBTlγ(BT),(2.10) where the dependence on γ here is suppressed for notational simplicity. For a given B̂T, the set of estimated breaks is given by T̂m̂={T̂1,,T̂m̂}, where T̂1<<T̂m̂ for m̂>0 are such that ||β̂tβ̂t1||0 for t=T̂1,,T̂m̂. If ||β̂tβ̂t1||=0 for all t=1,,T, then m̂=0 and T̂m̂=T̂0=. We also define T̂0=1 and T̂m̂+1=T+1. The set T̂m̂ divides the sample into m̂+1 regimes such that the parameter estimates remain constant within each regime. The proposed estimator Âm̂ of Am00 is obtained by PDL2S, which is regime-by-regime LS conditional on T̂m.Footnote5 In terms of the notation introduced earlier,(2.11) Âm̂=[α̂1α̂m̂+1]=[α̂1(T̂m̂)α̂m̂+1(T̂m̂)]=QN(T̂m̂)1RN(T̂m̂),(2.11) where the dependence on T̂m̂ and γ is again suppressed.

Remark 3.

Tibshirani et al. (Citation2005) propose the fused Lasso, which penalizes the l1 norm of both the individual slope coefficients themselves and their differences. Our objective function, which is similar to the adaptive group fused Lasso objective function in Li, Qian, and Su (Citation2016) and Qian and Su (Citation2016), differs from the one used in fused Lasso. The main differences are; (i) the penalization is done by using the Frobenius norm, which enables us to identify breaks in the entire vector βt , (ii) only the coefficient differences are penalized, which is natural because in the present context there is no reason to shrink the coefficients themselves to zero, and (iii) the coefficient differences are weighted, which is necessary to achieve consistency.

3 Assumptions and Asymptotic Results

3.1 Assumptions

The conditions that we will be working under are given in Assumptions EPS, LAM, Q, MOM, and J. However, before we state these assumptions, we introduce some notations. Specifically, if A is a matrix, λmin(A) and λmax(A) signify its smallest and largest eigenvalues, respectively, tr A signifies its trace, and ||A||=tr AA signifies its Frobenius norm. If B is also a matrix, then diag(A,B) denotes the block-diagonal matrix that takes A (B) as the upper left (lower right) block. The symbols d,p and MN(·,·) signify convergence in distribution, convergence in probability and a mixed normal distribution, respectively. We use w.p.1 (w.p.a.1) to denote with probability (approaching) one. C denotes the sigma-field generated by (f1,,fT).

Assumption EPS.

  1. εi,t is conditionally independent across i given C with E(εi,t|C)=0 w.p.1;

  2. εi,t is independent of xj,s for all i, j, t, and s.

Assumption LAM.

  1. λi=λ+νi, where νi is conditionally independent across i given C with E(νi|C)=0r×1 w.p.1;

  2. νi is independent of (xj,t,εj,t) for all i, j, and t.

Assumption Q.

  1. infTmλmin[QN(Tm)]>0 w.p.1;

  2. QN(Tm)pQ0(Tm)=limNE[QN(Tm)|C] as N, where 0<infTmλmin[Q0(Tm)]< w.p.1.

Assumption MOM. E||x˜i,t||4<,||ft||< w.p.1, E||ft||4<,E||νi||4< and Eεi,t4< for all i and t.

Assumption J.

  1. (a) Jmax=max1jm0+1||αj+10αj0||=O(1);

  2. NγJminκc1[0,) and NJminc2(0,] as N, where Jmin=min1jm0+1 ||αj+10αj0||;

  3. N(κ+1)/2γ as N.

Some comments are in order. Consider Assumption EPS. Many articles in the literature assume that εi,t is (conditionally) independent over i (see e.g., Pesaran Citation2006; Ahn, Lee, and Schmidt Citation2013; Moon and Weidner Citation2015; Robertson and Sarafidis Citation2015; Westerlund, Petrova, and Norkute Citation2019), and so do we. Independence is not necessary, though, and can be relaxed to allow for weak cross-sectional dependence at the expense of additional high-level moment conditions (as in Bai Citation2009). This is demonstrated in Section 4, where we use Monte Carlo simulations to investigate the effect of error cross-section dependence. The heteroscedasticity and serial correlation properties of εi,t are not restricted in any way. As we demonstrate in empirical illustration of Section 5, the assumption that εi,t is independent of xi,t, which is the same as in, for example, Bai (Citation2009), Moon and Weidner (Citation2015), Pesaran (Citation2006), and Westerlund, Petrova, and Norkute (Citation2019), is also not necessary but can be relaxed provided that suitable instruments are available. Note also that independence between εi,t and xi,t does not rule out endogeneity, as xi,t can still be correlated with ft . Lagged dependent variables can be permitted, but then εi,t cannot be serially correlated and we also require λi=λ for all i, such that λift=λft reduces to a common time effect that is eliminated completely by the cross-sectional demeaning. In the supplementary materials, we elaborate on this point.

Assumption LAM is a random coefficient condition that can be seen as the “price” for allowing β1,,βT to vary unrestrictedly when T is fixed. It demands that λi is randomly distributed with constant mean, and that it is independent of xi,t and εi,t. The need for this condition can be explained as follows. As mentioned in Section 2, because of the demeaning, the PDL2S estimator is exactly invariant with respect to λift when λi=λ for all i. Assumption LAM ensures that the PDL2S estimator is consistent and asymptotically mixed normal even in cases when λ1,,λN are not all equal. Hence, unless λi=λ for all i, Assumption LAM binds. This might seem restrictive but actually the evidence that exists suggest that it is not (see e.g., Kapetanios, Serlenga, and Shin Citation2019; Petrova and Westerlund Citation2020). The condition is also not very controversial, and has been used extensively in the common correlated effects (CCE) strand of the literature (see Westerlund, Petrova, and Norkute Citation2019, for an overview). Arguably the study that is closest to ours is that of Boldea, Drepper, and Gan (Citation2020). They assume that N1i=1Nxi,tλiftpaj as N for all t=Tj1,,Tj1 and j=1,,m+1, so that asymptotically the sample cross-moment of the regressors and the interactive effects is constant within break regimes.Footnote6 Assumption LAM implies N1i=1Nxi,tλiftplimNN1i=1NE(xi,tλft), which may vary freely over t, and is therefore more general. This being said, we want our approach to be as widely applicable as possible. In the supplementary materials, we therefore, discuss ways in which Assumption LAM can be relaxed. One possibility here that we also make use of in the empirical illustration of Section 5 involves transforming all observations on xi,t and yi,t into deviations from their initial values, which is tantamount to allowing for cross-section fixed effects in addition to the random interactive effects allowed under Assumption LAM.

Assumption Q is a noncollinearity condition that rules out cross-section-invariant regressors in xi,t. This is the same as the usual time fixed effects-only condition. The simplicity and transparency of this condition is an advantage when compared to studies such as Bai (Citation2009), and Moon and Weidner (Citation2015), where the factors are estimated and the regressors are de-factored, as opposed to just demeaned. As a result, general “low-rank” regressors have to be ruled out in order to ensure that the de-factored regressors have enough variation.Footnote7 The problem is that the ruled out low-rank regressors depend on λi and ft , which are unknown to the researcher. There is therefore, a risk that the defactoring exhausts too much variation, causing the signal matrix to become (near) singular. This is particularly true in the type of small-T (microeconomic) panels that we have in mind where many regressors have low variation.

Assumption MOM supposes that x˜i,t, ft , νi , and εi,t have a certain number of finite moments. Four finite moments are required for x˜i,t, which is a standard condition. This condition together with the noncollinearity condition in Assumption Q, and the independence of εi,t and νi in Assumptions EPS and LAM are the only conditions placed on the regressors. This is different from the CCE strand of the literature where it is standard to assume that xi,t has a common factor structure that loads on the same factors as ui,t (see Pesaran Citation2006; Westerlund, Petrova, and Norkute Citation2019), which is restrictive in itself but also because it rules out models involving, for example, powers or products of the regressors. Boldea, Drepper, and Gan (Citation2020) and Qian and Su (Citation2016) assume that the regressors are independent across the cross-section, which is even more restrictive.Footnote8 In the present article, xi,t does not have a factor structure, nor does it have to be independent. In fact, xi,t does not even have to be stochastic, but can also contain deterministic terms such as dummy variables. And those regressors that are stochastic can be arbitrarily correlated across both time and cross-section. The same is true for ft , which is almost completely without restriction. Note in particular how the number of factors, r, is completely unrestricted, provided that it is fixed. This is different from most CCE studies where r is bounded from above by the number of observables, p + 1 (see e.g., Westerlund, Petrova, and Norkute Citation2019). Moreover, unlike in most GMM- and principal components-based studies, the proposed PDL2S estimator does not depend on the availability of a consistent estimator of r (see e.g., Ahn, Lee, and Schmidt Citation2013; Bai Citation2009; Robertson and Sarafidis Citation2015).

Assumption J imposes some conditions on the tuning parameter γ and the size of the breaks, and is easy to justify. For example, if we assume that all the breaks are bounded away from zero and infinity, then Assumption J requires that γ=O(N(1+δ)/2) with δ[0,κ). One way to satisfy Assumption J is therefore, to set γ proportional to .. (as in, e.g., Belloni et al. Citation2016; Hansen and Liao Citation2019). We also note that the breaks do not have to be bounded away from zero and hence that some, or indeed all, breaks may be shrinking to zero. The breaks therefore, do not have to be “large” for our procedure to be able to detect them, which is reassuring.

3.2 Asymptotic Results

Our first main result characterizes the limit of β̂t.

Theorem 1.

Suppose that Assumptions EPS, LAM, Q, MOM, and J hold. Then, uniformly in tTT1,||β̂tβt0||=Op(N1/2).

Theorem 1 establishes that the PDL2S estimator is consistent and that the rate of convergence is given by N1/2, which is the highest possible rate for the type of parametric fixed-T panel data models that we consider. In Lemma A.1 of the supplementary materials, we show that the preliminary period-by-period LS estimator, β̇t, is consistent at the same rate, which is just as expected because T is fixed. Hence, from a rate of convergence point of view, nothing is gained by using PDL2S. However, the preliminary estimator does not account for the fact that the slopes are constant within break regimes. It is therefore, not as efficient as PDL2S. It is also completely uninformative regarding the number of breaks and their location. This brings us to our second main result.

Theorem 2.

Suppose that Assumptions EPS, LAM, Q, MOM, and J hold. Then, as N,P(||β̂tβ̂t1||=0  for all  tTm00c=TT1Tm00)1.

The set Tm00c is the complement of Tm00. Hence, since .. is constant within break regimes, we have that βt0=βt10 for all tTm00c. Theorem 2 states that β̂tβ̂t1 is strongly consistent for βt0βt10 when tTm00c, which is a reflection of the usual sparsity result in the variable selection literature (see e.g., Fan and Li Citation2006). But from Theorem 1, we know that β̂tβ̂t1 is consistent for all t, including tTm00. This means that PDL2S is able to identify the true model in (2.1) with the correct number of breaks and break dates. The following corollary to Theorems 1 and 2 formalizes this.

Corollary 1.

Suppose that Assumptions EPS, LAM, Q, MOM, and J hold. Then, as N,

(a) P(m̂=m0)1;

(b) P(T̂m̂=Tm00|m̂=m0)1.

Theorem 3 reports the asymptotic distribution of the PDL2S estimator, and it does so conditional on the high probability even that m̂=m0.

Theorem 3.

Suppose that Assumptions EPS, LAM, Q, MOM, and J hold, and that m̂=m0. Then, as N,N(Âm̂Am00)dMN(0(m0+1)p×1,Q01Ω0Q01),whereQ0=Q0(Tm00),Ω0=limN1Ni=1NE(uiui|C),ui=ui(Tm00)=[t=T00T101x˜i,tu˜i,tt=Tm00Tm0+101x˜i,tu˜i,t].

The definitions of Q0 and Ω0 in Theorem 3 reveal that the PDL2S estimator is asymptotically equivalent to the infeasible LS estimator of (2.4) that takes all the breaks as known. In this sense, PDL2S is “oracle efficient.” That being said, Ω0 does depend on u˜i,t, which is a function of ν˜ift. Hence, while oracle efficient in the sense that it is asymptotically equivalent to the known break LS estimator, the PDL2S estimator is not asymptotically equivalent to the LS estimator that takes both the breaks and the factors as known. As pointed out in Section 2, the demeaning removes the mean of λi , and this is enough to ensure N-consistency and asymptotic (mixed) normality as long as εi,t and νi are uncorrelated with x˜i,t. However, this does not mean that the PDL2S estimator is asymptotically invariant with respect to λift, and Theorem 3 confirms this. In Section 4, we use Monte Carlo simulations as a means to investigate how the variance of the PDL2S estimator is affected by the interactive effects.

The asymptotic distribution of N(Âm̂Am00) is normal conditional on C, which means that unconditionally it is mixed normal (see Andrews Citation2005, for a discussion). The asymptotic distribution therefore, supports standard normal and chi-squared inference. Of course, for such standard inference to be possible, we need a consistent estimator of Q01Ω0Q01. Let us therefore, define ûi,t=y˜i,tx˜i,tα̂j, where t=T̂j1,,T̂j1 with j=1,,m̂+1. A natural estimator of Q01Ω0Q01 given by(3.1) QN(T̂m̂)1Ω̂QN(T̂m̂)1(3.1) where QN(T̂m̂) is as before and the (m̂+1)p×(m̂+1)p matrix Ω̂ is given by(3.2) Ω̂=1Ni=1Nûiûi,(3.2) where(3.3) ûi=ûi(T̂m̂)=[t=T̂0T̂11x˜i,tûi,tt=T̂mT̂m+11x˜i,tûi,t].(3.3)

The consistency of this estimator is a direct consequence of the consistency of Âm̂,m̂, and T̂m̂.

Corollary 2.

Suppose that Assumptions EPS, LAM, Q, MOM, and J hold. Then, as N,QN(T̂m̂)1Ω̂QN(T̂m̂)1pQ01Ω0Q01.

Remark 4.

A major point about Corollary 2 is that the asymptotic covariance matrix of the PDL2S estimator is very easily estimable. This stands in sharp contrast to the large-T framework that typically involves some kind of heteroscedasticity and autocorrelation consistent (HAC) correction (see e.g., Bai Citation2009; Pesaran Citation2006), which is not only difficult to implement but also known to lead to poor small-sample properties.

Consider testing the null hypothesis of H0:RAm00=r, where R is a q×(m0+1)p matrix of rank q(m0+1)p and r is a q×1 vector. Again, conditional on the high probability event that m̂=m0, the relevant Wald test statistic is given by(3.4) W=N(RÂm̂r)[RQN(T̂m̂)1Ω̂QN(T̂m̂)1R]1(RÂm̂r).(3.4)

Suppose that H0 is true. Then, because of Theorem 3 and Corollary 2,(3.5) W=N(RÂm̂r)(RQ01Ω0Q01R)1N(RÂm̂r)+op(1)dχ2(q)(3.5)

as N. Similarly, if q = 1, then the t-statistic(3.6) t=N(RÂm̂r)RQN(T̂m̂)1Ω̂QN(T̂m̂)1R(3.6) has a limiting N(0, 1) distribution under H0.

All the results reported so far are conditional on the tuning parameter γ. While in theory any choice satisfying Assumption J will do, as with most other tuning parameters, in practice the results can be sensitive to different specifications of γ. It might therefore, be preferable to set this parameter in a data-driven fashion. In this article, we follow Li, Qian, and Su (Citation2016), and Qian and Su (Citation2016), and set γ by minimizing an information criterion;(3.7) γ̂=argminγIC(γ),(3.7) with(3.8) IC(γ)=σ̂2(T̂m̂(γ))+ϕ·p[m̂(γ)+1],(3.8) where T̂m̂(γ) and m̂(γ) are T̂m̂ and m̂, respectively, when treated as functions of γ, ϕ=ϕ(N)>0 is a penalty, and(3.9) σ̂2(Tm)=1NTi=1Nj=1m+1t=Tj1Tj1(y˜i,tx˜i,tα̂j)2.(3.9)

Theorem 4.

Suppose that Assumptions EPS, LAM, Q, MOM, and J hold, that ϕ0 and that Nϕ. Then, as N,P[m̂(γ̂)=m0]1.

As usual, the penalty ϕ is not unique and has to be set by the researcher. Analogous to Qian and Su (Citation2016), in this article we set ϕ=(lnN)/N, which makes IC(γ) similar to the conventional Schwarz Bayesian information criterion (BIC).

4 Monte Carlo Simulations

4.1 Setup

In this section, we use Monte Carlo simulations as a means to evaluate the finite sample properties of the proposed PDL2S approach. The data generating process used for this purpose is given by a restricted version of (2.1) and (2.2) that sets p = 4 and r = 5. Similarly to Qian and Su (Citation2016), we consider m0{0,1,2} with βt=0p×1 when m0=0,βt=1p×1·1(T/2tT) when m0=1 and βt=1p×1·[1(T/3t<2T/3)+2·1(2T/3tT)] when m0=2, where 1(·) and · are the indicator and integer part functions, respectively, and 0p×1 (1p×1) is a p×1 vector of zeroes (ones).Footnote9

If ui,t and xi,t are independent, the interactive effects in (2.1) can be ignored without risking the consistency of the regular post-Lasso LS estimator based on raw (nondemeaned) data. Hence, in order to make ui,t and xi,t dependent, we allow xi,t to load on the same set of factors as ui,t, which as pointed out in Section 3.1 is a requirement in CCE. Specifically, xi,t is generated according to the following factor model:(4.1) xi,t=Γift+νi,t,(4.1) where(4.2) ft=(1φ)+φft1+ηt,(4.2) with f0=0r×1,φ{0.8,1} and ηtN(0r×1,Ir). Hence, while stationary (although highly persistent) when φ=0.8, when φ=1, ft is unit root nonstationary. The data generating process considered for νi,t is also very general and is the same as in, for example, Petrova and Westerlund (Citation2020). It is given by(4.3) νi,t=πνi,t1+ei,t+j=1Kπ(eij,t+ei+j,t),(4.3) where ν1,0==νN,0=0p×1,π{0.4,0.8}, K = 10 and ei,tN(0p×1,Ip). This means that νi,t is weakly correlated over time as well as with 2K of its neighboring cross-sectional units.Footnote10 If π=0.4, we say that the error dependence is “low,” whereas if π=0.8 the error dependence is said to be “high.” A similar process is used for generating εi,t;(4.4) εi,t=πεi,t1+ξi,t+j=1Kπ(ξij,t+ξi+j,t),(4.4) where ε1,0==εN,0=0 and ξi,tN(0,σi2) with σi2U(0.5,1). Hence, εi,t is not only weakly serially and cross-sectionally correlated but also heteroscedastic. Finally, to ensure that Assumption LAM is met the loadings in Γi and λi are drawn independently from N(2, 1).

As for the sample size, we consider all combinations of N{25,50,100,200,600} and T{5,10,20}, where the values considered for T are intentionally smaller than those considered for N.

We report four performance measures; the frequency of false detection of the estimated number of breaks, the frequency of false detection of the estimated breakpoints given that the number of breaks is selected correctly, the average number of estimated breaks, and the mean squared error (MSE) of the PDL2S estimator (times 100), computed as the average ||Âm̂Am00||/(m̂+1)p across the Monte Carlo replications, whose number is here set to 1000.

The estimation code was written in Python, which is one of the most common programming languages in applications of the Lasso. Following the previous literature on the adaptive Lasso (see Qian and Su Citation2016), we set κ = 2. For a given value γ, we optimize (2.5) using the convex optimization package CVXPY. We then determine the most appropriate value of γ by minimizing the information criterion in (3.8). To accomplish this, we need to choose a suitable grid containing values of γ that yield the true breaks. One way to do so is to first select an interval [γmax,γmin], where γmin (γmax) is chosen so that the number of estimated breaks is zero (“many”) (see Qian and Su Citation2016). We then slice [γmax,γmin] into 50 evenly sized intervals on a log-scale, optimize (2.5) at each value and select as γ̂ the value that minimizes the information criterion in (3.8).

The simulations are too time consuming for a personal computer. We used the UPPMAX (Uppsala Multidisciplinary Center for Advanced Computational Science) cluster Rackham, which is accessible via the SNIC (Swedish National Infrastructure for Computing). Rackham consists of 486 nodes, each containing two 10-core Intel Xeon V4 central processing units.

4.2 Results

contain the results, which are reported for different constellations of φ and π. We consider four cases; (i) stationary factors (φ=0.8) and low error dependence (π=0.4), (ii) nonstationary factors (φ=1) and low error dependence (π=0.4), (iii) stationary factors (φ=0.8) and high error dependence (π=0.8), and (iv) nonstationary factors (φ=1) and high error dependence (π=0.8).

Table 1 Simulation results for the case with stationary factors (ϕ=0.8) and low error dependence (π=0.4).

Table 2 Simulation results for the case with nonstationary factors (ϕ=1) and low error dependence (π=0.4).

Table 3 Simulation results for the case with stationary factors (ϕ=0.8) and high error dependence (π=0.8).

Table 4 Simulation results for the case with nonstationary factors (ϕ=1) and high error dependence (π=0.8).

We begin by considering the results reported in for the case with stationary factors and low error dependence. The first thing to note is that PDL2S does very well even when N and T are as small as N = 25 and T = 5. In fact, in this case the breaks are estimated perfectly. The only exception is when m0=2, in which case the number of breaks is not always estimated correctly. However, this is only a small-sample effect that goes away with increasing values of N and T. The breakpoints are always estimated correctly. Hence, as expected given Corollary 1, the PDL2S approach is robust to the number of breaks, and works very well even if there are no breaks at all. The MSE decreases with increasing values of N and T. The effect of N is anticipated and is a reflection of the consistency of the PDL2S estimator (Theorem 1). The improvement that comes from increasing T cannot be explained by our theoretical results, which are silent about the effect of T. It suggests that T does not have to be “small” but that the estimator works well also when T is relatively large.

As expected given the unrestricted specification of the factors, increasing their persistence from φ=0.8 to φ=1 has no effect on the results. This is clear from comparing the results reported in with those reported in . In the literature it is common to assume that the factors are stationary (see e.g., Bai Citation2009; Pesaran Citation2006), which rules out factors that are, for example, breaking or trending. This may be justified in some applications, but certainly not in general. The fact that the performance of the PDL2S estimator is unaffected by the specification of the factors is therefore, a great advantage.

High error dependence generally leads to worse performance than if the dependence is low, as is evident by comparing the results reported in and . Note in particular how the gain in performance that comes from increasing N is relatively slow when the dependence is high, which is partly expected given the high level of error cross-section correlation in this case. The effect is not detrimental, though, and so performance is still acceptable. Hence, as discussed in Section 3, error cross-section independence is not necessary. This is true not only when the factors are stationary, but also when they are nonstationary, as they are in .

All-in-all, we find that the PDL2S estimator performs very well in the type of small-T panels considered, and that it does so under a wide range of empirically relevant scenarios. It should therefore, be an attractive alternative to the already existing menu of estimators of panel regression models with possible interactive effects and breaks.

5 Empirical Illustration

5.1 Motivation

There is a large and growing empirical literature concerned with the socioeconomic determinants of crime. Usual suspects include deterrence variables capturing the probabilities of apprehension and punishment, and variables that control for the relative rate of return of legal opportunities. One of the main conclusions from this literature is that aggregate data do not provide much support of the deterrence idea that policy can reduce crime by raising expected costs (see e.g., Dills, Miron, and Summers Citation2010).

One of the most widely held explanations for this lack of empirical support is the presence of unobserved heterogeneity, which, unless appropriately accounted for, may well render the LS estimator biased and inconsistent. Cornwell and Trumbull (Citation1994) were among the first to make this point. According to them, the issue of unobserved heterogeneity cannot be ignored, and there are by now plenty of research that confirms this (see e.g., Bushway, Brame and Paternoster Citation1999; Cherry and List Citation2002; Worrall and Pratt Citation2004). Another explanation is that while most theories of crime are about the behavior of individuals, many studies use aggregated data, usually at the state or country level, even though there is by now plenty of evidence to suggest that individual (crime) behavior is not well preserved under aggregation (see Lott and Mustard Citation1997). Yet another explanation for the lack of support of the deterrence idea is the presence of structural breaks. According to McDowall and Loftin (Citation2005, p. 359), “[c]onventional explanations of crime rate trends assume that changes in the rates follow a process that is linear and constant […] Questioning the conventional assumptions, an emerging class of historical contingency theories stresses variation in the crime-generating mechanism. According to contingency explanations, the process underlying the rates […] has a structure that shifts over time.” The concern is that failure to control for such shifts is likely to result in inconsistent estimates of the model parameters.

Although the presence of unobserved heterogeneity and structural breaks have been more or less ignored in most studies, some attempts have been made to obtain at least a partial solution. Cornwell and Trumbull (Citation1994) use data on 90 counties in North Carolina between 1981 and 1987. The fact that their dataset has a panel structure makes it possible to control for certain types of unobserved heterogeneity while at the same time maintaining a relatively low level of aggregation. By contrast, studies such as Batton and Jensen (Citation2002) and Carlson and Michalowski (Citation1997) employ aggregate time series data that they split into subperiods based on major events in order to account for structural change.

Of course, while potentially quite useful by themselves, these solutions are bound to be inadequate in any application that is characterized by both unobserved heterogeneity and structural breaks. One possibility is to use panel data to account for unobserved heterogeneity and to slice up the sample period to account for breaks. But then this means that the breaks are treated as known, which is risky, as misplaced breaks are just as problematic as omitted breaks. This is important, as there is usually great uncertainty over both the number of breaks and their location. As an example, Batton and Jensen (Citation2002) used the Chow test to test for the presence of breaks at given dates. They urged caution in interpreting their test results, since almost every potential breakpoint was found to be significant.

The discussion of the last paragraph suggests that there is a need for an approach that is general enough to accommodate not only unobserved heterogeneity but also structural breaks. The PDL2S estimator fits this bill and we will therefore, use it in this empirical illustration.

5.2 Main Results

The data that we will use are the same as in Cornwell and Trumbull (Citation1994) (see also Baltagi Citation2006; Baltagi and Liu Citation2009).Footnote11 Hence, in this illustration, N = 90 and T = 7, which means that it is important to use techniques that do not require T to be large. This is another reason for considering PDL2S.

The included deterrence variables, which are standard in the literature, are the probability of arrest (PRBARR), the probability of conviction given arrest (PRBCONV), the probability of a prison sentence given a conviction (PRBPRIS), the average prison sentence in days (AVGSEN), and the number of police per capita (POLPC). In addition to the deterrence variables, the dataset contains a number of controls. There are wage variables that are intended to capture opportunities in the legal sector. These are the average weekly wages in construction (WCON), transportation, utilities and communication (WTUC), wholesale and retail trade (WTRD), finance, insurance and real estate (WFIR), services (WSER), manufacturing (WMFG), federal government (WFED), state government (WSTA) and local government (WLOC). Population density (DENSITY), and percent young male (PCTYMLE) are also included, as crime tends to depend on these. All-in-all there are p = 16 regressors, which are again similar to those considered previously in the literature (see e.g., Ghasemi Citation2017, and the references provided therein). All regressors are transformed into logs, as is the crime rate (CRMRTE).Footnote12

The PDL2S estimator is implemented as described in Section 4, except that we now take deviations from initial values, as mentioned in Section 3.1, to control for county fixed effects. This means that the first time observation, 1981, is lost and hence that the effective number of time observations is given by T1=6. We estimate m̂(γ̂)=2 breaks in 1985 and 1986, which means that according to our break detection procedure there are three regimes, 1982–1984, 1985 and 1986–1987. The resulting PDL2S results are reported in the top panel of , which also contains PDL2S, and two-way fixed effects LS results for a model without breaks.

Table 5 Empirical estimation results.

We begin by considering the results for the deterrence variables. The first thing to note is that while there is some variation in the results, the estimated effects of the deterrence variables appear to be quite stable over time. We also see that the signs of the estimated deterrence effects are largely as expected based on economic theory. PRBARR and PRBCONV are two of the variables that have attracted most interest in the previous literature (see e.g., Cherry and List Citation2002; Cornwell and Trumbull Citation1994; Ghasemi Citation2017). Their effects are estimated to be significantly negative, which is evidence in support of the deterrence idea. The estimated effects of PRBPRIS are also negative but they are generally smaller (in absolute value) than those of PRBARR and PRBCONV, and they are not always significant. The same is true for AVGSEN. Together with the significantly negative effect of PRBARR and PRBCONVI, the insignificance of PRBPRIS and AVGSEN suggests that imprisoning more criminals, or imprisoning them for longer, is not as effective as increasing the risk of apprehension or conviction once arrested. This is consistent with the idea that the consequences of being arrested and found guilty of a crime do not stop with the punishment of the criminal justice system, but that they also include other “costs” to the individual, such as the social stigmatization that come with a conviction (see Bun et al. Citation2020). POLPC enters significantly but with an unexpected positive sign, a finding that is consistent with the results of Cornwell and Trumbull (Citation1994), Baltagi (Citation2006), and Baltagi and Liu (Citation2009).

Consider next the results for the controls. The first thing to note is that most variables are not significant, as expected given the results reported in the previous literature (see e.g., Cornwell and Trumbull Citation1994; Baltagi Citation2006; Baltagi and Liu Citation2009). We also note that unlike for the deterrence variables, for the controls there are some marked jumps in the results over time, which consistent with the many major events of the period being considered (see e.g., Carlson and Michalowski Citation1997).Footnote13 To take an extreme example, consider WTUC. The estimated effect of this variable goes from 0.026 and insignificant in the first regime to 0.778 and highly significant in the second, which represents an increase of almost 3000%, only to go down to 0.084 and insignificant again in the third regime. This last observation is important because it suggests that the detected breaks might not be due to the deterrence variables but that they might be due to the other, economically and statistically relatively unimportant, control variables. The controls might therefore, do more damage than good here.

In order to be able to test for the presence of breaks in the deterrence variables only, instead of including the controls in the estimated equation we project them out prior to applying PDL2S. Specifically, a two-stage procedure, in which each deterrence variable and CRMRTE are first regressed onto all the controls, is applied. This is done point-wise in time, which means that we allow the coefficients of the controls to break every year. In the second stage, we then take the residuals of the first-stage CRMRTE and deterrence variable regressions, and feed them to the PDL2S procedure. According to the results reported in the bottom panel of , there are no breaks in the second-stage regression, which supports the idea that conditional on the controls there are no breaks. We also see that the estimated deterrence effects are almost identical to the ones obtained when the controls are included but no breaks are allowed, suggesting again that the controls are unimportant. This gives credence to studies such as Ghasemi (Citation2017) where only deterrence variables are used.

As pointed out in Section 5.1, while the literature generally agrees that the presence of breaks is a concern, the role of these breaks in the crime generating process is still an unsettled issue. The following quotation from McDowall and Loftin (Citation2005, p. 361) captures the sentiment in the literature: “contingency theories do not disagree with the conventional approach about the variables that produce the rate changes. Instead, they add a new layer of complexity to allow for the context within which the rate-generating process operates. If these theories are correct, they could significantly improve knowledge about how crime rates change over time. If they are incorrect, they might needlessly complicate attempts to refine the standard approach.” The results reported here suggest that while the crime process is breaking, the effect of the deterrence variables has been stable over time.

5.3 Robustness

In order to get a feeling for the validity of the random interactive effects assumption (extended to allow for county fixed effects), we computed the average correlation coefficient of the PDL2S residuals for all pairs of counties, and the CD test of Pesaran (Citation2021), which tests the null hypothesis of no cross-sectional correlation. If the random interactive effects assumption is correct, the regression errors should be cross-county uncorrelated, whereas if the assumption is incorrect there should be some remaining cross-county correlation. Hence, only if the residuals are cross-county uncorrelated can we conclude in favor of the random interactive effects assumption. The average correlation coefficient is –0.01 and the CD statistic is –1.56, which is insignificant even at the liberal 10% level.Footnote14 We take these results to suggest that there are no major violations of the random interactive effects assumption.Footnote15

Cornwell and Trumbull (Citation1994) argue that PRBARR and POLPC may be endogenous. As pointed out in Section 2, we do not require strict exogeneity but only exogeneity conditional on the factors. In order to assess the validity of this condition, we employed a post-demeaned version of the Lasso GMM of Qian and Su (Citation2016). The instruments used are the same as in Cornwell and Trumbull (Citation1994). They are the fraction of crimes that involve face-to-face contact, and per capita tax revenue. While the first instrument is likely to be correlated with PRBARR, as face-to-face contact makes it possible for victim to identify the offender, the second is likely to correlate with POLPC, as counties with preferences for law enforcement will vote for higher taxes to fund a larger police force.Footnote16 The results, available upon request, are very similar to those reported in . The main difference is that the standard errors are much larger in the GMM specification, which is consistent with the results of Baltagi (Citation2006), Baltagi and Liu (Citation2009), Bun et al. (Citation2020), and Cornwell and Trumbull (Citation1994). We interpret these results as providing evidence in favor of the PDL2S results reported in . The logic goes as follows: While LS and GMM are both consistent under exogeneity (conditional on the factors), the GMM instruments are not as informative as the regressors that they replace, leading to variance inflation. This is consistent with Bun et al. (Citation2020), and Cornwell and Trumbull (Citation1994), who on efficiency grounds prefer LS over GMM.

6 Conclusion

The present article considers what we believe to be an empirically very relevant scenario, namely, a researcher faced with the task of estimating a panel data model with unobserved heterogeneity and slope coefficients that may be subject to multiple structural breaks. The researcher wants to be able to estimate not only the slope coefficients within each regime, but also the unknown breakpoints and their number. Moreover, because the panel dataset is short, estimation must be possible even if the number of time periods, T, is fixed and only the number of cross-sectional units, N, is large. The current article contributes by developing a Lasso-based approach that meets this list of demands.

Our asymptotic results show that with probability approaching one the new approach correctly determines the number of breaks and their locations, and that the estimator of the regime-specific regression coefficients is consistent and asymptotically normal. Simulation results are also provided to suggest that the asymptotic predictions are borne out well in small samples.

Supplemental material

Supplemental Material

Download PDF (245.4 KB)

Acknowledgments

The authors would like to thank Ivan Canay (Coeditor), an Associate Editor and two anonymous referees for many valuable comments and suggestions.

Supplementary Materials

The supplement provides (i) proofs of the asymptotic results provided in Section 3.2 of the main article, (ii) details of the extensions mentioned in Sections 2 and 3.1 of the same article, and (iii) some Monte Carlo results pertaining to these extensions.

Additional information

Funding

Westerlund would like to thank the Knut and Alice Wallenberg Foundation for financial support through a Wallenberg Academy Fellowship.

Notes

1 An incomplete list of studies dealing with a single structural break in panel data include Antoch et al. (Citation2019), Baltagi, Feng, and Kao (Citation2016), Baltagi, Kao, and Liu (Citation2017), Hidalgo and Schafgans (Citation2017), Karavias, Narayan, and Westerlund (Citation2022), and Zhu, Sarafidis, and Silvapulle (Citation2020).

2 Qian and Su (Citation2016) recognize the importance of allowing T to be finite and discuss likely implications for theory, but they do not provide any formal results for the fixed-T case. Similarly, while in Baltagi, Feng, and Kao (Citation2016) there is a discussion of how to proceed in the presence of multiple breaks, their theory supposes that there is just one break.

3 This condition can be restrictive but it is need for the proofs; see Section 3.1 for a discussion.

4 As we explain later in Section 3, the type of factors that can be permitted under our assumptions is very broad. This suggests that there is no need to discriminate between known and unknown factors, but that one can just as well treat them all as unknown. This is the main rationale for writing (2.2) in terms of (the unknown) ft only.

5 One can also use the regular Lasso estimator of Am00 , as given by [β̂T̂0,,β̂T̂m̂]. However, as is well known in the literature, post-Lasso typically outperforms regular Lasso, and our (unreported) Monte Carlo results confirm this. In this article, we therefore, focus on post-Lasso LS.

6 The need for this condition is partly expected given the discussion in Section 1 on the difficulty of separating the breaks from the interactive effects. Boldea, Drepper, and Gan (Citation2020) do not do anything to control for the interactive effects but apply LS as if there were no effects present at all. This means that they have to put enough structure on the effects so as to ensure that they do not interfere with their break estimation procedure. One of the terms in the resulting omitted interactive effects bias of the LS estimator is given by N1i=1Nxi,tλift. If this is not constant within break regimes, the interactive effects will be mistaken for structural breaks.

7 Certain low-rank regressors can be permitted but they then require special treatment (see Bai Citation2009).

8 The condition that the regressors are identically distributed can be relaxed (see Boldea, Drepper, and Gan Citation2020). However, it is still necessary that the sample second moment matrix of the regressors is asymptotically time-invariant (within break regimes).

9 We experimented with differently spaced breaks. The results, available upon request, suggests that the conclusions are unaffected by the spacing of the breaks and that the PDL2S approach works well even if there are regimes that contains only one time period, which corroborates our asymptotic results.

10 The cross-sectional sum in νi,t is truncated at beginning and end when not enough cross-sections are available. For example, when generating ν1,t , the sum only includes e2,t,,e11,t.

11 The data can be downloaded online from the Journal of Applied Econometrics data archive, available at http://qed.econ.queensu.ca/jae/.

12 We refer to Cornwell and Trumbull (Citation1994) for a more detailed description of the data.

13 Most important, there was (i) the election of Ronald Reagan in 1980 and the political-economic reorganization that followed, (ii) a displacement of nonwhite inner-city males from the regular labor force to the criminogenic informal drug economy, and (iii) a steep increase in juvenile violent crime.

14 Juodis and Reese (Citation2022) argue that the CD test can be misleading when applied to cross-sectionally demeaned data in that it will tend to reject too often. We are unable to reject, suggesting that this tendency to over-reject is not an issue.

15 Unreported results confirm that the regressors are not uncorrelated across counties. This means that misspecification of the regression function, such as when breaks are omitted or misplaced, should manifest itself as cross-correlated residuals, which is not what we find.

16 Hence, there are two instruments, one for each of the two endogenous regressors. This means that the model is just identified. We experimented with using the one-year lagged values of PRBARR and POLPC as additional instruments. Because the resulting model is overidentified, we can apply the overidentifying restrictions J-statistic to assess the validity of the instruments. The instruments passed the test. The problem is that the lags do not appear to be very relevant, in that PRBARR and POLPC are basically serially uncorrelated, which casts doubt on the results based on the larger instrument set. For this reason, we follow the previous literature and focus on the just-identified model specification. All other regressors are treated as exogenous and are therefore, included in the set of instruments.

References

  • Ahn, S. C., Lee, Y. H., and Schmidt, P. (2013), “Panel Data Models with Multiple Time-Varying Individual Effects,” Journal of Econometrics, 174, 1–14. DOI: 10.1016/j.jeconom.2012.12.002.
  • Antoch, J., Hanousek, J., Horváth, L., Hušková, M., and Wang, S. (2019), “Structural Breaks in Panel Data: Large Number of Panels and Short Length Time Series,” Econometric Reviews, 38, 828–855. DOI: 10.1080/07474938.2018.1454378.
  • Andrews, D. W. K. (2005), “Cross-Section Regression with Common Shocks,” Econometrica, 73, 1551–1585. DOI: 10.1111/j.1468-0262.2005.00629.x.
  • Bai, J. (2009), “Panel Data Models with Interactive Fixed Effects,” Econometrica, 77, 1229–1279.
  • Baltagi, B. H. (2006), “Estimating an Economic Model of Crime using Panel Data from North Carolina,” Journal of Applied Econometrics, 21, 543–547. DOI: 10.1002/jae.861.
  • Baltagi, B. H., and Liu, L. (2009), “A Note on the Application of EC2SLS and EC3SLS Estimators in Panel Data Models,” Statistics and Probability Letters, 79, 2189–2192. DOI: 10.1016/j.spl.2009.07.014.
  • Baltagi, B. H., Feng, Q., and Kao, C. (2016), “Estimation of Heterogeneous Panels with Structural Breaks,” Journal of Econometrics, 191, 176–195. DOI: 10.1016/j.jeconom.2015.03.048.
  • Baltagi, B. H., Kao, C., and Liu, L. (2017), “Estimation and Identification of Change Points in Panel Models with Nonstationary or Stationary Regressors and Error Term,” Econometric Reviews, 36, 85–102. DOI: 10.1080/07474938.2015.1114262.
  • Baltagi, B. H., Griffin, J. M., and Xiong, W. (2000), “To Pool or Not to Pool: Homogeneous Versus Heterogenous Estimators Applied to Cigarette Demand,” Review of Economics and Statistics, 82, 117–126. DOI: 10.1162/003465300558551.
  • Batton, C., and Jensen, G. (2002), “Decommodification and Homicide Rates in the 20th-Century United States,” Homicide Studies, 6, 6–38. DOI: 10.1177/1088767902006001002.
  • Belloni, A., Chernozhukov, V., Hansen, C., and Kozbur, D. (2016), “Inference in High-Dimensional Panel Models with an Application to Gun Control,” Journal of Business & Economic Statistics, 34, 590–605.
  • Boldea, O., Drepper, B., and Gan, Z. (2020), “Change Point Estimation in Panel Data with Time-Varying Individual Effects,” Journal of Applied Econometrics, 35, 712–727. DOI: 10.1002/jae.2769.
  • Bun, M. J. G., Kelaher, R., Sarafidis, V., and Weatherburn, D. (2020), “Crime, Deterrence and Punishment Revisited,” Empirical Economics, 59, 2303–2333. DOI: 10.1007/s00181-019-01758-6.
  • Bushway, S., Brame, R., and Paternoster, R. (1999), “Assessing Stability and Change in Criminal Offending: A Comparison of Random Effects, Semiparametric, and Fixed Effects Modeling Strategies,” Journal of Quantitative Criminology, 15, 23–61.
  • Carlson, S. M., and Michalowski, R. J. (1997), “Crime, Unemployment, and Social Structures of Accumulation: An Inquiry Into Historical Contingency,” Justice Quarterly, 14, 209–241. DOI: 10.1080/07418829700093311.
  • Cherry, T., and List, J. (2002), “Aggregation Bias in the Economic Model of Crime,” Economics Letters, 75, 81–86. DOI: 10.1016/S0165-1765(01)00597-3.
  • Cornwell, C., and Trumbull, W. N. (1994), “Estimating the Economic Model of Crime with Panel Data,” Review of Economics and Statistics, 76, 360–366. DOI: 10.2307/2109893.
  • Dills, A. K., Miron, J. A., and Summers, G. (2010), “What Do Economists Know about Crime?” in The Economics of Crime: Lessons for and from Latin America, eds. R. Di Tella, S. Edwards, and E. Schargrodsky, pp. 269–302, Chicago, IL: National Bureau of Economic Research, University of Chicago Press.
  • Fan, J., and Li, R. (2006), “Statistical Challenges with High Dimensionality: Feature Selection in Knowledge Discovery,” in Proceedings of the International Congress of Mathematicians, eds. M. Sanz-Sole, J. Soria, J.L. Varona, and J. Verdera, Vol. III, European Mathematical Society, Zurich, pp. 595–622.
  • Ghasemi, M. (2017), “Crime and Punishment: Evidence from Dynamic Panel Data Model for North Carolina (2003–2012),” Empirical Economics, 52, 723–730. DOI: 10.1007/s00181-016-1093-5.
  • Hansen, C., and Liao, Y. (2019), “The Factor-Lasso and K-Step Bootstrap Approach for Inference in High-Dimensional Economic Applications,” Econometric Theory, 35, 465–509. DOI: 10.1017/S0266466618000245.
  • Hidalgo, J., and Schafgans, M. (2017), “Inference and Testing Breaks in Large Dynamic Panels with Strong Cross-Sectional Dependence,” Journal of Econometrics, 196, 259–274. DOI: 10.1016/j.jeconom.2016.09.008.
  • Juodis, A., and Reese, S. (2022), “The Incidental Parameters Problem in Testing for Remaining Cross-section Correlation,” Journal of Business & Economic Statistics, Forthcoming. DOI: 10.1080/07350015.2021.1906687.
  • Kapetanios, G., Serlenga, L., and Shin, Y. (2019), “Testing for Correlated Factor Loadings in Cross Sectionally Dependent Panels,” SERIES working papers N. 02/2019.
  • Karavias, Y., Narayan, P., and Westerlund, J. (2022), “Structural Breaks in Interactive Effects Panels and the Stock Market Reaction to COVID–19,” Journal of Business & Economic Statistics, Forthcoming. DOI: 10.1080/07350015.2022.2053690.
  • Li, D., Qian, J., and Su, L. (2016), “Panel Data Models With Interactive Fixed Effects and Multiple Structural Break,” Journal of the American Statistical Association, 111, 1804–1819. DOI: 10.1080/01621459.2015.1119696.
  • Lott, J. R. Jr., and Mustard, D. B. (1997), “The Right-to-Carry Concealed Guns and the Importance of Deterrence,” Journal of Legal Studies, 26, 1–64. DOI: 10.1086/467988.
  • McDowall, D., and Loftin, C. (2005), “Are U.S. Crime Rate Trends Historically Contingent?,” Journal of Research in Crime and Delinquency, 42, 359–383. DOI: 10.1177/0022427804270050.
  • Moon, H. R., and Weidner, M. (2015), “Linear Regression for Panel with Unknown Number of Factors as Interactive Fixed Effects,” Econometrica, 83, 1543–1579. DOI: 10.3982/ECTA9382.
  • Pesaran, M. H. (2006), “Estimation and Inference in Large Heterogeneous Panels with a Multifactor Error Structure,” Econometrica, 74, 967–1012. DOI: 10.1111/j.1468-0262.2006.00692.x.
  • Pesaran, M. H. (2021), “General Diagnostic Tests for Cross-Sectional Dependence in Panels,” Empirical Economics, 60, 13–50. DOI: 10.1007/s00181-020-01875-7.
  • Petrova, Y., and Westerlund, J. (2020), “Fixed Effects Demeaning in the Presence of Interactive Effects in Treatment Effects Regressions and Elsewhere,” Journal of Applied Econometrics, 35, 960–964. DOI: 10.1002/jae.2790.
  • Qian, J., and Su, L. (2016), “Shrinkage Estimation of Common Breaks in Panel Data Models via Adaptive Group Fused Lasso,” Journal of Econometrics, 191, 86–109. DOI: 10.1016/j.jeconom.2015.09.004.
  • Robertson, D., and Sarafidis, V. (2015), “IV Estimation of Panels with Factor Residuals,” Journal of Econometrics, 185, 526–541. DOI: 10.1016/j.jeconom.2014.12.001.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J., and Knight, K. (2005), “Sparsity and Smoothness via the Fused Lasso,” Journal of the Royal Statistical Society, Series B, 67, 91–108. DOI: 10.1111/j.1467-9868.2005.00490.x.
  • Westerlund, J., Petrova, Y., and Norkute, M. (2019), “CCE in Fixed-T Panels,” Journal of Applied Econometrics, 34, 746–761. DOI: 10.1002/jae.2707.
  • Worrall, J. L., and Pratt, T. C. (2004), “On the Consequences of Ignoring Unobserved Heterogeneity when Estimating Macro-Level Models of Crime,” Social Science Research, 33, 79–105. DOI: 10.1016/S0049-089X(03)00040-1.
  • Zhu, H., Sarafidis, V., and Silvapulle, M. J. (2020), “A New Structural Break Test for Panels with Common Factors,” Econometrics Journal, 23, 137–155. DOI: 10.1093/ectj/utz018.