Search in:

Journal of Computational and Graphical Statistics Volume 31, 2022 - Issue 3

Submit an article Journal homepage

Open access

2,066

Views

CrossRef citations to date

Altmetric

Machine Learning

Linear Aggregation in Tree-Based Estimators

Sören R. Künzela Department of Statistics, University of California, Berkeley, CAView further author information

Theo F. Saarinenb University of California, Berkeley, CAView further author information

Edward W. Liub University of California, Berkeley, CAView further author information

Jasjeet S. Sekhonc Department of Statistics & Data Science, Yale University, New Haven, CTCorrespondence[email protected]
View further author information

Pages 917-934 | Received 22 Jan 2021, Accepted 14 Dec 2021, Published online: 18 Feb 2022

Cite this article
https://doi.org/10.1080/10618600.2022.2026780
CrossMark

Full Article
Figures & data
References
Supplemental
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

Figures & data

Fig. 1 Comparison of the classical CART and LRT. The top of the figure shows the fit of LRT and the fit of classical CART, and the lower part shows the density of the training set.

Table D1 Above displays the software packages and tuned hyperparameters used by the caret package for each estimator in Section 3.

Download CSV Display Table

Fig. 2 Different levels of smoothness. In Experiment 1, the response surface is a linear function, in Experiment 2, it is a step function, and in Experiment 3, it is partly a step function and partly a linear function.

Table 1 Summary of real world datasets.

Download CSV Display Table

Table 2 Estimator RMSE compared across real world datasets.

Display Table

Fig. 3 The mean Ridge Coefficients from simulated data generated according to EquationEquation (10)(10) $Y = ς (X_{1}) ς (X_{2}) + ς (X_{3}) ς (X_{4}) + ϵ$ (10) , repeated over 100 Monte Carlo replications. The true outcome relies on only the first four covariates, and these coefficients have nonzero slope which is picked up by the LRF. The horizontal line corresponds to 1.96 times the sample standard deviation of the coefficients over the Monte Carlo replications.

Fig. 4 The first tree of the S-Learner as described in Section 4.2. The first row in each leaf contains the number of observations in the averaging set that fall into the leaf. The second part of each leaf displays the regression coefficients. Baseline stands for untreated base turnout of that leaf and it can be interpreted as the proportion of units that fall into that leaf who voted in the 2004 general election. Each coefficient corresponds to the ATE of the specific mailer in the leaf. The color strength is chosen proportional to the treatment effect for the neighbors treatment.

Fig. A1 This example has a piecewise linear response, with alternating slopes in the first covariate. Here we plot X₁ against the outcome y, and overlay the fitted values which are returned by both the Local Linear Forest and Linear Regression Tree.

Fig. A1 This example has a piecewise linear response, with alternating slopes in the first covariate. Here we plot X1 against the outcome y, and overlay the fitted values which are returned by both the Local Linear Forest and Linear Regression Tree.

Fig. H1 The next three trees of the S-Learner as described in Section 4.2. The first row in each leaf contains the number of observations in the averaging set that fall into the leaf. The second part of each leaf displays the regression coefficients. Baseline stands for untreated base turnout of that leaf and it can be interpreted as the proportion of units that fall into that leaf who voted in the 2004 general election. Each coefficient corresponds to the ATE of the specific mailer in the leaf. The color strength is chosen proportional to the treatment effect for the neighbors treatment.

Fig. I1 The left shows the mean normalized variable importances from LRF on simulated data, repeated over 100 Monte Carlo replications. The right shows the mean normalized variable importances from RF on simulated data, repeated over 100 Monte Carlo replications. The true outcome relies on only the first four covariates.

Fig. J1 We run both the naive linear regression tree algorithm and the fast LRF algorithm from Section 2 for 100 Monte Carlo replications on the same data generated according to EquationEquation 25(25) $Y = .4 X_{1} + 2 X_{2} - .9 X_{3} + .25 X_{4} + ϵ$ (25) . The left figure shows the mean timing of both algorithms as we keep the dimension fixed and vary the sample size, and the right figure shows the mean timing of both algorithms as we fix the sample size and vary the dimension.

Supplemental material

Data Availability Statement

Code for replication of results can be found at https://github.com/forestry-labs/RidgePaperReplication.

The Rforestry package can be found at https://github.com/forestry-labs/Rforestry

Related Research Data

Linear Aggregation in Tree-based Estimators

Source: Taylor & Francis

Linking provided by

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Linear Aggregation in Tree-Based Estimators

Table D1 Above displays the software packages and tuned hyperparameters used by the caret package for each estimator in Section 3.

Table 1 Summary of real world datasets.

Table 2 Estimator RMSE compared across real world datasets.

Supplemental Material

Supplemental Material

Supplemental Material

Supplemental Material

Supplemental Material

Supplemental Material

Related Research Data

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Linear Aggregation in Tree-Based Estimators

Figures & data

Table D1 Above displays the software packages and tuned hyperparameters used by the caret package for each estimator in Section 3.

Table 1 Summary of real world datasets.

Table 2 Estimator RMSE compared across real world datasets.

Supplemental Material

Supplemental Material

Supplemental Material

Supplemental Material

Supplemental Material

Supplemental Material

Data Availability Statement

Related Research Data

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date