Understanding Bias and Variance of Learning-to-Rank Algorithms: An Empirical Framework

Muhammad IbrahimDepartment Of Computer Science and Engineering, University of Dhaka, DhakaBangladeshCorrespondence[email protected]

https://orcid.org/0000-0003-3284-8535

Article: 2009164 | Received 15 Mar 2021, Accepted 15 Nov 2021, Published online: 11 Dec 2021

Cite this article
https://doi.org/10.1080/08839514.2021.2009164
CrossMark

Full Article
Figures & data
References
Citations
Metrics
Licensing
Reprints & Permissions
View PDF PDF View EPUB EPUB

References

Al-Maskari, A., M. Sanderson, and P. Clough (2007). The relationship between ir effectiveness measures and user satisfaction. In Proceedings of the 30th annual international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 773–759. Amsterdam, Netherlands: ACM.
Google Scholar
Bernard, S., L. Heutte, and S. Adam. 2009. Towards a better understanding of random forests through the study of strength and correlation, In Emerging intelligent computing technology and applications. Berlin, Heidelberg: With Aspects of Artificial Intelligence, pp. 536–545. Springer.
Google Scholar
Brain, D., and G. I. Webb. 2002. The need for low bias algorithms in classification learning from large data sets. In Principles of data mining and knowledge discovery, 62–73. Berlin, Heidelberg: Springer.
Google Scholar
Breiman, L. 1996. Bagging predictors. Machine Learning 24 (2):123–40. doi:10.1007/BF00058655.
Web of Science ®Google Scholar
Breiman, L. 2001. Random forests. Machine Learning 45 (1):5–32.
Web of Science ®Google Scholar
Carterette, B. (2009). On rank correlation and the distance between rankings. In Proceedings of the 32nd international ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 436–43. Boston MA, USA: ACM.
Google Scholar
Chapelle, O., and Y. Chang. 2011. Yahoo! learning to rank challenge overview. Journal of Machine Learning Research-Proceedings Track 14:1–24.
Google Scholar
Cossock, D., and T. Zhang. 2006. Subset ranking using regression 19th Annual Conference on Learning Theory 2006 (Springer) Pittsburgh PA, USA. 605–619.
Google Scholar
Diaconis, P., and R. L. Graham. 1977. Spearman’s footrule as a measure of disarray. Journal of the Royal Statistical Society: Series B (Methodological) 39 (2):262–68.
Google Scholar
Domingos, P. 2000 A unified bias-variance decomposition Proceedings of the 17th International Conference on Machine Learning (ICML). San Francisco CA: Morgan Kaufmann: Pp. 231–38.
Google Scholar
Dwork, C., R. Kumar, M. Naor, and D. Sivakumar (2001). Rank aggregation methods for the web. In Proceedings of the 10th International Confference on World Wide Web, Hong Kong, pp. 613–22. ACM.
Google Scholar
Friedman, J. H., and P. Hall. 2007. On bagging and nonlinear estimation. Journal of Statistical Planning and Inference 137 (3):669–83.
Web of Science ®Google Scholar
Friedman, J. H. 1997. On bias, variance, 0/1—loss, and the curse-of-dimensionality. Data Mining and Knowledge Discovery 1 (1):55–77.
Web of Science ®Google Scholar
Friedman, J. H. 2001. Greedy function approximation: A gradient boosting machine. (English summary). Annals of Statistics 29 (5):1189–232.
Web of Science ®Google Scholar
Ganjisaffar, Y., R. Caruana, and C. V. Lopes (2011). Bagging gradient-boosted trees for high precision, low variance ranking models. In Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval Beijing, China, pp. 85–94. ACM.
Google Scholar
Ganjisaffar, Y., T. Debeauvais, S. Javanmardi, R. Caruana, and C. V. Lopes (2011). Distributed tuning of machine learning algorithms using mapreduce clusters. In Proceedings of the 3rd Workshop on Large Scale Data Mining: Theory and Applications, San Diego CA, USA, pp. 2. ACM.
Google Scholar
Geman, S., E. Bienenstock, and R. Doursat. 1992. Neural networks and the bias/variance dilemma. Neural Computation 4 (1):1–58.
Web of Science ®Google Scholar
Geurts, P., and G. Louppe (2011). Learning to rank with extremely randomized trees. In JMLR: Workshop and Conference Proceedings, Boston MA, USA, Volume 14.
Google Scholar
Geurts, P. 2005. Bias vs variance decomposition for regression and classification. In Data mining and knowledge discovery handbook, 749–63. Boston MA, USA: Springer.
Google Scholar
Ghosal, I., and G. Hooker. 2020. Boosting random forests to reduce bias; one-step boosted forest and its variance estimate. Journal of Computational and Graphical Statistics 30 (2):493–502.
Web of Science ®Google Scholar
Hastie, T., R. Tibshirani, and J. Friedman (2009). The elements of statistical learning.
Google Scholar
Hofmann, K., S. Whiteson, and M. de Rijke. 2013. Balancing exploration and exploitation in listwise and pairwise online learning to rank for information retrieval. Information Retrieval 16 (1):63–90.
Google Scholar
Horváth, M. Z., M. N. Müller, M. Fischer, and M. Vechev. 2021. Boosting randomized smoothing with variance reduced classifiers. arXiv Preprint arXiv:2106 06946.
Google Scholar
Ibrahim, M., and M. Carman. 2016. Comparing pointwise and listwise objective functions for random-forest-based learning-to-rank. ACM Transactions on Information Systems (TOIS) 34 (4):20.
Web of Science ®Google Scholar
Ibrahim, M., and M. Murshed. 2016. From tf-idf to learning-to-rank: An overview. In Handbook of research on innovations in information retrieval, analysis, and management, Martins, J. T., and Molnar, A., edited by, 62–109. USA: IGI Global.
Google Scholar
Ibrahim, M. 2019. Reducing correlation of random forest–based learning-to-rank algorithms using subsample size. Computational Intelligence 35 (4):774–98.
Web of Science ®Google Scholar
Ibrahim, M. 2020. An empirical comparison of random forest-based and other learning-to-rank algorithms. Pattern Analysis and Applications 23 (3):1133–55.
Web of Science ®Google Scholar
James, G. M. 2003. Variance and bias for general loss functions. Machine Learning 51 (2):115–35.
Web of Science ®Google Scholar
Järvelin, K., and J. Kekäläinen (2000). Ir evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, pp. 41–48. ACM.
Google Scholar
Jones, T., P. Thomas, F. Scholer, and M. Sanderson (2015). Features of disagreement between retrieval effectiveness measures. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval Santiago, Chile, USA: ACM, 847–850.
Google Scholar
Kohavi, R., Wolpert, D. H., et al. 1996 Bias plus variance decomposition for zero-one loss functions, Proceedings of the 13th International Conference on Machine Learning (ICML), Italy USA: Morgan Kaufmann. In , 275–283.
Google Scholar
Kong, E. B., and T. G. Dietterich. 1995. Error-correcting output coding corrects bias and variance. In International Confference on Machine Learning (ICML), USA: Morgan Kaufmann, 313–321.
Google Scholar
Lan, Y., T.-Y. Liu, T. Qin, Z. Ma, and H. Li (2008). Query-level stability and generalization in learning to rank. In Proceedings of the 25th International Confference on Machine Learning, Helsinki, Finland, pp. 512–19. ACM.
Google Scholar
Lan, Y., T.-Y. Liu, Z. Ma, and H. Li (2009). Generalization analysis of listwise learning-to-rank algorithms. In Proceedings of the 26th Annual International Confference on Machine Learning, Montreal, Canada, pp. 577–84. ACM.
Google Scholar
Lapata, M. 2006. Automatic evaluation of information ordering: Kendall’s tau. Computational Linguistics 32 (4):471–84.
Web of Science ®Google Scholar
Lin, Y., and Y. Jeon. 2006. Random forests and adaptive nearest neighbors. Journal of the American Statistical Association 101 (474):578–90.
Web of Science ®Google Scholar
Liu, T.-Y. 2011. Learning to rank for information retrieval. Berlin Heidelberg: Springerverlag.
Google Scholar
Manning, C. D., P. Raghavan, and H. Schütze. 2008. Introduction to information retrieval, vol. 1. Cambridge, UK: Cambridge University Press.
Google Scholar
Moffat, A. 2013. Seven numeric properties of effectiveness metrics. In Information retrieval technology, 1–12. Berlin, Heidelberg: Springer.
Google Scholar
Mohan, A., Z. Chen, and K. Q. Weinberger. 2011. Web-search ranking with initialized gradient boosted regression trees. Journal of Machine Learning Research-Proceedings Track 14:77–89.
Google Scholar
Qin, T., T.-Y. Liu, J. Xu, and H. Li. 2010. Letor: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval 13 (4):346–74.
Google Scholar
Quoc, C., and V. Le. 2007. Learning to rank with nonsmooth cost functions. Proceedings of the Advances in Neural Information Processing Systems 19:193–200.
Google Scholar
Sanderson, M., M. L. Paramita, P. Clough, and E. Kanoulas (2010). Do user preferences and evaluation measures line up? In Proceedings of the 33rd international ACM SIGIR Conference on Research and Development in Information Retrieval, Geneva, Switzerland, pp. 555–62. ACM.
Google Scholar
Sculley, D. (2007). Rank aggregation for similar items. In SIAM International Conference on Data Mining (SDM), Philadelphia PA, USA, pp. 587–592. SIAM.
Google Scholar
Segal, M. R. (2004). Machine learning benchmarks and random forest regression.
Google Scholar
Sexton, J., and P. Laake. 2009. Standard errors for bagged and random forest estimators. Computational Statistics & Data Analysis 53 (3):801–11.
Web of Science ®Google Scholar
Voorhees, E. M. 2000. Variations in relevance judgments and the measurement of retrieval effectiveness. Information Processing & Management 36 (5):697–716.
Web of Science ®Google Scholar
Wager, S., T. Hastie, and B. Efron. 2014. Confidence intervals for random forests: The jackknife and the infinitesimal jackknife. The Journal of Machine Learning Research 15 (1):1625–51.
PubMed Web of Science ®Google Scholar
Wu, Q., C. J. Burges, K. M. Svore, and J. Gao. 2010. Adapting boosting for information retrieval measures. Information Retrieval 13 (3):254–70.
Google Scholar
Xia, F., T.-Y. Liu, and H. Li. 2009. Top-k consistency of learning to rank methods. Advances in Neural Information Processing Systems 22:2098–106.
Google Scholar
Xia, F., T.-Y. Liu, J. Wang, W. Zhang, and H. Li (2008). Listwise approach to learning to rank: Theory and algorithm. In Proceedings of the 25th International Confference on Machine Learning, Helsinki, Finland, pp. 1192–99. ACM.
Google Scholar
Yilmaz, E., J. A. Aslam, and S. Robertson (2008). A new rank correlation coefficient for information retrieval. In Proceedings of the 31st annual international ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, pp. 587–94. ACM.
Google Scholar
Zhang, G., and Y. Lu. 2012. Bias-corrected random forests in regression. Journal of Applied Statistics 39 (1):151–60.
Web of Science ®Google Scholar

Share icon
Back to Top

Related research

People also read lists articles that other readers of this article have read.

Recommended articles lists articles that we recommend and is powered by our AI driven recommendation engine.

Cited by lists all citing articles based on Crossref citations.
Articles with the Crossref icon will open in a new tab.

People also read
Recommended articles
Cited by

To cite this article:

Reference style: APA Chicago Harvard

Citation copied to clipboard

Reference styles above use APA (6th edition), Chicago (16th edition) & Harvard (10th edition)

Download citation

Download a citation file in RIS format that can be imported by citation management software including EndNote, ProCite, RefWorks and Reference Manager.

Choose format: RIS BibTex RefWorks Direct Export

Choose options: Citation Citation & abstract Citation & references

Understanding Bias and Variance of Learning-to-Rank Algorithms: An Empirical Framework

References

Information for

Open access

Opportunities

Help and information

Your download is now in progress and you may close this window

Login or register to access this feature

Understanding Bias and Variance of Learning-to-Rank Algorithms: An Empirical Framework

References

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date