Views

CrossRef citations to date

Altmetric

Articles

Combining phenotypic and genomic data to improve prediction of binary traits

D. Jarquina Agronomy, University of Florida, Gainesville, FL, USACorrespondence[email protected]

https://orcid.org/0000-0002-5098-2060

A. Royb Biostatistics Department, University of Florida, Gainesville, FL, USA

B. Clarkec Statistics, University of Nebraska-Lincoln, Lincoln, NE, USA

S. Ghosald Statistics, North Carolina State University, Raleigh, NC, USA

https://orcid.org/0000-0002-1710-9761

Abstract

Plant breeders want to develop cultivars that outperform existing genotypes. Some characteristics (here ‘main traits’) of these cultivars are categorical and difficult to measure directly. It is important to predict the main trait of newly developed genotypes accurately. In addition to marker data, breeding programs often have information on secondary traits (or ‘phenotypes’) that are easy to measure. Our goal is to improve prediction of main traits with interpretable relations by combining the two data types using variable selection techniques. However, the genomic characteristics can overwhelm the set of secondary traits, so a standard technique may fail to select any phenotypic variables. We develop a new statistical technique that ensures appropriate representation from both the secondary traits and the genotypic variables for optimal prediction. When two data types (markers and secondary traits) are available, we achieve improved prediction of a binary trait by two steps that are designed to ensure that a significant intrinsic effect of a phenotype is incorporated in the relation before accounting for extra effects of genotypes. First, we sparsely regress the secondary traits on the markers and replace the secondary traits by their residuals to obtain the effects of phenotypic variables as adjusted by the genotypic variables. Then, we develop a sparse logistic classifier using the markers and residuals so that the adjusted phenotypes may be selected first to avoid being overwhelmed by the genotypic variables due to their numerical advantage. This classifier uses forward selection aided by a penalty term and can be computed effectively by a technique called the one-pass method. It compares favorably with other classifiers on simulated and real data.

Keywords:

Mathematics Subject Classifications:

Acknowledgments

The authors thank Holland Computing Center for providing computing resources and Zhikai Liang & James Schnable for making their data and insights accessible to us.

Availability of data and materials

All code used for the computations presented here is available at https://github.com/royarkaprava/OPM.

Disclosure statement

No potential conflict of interest was reported by the author(s).

Notes

1 Actually, we also tested RVMs with an RBF kernel and SVM with a Laplace kernel. The available code for the former often would not run and the latter gave very poor results. This amounted to testing an extra eight methods.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Order Reprints Request Corporate Permissions

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

Request Academic Permissions

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.

Combining phenotypic and genomic data to improve prediction of binary traits

Information for

Open access

Opportunities

Help and information

Combining phenotypic and genomic data to improve prediction of binary traits

Abstract

Acknowledgments

Availability of data and materials

Disclosure statement

Notes

Reprints and Corporate Permissions

Academic Permissions

Related research

To cite this article:

Download citation

Information for

Open access

Opportunities

Help and information

Keep up to date

Your download is now in progress and you may close this window

Login or register to access this feature