1,698
Views
4
CrossRef citations to date
0
Altmetric
Original Articles

Agreement between Youth Self-Report and Biospecimen-Confirmed Substance Use: A Systematic Review

ORCID Icon, , &

Abstract

Context

Biospecimen analysis may enhance confidence in the accuracy of self-reported substance use among adolescents and transitional age youth (TAY). Associations between biospecimen types and self-reported use, however, are poorly characterized in the existing literature. Objective: We performed a systematic review of associations between biospecimen-confirmed and self-reported substance use. Data sources: PubMed, Embase, and Web of Science. Study selection: We included studies documenting associations between self-reported and biospecimen-confirmed substance use among adolescents (12–18 years) and TAY (19–26 years) published 1990–2020. Data extraction: Three authors extracted relevant data using a template and assessed bias risk using a modified JBI Critical Appraisal Tool. Results: We screened 1523 titles and abstracts, evaluated 73 full texts for eligibility, and included 28 studies. Most studies examined urine (71.4%) and hair (32.1%) samples. Self-report retrospective recall period varied from past 24 h to lifetime use. Agreement between self-report and biospecimen results were low to moderate and were higher with rapidly metabolized substances (e.g., amphetamines) and when shorter retrospective recall periods were applied. Frequently encountered sources of potential bias included use of non-validated self-report measures and failure to account for confounding factors in the association between self-reported and biospecimen-confirmed use. Limitations: Study heterogeneity prevented a quantitative meta-analysis. Studies varied in retrospective recall periods, biospecimen processing, and use of validated self-report measures. Conclusions: Associations between self-reported and biospecimen-confirmed substance use are low to moderate and are higher for shorter recall periods and for substances with rapid metabolism. Future studies should employ validated self-report measures and include demographically diverse samples.

Introduction

Substance use is often initiated during adolescence; 38.3% and 44.0% of US students in grades 8, 10, and 12 report lifetime use of any illicit drug (including inhalants) and alcohol, respectively (Johnston et al., Citation2021). Although experimentation with substances is typical, neurobiological changes make adolescents particularly vulnerable to progression to problematic use (Gray & Squeglia, Citation2018). The potential adverse effects of substance use during adolescence are well documented, including, but not limited to, increased risk for sexually transmitted infections, unintended pregnancy, involvement in the juvenile legal system, school truancy, exacerbation of psychiatric symptoms, and physical health problems (Gray & Squeglia, Citation2018; Kulak & Griswold, Citation2019). Additionally, substance use often co-occurs with psychiatric disorders in adolescence (Brownlie et al., Citation2019; Welsh et al., Citation2017) and is associated with impaired social functioning (Marti et al., Citation2010).

Given the strong bidirectional associations between substance use and adverse health and social outcomes, health researchers often seek to quantify adolescent substance use (Winters & Kaminer, Citation2008). Such research would ideally include external verification (e.g., biospecimen) of substance use to mitigate potential self-report biases (e.g., under-reporting). Self-reported use is logistically more straightforward than obtaining biospecimens (e.g., urine, hair, saliva, breath) and biological assays often only test for recent (days to weeks) past use, whereas many studies examine lifetime or months-long usage patterns (Aarons et al., Citation2001). Nonetheless, for recent substance use, it is important to understand the validity of adolescent self-report of substance use compared to analysis of biospecimens (Williams & Nowatzki, Citation2005). The current systematic review therefore examines the correspondence between self-reported and biospecimen-confirmed substance use among adolescents and transitional age youth (TAY).

Methods

Overall study design

The current systematic review followed guidelines described in the Preferred Reporting Items for Systematic Reviews and Meta-Analysis (PRISMA) (Moher et al., 2009).

Search strategy

A search was carried out in March of 2021 by an information specialist using three electronic databases (PubMed, Embase, and Web of Science). A combination of medical subject headings and free text terms were used to identify publications pertaining to adolescent substance use and biospecimen testing. Searches were limited to publications in English and with publication dates from January 01, 1990 to December 31, 2020 to increase precision. The reference lists of the articles selected for analysis were searched for additional pertinent publications. Full details of the search strategy are presented in the Supporting Information.

Inclusion and exclusion criteria

Eligible studies were peer-reviewed empirical articles (e.g., controlled trial, cohort study) published between 1990 and 2020 in English, including participants ages 12–26 years and assessing youth-report of substance use (e.g., tobacco, alcohol, cannabis, cocaine, methamphetamine, ecstasy, and other illicit substances) and biospecimen confirmation of substance use within the study sample. We excluded non-empirical reports (e.g., review, meta-analysis, commentary, opinion piece), publications written in any language other than English, and studies involving participants outside the ages of 12–26 years when data was not reported separately for our target age group.

Study selection

Study screening progress was documented in a PRISMA flow chart (). After removing duplicates, our search strategy yielded 1525 publications. Three reviewers (JF, MH, QM) independently screened a random sample of 100 titles and abstracts and collaboratively reviewed decisions to ensure inter-rater reliability. Publications were then divided and screened by the three reviewers to determine if they met criteria for full-text review; 1450 were eliminated because of irrelevance to the topic. Full text screening of 75 articles was completed independently by three reviewers (JF, MH, RK) to determine eligibility for inclusion. Of the 75 full texts reviewed, 47 did not meet inclusion because they were (reasons not mutually exclusive) the wrong study type (e.g., systematic review; n = 10), did not present youth data separately from adult data (n = 23), or reported no measure of association between self-reported and biospecimen confirmed substance use (n = 21). A team of three reviewers (JF, MH, RK) assessed and summarized findings from the final 28 articles. To establish interrater reliability, the three reviewers extracted data from three articles (11% of articles included) independently and met to compare results and resolve discrepancies through discussion. Any disagreements in the full text review were resolved collectively through consultation and detailed examination of the study. The remaining articles were coded independently by a single reviewer. Reviewers met regularly to discuss any questions regarding the articles they coded independently to ensure consistency in decision making.

Figure 1. PRISMA flow diagram. Abbreviation: PRISMA: preferred reporting items for systematic reviews and meta-analyses.

Figure 1. PRISMA flow diagram. Abbreviation: PRISMA: preferred reporting items for systematic reviews and meta-analyses.

Data extraction

Data were extracted by three authors (JF, MH, RK) into a standardized electronic form, including information about study design, participants, self-report and biospecimen assessments, and congruence between self-report and biospecimen findings.

Quality assessment

The JBI Critical Appraisal Checklist for Analytical Cross-Sectional Studies (JBI, Citation2021) was used to assess the methodological quality and risk of bias of included studies. The JBI was modified to address specific areas relevant to the current review and each study was rated on eight domains (see ). Three members of the review team (JF, MH, RK) independently assessed risk of bias for each included study and resolved disagreements through discussion. Bias scores could range from 0 to 8, with high scores reflecting greater degree of bias/lower methodological quality. The quality assessment had no impact on study inclusion or exclusion.

Table 1. Bias risk assessment ratings.

Synthesis of results

We performed a narrative synthesis because of heterogeneous self-report methods and biospecimen assays for substance use. No single summary measure was applicable across all studies. Additionally, we tabulated the relevant kappa values (a measure of reproducibility or agreement between two measures) by biospecimen type, substance tested, and retrospective self-report recall period among studies that calculated this statistic (). The following classification of kappa values was used to interpret values: ≤0, no agreement; 0.01–0.20, none to slight; 0.21–0.40, fair; 0.41–0.60, moderate; 0.61–0.80, substantial; 0.81–1.00, almost perfect agreement (McHugh, Citation2012).

Table 2. Kappa statistics by biospecimen, substance, and self-report period among studies that reported kappa values.

Results

Characteristics of included studies

Twenty-eight articles met criteria for inclusion in the final analysis. Study designs were 80.8% (n = 21) cross-sectional and 19.2% (n = 5) longitudinal. Study sample sizes ranged from 31 to 33,313. Studies predominantly focused on adolescents only (n = 17; adolescence defined as ages 12–18 years) or adolescents and TAY (TAY defined as 19–26 years; n = 9); two studies included only TAY. Participating youth were on average 54.9% white and 66.2% male; 35.7% (n = 10) of studies included youth with involvement in the legal system. Most studies (71.4%, n = 20) used urine samples, and 32.1% (n = 9) used hair, 10.7% (n = 3) blood, 7.1% (n = 2) saliva, and 7.1% (n = 2) breathalyzer; of these studies, 25.0% (n = 7) included more than one biospecimen for comparison with self-report (included in specimen totals). Studies primarily examined congruence between biospecimens and self-reported results for cannabis (67.9%; n = 19) and cocaine (50.0%; n = 14), with smaller subsets focused on amphetamines/methamphetamines (28.6%; n = 8), opioids (28.6%; n = 8), alcohol (25.0%; n = 7), hallucinogens (17.9%; n = 5), PCP (14.3%, n = 4), nicotine (14.3%; n = 4), barbiturates (7.1%; n = 2), and other substances (10.7%; n = 3).

Methodological quality of included studies

Results of the JBI risk of bias assessment are displayed in . Risk of bias across the studies was low, on average (M = 2.0, SD = 1.2, range = 0.5–4.5). Three studies (Feucht et al., Citation1994; Komro et al., Citation1993; Oesterle et al., Citation2015) had ratings above the midpoint (sum scores ≥ 4), reflecting a moderate amount of bias. The most common sources of bias across studies were not identifying and addressing possible confounders (e.g., self-report period, site), and not adequately describing inclusion/exclusion criteria. For the purpose of this review, no articles were excluded due to risk of bias.

Concordance between self-report and biospecimens

Urine

Twenty studies examined congruence among self-report and urinalysis. Sample size ranged from 31 to 33,313 and youth were on average 72.9% male and 56.4% white; six studies included adolescents and TAY, 13 were adolescents only, and one included only TAY. Study designs were 30.0% longitudinal (n = 6). Substances examined were primarily cannabis (n = 18) and cocaine (n = 12), with fewer studies on opioids (n = 8), amphetamines (n = 8), alcohol (n = 5), benzodiazepines (n = 4), PCP (n = 4), barbiturates (n = 2), and methaqualone (n = 1).

Kappas for cannabis were reported in nine studies (), ranging from 0.11 (Yacoubian, Citation2001) to 0.79 (Wilcox et al., Citation2013). Percent discrepancy between self-report and urinalysis was reported in three studies, at 13% (Akinci et al., Citation2001) and 35% (Mieczkowski et al., Citation1998), and at 3% for those who self-reported no use, 47% for those who reported use, and at 21% for those who reported abuse or dependence (Gignac et al., Citation2005). Intraclass correlations were reported in one study (Donohue et al., Citation2007) across six months, ranging from 0.47 to 0.58 based on contemporaneous self-report and 0.39 to 0.62 based on timeline follow-back report.

Regarding other substances, kappas for cocaine were reported in five studies, ranging from 0.18 (Yacoubian et al., Citation2003) to 0.75 (Wilcox et al., Citation2013) (); percent discrepancy was reported in two studies at 7% (Mieczkowski et al., Citation1998) and 100% (Feucht et al., Citation1994). Kappas ranged from 0.05 (Fendrich & Xu, Citation1994) to 0.72 (Wilcox et al., Citation2013) for opioids (reported in five studies; ); 0.14 (Fendrich & Xu, Citation1994) to 0.83 (Solbergsdottir et al., Citation2004) for amphetamines (reported in three studies); 0.55 (Wilcox et al., Citation2013) to 0.86 (Solbergsdottir et al., Citation2004) for benzodiazepines (reported in two studies); 0.13 (Williams & Nowatzki, Citation2005) to 0.19 (Solbergsdottir et al., Citation2004) for alcohol (reported in two studies); and 0.18 (Fendrich & Xu, Citation1994) to 0.46 (Yacoubian et al., Citation2003) for PCP (reported in two studies; ). The remaining studies combined results across biospecimens and could not be disaggregated.

Hair

Nine studies examined congruence among self-report and hair specimen analysis. Sample size ranged from 48 to 1000 and youth were on average 62.7% male and 46.1% white; four studies included adolescents only, three included adolescents and TAY, and two TAY only. All studies were cross-sectional designs. Substances examined were cocaine (n = 5), cannabis (n = 3), alcohol (n = 2), amphetamines (n = 1), hallucinogens (n = 1), nicotine (n = 1), and other (n = 2; bath salts, caffeine). Only three studies reported kappas (), although one did not clearly define the self-report period ().

For cocaine, kappas were 0.45 among methadone patients and 0.00 among criminal justice-involved individuals (Magura & Kang, Citation1997). Regarding percent detection, in a sample with no reported cocaine use, 1.7% tested positive on hair analysis (Bessa et al., Citation2010); in a sample where 0.9%, 3.2%, and 13.2% reported cocaine use in the last 3 days, 30 days, and ever, 22% tested positive for cocaine (Mieczkowski et al., Citation1998). Another study found of those who tested positive for cocaine, 93% self-reported no use in the past three months, whereas of those who tested negative 8% self-reported use (Dembo et al., Citation1999). Among six participants who reported ever using cocaine, five were negative by hair assay and of the three who reported use in the last month, only one was positive by hair assay (Feucht et al., Citation1994).

Regarding cannabis, in a sample with no reported cannabis use, 4% tested positive on hair analysis (Bessa et al., Citation2010); in a sample where 25.3%, 50.6%, and 85.6% reported cannabis use in the last three days, 30 days, and ever, 38.5% tested positive for cannabis (Mieczkowski et al., Citation1998). An additional study found that of those who tested positive for cannabis, 19% self-reported no use in the past three months, whereas of those who tested negative, 35% self-reported use (Dembo et al., Citation1999).

For amphetamine use, percent detection ranged from 34.3% (≤90 days) to 46.4% (≤1 day) (Junkuy et al., Citation2014). Among participants who reported lifetime use of “bath salts,” stimulant NPS, or unknown pills or powders, 41.2% (n = 14) tested positive for an NPS (i.e., had discordant reports) (Palamar et al., Citation2016). For alcohol, kappas ranged from 0.31 to 0.72 depending upon the method of analysis (Bertol et al., Citation2017); among heavy drinkers and high alcohol consumers, kappas were −0.05 and −0.03, respectively (Comasco et al., Citation2009). Kappas for caffeine and tobacco were 0.57 and 0.40, respectively (Bertol et al., Citation2017).

Blood

Three studies examined congruence among self-report and blood specimens. Sample size ranged from 200 to 2107 and youth were on average 42.7% male and 63.2% white; two studies included adolescents only and one included adolescents and TAY. All three were cross-sectional designs. One focused solely on alcohol (Comasco et al., Citation2009) finding no agreement between phosphatidyl ethanol analysis (venous blood sample) and a semistructured interview which classified participants as low or high alcohol consumers. One focused on nicotine (Caraballo et al., Citation2004), finding sensitivity of 78.9% and a specificity of 97.3% when self-report was used as the gold standard (Caraballo et al., Citation2004); when serum cotinine concentration (cutoff of 11.40 ng/mL) was used as the gold standard, there was a sensitivity of 81.3% and a specificity of 96.9%. Adolescents who self-reported smoking less than one cigarette daily, on average, were 34 times more likely to have discrepant biochemically cotinine levels, compared to those who smoke five or more cigarettes daily. One study combined results across biospecimens and could not be disaggregated (Oesterle et al., Citation2015).

Saliva

Two studies examined congruence among self-report of nicotine use and saliva specimens in cross-sectional, adolescent only samples (Dolcini et al., Citation2003; Komro et al., Citation1993). Sample sizes ranged from 959 to 1881 and adolescents were approximately half female in both studies and 69% white (race was only reported in a single study (Dolcini et al., Citation2003)). One study (Dolcini et al., Citation2003) found unadjusted sensitivity of cotinine ranging from 48.6% (self-reported smoking in past 9 h) to 88.9% (self-reported smoking today), and unadjusted specificity from 92.5% (self-reported smoking today) to 93.8% (self-reported smoking in past 3 days); for thiocyanate, unadjusted sensitivity ranged from 31.4% (self-reported smoking in past 9 h) to 48.9% (self-reported smoking today) and unadjusted specificity at 81.2% or 81.3% across all time periods (self-reported smoking in past 9 h through past three days). Another study examining thiocyanate found that students in the reference group claimed to be nonsmokers despite elevated thiocyanate levels (i.e., false-negative report) more frequently (10.04%) than students in the treatment group (5.96%) (Komro et al., Citation1993).

Brethalyzer

Two studies examined congruence among self-report and breathalyzers in cross-sectional, adolescent only samples (Dolcini et al., Citation2003; Oesterle et al., Citation2015). Sample sizes ranged from 645 to 1,881 and adolescents were approximately 43.3% male and 76.2% White. One study examined carbon monoxide, finding unadjusted sensitivity ranging from 33.3% (self-reported smoking in past 9 h) to 68.9% (self-reported smoking today) and unadjusted specificity ranging at 98% across all time periods (self-reported smoking in past 9 h through past three days). The one study of ethanol combined results across biospecimens and could not be disaggregated (Oesterle et al., Citation2015).

Discussion

Summary of evidence

This systematic review reveals substantial variability in the concordance between self-reported and biospecimen-confirmed substance use among adolescents and TAY. The agreement between self-report and biospecimen-confirmed substance use was, at best, moderate, and agreement generally decreased as the length of self-report recall period increased. Risk of bias among the studies included were low-to-moderate; however, the wide variability in methods precluded quantitative aggregation of findings. The most consistent source of bias across studies was not using a validated self-report measure of substance use.

The largest number of studies compared self-reported use with urinalysis. Different substances, however, have different periods of detection and sensitivity by urinalysis, particularly for infrequent use, and as such the concordance between self-report and urinalysis differs by substance. For instance, daily cannabis use may be detected in urine for up to a month following last use, whereas more rapidly metabolized drugs such as cocaine may be undetectable in urine within days of last use. Therefore, adolescents reporting use when biospecimens are negative may indicate the biospecimen test is not sensitive enough. Of note, very recent recall (past 2 days) of benzodiazepine and amphetamine use was excellent; however, this was documented in single study conducted among 100% white and predominantly (68%) male treatment-seeking patients in a single treatment center (Solbergsdottir et al., Citation2004). Other included studies had greater diversity in terms of sex and race/ethnicity. In addition, a substantial number of studies included youth involved in the legal system, for whom accurate substance use assessment has substantial legal implications.

Studies also used different methods of analyzing biospecimens, which may be contributing to variability in concordance between self-reported and biospecimen analysis. For urinalysis, as an example, studies used techniques including but not limited to, analyte immunoassay (Dillon et al., Citation2005), fluorescence polarization immunoassay and a paper chromatography screen (Williams & Nowatzki, Citation2005), and SYVA Emit enzyme amino acid assay technique (Donohue et al., Citation2007); some confirmed positive tests with gas chromatography/mass spectrometry (e.g., Buchan et al., Citation2002; Yacoubian, Citation2001). Thresholds for detection were not consistently reported, which is a crucial detail for reproducibility and understanding differences in concordance across studies.

Strengths

This study is the first systematic review providing a rigorous evaluation of the evidence related to the association between self-reported and biospecimen-confirmed substance use among adolescents and TAY. With the assistance of a medical librarian we searched multiple databases for relevant studies over the past three decades. In addition, we applied a structured risk-of-bias evaluation of each included study. We applied PRISMA guidelines in our presentation of results. Our summary of available kappa values illustrates pertinent patterns in associations between self-reported and biospecimen-confirmed substance use. Participants in the studies were recruited from diverse settings, including substance use treatment facilities, the legal system, and educational environments; thus, our findings may be applicable to a wide range of youth cohorts.

Limitations

Our findings suggest the need for more systematic use of objective measures of youth substance use, as overall the concordance was low for most specimen types, particularly beyond a seven-day self-report period. The diverse nature of the studies (e.g., self-report measure used, mode of administration for self-report and biospecimen collection method and threshold, comparison statistics presented) and the limited number of studies per biospecimen and substance type limited our ability to conduct a meta-analysis to examine the relative impact of study and sample characteristics on concordance. Our search strategy of published sources was comprehensive, although it is possible some studies were missed as a result of excluding non-English-language publications. In addition, we limited our study to include only published studies, which may be subject to publication bias. It is possible inclusion of non-published data might yield different results, particularly given that published studies are often more likely to have found statistical significance.

Recommendations for future research

Whenever possible, studies examining the associations between self-reported and biospecimen-confirmed substance use among adolescents and TAY should use validated self-report measures. Multiple brief self-report tools are available, many of them free, and can be integrated into routine clinical practice and research (Gray & Squeglia, Citation2018). Of note, the Screening to Brief Intervention (S2BI) (Levy et al., Citation2014) queries frequency of use of eight drugs during the past year, yielding high sensitivity and specificity for identifying use and substance use disorders. The Brief Screener for Tobacco, Alcohol, and Other Drugs (BSTAD) (Kelly et al., Citation2014) assesses frequency of use during the past year and provides optimal cutoff points for identifying substance use disorders.

Furthermore, studies should explore differences in agreement between self-report and biospecimen results for various self-report recall durations and biospecimen types, ideally within the same cohort. These data would help clinicians and researchers identify optimal testing parameters for a substance given the specific clinical or research question. More data are needed to inform real-world applications; to that end, studies should examine associations in clinical and non-clinical settings. Acceptance of biospecimen testing in voluntary settings (e.g., when not court mandated) is dependent on patient perceptions; future studies should integrate patient preference for testing modalities (Gassman et al., Citation2016).

Conclusions

This systematic review summarizes three decades of research examining the associations between self-reported and biospecimen-confirmed substance use among adolescents and TAY. Included studies varied widely in their methods and presentation of results, thereby impeding straightforward synthesis across studies. Overall, associations between self-reported and biospecimen-confirmed use were in the low-to-moderate range and tended to be higher when shorter self-report recall periods were used and among substances with rapid metabolism. Future research should examine self-reported substance use using validated measures, and biospecimen analysis both combined with self-report and independently to explore the distinct performance of these two techniques on rates of substance use.

Clinical trials registration

N/A

Prisma=

preferred reporting items for systematic reviews and meta-analysis

TAY=

transitional age youth

Disclosure of interest

The authors have indicated they have no financial relationships relevant to this article to disclose.

Additional information

Funding

This publication was supported by the National Institute on Drug Abuse (K23DA050798, PI: Folk) the National Center for Advancing Translational Sciences, National Institutes of Health, through UCSF-CTSI (UL1 TR001872). Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the NIH.

References