1,698
Views
1
CrossRef citations to date
0
Altmetric
Evidence Base Update

Evidence Base Update on the Assessment of Irritability, Anger, and Aggression in Youth

, ORCID Icon, ORCID Icon, , , & ORCID Icon show all

ABSTRACT

Objective

Irritability, anger, and aggression have garnered significant attention from youth mental health researchers and clinicians; however, fundamental challenges of conceptualization and measurement persist. This article reviews the evidence base for assessing these transdiagnostic constructs in children and adolescents.

Method

We conducted a preregistered systematic review of the evidence behind instruments used to measure irritability, anger, aggression, and related problems in youth. Searches were conducted in PsycINFO and PubMed, identifying 4,664 unique articles. Eligibility criteria focused on self- and proxy-report measures with peer-reviewed psychometric evidence from studies in English with youths ages 3–18. Additional measures were found through ancillary search strategies (e.g. book chapters, review articles, test publishers). Measures were screened and coded by multiple raters with acceptable reliability.

Results

Overall, 68 instruments met criteria for inclusion, with scales covering irritability (n = 15), anger (n = 19), aggression (n = 45), and/or general overt externalizing problems (n = 27). Regarding overall psychometric support, 6 measures (8.8%) were classified as Excellent, 46 (67.6%) were Good, and 16 (23.5%) were Adequate. Descriptive information (e.g. informants, scales, availability, translations) and psychometric properties (e.g. reliability, validity, norms) are summarized.

Conclusions

Numerous instruments for youth irritability, anger, and aggression exist with varying degrees of empirical support for specific applications. Although some measures were especially strong, none had uniformly excellent properties across all dimensions, signaling the need for further research in particular areas. Findings promote conceptual clarity while also producing a well-characterized toolkit for researchers and clinicians addressing transdiagnostic problems affecting youth.

Consider a child who yells at their teacher, explodes in rage at their parent, and gets in fights with their peers. When such behaviors occur, it usually does not take long for the adults in the child’s life to recognize that there is a problem. These behaviors are disruptive, stressful, even scary, and, if left unaddressed, can have significant long-term consequences. Importantly, the child and others in their life are likely to experience distress, impairment, and unhappiness in relation to these behaviors, underscoring the need for clinical and research attention.

Problems like these—temper loss, aggression, irritability, arguing, anger, defiance, etc.—have been mapped empirically as overt externalizing behaviors (Frick et al., Citation1993). Usually beginning in childhood, overt externalizing behaviors represent some of the most common youth mental health concerns, both in the population (Ghandour et al., Citation2019; Polanczyk et al., Citation2015) and in clinical settings (Evans et al., Citation2022; Freeman et al., Citation2016; Olfson et al., Citation2014). These problems can predict the development of numerous poor developmental outcomes, including school suspension and expulsion, delinquency, substance use problems, depression, anxiety, severe antisocial behaviors, impaired occupational and relational functioning, and adverse physical health outcomes (Burke et al., Citation2005, Citation2014; Loth et al., Citation2014; Rowe et al., Citation2010; Stringaris & Goodman; Odgers et al., Citation2007, Citation2008). Over time, overt externalizing problems have enormous ramifications on the developmental trajectories of individual youths with large costs to families, society, health care, and justice systems (Goulter et al., Citation2023; Hawes et al., Citation2023). Thus, early identification of overt externalizing problems is critical.

Although the observable nature of overt externalizing behaviors may make it obvious that there is a problem, what is often less clear is what kind of problem it is, and what to do about it. How should professionals understand, measure, and communicate about the challenges that affect youths with externalizing problems? Considering the heterogeneity of these behaviors, even when the manifest behaviors look similar, important distinctions can be made based on factors ranging from the context in which they occur (e.g., home vs. school), to whether they are accompanied by physical aggression.

Further, it is critical to consider the affective experiences underlying overt behaviors. Disruptive behaviors are typically accompanied by emotional, social, and cognitive patterns which may represent targets for intervention and prevention. For example, irritability can precipitate aggressive behaviors, but this is not the case for most instances of feeling irritable, nor for most chronically irritable youths (Brotman et al., Citation2017). Nevertheless, chronic irritability is a predictor of anxiety, depression, oppositional defiant disorder (ODD), suicidality and various forms of impairment (Brotman et al., Citation2006; Rowe et al., Citation2010; Stringaris & Goodman, Citation2009; see; Vidal-Ribas et al., Citation2016 for a meta-analysis). Problems with irritability and anger are important affective dimensions of the overt externalizing spectrum. These difficulties are often impairing, under-identified, and warrant attention both within and beyond the context of behavioral dimensions that are more easily observed.

The issues outlined above are fundamental questions of conceptualization and measurement. To help address these issues, both conceptually and practically, we conducted a systematic review of instruments for assessing problems on the overt externalizing spectrum, including its affective and behavioral dimensions. It is important to note that this paper adopts a transdiagnostic framework and is therefore not about assessing for particular disorders, such as ODD or Conduct Disorder (CD).Footnote1 In research and practice, assessment and diagnosis are closely linked. But diagnostic classification systems—including the American Psychiatric Association’s (Citation2013; Citation2022) Diagnostic and Statistical Manual (DSM) and the World Health Organization’s (Citation2022) International Classification of Diseases (ICD) have a way of driving and reifying prevailing models of psychopathology (Hyman, Citation2010). Many have argued that DSM and ICD fail to provide a nosological home for youth experiencing problems with irritability and related problems (e.g., Lochman, Evans, et al., Citation2015; Mikita & Stringaris, Citation2013; Stepanova et al., Citation2022). The DSM and ICD dedicate entire blocks for disorders of anxiety, disorders of low mood, and so on; but there is no section for disorders of irritability, anger, or aggression.Footnote2 Instead, these problems are associated with many different categories across the landscape of psychopathology (e.g., for irritability, ODD, PTSD, ADHD, intermittent explosive disorder, autism spectrum disorder, anxiety and depressive disorders); at the same time, they are not necessary or sufficient for any particular diagnosis (Evans et al., Citation2017; Stepanova et al., Citation2022). This creates a significant problem in research and clinical contexts when the focus is on irritability, anger, and/or aggression, as these are not adequately captured by any one diagnosis and instead are scattered in different ways across many categories (Evans et al., Citation2023).

Considering the availability of research on longstanding diagnostic categories, relatively less attention has been given to measurement issues independent from existing nosologies. In light of these gaps, the present systematic review adopted a transdiagnostic conceptualization. Transdiagnostic approaches to psychopathology have gained traction in recent years, influencing both research and practice. These models provide a framework for understanding mental health, emphasizing problems, principles, and approaches that tend to cut across diagnostic categories (Dalgleish et al., Citation2020; Sauer-Zavala et al., Citation2017). By better addressing clinical heterogeneity, these approaches could lead to greater efficiency and decreasing costs in mental health care (Dalgleish et al., Citation2020). For youth, transdiagnostic approaches to assessment, intervention, and clinical research have been evolving to better address high rates of comorbidity, complex contextual factors, and shifting symptom profiles across development (Ehrenreich-May & Chu, Citation2014; Marchette & Weisz, Citation2017). Irritability, anger, and aggression are among the most prominent transdiagnostic features in youth mental health, as they can relate specifically to traumatic stress, mood disturbance, anxiety, sleep problems, personality, and externalizing disorders, to name a few (Evans et al., Citation2023). To clearly separate these transdiagnostic features from diagnostic categories, we focus on the evidence base for measures of youth emotional and behavioral problems explicitly apart from the criterial definitions given by DSM. By doing so, we aimed to contribute a less biased and more comprehensive review of instruments that were truly designed and tested for measuring these constructs while also acknowledging their interrelations.

Relations Among Irritability, Anger, and Aggression

The last two decades have seen an explosion in youth irritability research. This growth has been spurred on by converging bodies of work on the boundaries of pediatric bipolar disorder (Leibenluft, Citation2011), irritability in ODD (Evans et al., Citation2017), and closely related phenomena of tantrums (Wakschlag et al., Citation2012), dysregulation (Althoff et al., Citation2010), and “rages” or outbursts (Carlson et al., Citation2009). Much has been gained and learned. Youth irritability has even left its mark on the American Psychiatric Association’s (Citation2013) DSM-5 via the addition of Disruptive Mood Dysregulation Disorder (DMDD; Roy et al., Citation2014) and on the World Health Organization’s (Citation2022) ICD-11 via a new subtype for chronic irritability in ODD (Evans et al., Citation2017).

As with any burgeoning new area of research, questions regarding definition and measurement of irritability have been a sticking point. Given the paucity of irritability measures initially, some of the transformative early work (e.g., Brotman et al., Citation2006; Stringaris & Goodman, Citation2009) required researchers to piece together a handful of available items to measure youth irritability on a conceptual and post hoc basis. Since then, several new measures have been developed and evaluated specifically for irritability. A synthesis of the evidence is now needed.

There now seems to be reasonably strong agreement in the field coalescing around a definition of irritability as an increased proneness to anger, which can lead to aggression but often does not (Barata et al., Citation2016; Brotman et al., Citation2017; Toohey & DiGiuseppe, Citation2017; Vidal-Ribas et al., Citation2016).Footnote3 This definition requires two more definitions. First, anger has been defined as an uncomfortable, transient, negatively valanced emotional state, ranging from mild irritation to intense rage, with physiological, cognitive, phenomenological, and behavioral aspects such as arousal, hostility, and predisposition to aggression (Eckhardt et al., Citation2004; Sukhodolsky et al., Citation2016). Second, aggression is an intentional overt behavior that can result in harm to self or others, including subtypes of irritability/angry affect—i.e., reactive, impulsive, hostile, “hot-blooded” aggression—differentiated “cold-blooded” types (i.e., proactive, instrumental; Fite et al., Citation2018; Sukhodolsky et al., Citation2016). Clinically, these problems cluster together in the same multidimensional space—overt externalizing problems—that comprise parts of diagnostic criteria for ODD, CD (Frick et al., Citation1993), and many other disorders (Evans et al., Citation2017, Citation2023).

These definitions of irritability, anger, and aggression reveal a fundamental problem: It is difficult to characterize any one of these constructs without invoking the others. Empirically, rating scale measures of youth irritability, anger, and aggression (especially reactive aggression) correlate with each other with medium to large effect sizes, with most correlations falling between 0.3 and 0.8 when rated by the same informant (Evans et al., Citation2016, 2020, Citation2021; Ezpeleta et al., Citation2020; Zik et al., Citation2022).Footnote4 Such patterns suggest it is not clear whether a particular measure is really capturing the intended construct as opposed to a closely related construct or facets of a larger phenomenon (e.g., whether irritability scales actually measure irritability vs. anger or emotion dysregulation). Indeed, a close, item-level inspection often reveals similar if not overlapping item content across measures of purportedly different constructs (e.g., measures of all three constructs tend to include items related to getting angry).

The picture becomes even murkier when we consider the role of informant variation. Cross-informant correlations in this space tend to be small to medium (Evans et al., Citation2021; Ezpeleta et al., Citation2020; Zik et al., Citation2022), comparable to cross-informant correlations in youth psychopathology in general (De Los Reyes et al., Citation2015). It is likely that irritable, angry, and/or aggressive behaviors manifest in different ways in different contexts, or to different degrees (e.g., aggressive outbursts sometimes occur at home but not at school), and that each informant offers a distinct perspective that is more than just measurement error. The available evidence suggests that more variation in these variables can be accounted for by informant perspective than by differences between the constructs, but this evidence is relatively limited and new, as are the tools behind it. Clearly further work is needed to better define, measure, and model these phenomena as being related to one another, while also acknowledging their differences, within and across informants. Such work could bring researchers closer to a consistent approach, a shared understanding, language, and toolkit for meaningfully advancing research on the disruptive emotions and behaviors in youth (Althoff & Ametti, Citation2021). Accordingly, we hope to provide a comprehensive and practical review of instruments that researchers and professionals can use to assess problems across the irritability-anger-aggression spectrum and from different informant perspectives.

The Current Evidence Base and the Present Review

Many tools are available for assessing overt externalizing emotional and behavioral problems in youth. At a broadband level, there are comprehensive, normed, multi-informant instruments such as the Child Behavior Checklist (CBCL; Achenbach & Rescorla, Citation2001) and the Behavior Assessment System for Children (BASC; Reynolds & Kamphaus, Citation2015) that provide a picture of youths’ externalizing problems broken into subscales such as aggression, rule-breaking behaviors, and anger control (as well as internalizing, total, and other problems). Measures like the Eyberg Child Behavior Inventory (ECBI; Eyberg & Ross, Citation1978) have long been used to assess disruptive behaviors at baseline and throughout treatment. More recently, the field has seen more rating scales targeted for different types of aggression (e.g., Raine et al., Citation2006), anger (e.g., Brunner & Spielberger, Citation2009), and irritability (e.g., Stringaris et al., Citation2012), including developmentally typical and atypical problems from early childhood (e.g., Wakschlag et al., Citation2012) through adolescence and adulthood (e.g., Butcher et al., Citation1992). However, these measures draw from different literatures and disciplines (e.g., psychiatry, pediatrics, education, and clinical, counseling, developmental, social, and personality psychology). It is easy to overlook relevant work from outside one’s own discipline, justifying this comprehensive review.

Several existing literature reviews are worth noting. A few papers have recently summarized the evidence base on measures pertaining to irritability, outbursts, or emotion dysregulation (Adrian et al., Citation2011; Althoff & Ametti, Citation2021; Carlson, Singh, et al., Citation2022; Freitag et al., Citation2023; Mazefsky et al., Citation2021). Others have focused on more traditional conceptualizations of disruptive and externalizing problems (Becker-Haimes et al., Citation2020; Burke et al., Citation2023; Collett et al., Citation2003; Evans et al., Citation2024; McMahon & Frick, Citation2005; Steiner & Remsing, Citation2007; Walker et al., Citation2020). Still others have put forth methodological reviews on anger and aggression in youth (Blake & Hamrin, Citation2007; Feindler & Engel, Citation2011; Hubbard et al., Citation2010; Kerr & Schneider, Citation2008). We consulted sources like these (see Table S2 for full list) when conducting the present review, to identify leads and trends in the literature. However, existing reviews have used different or more narrow foci than the present review, identifying key limitations in the available evidence. For example, one evidence base update on measures of emotion regulation concluded that most measures had “unsatisfactory” psychometric properties (Mazefsky et al., Citation2021). Others have concluded that new directions in assessment are needed, including different psychological domains, social contexts, and informant perspectives on irritability (Stringaris et al., Citation2018). A synthesis of irritability, anger, and aggression measures and is needed to advance measurement in clinical and research settings.

Given these gaps and the assessment challenges facing researchers and clinicians, the goal of this article was to provide a comprehensive review of the evidence base behind measures for assessing overt externalizing affective and behavioral problems. Specifically, we sought to (a) identify all measures relevant to assessing irritability, anger, and/or aggression in youth; (b) summarize the characteristics of these measures; and (c) evaluate their psychometric properties according to the available evidence. To our knowledge, this is the first review of its kind in terms of its transdiagnostic conceptualization of the topic, practical orientation for assessment, and its integrative, interdisciplinary, and comprehensive scope. Our hope is that it may serve as a practical resource and toolkit for researchers and clinicians.

Methods

Procedures were developed following the guidelines put forth by De Los Reyes and Langer (Citation2018) for evidence base updates (EBU) on assessment—in particular, for broad reviews concerning transdiagnostic constructs (p. 362). Broadly, the review involved locating measures that met our criteria and empirical articles examining those measures, then extracting descriptive information and evaluating the psychometric evidence according to established rubrics (Hunsley & Mash, Citation2008; Youngstrom et al., Citation2017). Our approach was informed by recent EBU reviews on assessing parenting stress (Holly et al., Citation2019), youth sleep (Van Meter & Anderson, Citation2020), emotion regulation (Mazefsky et al., Citation2021), anxiety (Etkin, Lebowitz, et al., Citation2021; Etkin, Shimshoni, et al., Citation2021), and brief, free, and accessible youth mental health measures (Becker-Haimes et al., Citation2020). This project was pre-registered with PROSPERO (ID: CRD42021283556). Title/abstract and full-text screenings were carried out using the Covidence online review software (https://www.covidence.org/).

Search Methods

Electronic Database Searches

Primary searches for peer-reviewed articles were conducted in PsycINFO and PubMed in October 2021. Search queries were designed to be comprehensive for the constructs of interest, with exact parameters tailored by database. Broadly, the target was peer-reviewed empirical articles that touched on at least one relevant conceptual term (e.g., irritab*, aggress*) AND a methodological term (e.g., assess*, valid*) AND an age term (e.g., child*, adolesc*). The full search strings for both databases are reported in the Supplement (Table S1). These searches yielded 5,412 results. After removing 748 duplicates that overlapped between the two databases, 4,664 unique articles remained and were subjected to screening (described below), ultimately leading to 55 measures identified via this strategy. See for the full PRISMA flow diagram.

Figure 1. PRISMA flow diagram for present systematic review of measures of youth irritability, anger, and aggression.

Note: Adapted from PRISMA flow diagram template (Page et al., Citation2021; https://prisma-statement.org/)
Figure 1. PRISMA flow diagram for present systematic review of measures of youth irritability, anger, and aggression.

Ancillary Search Strategies

While the primary searches described above focused on articles about relevant measures (Table S1), we also employed a variety of strategies to identify measures directly (see Table S2). First, we examined recent review papers and chapters summarizing lists of instruments relevant for assessing at least one of the problems of interest (e.g., Althoff & Ametti, Citation2021; Blake & Hamrin, Citation2007; Kerr & Schneider, Citation2008; Walker et al., Citation2020). Second, we reviewed several online listings of assessment measures organized by professional, scientific, and health organizations (e.g., American Academy of Child and Adolescent Psychiatry [AACAP], Citation2022; American Psychiatry Association [APA], Citation2022b; Wikiversity, Citation2022). Third, we browsed and searched the sites of major commercial test publishers (e.g., Pearson, Psychological Assessment Resources, Multi-Health Systems). Fourth, we searched the Health and Psychosocial Instruments (HaPI) database for additional measures to evaluate for eligibility (search terms in Table S1). Finally, a few relevant measures were identified serendipitously during searching and coding. When identified through any of these ancillary strategies, new candidate measures were subjected to the same evaluation for eligibility as all other measures. Taken together, these strategies yielded an additional 13 measures that met criteria, for 68 in total (see ).

Follow-Up Searches

Once a measure was identified as potentially relevant, additional targeted literature searches were conducted specific to that measure. This practice ensured that there was sufficient evidence to carry the measure forward to the next stage for further evaluation and coding. For a measure to be included, we required that there be at least 3 peer-reviewed articles reporting original empirical results on that measure’s psychometric properties, following other recent EBU papers (Holly et al., Citation2019; Van Meter & Anderson, Citation2020). When measures fell just shy of this 3-paper threshold, additional searches were conducted to confirm no relevant papers had been missed prior to excluding. These targeted searches were conducted with an aim toward fully characterizing the evidence base for included measures. Following procedures similar to those used by Holly et al. (Citation2019, p. 687), once measures were identified as relevant and having more than 3 articles, we then sought to locate an initial batch of ~ 10 articles (or as many as possible if 10 could not be found) that were the most appropriate sources reporting on the psychometric resultsFootnote5 for each measure. At this stage, we assigned preliminary ratings and assessed whether this initial batch of articles was sufficiently exhaustive or representative for the measure. Additional targeted searches were conducted as needed capture a more complete and representative sampling of articles in order to allow us to rate all psychometric properties.

To keep the review manageable and consistent, we imposed an upper limit such that no more than 20 unique and relevant sources (i.e., all meeting our criteria) would be reviewed to inform the ratings for any given measure. Ultimately, we reviewed a mean of 9.9 total sources (8.8 peer-reviewed) for each measure, with individual measures ranging from a minimum of 3 sources (Rage Attacks Questionnaire [RAQ]; Budman et al., Citation2003) to a maximum of 19 sources (Adjustment Scales for Children and Adolescents/Preschool Intervention [ASCA/ASPI]; Lutz et al., Citation2002; McDermott, Citation1993). Examples of measures that were screened out due to insufficient evidence and other reasons are provided in Table S3.

During these searches, we sometimes had to decide which articles were most relevant from among the many that reported results on the same measure. To accomplish this, we prioritized our eligibility criteria (e.g., youth samples; see below) and the following considerations (roughly in order of priority): (a) psychometric papers focusing directly on the measure of interest; (b) studies with large, diverse, and varied samples; (c) collectively selecting a variety of study types and designs to inform evaluations ranging from validity and reliability to treatment sensitivity and clinical utility; and miscellaneous other factors including (d) diversity and independence of research teams, to avoid solely relying on papers from the measure’s developer; (e) inclusion of more recent studies alongside original measure validation studies; and (f) emphasis on papers that were more highly cited or appeared in more impactful journals. Ultimately, these considerations would only affect the more widely used and studied measures, as we tried to include all available relevant papers until we reached approximately 10–20 papers for a given measure. Occasionally, test reviews and methodological reviews would be found highly relevant to the focal measures, and these were included as well. Finally, we conducted additional searches for web sites, test manuals, and review papers, and follow-up searches in Google Scholar as needed (see Table S1). Prior to finalizing ratings, we reviewed all sources for information that might have provided evidence for a higher rating in any psychometric category. Table S5 lists all sources reviewed by measure.

In sum, for each measure included in this review, we sought to find a robust and varied selection of peer-reviewed evidence that would speak to its psychometric properties to the fullest extent possible. Thus, our evaluations of each measure can be construed as the highest level of psychometric performance that the instrument appeared to reach—beyond which either we were unable to locate additional relevant articles, or including more articles would be unlikely to change any of its psychometric ratings.

Measure Eligibility

A Priori Inclusion and Exclusion Criteria

To be included, measures had to (a) assess irritability, anger, aggressive behavior, or a closely related transdiagnostic construct (e.g., hostility, rage, undifferentiated overt externalizing problems); (b) use self-report and/or proxy-report (e.g., teacher, parent, clinician) collected via rating scale, checklist, questionnaire, or interview; (c) have been used in youth samples with a mean age between 3.0 and 18.9 years, not including undergraduate samples; (d) have been written, administered, and evaluated in English; and (e) have been examined in at least three peer-reviewed empirical articles reporting psychometric information, consistent with all of the above criteria (i.e., focus of measurement, informant, methods, sample age, English language).

Measures were excluded if they did not meet all the above criteria, and specifically if they (a) were directly tied to DSM diagnostic criteria for a mental health disorder including ODD, CD, DMDD (because the focus was on transdiagnostic constructs); (b) used vignettes, hypotheticals, projective tasks, etc., or focused on prediction of risk (because the focus was on actual patterns of behaviors and emotions, not indirect inferences about them); (c) relied on peer nominations or behavioral observations at school or home (because the focus was on tools reasonably implemented at the individual level, not requiring social environment); (d) focused on other areas, such as parenting, delinquency, or mood disorders (because these were not the constructs of interest, although related); or (e) did not have enough evaluation in peer-reviewed articles (i.e., dissertations, chapters, and presentations were excluded).

Ad Hoc Eligibility Criteria

Some situations arose that required clarifications and additions to our a priori eligibility rules. First, single-item measures were excluded because they do not permit robust measurement or psychometric evaluation (e.g., no estimation of internal consistency or measurement error). Second, measures were excluded if they used items drawn from an existing instrument to measure a relevant construct, rather than being designed and developed to measure the relevant construct originally. Third, we identified some measures that appeared to fall out of common use several decades ago. To promote the review’s relevance, measures were excluded if all their articles were published >25 years ago. Fourth, many measures evolved over the years, and again, we aimed to provide a current review. We decided to prioritize the current version of each measure to determine its relevance for inclusion (however, once a measure was included, we examined its entire evidence base cumulatively). Finally, clarifications were made to differentiate constructs of interest from closely related constructs. For example, we prioritized aggression measures and excluded bullying measures by attending to terminology in the measures and evaluating them against the definition of bullying.Footnote6 Overall, these criteria led to excluding some measures that may be of interest. See Table S3 for all exclusionary criteria, operational definitions, and examples of measures that were excluded.

Measure Content

Decisions about eligibility were made based on measures’ titles, descriptions of their intended purpose, and direct inspection of the item content. Scales and subscales that were named and described using terms like irritability, anger, and aggression were typically included, whereas measures defined by terms that were more ambiguous or broad (e.g., externalizing, conduct, disruptive behavior) were borderline cases that required closer evaluation to decide about inclusion or exclusion. To do this, scales were coded conceptually by two authors (SCE, ARK) according to whether each item was an indicator of irritability, anger, or aggression specifically; or relevant to overt externalizing problems broadly; or tapping other constructs and therefore not relevant. Scales were included if ≥50% of their items were broadly relevant and ≥25% of their items specifically mapped to irritability, anger, or aggression. On this basis, broad externalizing scales like the Strengths and Difficulties Questionnaire (SDQ; Goodman, Citation1997) Conduct Problems and Pediatric Symptom Checklist (PSC; Jellinek & Murphy, Citation1988) Externalizing Problems were included, whereas other externalizing measures (See Table S3) were excluded as they were coded as having minimal relevance to irritability, anger, and/or aggression.

Article Screening and Measure Evaluation Procedures

The project proceeded from title/abstract screening, to full-text screening, to coding (see ). These steps were carried out by the authors, all of whom have bachelor’s-level or graduate-level experience in psychology research, led by a doctoral-level clinical child psychologist (SCE). Weekly team meetings were held to allow for discussion, troubleshooting, and to promote reliability throughout screening and coding. All titles were reviewed by the lead author prior to exclusion or inclusion.

Title/Abstract Screening

The 4,664 unique article results were screened for eligibility based on broad features discernable from titles and abstracts (e.g., empirical paper, youth sample, relevant variables). During initial practice rounds, five screeners achieved perfect agreement on 10 of 20 training articles, with fair-to-moderate overall reliability between pairs of individual screeners (M pairwise agreement = 75%, range: 60–85%; M κ = 0.55, range: 0.28–0.72). Discrepancies were resolved by consensus, and procedures clarified and refined. A second training batch was then title-screened, with perfect agreement on 14/20 articles and moderate-to-substantial pairwise agreement (M = 85%, range: 80–90%; M κ = 0.62, range = 0.50–0.84). Title/abstract screening was then performed independently, leading to the exclusion of 3,852 articles, leaving 812 for full-text review ().

Full-Text Review

All 812 articles were reviewed by two screeners, with discrepancies resolved by the first or second author. During an initial training phase, a kappa of at least 0.61 (“substantial”) with the lead author’s coding was set as the benchmark for reliability. The team approached but did not uniformly meet this target. All articles were therefore subjected to full-text review by two screeners, among whom there was moderate agreement (79%, κ = 0.56), including at least one reliable screener, among whom there was substantial agreement (82%, κ = 0.63). This process excluded 494 studies, leaving 318 studies considered relevant for inclusion (see ).

Measure Review Processes

Measure Identification

The 318 included articles were reviewed to extract the names and citations of the relevant measure(s) they focused on. This list of measures was then reviewed collectively and iteratively, including additional searches to ascertain the full evidence base for measures under consideration. From this process, 55 measures emerged as meeting criteria for inclusion. These measures were later joined by 13 additional measures, identified and from other sources (see ). Thus, a total of 68 measures were included in the review.

Psychometric Evaluation and Data Extraction

As described above, included measures were subjected to additional searches to ensure the evidence base was adequately represented. This resulted in a representative set of sources to review for each measure. Articles were reviewed following the rubrics developed for evaluating assessment instruments, as shown in (De Los Reyes & Langer, Citation2018; Hunsley & Mash, Citation2008; Youngstrom et al., Citation2017). Criteria could be rated as “Excellent,” “Good,” “Adequate,” or “Insufficient Evidence to Rate.” Of the original 12 EBU criteria, 8 were included in our review and all achieved strong interrater reliability: (a) norms, κ = 0.69; (b) internal consistency, κ = 0.87; (c) test-retest reliability, κ = 0.73; (d) content validity, κ = 0.70; (e) construct validity, κ = 0.93; (f) validity generalization, κ = 0.68; (e) treatment sensitivity, κ = 0.80; (f) clinical utility, κ = 0.71. The other 4 criteria were not applicable, rarely reported, or irrelevantFootnote7 to our focus. All criteria were evaluated by at least two coders, with discrepancies resolved by the first or second author. Minor elaborations were made to operationalize these criteria (e.g., for clinical utility to be adequate, we required that the measure must be intended for clinician use and have at least some evidence from clinical samples; see ).

Based on these ratings, we formed a few composite evaluations of each measure. First, similar to some previous EBU reviews (e.g., Becker-Haimes et al., Citation2020; Mazefsky et al., Citation2021), we sought to provide a classification of each measure as being Adequate, Good, or Excellent overall. We did this by considering each criterion rating as evidence for either external validity (norms, generalizability), reliability (internal consistency, test-retest), measurement validity (content and construct validity), and utility (treatment sensitivity, clinical utility). To be considered Excellent overall, a measure had to satisfy criteria for Excellent in at least 3 of 4 major categories and reach Good in the fourth. To be considered Good overall, a measure had to meet the criteria for Good in at least 3 of the 4 categories and reach Adequate in the fourth. To be considered Adequate overall, a measure had to reach Adequate in all 4 categories. In addition, an overall score was computed as the sum of all ratings, where Excellent = 3, Good = 2, Adequate = 1, and Insufficient Evidence = 0 (possible range = 0–24, higher scores being more favorable). Finally, similar to Etkin, Lebowitz, et al. (Citation2021) and Etkin, Shimshoni, et al. (Citation2021), we tallied the number of data points reaching each level (Excellent, Good, etc.) by row (measure), as well as by column (criterion) for a snapshot of the extent to which different types of psychometric evidence had accumulated. Beyond the psychometric ratings, we also extracted additional descriptive information about each measure, obtained from the measures themselves or their web sites, manuals, key papers, and the literature reviewed.

Results

Sixty-eight measures met criteria for inclusion. lists the abbreviated and full instrument names, key citations, item counts, informant(s), age range, and focus/foci. Focus was coded in terms of whether the measure included scales targeting specific constructs of interest—irritability, anger, aggression, or general overt externalizing problems—and which of these constructs was considered the primary focus. Psychometric ratings and relevant scales are summarized by focus in (for measures covering irritability or anger; n = 30, 44.1%) and (for measures focusing on aggression or overt externalizing problems; n = 38, 55.9%). The supplement provides additional details for each measure including response scale, years, translations, notes, samples, norm types (e.g., t-scores, means/SDs), whether it is freely accessible, and how to access (Table S4), as well as the literature reviewed for each measure (Table S5). Most measures (n = 46, 67.6%) had been studied in 2 or more different types of samples, with school/community samples (n = 57, 83.8%) and clinical samples (n = 53, 77.9%) being the most common, followed by juvenile justice samples (n = 12, 17.6%).

Table 2. Summary of measures included in review (N = 68).

Table 3. Measures covering irritability and anger (N = 30).

Table 4. Measures focused on aggression and general overt externalizing problems (N = 38).

In terms of overall psychometric evaluations, only 6 measures (8.8%) were classified as Excellent, while 46 (67.6%) were considered Good, and the remaining 16 (23.5%) were Adequate. The numeric scores (i.e., the sum of psychometric ratings; possible range = 0–24) were roughly normally distributed, with a median of 13 (M = 12.91, SD = 3.14; observed range = 6–19). Thus, measures with scores ≥ 16 (75th percentile) in can be viewed as having relatively strong psychometric evidence, whereas those with scores ≤ 11 (25th percentile) might benefit from further empirical examination. Ratings did not differ between anger/irritability vs. aggression/externalizing measures, t(66) = 0.747, p = .46. However, measures that were accompanied by manuals (n = 28), which typically had to be purchased from the publisher, had better psychometric properties (M = 15.07, SD = 2.65) than instruments without manuals (M = 11.43, SD = 2.56), t(66) = 5.695, p < .001, d = 1.40.

We also looked at the overall state of the evidence on each criterion, pooling across all measures (see bottom of ). The evidence reached a median rating of Good for norms, internal consistency, content validity, construct validity, and validity generalization. However, the evidence was lower (i.e., Adequate) for test-retest reliability, treatment sensitivity, and clinical utility. Notably, no measures reached Good or Excellent on clinical utility. Despite such differences in the distributions of ratings, all eight criteria seemed to provide useful information about individual measures.

As shown in , only 7 instruments (10.3%) were classified as primarily for irritability, 13 (19.1%) for anger, 25 (36.8%) for aggression, and 23 (33.8%) for general overt externalizing problems. However, 30 (44.1%) measures contained multiple scales that collectively covered 2 or more of these constructs, with 10 (14.7%) covering 3 or more. This seemed to support our broad search and coding strategies, as the results yielded relevant and overlapping scales that might otherwise be overlooked.

Measures Focusing on Irritability

Overall, 15 instruments were identified that measured irritability, including 7 that we classified as having irritability as their primary focus. These measures are listed in , with psychometric ratings listed in see (Sections A, C, and D). Of these measures, only 2—the ARI and the Aberrant Behavior Checklist (ABC)—emerged as having Excellent overall psychometric properties (9 were Good, 4 were Adequate). Both the ARI and the ABC have been widely used with at least good evidence for internal consistency, norms, content and construct validity, generalization, and treatment sensitivity. The ABC was developed for use with children, adolescents, and adults with developmental delays, in various settings and proxy raters (Aman & Singh, Citation2017). It is commercially available and a relatively comprehensive measure (58 items) of different behavior problems common in youth with autism spectrum disorder (e.g., irritability, hyperactivity/noncompliance), although there is evidence specifically for its use in addressing irritability/aggression (Frazier et al., Citation2010; Stoddard et al., Citation2020). For a free, brief, and multi-informant measure of irritability, the ARI has emerged as a leading standard over the last decade, with psychometric evidence from diverse clinical and community samples (Dougherty et al., Citation2021; Evans et al., Citation2021; Mulraney et al., Citation2014; Stringaris et al., Citation2012; Tseng et al., Citation2017; Wilson et al., Citation2022). Recently, a clinician version, the CL-ARI, has been developed (Haller et al., Citation2020); it has also been adapted into a short-term version, translated into 15+ languages, with initial psychometric evidence for teacher-report in Spanish (Ezpeleta et al., Citation2020).

The other 13 measures covering irritability and related constructs were rated as Good or Adequate. These include instruments rated by single or multiple informants including parent, teacher, self, and clinician. Notably, these measures cover some specific aspects of irritability, including emotional or aggressive outbursts (ROARS, RAQ, EMO-I), general externalizing problems (NCBRF), personality or temperament (MMPI, PAI-A, Rothbart measures), as well as broader measures for specific age ranges and purposes from early childhood (MAP-DB) to childhood and adolescence (CHIA, AdolBC, MAYSI, CHI/HI, HCSBS/SSBS). More information about the characteristics, strengths, weaknesses, and evidence base for these measures can be found in , S4, and S5.

Measures Focusing on Anger

We identified 19 measures covering anger (13 as a primary focus). Of these, only one measure—the BASC—reached criteria for Excellent and, with a score of 19, was one of the two highest-rated instruments overall (see , Sections B, C, and D). The BASC is a comprehensive multi-informant system for measuring emotional and behavioral functioning in youth (Reynolds & Kamphaus, Citation2015). As a system, the BASC also includes tools for screening and progress-monitoring (Kamphaus & Reynolds, Citation2015). The BASC is included here due to its Anger Control scale, which is new to the third edition, although it also includes Aggression, Conduct Problems, Externalizing Problems, and other broadband and narrowband scales for emotional and adaptive functioning (Reynolds & Kamphaus, Citation2015). The BASC is commonly used in clinical as well as school settings, with nationally representative norms and evidence accruing over 3 editions and more than 3 decades (e.g., Canivez et al., Citation2021; Flanagan et al., Citation1996; Lochman, Dishion, et al., Citation2015).

Of the remaining measures covering anger, 13 were classified as Good and 5 as Adequate. Of note, many of these instruments offer multiple dimensions of anger-related constructs, such as anger expression in and out, anger control, coping, dysregulation, inhibition, hostility, cognitive, arousal, behavioral, and provocations (e.g., AARS, AESC, AXS, CAMS, ChIA, NAS-PI, PAES, STAXI; see ). Other measures offer only a single or total scale related to anger (e.g., BYI, PROMIS), although these can offer other benefits (e.g., a more comprehensive measurement suite, parent- and youth-report progress-monitoring). It is apparent from these anger measures that—unlike irritability and aggression—anger has historically been measured primarily by self-report, with all having a youth self-report version and only two (PROMIS, CAMS) having a corresponding parent-report version. We found no teacher-report measures of anger. It may be that views of “anger” have focused more on one’s subjective experiences of that emotion, whereas irritability and aggression measures are more likely to draw upon ratings from others (e.g., parents, teachers). We also observed a pattern that was generally unique to anger measures, where these instruments were often developed initially with a focus on adults (e.g., personality, hostility), with several measures being downward extensions (AQ, AXS, MMPI, CHI/HI, NAS-PI, PAI). Nonetheless, the evidence reviewed all came from youth studies.Footnote8

Measures Focusing on Aggression

Overall, 45 measures (66.2%) were identified for aggression (25 as the primary focus)—by far the largest focus of measurement included in this review. Ratings are listed in (Sections A and C) and (Sections C and D). Of these, only a few were classified as having Excellent psychometrics overall: the BASC-3, the Achenbach System of Empirically Based Assessment (ASEBA; Achenbach & Rescorla, Citation2001) and the Revised Behavior Problem Checklist (RBPC; Quay & Peterson, Citation1983). Comparable to the BASC (already discussed above), ASEBA is a suite of multi-informant measures of emotional and behavioral functioning that received the other highest psychometric rating in the sample. Comprehensive (CBCL, YSR, TRF; Achenbach & Rescorla, Citation2001) and progress-monitoring (BPM; Achenbach et al., Citation2017) instruments are available. The ASEBA goes back several decades, a variety of translations and norms are available, and the evidence base for validity, reliability, and treatment sensitivity is extensive (Ebesutani et al., Citation2011; Ivanova et al., Citation2019; Weisz et al., Citation2020; Youngstrom et al., Citation2003). Of note, Aggressive Behavior is one of the original empirically derived syndrome scales on the CBCL/YSR/TRF instruments, with relations to the externalizing composite, DSM Oppositional Problems, and various other scales. The RBPC, with non-diagnostic scales for Conduct Disorder and Socialized Aggression, has a similarly rich history going back several decades (Hinshaw et al., Citation1987; Lahey & Piacentini, Citation1985; Quay, Citation1983; Santisteban et al., Citation2017); however, this measure has been used infrequently in recent years, and our search suggests it may have gone out of print.

Among the remaining aggression measures classified as Good (n = 33) or Adequate (n = 8), a significant variety of subtypes and specialized foci can be seen. For example, some measures focus on cyber-aggression (CPEQ; Landoll et al., Citation2015) or aggression during play (PIPPS; Fantuzzo et al., Citation1995). In particular, several attend to the proactive vs. reactive functions of aggression, including the PRA (Dodge & Coie, Citation1987), the revised teacher PRA (Brown et al., Citation1996), and the RPQ (Raine et al., Citation2006), or different forms (e.g., direct, indirect; overt, relational) of aggression including RPEQ (Prinstein et al., Citation2001), CSBS (Crick, Citation1996), DIAS (Björkqvist et al., Citation1992), and the PBFS (Farrell et al., Citation2016). Notably, a few measures focus on both the forms and the functions of aggression, including the PCS (Marsee & Frick, Citation2007) and Little et al. (Citation2003). A handful of measures approach aggression as a single monothetic scale or conceptualization (e.g., ASCA/ASPI, IAB, interRAI, Jesness, MHBQ, PKBS, SBS, SSIS/SSRS, TOCA, TODS, WAI, MAVRIC, ICS, CBS), while others offer many different types of aggression (e.g., AQ, AS, CAS, C-SHARP, R-MOAS).

It is important to point out that many of the measures discussed above are relevant to aggression, even if not classified as such. For example, the ChIA, CHI/HI, EMO-I, RAQ, ROARS, and MAP-DB cover outbursts and tantrums—forms of aggressive behavior. Similarly, the, MMPIA, HCSBS/SSBS, and PAI-A all cover aggression in the context of personality or general behavior. Among this tier, rather than some measures being clearly better than others, what is most evident is varying foci as described above, with different strengths and weaknesses for different purposes (e.g., free vs. commercial; lengthy vs. brief; school vs. clinical). and S4 offer useful insights for deciding among measures in this way.

Measures of General Overt Externalizing Problems

Finally, 27 measures (39.7%) included scales covering general overt externalizing problems, 23 (33.8%) with this as the primary focus. Ratings of these measures are listed in (Section D) and (Sections B and C). In this category, 3 of the 4 measures (ABC, ASEBA, BASC) rated as Excellent have already been discussed above. The other was the ECBI/SESBI (Eyberg & Pincus, Citation1999). The ECBI is a widely used for measuring disruptive behavior problems in children, for screening and treatment monitoring and outcomes evaluation in a variety of populations (Boggs et al., Citation1990; Eyberg & Ross, Citation1978; Jeter et al., Citation2017; Nixon et al., Citation2004). The SESBI adapts the ECBI to school populations, with evidence from preschool to high school age (Bagner et al., Citation2010; Floyd et al., Citation2004; Querido & Eyberg, Citation2003). Several other measures in this category (e.g., ASCA/ASPI, PSC, Ruttter, SCL-90/BSI, SDQ, SBS, SEBS, SSIS/SSRS) are widely used brief or broad measures that cover overt externalizing problems in some form. Most measures in this category (n = 19 of 27) were rated as Good, while 4 were rated as Adequate. For brevity we will not summarize these ratings here, but the findings can be reviewed in Tables (primarily in ). Of note, these overt externalizing measures may not specifically assess for irritability, anger, or aggression. However, our search and evaluation methods led us to classify them as having scales or items that were mostly related to these constructs, broadly construed.

Discussion

We sought to provide a comprehensive review of the evidence base for assessing irritability, anger, aggression, and related problems in children and adolescents. Sixty-eight measures were identified and summarized in terms of their psychometric properties and descriptive characteristics, intended to provide a resource that may be useful to researchers and clinicians. Although these results are rich with detailed findings, a fine-grained methodological analysis of individual measures is beyond our scope (interested readers are referred to the tables and the supplement). Instead, results paint a general but reliable picture of the state of the evidence base for measuring these constructs, which we have conceptualized as a heterogeneous spectrum of overt externalizing affective and behavioral problems.

Evidence Base Update and Evaluation of Psychometric Properties

Several methodological themes emerged from the measures and evidence reviewed. Broadly, findings reflect favorably on the availability of research-supported tools for assessing irritability, anger, and aggression in youth. In aggregate, the evidence reached at least “Good” for most of the measures (52 of 68; 76.5%) and most of the psychometric dimensions (5 of 8; 62.5%) used to evaluate the measures: norms, internal consistency, content validity, construct validity, and validity generalization. These conclusions should be qualified by constraints that limited what measures were included, what evidence was reviewed, and other factors affecting psychometric ratings (see Limitations). Regarding specific psychometric dimensions, findings underscore the need for more targeted research on clinical utility, treatment sensitivity, and test-retest reliability—i.e., the three domains with the lowest overall ratings (“Adequate”). In addition, more work is needed on inter-rater reliability, repeatability, discriminative validity, and prescriptive validity—the original EBU dimensions (Youngstrom et al., Citation2017) that could not be included in this review. A few of these criteria warrant specific comment.

First, test-retest reliability and sensitivity to change both require two or more measurement occasions. In this regard, it is not surprising that these dimensions might lag behind others given that cross-sectional designs are so common. We recommend that measure developers and psychometric researchers strive to collect longitudinal data when possible, as these data are needed to evaluate test-retest reliability, sensitivity to change, and other key questions such as longitudinal invariance (Little, Citation2013). However, important tensions can arise between these two assumptions: that measures should be both stable over time (as evidence for test-retest reliability) and show evidence of change over time (as evidence for change sensitivity). Different measures may value these goals differently. For example, some measures of personality (e.g., MMPI-A) and temperament (Rothbart measures) provide “Adequate” or better long-term stability and test-retest reliability, but they are not necessarily sensitive to change (both rated as Insufficient Evidence for treatment sensitivity). Conversely, measures like the NCBRF and the CAS may function as “Excellent” measures for treatment sensitivity precisely because sensitivity to change is prioritized over longitudinal stability (rated as Insufficient Evidence and Adequate, respectively, for test-retest reliability). The standard for “Adequate” test-retest reliability (i.e., r = 0.7 over a week) may be sufficient, but stretching this period longer, over months and years, begins to reflect trait-like longitudinal stability of the construct, not just the reliability of the instrument. As researchers increasingly study within-person changes and variability on shorter time scales (e.g., over days, hours, and minutes), alternative indicators of reliability will be needed. Such measures were excluded from this review, largely due to the use of single-items and lack of proper names and standardized formats, but they represent an important future direction for research in this area (Evans et al., Citation2023).

Clinical utility is gaining momentum in many areas of assessment (De Los Reyes & Langer, Citation2018; Youngstrom et al., Citation2017), from diagnosis and classification (Keeley et al., Citation2016) to measurement-based care (McLeod et al., Citation2022). Whether something is clinically useful is an empirical question, and usually there is vanishingly little evidence to answer it. Of note, most measures are likely to offer some degree of clinically useful information—hence, 62 of 68 (91.2%) measures reviewed here were classified as Adequate for clinical utility, while those that fell short of this threshold were not intended for clinical use. But none reached “Good” or higher, as this requires “published evidence that using the assessment data confers clinical benefit (e.g., better outcome, lower attrition, greater satisfaction), in areas important to stakeholders.” Such evidence might require, for example, randomized trials showing better outcomes were achieved when clinicians used a certain measurement tool or method, relative to a control group that did not. Generating this evidence will take time and possibly a paradigm shift toward implementation science and measurement-based care, especially testing empirical questions about whether and how different assessment practices could actually affect clinical outcomes (McLeod et al., Citation2022).

Other psychometric categories do not require much elaboration (norms, internal consistency, content validity, construct validity, and validity generalization); however, it is worth noting what was not captured. For instance, norms focus on the representativeness of the samples in which they have been used, but do not capture whether or not a measure comes with t-scores or an empirically derived cutoff for clinical significance—features which would greatly enhance the utility of normative data for clinicians and researchers (see Table S4 for this information). Additional measurement characteristics may also be relevant to a measure’s utility, such as cost, length, and available translations, which were also not captured in existing categories rated, but are provided as additional information in the current review. Research on repeatability and prescriptive validity is so new that it was not useful to include in the review, and inter-rater reliability rarely applies beyond correlations between different informants (e.g., teacher with parent) which arguably should be considered as informants that are expected to have low correlations, around r = 0.28 (De Los Reyes et al., Citation2015). Lastly, discriminative validity, as defined by Youngstrom et al. (Citation2017), is primarily applicable to discriminating a particular diagnosis from non-diagnosis within a given population—not applicable to transdiagnostic approaches that are intended to cut across, rather than identify, specific diagnostic categories.

Although not an EBU dimension itself, the concepts of incremental validity and incremental utility are particularly important here. If irritability, anger, and aggression are indeed related to functioning, perhaps distinctly or in combination with one another, it follows that some future research is needed to focus on the validation and “value-added” from measuring each of these domains. For example, research is needed to examine whether irritability, anger, and aggression distinctly and incrementally predict clinically relevant validity criteria such as psychopathology, treatment response, performance on domain-relevant tasks, and associated features of the domains themselves (e.g., parenting, peer relations, social skills, sleep functioning). To test whether a measure confers some benefit “above and beyond” measurement of other variables is an important approach but rarely used in the articles we reviewed. Such work calls for multiple regression models, machine learning methods, and longitudinal designs to help tease apart what measures are the most robust, incremental predictors of relevant outcomes

We also call for more research on cross-informant associations—not necessarily as evidence for “inter-rater reliability,” but because findings regarding cross-informant patterns are informative in and of themselves. In particular, there is a need for research to quantify the level of agreement on assessing each of these constructs (irritability, anger, and aggression) according to each of several informants (parent, youth, teacher), within- vs. cross-informant, and same- vs. different-construct. Populating this correlation matrix would help elucidate the distinctions among these constructs and the role informants play, and could provide useful in understanding how context impacts displays of the behaviors used to estimate their presence. The present review found that researchers have relied almost exclusively on youth self-report for measuring anger, finding few studies/measures using parent-report and none that used teacher-report. This scenario is subject to mono-informant bias, making it difficult for researchers to understand different perspectives as well as different constructs. Informant discrepancies speak to issues surrounding measurement validity best understood from a multi-informant perspective (De Los Reyes et al., Citation2023). We call specifically for research on the role of informant perspectives and context in affecting ratings from specific informants and discrepancies among them. Multi-informant measures, particularly with parallel item content, should be prioritized, as they permit more robust tests of these questions.

Implications for Research on Youth Irritability, Anger, and Aggression

Several interesting findings were revealed by virtue of the types and properties of measures that were included. First, the overlap among the scales seems to support the view that our targeted constructs can reasonably represent a coherent spectrum of overt externalizing problems, which include affective and behavioral manifestations. Many, though not all, measures construed as measuring “externalizing behavior” also include many items, if not entire scales, that capture the prevailing emotions (irritability, anger, frustration) associated with externalizing problems. Although some measures and scales reflect the empirical associations between these problems, they also reveal that they are empirically distinct (e.g., MAP-DB: Temper Loss and Aggression scales).

Second, our searches, screening, and coding revealed that the measures of interest here cut across a number of contexts, including outpatient, inpatient, school-based, preschool, research, developmental, intervention, experimental, social, personality, and other types of work. Because these boundaries have led to some confusion and lack of clarity in the literature, we sought to be as inclusive as possible in order to align measures from different contexts side by side in the same synthesis. Terminological differences emerged from this process, warranting careful consideration. For example, aggression and bullying measures can capture the same behaviors, but contemporary definitions of bullying have focused these instruments in ways that go beyond aggression and should be considered separately. At the same time, aggression itself has several different subtypes, which are well-captured in this review (e.g., proactive vs. reactive, e.g., Dodge & Coie, Citation1987; overt vs. relational, e.g., Crick, Citation1996), and researchers and clinicians interested in better understanding aggressive behavior are advised to consider whether they are best served by a unidimensional instrument or by more focused subtypes of aggression.

Third, readers may note that specific measures appear to have been left out, notably including measures relevant to diagnostic assessment such as the DBDRS or the KSADS. As noted in the introduction, our transdiagnostic approach precluded us from including measures that were explicitly diagnostic. Yet at the same time, these excluded measures are often relevant to include in a comprehensive assessment of externalizing problems. For more comprehensive coverage of this topic, we refer readers to relevant chapters/books (e.g., Burke et al., Citation2023; Evans et al., Citation2024; Lochman & Matthys, Citation2018; Matthys & Lochman, Citation2017; Walker et al., Citation2020) and articles (McKinney & Morse, Citation2012; McMahon & Frick, Citation2005; Steiner & Remsing, Citation2007; Stringaris et al., Citation2018). However, we would also emphasize the importance of a transdiagnostic perspective, as these problems not only cut across a wide range of diagnostic categories, but they are also not well captured by any single category (e.g., Stepanova et al., Citation2022). These challenges have led us as a field to “reverse-engineer” provisional solutions to our diagnostic systems, such as a specifier for outbursts in DSM-5-TR (Carlson, Singh, et al., Citation2022), a subtype of ODD in DSM-5 (Evans et al., Citation2017), and syndromic targets for research programs (Leibenluft, Citation2011; Stepanova et al., Citation2022). Clearly this is both an active area of research and an area in which much work has yet to be done. The present review underscores the promise of a transdiagnostic and transdisciplinary approach in pursuing research on irritability, anger, and aggression.

Strengths, Limitations, and Future Directions

The present review has several strengths offering particular value for those interested in understanding or measuring irritability, anger, or aggression in youth. First, this is one of the most comprehensive reviews available, covering 4,664 unique articles during screening, and considering hundreds more sources through measure database queries, ancillary strategies, and follow-up searches (See and Supplement. This ultimately led us to identify an average of 9.9 relevant sources (8.8 peer-reviewed) per measure upon which to base our psychometric ratings. The stringent inclusion and exclusion criteria allowed us to reliably evaluate each measure and accurately judge whether or not it should be included for review. One of these criteria was that a measure must have at least three published, peer-reviewed articles on its use; dissertations and theses that had not been peer-reviewed were also excluded. The downside of this approach is that it omits newer measures and measures that have not been used as often in research contexts. Further, our screening workflow was carried out in such a way that we are not able to provide a comprehensive list of all the measures that were possibly relevant but were excluded for this or that specific reason (but see Table S3 for exemplar excluded measures and reasons why). However, the strength of this approach is that it allowed us to focus on only measures that had been implemented and studied in research, with sufficient peer-reviewed evidence for evaluation.

An important exclusion of any measure that was directly tied to any DSM diagnostic criteria sets, such (e.g., ODD, CD, or DMDD) as this review aimed to investigate transdiagnostic measures relating to irritability, anger, and aggression. This transdiagnostic approach allows these measures to be applicable across diagnoses because it is not for the purpose of giving a diagnosis. Further, this approach is cross-disciplinary, relevant to psychology, psychiatry, education, and other fields wanting to find an applicable measure for youth in their care. Our systematic review also includes detailed information about the 68 measures included, such as the number of items, age range, constructs of interest, and translations available. This information allows the review to also function as a tool to compare measures and determine which would be most applicable to the setting in question. Thus, this review provides a comprehensive look into the field of actionable measures of irritability, anger, and aggression in youth, where a current gap is apparent. We encourage researchers specializing in different substantive areas (e.g., trauma, ADHD, disruptive behavior problems, mood and anxiety disorders) to approach these measurement questions in a manner that is most appropriate for their population and problem of interest. At the same time, maintaining a dialogue with irritability research and researchers from different areas will likely help advance assessment practices more effectively for the most people.

Some further limitations also should be noted. First, although necessary for a transdiagnostic approach, our exclusion of measures tied to DSM led to the exclusion of measures that might otherwise be relevant clinically, while other exclusion criteria omitted newer measures and measures that have not been widely used in research (see Table S3 for details and examples). Second, the focus on English-speaking populations led to the omission of psychometric research and instruments from other languages and countries, which is important for future research. It would be useful for future work to examine the cross-cultural applicability between English and non-English versions of the measures evaluated in this study. Similarly, this study is not representative of measurement in adults, infants, or toddlers; measures developed for these age groups may be omitted here or have stronger evidence for their properties in those ages than reviewed here. Third, the psychometric evaluation process used in this study (Hunsley & Mash, Citation2008; Youngstrom et al., Citation2017) was vital, offering a way to tier the extent of evidence supporting each measure and provided a rubric from which to evaluate existing evidence. However, there are few psychometric studies with a specific focus in categories such as treatment sensitivity and clinical utility (Holly et al., Citation2019). Therefore, certain measures have lower psychometric strength overall despite having high ratings for properties like test-retest reliability and content validity. Further, the rubric has limited room for nuance in evaluating how good a particular piece of evidence is (e.g., large, randomized trials vs. small correlational studies), instead taking any evidence as sufficient to pass a given threshold. Future EBU projects should revisit how psychometric criteria are defined and operationalized, to promote greater reliability and validity in these ratings within and across review papers.

Interesting differences emerged between free and commercially available measures, such that the latter had better properties on average. This pattern is understandable given that companies have large, funded efforts behind them, with resources to draw from large, nationally representative samples for norms and data. Free measures, on the other hand, are commendable for their accessibility, and in some cases may be more extensively used in research providing a more complete and transparent picture of their psychometric properties. Thus, commercial and noncommercial test developers can learn from one another how to improve the tools they produce.

We have attempted to be as comprehensive as possible, and to define clear criteria to explain what gets omitted and why. Nonetheless, irritability, anger, and aggression measurement is an active and robust area of research, and it is possible that relevant measures were omitted. In particular, we carried out our searches and coding in 2021–2023. More research has been done during this time and will continue to be done in the years ahead. Thus, this EBU review can be viewed as a snapshot of measurement methods and evidence at a given time, underscoring the need for continued methodological development and research synthesis moving forward.

Summary and Conclusions

Despite its limitations, the present paper offers a useful review and toolkit for youth mental health assessment. Research on overt externalizing problems has come a long way over the past few decades, converging on a set of measures with empirical support for assessing irritability, anger, and/or aggressive behavior as transdiagnostic constructs. We identified, reviewed, and evaluated 68 of these measures. The availability of measures with Good to Excellent psychometric ratings is encouraging. At the same time, areas identified as Adequate or Insufficient Evidence highlight potential directions for future research, particularly clinical utility, incremental validity, and longitudinal stability vs. change sensitivity. Other priority areas for future research include advancing multi-informant approaches and understanding; disentangling overlapping constructs for a clearer theoretical conceptualization; clarifying associations with ecologically or clinically meaningful variables; and understanding the cross-cultural variability and applicability of measures. Although there is a place for developing new measures, what is more pressing now is continued evaluation of existing measures, and careful review, consolidation, and elaboration of these measures, including harmonizing and bridging to other informants and populations, as a foundation for future research in youth irritability, anger, and aggression.

Findings from this review are consistent with the impression that progress in this area may have been limited by diagnostic, disciplinary, methodological, and terminological boundaries. Moving forward, researchers are encouraged to be aware of and actively work to dismantle these boundaries, as this could help strengthen assessment methods and the empirical evidence base. When selecting measures, researchers and clinicians are advised to review multiple measures and consider what the evidence says about their strengths and weaknesses, for what purposes, and in what populations and settings. The present paper was written to help inform such decisions. In the tables and supplement, we have summarized key characteristics of these instruments, including number of items, age ranges, sample types, norms, instructions to access the measures, and whether alternative translations are available. There is no shortage of tools for assessing irritability, anger, or aggression in youth. Future research can continue to grow this toolkit—and evaluate the tools within it—to better meet the needs of irritable, angry, and aggressive youths in clinical and research settings.

Editor’s Note

This article is part of a special issue, “The Affective Side of Disruptive Behavior: Toward Better Understanding, Assessment, and Treatment,” published in the Journal of Clinical Child and Adolescent Psychology in 2024. Spencer C. Evans and Jeffrey D. Burke served as editors of the special issue; Andres De Los Reyes served as conflict-of-interest editor.

Supplemental material

Supplemental Material

Download MS Word (173.2 KB)

Acknowledgments

The authors thank Nour Abduljawad, Patricia Gao, Sophia Ross, and Kate Simmons for their assistance in carrying out this project.

Disclosure Statement

No potential conflict of interest was reported by the author(s).

Data Availability Statement

This is a review paper that has no data associated with it. The supplement provides detailed information (references, links) on all the sources consulted to inform this review.

Supplementary Data

Supplemental material for this article can be accessed online at https://doi.org/10.1080/15374416.2023.2292041.

Additional information

Funding

This project was not specifically associated with any source of funding. SCE received partial support from the AIM Clinical Science Fellowship, NIH Loan Repayment Program (L30MH120708), and University of Miami faculty start-up funds during the preparation of this work.

Notes

1 For relevant diagnostic guidance on assessment, readers are referred to the following sources (Burke et al., Citation2023; Evans et al., Citation2024; Lochman & Matthys, Citation2018; Matthys & Lochman, Citation2017; McKinney & Morse, Citation2012; McMahon & Frick, Citation2005; Steiner & Remsing, Citation2007; Stringaris et al., Citation2018; Walker et al., Citation2020).

2 For the sake of illustration, nearly all specific problems related to anxiety (e.g., separation anxiety disorder, specific phobia, social anxiety disorder, panic disorder, agoraphobia, generalized anxiety disorder) are classified as Anxiety Disorders in DSM-5; and most specific problems related to low mood (e.g., major depressive disorder, major depressive episodes, persistent depressive disorder) are classified as Depressive Disorders in DSM-5. There is no such section for disorders of irritability, anger, or aggression. We lump together “irritability, anger, and aggression” here for reasons outlined in the next section.

3 Some iterations of this definition specify that irritability is an increased proneness to anger compared to peers, a between-person conceptualization. More recently, there has been increasing recognition that irritability can also be viewed as an increased proneness to anger compared to one’s oneself or one’s own baseline (i.e., a within-person, contextual, or longitudinal conceptualization). We adopt a definition that accommodates both views.

4 Multi-informant data are limited, and this is not a comprehensive review, but these studies suggest that within-informant correlations vary somewhat across informants (roughly 0.4–0.8 for parent-report, 0.6–0.9 for teacher-report, and 0.2–0.7 for youth-report) and drop considerably when the same construct is rated by different informants (most cross-informant correlations around 0.1–0.5), with no clear construct-specific patterns.

5 We define “psychometric results” broadly based on the rubric’s domains (see ). Importantly, this definition did not limit our focus exclusively to assessment papers. For example, any relevant empirical study reporting Cronbach’s alpha could be considered for rating internal consistency, or means/SDs for rating norms. At the same time, however, we did not simply review all empirical studies that used a measure and reported statistics from it. In practice, this approach meant that we leaned toward including all relevant empirical papers for measures that were newer or less frequently used. In contrast, for widely used measures with dozens or hundreds of studies, we were more selective to identify a robust corpus of relevant evidence to inform our evaluations. More on this below.

Table 1. Rubric for evaluating norms, validity, and utility (Hunsley & Mash, Citation2008; extended by Youngstrom et al., Citation2017).

6 Aggression refers to any behavior enacted with an intent to harm another person who does not want to be harmed (Baron & Richardson, Citation1994). Bullying is defined by three criteria: (a) the intentionality criterion from aggression, plus (b) repetitiveness of the aggressive behaviors over time, and (c) a power imbalance favoring the perpetrator over victim (Olweus, Citation2013). Given the conceptual overlap, we included “bullying” in our search strategies—to capture tools that measured youth aggression broadly—but we explicitly excluded scales that were defined by the full definition of bullying, as that was beyond the focus of the review.

7 The following criteria were omitted: (a) interrater reliability, because only multi-informant data could be construed as such; this was rarely reported, and we view different informants as different perspectives, not meant to show high levels of agreement (De Los Reyes et al., Citation2015, Citation2023); (b) repeatability, because evaluation of this essentially required Bland-Altman plots and/or a coefficient of repeatability, which was virtually never reported; (c) discriminative validity, because this usually focused on discriminating among those with vs. without a diagnosis, but our transdiagnostic approach excludes diagnoses, leaving little basis for discriminative validity; and similarly (d) prescriptive validity, involves identifying a diagnosis with a well-specified matching intervention or a significant treatment moderator, and this seemed rare and not an appropriate fit for our transdiagnostic approach.

8 This was the case for all measures and studies included in this review. Thus, any “adult/youth measures” were evaluated solely for their performance among children and adolescents, defined as non-undergraduate samples with a mean age between 3.0 years and 18.9 years. Evidence from adult samples was not considered.

References

  • Achenbach, T. M., McConaughy, S. H., Ivanova, M. Y., & Rescorla, L. A. (2017). Manual for the ASEBA brief problem monitor for ages 6–18. University of Vermont Research Center for Children, Youth, and Families.
  • Achenbach, T. M., & Rescorla, L. A. (2001). Manual for the ASEBA school-age forms & profiles. University of Vermont Research Center for Children, Youth, and Families.
  • Adams, C. D., Kelley, M. L., & McCarthy, M. (1997). The adolescent behavior checklist: Development and initial psychometric properties of a self-report measure for adolescents with ADHD. Journal of Clinical Child Psychology, 26(1), 77–86. https://doi.org/10.1207/s15374424jccp2601_8
  • Adrian, M., Zeman, J., & Veits, G. (2011). Methodological implications of the affect revolution: A 35-year review of emotion regulation assessment in children. Journal of Experimental Child Psychology, 110(2), 171–197. https://doi.org/10.1016/j.jecp.2011.03.009
  • Althoff, R. R., & Ametti, M. (2021). Measurement of dysregulation in children and adolescents. Child and Adolescent Psychiatric Clinics of North America, 30(2), 321–333. https://doi.org/10.1016/j.chc.2020.10.004
  • Althoff, R. R., Verhulst, F. C., Rettew, D. C., Hudziak, J. J., & van der Ende, J. (2010). Adult outcomes of childhood dysregulation: A 14-year follow-up study. Journal of the American Academy of Child & Adolescent Psychiatry, 49(11), 1105–1116. https://doi.org/10.1016/j.jaac.2010.08.006
  • Aman, M. G., & Singh, N. N. (2017). Aberrant behavior checklist (2nd ed.). Slosson Educational Publications.
  • Aman, M. G., Tassé, M. J., Rojahn, J., & Hammer, D. (1996). The Nisonger CBRF: A child behavior rating form for children with developmental disabilities. Research in Developmental Disabilities, 17(1), 41–57. https://doi.org/10.1016/0891-4222(95)00039-9
  • American Academy of Child and Adolescent Psychiatry. (2022). Resource centers. Emotion dysregulation: Resources for clinicians. https://www.aacap.org/AACAP/Families_and_Youth/Resource_Centers/Emotional_Dysregulation/Resources_for_Clinicians.aspx
  • American Psychiatric Association. (2013). Diagnostic and statistical manual of mental disorders (5th ed.). American Psychiatric Association Publishing.
  • American Psychiatric Association. (2022a). Diagnostic and statistical manual of mental disorders (5th ed., text revision). American Psychiatric Association Publishing.
  • American Psychiatry Association. (2022b). DSM-5-TR online assessment measures. https://psychiatry.org/psychiatrists/practice/dsm/educational-resources/assessment-measures
  • Armstrong, J. M., Goldstein, L. H., & The MacArthur Working Group on Outcome Assessment. (2003). The MacArthur Health and Behavior Questionnaire (HBQ 1.0) [PDF]. https://macarthurhbq.wordpress.com/materials/
  • Bagner, D. M., Boggs, S. R., & Eyberg, S. M. (2010). Evidence-based school behavior assessment of externalizing behavior in young children. Education & Treatment of Children, 33(1), 65–83. https://doi.org/10.1353/etc.0.0084
  • Barata, P. C., Holtzman, S., Cunningham, S., O’Connor, B. P., & Stewart, D. E. (2016). Building a definition of irritability from academic definitions and lay descriptions. Emotion Review, 8(2), 164–172. https://doi.org/10.1177/1754073915576228
  • Baron, R. A., & Richardson, D. R. (1994). Human aggression. Springer Science & Business Media.
  • Beck, J. S., Beck, A. T., Jolly, J. B., & Steer, R. A. (2005). BECK youth inventories (2nd ed.). NCS Pearson.
  • Becker-Haimes, E. M., Tabachnick, A. R., Last, B. S., Stewart, R. E., Hasan-Granier, A., & Beidas, R. S. (2020). Evidence base update for brief, free, and accessible youth mental health measures. Journal of Clinical Child & Adolescent Psychology, 49(1), 1–17. https://doi.org/10.1080/15374416.2019.1689824
  • Behar, L., & Stringfield, S. (1974). A behavior rating scale for the preschool child. Developmental Psychology, 10(5), 601. https://doi.org/10.1037/h0037058
  • Ben-Porath, Y. S., & Tellegen, A. (2020). Minnesota Multiphasic Personality Inventory-3 (MMPI-3): Manual for administration, scoring, and interpretation. University of Minnesota Press.
  • Björkqvist, K., Lagerspetz, K. M., & Kaukiainen, A. (1992). Do girls manipulate and boys fight? Developmental trends in regard to direct and indirect aggression. Aggressive Behavior, 18(2), 117–127. https://doi.org/10.1002/1098-2337(1992)18:2<117:AID-AB2480180205>3.0.CO;2-3
  • Blake, C. S., & Hamrin, V. (2007). Current approaches to the assessment and management of anger and aggression in youth: A review. Journal of Child and Adolescent Psychiatric Nursing, 20(4), 209–221. https://doi.org/10.1111/j.1744-6171.2007.00102.x
  • Bland, J. M., & Altman, D. G. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. Lancet, 1(8476), 307–310. https://doi.org/10.1016/S0140-6736(86)90837-8
  • Boggs, S. R., Eyberg, S., & Reynolds, L. A. (1990). Concurrent validity of the Eyberg child behavior inventory. Journal of Clinical Child Psychology, 19(1), 75–78. https://doi.org/10.1207/s15374424jccp1901_9
  • Brotman, M. A., Kircanski, K., & Leibenluft, E. (2017). Irritability in children and adolescents. Annual Review of Clinical Psychology, 13(1), 317–341. https://doi.org/10.1146/annurev-clinpsy-032816-044941
  • Brotman, M. A., Schmajuk, M., Rich, B. A., Dickstein, D. P., Guyer, A. E., Costello, E. J., Egger, H. L., Angold, A., Pine, D. S., & Leibenluft, E. (2006). Prevalence, clinical correlates, and longitudinal course of severe mood dysregulation in children. Biological Psychiatry, 60(9), 991–997. https://doi.org/10.1016/j.biopsych.2006.08.042
  • Brown, K., Atkins, M. S., Osborne, M. L., & Milnamow, M. (1996). A revised teacher rating scale for reactive and proactive aggression. Journal of Abnormal Child Psychology, 24(4), 473–480. https://doi.org/10.1007/BF01441569
  • Brunner, T. M., & Spielberger, C. D. (2009). State-trait anger expression inventory-2 Child and adolescent. Psychological Assessment Resources.
  • Budman, C. L., Rockmore, L., Stokes, J., & Sossin, M. (2003). Clinical phenomenology of episodic rage in children with Tourette syndrome. Journal of Psychosomatic Research, 55(1), 59–65. https://doi.org/10.1016/S0022-3999(02)00584-6
  • Burke, J. D., Butler, E. J., Shaughnessy, S., Karlovich, A. R., & Evans, S. C. (2023). Evidence-based assessment of DSM-5 disruptive, impulse control, and conduct disorders. Assessment, 10731911231188739. https://doi.org/10.1177/10731911231188739
  • Burke, J. D., Loeber, R., Lahey, B. B., & Rathouz, P. J. (2005). Developmental transitions among affective and behavioral disorders in adolescent boys. Journal of Child Psychology and Psychiatry, 46(11), 1200–1210. https://doi.org/10.1111/j.1469-7610.2005.00422.x
  • Burke, J. D., Rowe, R., & Boylan, K. (2014). Functional outcomes of child and adolescent oppositional defiant disorder symptoms in young adult men. Journal of Child Psychology and Psychiatry, 55(3), 264–272. https://doi.org/10.1111/jcpp.12150
  • Burks, H. F., & Gruber, C. P. (2006). Burks behavior rating scales (2nd ed.). Western Psychological Services.
  • Burney, D. M. (2001). Adolescent anger rating scale. Psychological Assessment Resources.
  • Buss, A. H., & Durkee, A. (1957). An inventory for assessing different kinds of hostility. Journal of Consulting & Clinical Psychology, 21(4), 343–349. https://doi.org/10.1037/h0046900
  • Buss, A. H., & Warren, W. L. (2000). Aggression Questionnaire. Western Psychological Services.
  • Butcher, J. N., Williams, C. L., Graham, J. R., Archer, R. P., Tellegen, A., Ben-Porath, Y. S., & Kaemmer, B. (1992). Minnesota multiphasic personality inventory – adolescent. University of Minnesota Press.
  • Cairns, R. B., Leung, M. C., Gest, S. D., & Cairns, B. D. (1995). A brief method for assessing social development: Structure, reliability, stability, and developmental validity of the interpersonal competence scale. Behaviour Research and Therapy, 33(6), 725–736.
  • Canivez, G. L., von der Embse, N. P., & McGill, R. J. (2021). Construct validity of the BASC-3 teacher rating scales: Independent hierarchical exploratory factor analyses with the normative sample. School Psychology, 36(4), 235–254. https://doi.org/10.1037/spq0000444
  • Capaldi, D. M., & Rothbart, M. K. (1992). Development and validation of an early adolescent temperament measure. The Journal of Early Adolescence, 12(2), 153–173. https://doi.org/10.1177/0272431692012002002
  • Carlson, G. A., Potegal, M., Margulies, D., Gutkovich, Z., & Basile, J. (2009). Rages–what are they and who has them? Journal of Child and Adolescent Psychopharmacology, 19(3), 281–288. https://doi.org/10.1089/cap.2008.0108
  • Carlson, G. A., Silver, J., & Klein, D. N. (2022). Psychometric properties of the Emotional Outburst Inventory (EMO-I): Rating what children do when they are irritable. The Journal of Clinical Psychiatry, 83(2), e1–e8. Advance online publication. https://doi.org/10.4088/JCP.21m14015
  • Carlson, G. A., Singh, M. K., Amaya-Jackson, L., Benton, T. D., Althoff, R. R., Bellonci, C., Bostic J. Q., Chua J. D., Findling R. L., Galanter C. A., & McClellan. (2022). Narrative review: Impairing emotional outbursts: What they are and what we should do about them. Journal of the American Academy of Child & Adolescent Psychiatry, Advance online publication. https://doi.org/10.1016/j.jaac.2022.03.014
  • Coccaro, E. F. (2020). The overt aggression scale modified (OAS-M) for clinical trials targeting impulsive aggression and intermittent explosive disorder: Validity, reliability, and correlates. Journal of Psychiatric Research, 124, 50–57. https://doi.org/10.1016/j.jpsychires.2020.01.007
  • Collett, B. R., Ohan, J. L., & Myers, K. M. (2003). Ten-year review of rating scales. VI: Scales assessing externalizing behaviors. Journal of the American Academy of Child & Adolescent Psychiatry, 42(10), 1143–1170. https://doi.org/10.1097/00004583-200310000-00006
  • Cook, C. R. (2012). Student Externalizing Behavior Screener. School Psychology Quarterly, 30(2), 166–183. https://doi.org/10.1037/spq0000102
  • Crick, N. R. (1996). The role of overt aggression, relational aggression, and prosocial behavior in the prediction of children’s future social adjustment. Child Development, 67(5), 2317–2327. https://doi.org/10.2307/1131625
  • Crick, N. R., & Grotpeter, J. K. (1995). Relational aggression, gender, and social-psychological adjustment. Child Development, 66(3), 710–722. https://doi.org/10.2307/1131945
  • Dalgleish, T., Black, M., Johnston, D., & Bevan, A. (2020). Transdiagnostic approaches to mental health problems: Current status and future directions. Journal of Consulting and Clinical Psychology, 88(3), 179–195. https://doi.org/10.1037/ccp0000482
  • De Los Reyes, A., Augenstein, T. M., Wang, M., Thomas, S. A., Drabick, D. A., Burgers, D. E., & Rabinowitz, J. (2015). The validity of the multi-informant approach to assessing child and adolescent mental health. Psychological Bulletin, 141(4), 858–900. https://doi.org/10.1037/a0038498
  • De Los Reyes, A., & Langer, D. A. (2018). Assessment and the journal of clinical child and adolescent psychology’s evidence base updates series: Evaluating the tools for gathering evidence. Journal of Clinical Child & Adolescent Psychology, 47(3), 357–365. https://doi.org/10.1080/15374416.2018.1458314
  • De Los Reyes, A., Wang, M., Lerner, M. D., Makol, B. A., Fitzpatrick, O., & Weisz, J. R. (2023). The operations triad model and youth mental health assessments: Catalyzing a paradigm shift in measurement validation. Journal of Clinical Child and Adolescent Psychology, 52(1), 19–54. https://doi.org/10.1080/15374416.2022.2111684
  • Derogatis, L. R. (1993). Brief Symptom Inventory (BSI): Administration, scoring, and procedures manual (3rd ed.). NCS Pearson.
  • Derogatis, L. R. (1994). Symptom Checklist-90-R (SCL-90-R): Administration, scoring, and procedures manual (3rd ed.). NCS Pearson.
  • Dodge, K. A. & Coie, J. D. (1987). Social-information-processing factors in reactive and proactive aggression in children’s peer groups. Journal of Personality & Social Psychology, 53(6), 1146–1158. https://doi.org/10.1037/0022-3514.53.6.1146
  • Dougherty, L. R., Galano, M. M., Chad Friedman, E., Olino, T. M., Bufferd, S. J., & Klein, D. N. (2021). Using item response theory to compare irritability measures in early adolescent and childhood samples. Assessment, 28(3), 918–927. https://doi.org/10.1177/1073191120936363
  • Ebesutani, C., Bernstein, A., Martinez, J. I., Chorpita, B. F., & Weisz, J. R. (2011). The youth self report: Applicability and validity across younger and older youths. Journal of Clinical Child & Adolescent Psychology, 40(2), 338–346. https://doi.org/10.1080/15374416.2011.546041
  • Eckhardt, C., Norlander, B., & Deffenbacher, J. (2004). The assessment of anger and hostility: A critical review. Aggression and Violent Behavior, 9(1), 17–43. https://doi.org/10.1016/S1359-1789(02)00116-7
  • Ehrenreich-May, J., & Chu, B. C. (2014). Overview of transdiagnostic mechanisms and treatments for youth psychopathology. In J. Ehrenreich-May & B. Chu (Eds.), Transdiagnostic treatments for children and adolescents: Principles and practice (pp. 3–14). Guilford Press.
  • Essex, M. J., Boyce, W. T., Goldstein, L. H., Armstrong, J. M., Kraemer, H. C., Kupfer, D. J., & MacArthur Assessment Battery Working Group. (2002). The confluence of mental, physical, social, and academic difficulties in middle childhood. II: Developing the MacArthur health and behavior questionnaire. Journal of the American Academy of Child & Adolescent Psychiatry, 41(5), 588–603. https://doi.org/10.1097/00004583-200205000-00017
  • Etkin, R. G., Lebowitz, E. R., & Silverman, W. K. (2021). Using evaluative criteria to review youth anxiety measures, part II: Parent-report. Journal of Clinical Child & Adolescent Psychology, 50(2), 155–176. https://doi.org/10.1080/15374416.2021.1878898
  • Etkin, R. G., Shimshoni, Y., Lebowitz, E. R., & Silverman, W. K. (2021). Using evaluative criteria to review youth anxiety measures, part I: Self-report. Journal of Clinical Child & Adolescent Psychology, 50(1), 58–76. https://doi.org/10.1080/15374416.2020.1802736
  • Evans, S. C., Abel, M. R., Doyle, R. L., Skov, H., & Harmon, S. L. (2021). Measurement and correlates of irritability in clinically referred youth: Further examination of the affective reactivity index. Journal of Affective Disorders, 283, 420–429. https://doi.org/10.1016/j.jad.2020.11.002
  • Evans, S. C., Burke, J. D., Roberts, M. C., Fite, P. J., Lochman, J. E., Francisco, R., & Reed, G. M. (2017). Irritability in child and adolescent psychopathology: An integrative review for ICD-11. Clinical Psychology Review, 53, 29–45. https://doi.org/10.1016/j.cpr.2017.01.004
  • Evans, S. C., Corteselli, K. A., Edelman, A., Scott, H., & Weisz, J. R. (2022). Is irritability a top problem in youth mental health care? A multi-informant, multi-method investigation. Child Psychiatry & Human Development, 54(4), 1027–1041. Advance online. https://doi.org/10.1007/s10578-021-01301-8
  • Evans, S. C., de la Peña, F. R., Matthys, W., & Lochman, J. E. (2024). Disruptive behaviour and dissocial disorders and Attention-Deficit/Hyperactivity disorder. In G. M. Reed, P.-L.-J. Ritchie, & A. Maercker (Eds.), A psychological approach to diagnosis using the ICD-11 as a framework. American Psychological Association and International Union of Psychological Science. https://www.apa.org/pubs/books/psychological-approach-diagnosis-icd-11
  • Evans, S. C., Pederson, C. A., Fite, P. J., Blossom, J. B., & Cooley, J. L. (2016). Teacher-reported irritable and defiant dimensions of oppositional defiant disorder: Social, behavioral, and academic correlates. School Mental Health, 8(2), 292–304. https://doi.org/10.1007/s12310-015-9163-y
  • Evans, S. C., Shaughnessy, S., & Karlovich, A. R. (2023). Future directions in youth irritability research. Journal of Clinical Child and Adolescent Psychology, 52(5), 716–734. Advance online publication. https://doi.org/10.1080/15374416.2023.2209180
  • Eyberg, S., & Pincus, D. (1999). Eyberg child behavior inventory and sutter-eyberg student behavior inventory-revised. Psychological Assessment Resources.
  • Eyberg, S. M., & Ross, A. W. (1978). Assessment of child behavior problems: The validation of a new inventory. Journal of Clinical Child & Adolescent Psychology, 7(2), 113–116. https://doi.org/10.1080/15374417809532835
  • Ezpeleta, L., Penelo, E., de la Osa, N., Navarro, J. B., & Trepat, E. (2020). How the Affective Reactivity Index (ARI) works for teachers as informants. Journal of Affective Disorders, 261, 40–48. https://doi.org/10.1016/j.jad.2019.09.080
  • Fantuzzo, J., Sutton-Smith, B., Coolahan, K. C., Manz, P. H., Canning, S., & Debnam, D. (1995). Assessment of preschool play interaction behaviors in young low-income children: Penn interactive peer play scale. Early Childhood Research Quarterly, 10(1), 105–120. https://doi.org/10.1016/0885-2006(95)90028-4
  • Farmer, C. A., & Aman, M. G. (2009). Development of the children’s scale of hostility and aggression: Reactive/proactive (C-SHARP). Research in Developmental Disabilities, 30(6), 1155–1167. https://doi.org/10.1016/j.ridd.2009.03.001
  • Farrell, A. D., Sullivan, T. N., Goncy, E. A., & Le, A. T. H. (2016). Assessment of adolescents’ victimization, aggression, and problem behaviors: Evaluation of the problem behavior frequency scale. Psychological Assessment, 28(6), 702. https://doi.org/10.1037/pas0000225
  • Feindler, E. L., & Engel, E. C. (2011). Assessment and intervention for adolescents with anger and aggression difficulties in school settings. Psychology in the Schools, 48(3), 243–253. https://doi.org/10.1002/pits.20550
  • Fite, P. J., Craig, J., Colder, C. R., Lochman, J. E., & Wells, K. C. (2018). Proactive and reactive aggression. In R. J. R. Levesque (Ed.), Encyclopedia of adolescence. Springer. https://doi.org/10.1007/978-3-319-33228-4_211
  • Flanagan, D. P., Alfonso, V. C., Primavera, L. H., Povall, L., & Higgins, D. (1996). Convergent validity of the BASC and SSRS: Implications for social skills assessment. Psychology in the Schools, 33(1), 13–23. https://doi.org/10.1002/(SICI)1520-6807(199601)33:1<13:AID-PITS2>3.0.CO;2-X
  • Floyd, E. M., Rayfield, A., Eyberg, S. M., & Riley, J. L. (2004). Psychometric properties of the Sutter-eyberg student behavior inventory with rural middle school and high school children. Assessment, 11(1), 64–72. https://doi.org/10.1177/1073191103260945
  • Frazier, T. W., Youngstrom, E. A., Haycook, T., Sinoff, A., Dimitriou, F., Knapp, J., & Sinclair, L. (2010). Effectiveness of medication combined with intensive behavioral intervention for reducing aggression in youth with autism spectrum disorder. Journal of Child and Adolescent Psychopharmacology, 20(3), 167–177. https://doi.org/10.1089/cap.2009.0048
  • Freeman, A. J., Youngstrom, E. A., Youngstrom, J. K., & Findling, R. L. (2016). Disruptive mood dysregulation disorder in a community mental health clinic: Prevalence, comorbidity and correlates. Journal of Child and Adolescent Psychopharmacology, 26(2), 123–130. https://doi.org/10.1089/cap.2015.0061
  • Freitag, G. F., Grassie, H. L., Jeong, A., Mallidi, A., Comer, J. S., Ehrenreich-May, J., & Brotman, M. A. (2023). Systematic review: questionnaire-based measurement of emotion dysregulation in children and adolescents. Journal of the American Academy of Child & Adolescent Psychiatry, 62(7), 728–763. Advance online publication. https://doi.org/10.1016/j.jaac.2022.07.866
  • Frick, P. J., Lahey, B. B., Loeber, R., Tannenbaum, L., Van Horn, Y., Christ, M. A. G., & Hart, E. A., Hanson, K. (1993). Oppositional defiant disorder and conduct disorder: A meta-analytic review of factor analyses and cross-validation in a clinic sample. Clinical Psychology Review, 13(4), 319–340. https://doi.org/10.1016/0272-7358(93)90016-F
  • Ghandour, R. M., Sherman, L. J., Vladutiu, C. J., Ali, M. M., Lynch, S. E., Bitsko, R. H., & Blumberg, S. J. (2019). Prevalence and treatment of depression, anxiety, and conduct problems in US children. The Journal of Pediatrics, 206, 256–267. https://doi.org/10.1016/j.jpeds.2018.09.021
  • Goodman, R. (1997). The strengths and difficulties questionnaire: A research note. Journal of Child Psychology and Psychiatry, 38(5), 581–586. https://doi.org/10.1111/j.1469-7610.1997.tb01545.x
  • Goodman, G., Bass, J. N., Geenens, D. L., & Popper, C. W. (2006). The MAVRIC–C and MAVRIC–P: A preliminary reliability and validity study. Journal of Personality Assessment, 86(3), 273–290. https://doi.org/10.1207/s15327752jpa8603_04
  • Goulter, N., Hur, Y. S., Jones, D. E., Godwin, J., McMahon, R. J., Dodge, K. A., Lansford, J. E., Lochman, J. E., Bates, J. E., Pettit, G. S., & Crowley, D. M. (2023). Kindergarten conduct problems are associated with monetized outcomes in adolescence and adulthood. Journal of Child Psychology and Psychiatry, Advance online publication. https://doi.org/10.1111/jcpp.13837
  • Gresham, F. M., & Elliott, S. N. (2008). Social skills improvement system. NCS Pearson.
  • Gresham, F. M., & Elliott, S. N. (2017). Social skills improvement system: Social-emotional learning edition. NCS Pearson.
  • Grisso, T., & Barnum, R. (2006, 2). Massachusetts youth screening instrument. Professional Resource Exchange.
  • Haller, S. P., Kircanski, K., Stringaris, A., Clayton, M., Bui, H., Agorsor, C., Cardenas, S. I., Towbin, K. E., Pine, D. S., Leibenluft, E., & Brotman, M. A. (2020). The clinician affective reactivity index: Validity and reliability of a clinician-rated assessment of irritability. Behavior Therapy, 51(2), 283–293. https://doi.org/10.1016/j.beth.2019.10.005
  • Halperin, J. M., & McKay, K. E. (2008). Children’s aggression scale. Psychological Assessment Resources.
  • Hawes, D. J., Gardner, F., Dadds, M. R., Frick, P. J., Kimonis, E. R., Burke, J. D., & Fairchild, G. (2023). Oppositional defiant disorder. Nature Reviews Disease Primers, 9(1), 31. https://doi.org/10.1038/s41572-023-00441-6
  • Hinshaw, S. P., Morrison, D. C., Carte, E. T., & Cornsweet, C. (1987). Factorial dimensions of the revised behavior problem checklist: Replication and validation within a kindergarten sample. Journal of Abnormal Child Psychology, 15(2), 309–327. https://doi.org/10.1007/BF00916357
  • Holly, L. E., Fenley, A. R., Kritikos, T. K., Merson, R. A., Abidin, R. R., & Langer, D. A. (2019). Evidence-base update for parenting stress measures in clinical samples. Journal of Clinical Child & Adolescent Psychology, 48(5), 685–705. https://doi.org/10.1080/15374416.2019.1639515
  • Hubbard, J. A., McAuliffe, M. D., Morrow, M. T., & Romano, L. J. (2010). Reactive and proactive aggression in childhood and adolescence: Precursors, outcomes, processes, experiences, and measurement. Journal of Personality, 78(1), 95–118. https://doi.org/10.1111/j.1467-6494.2009.00610.x
  • Hunsley, J., & Mash, E. J. (2008). Developing criteria for evidence-based assessment: An introduction to assessments that work. In J. Hunsley & E. J. Mash (Eds.), A guide to assessments that work (pp. 3–14). Oxford University Press.
  • Hyman, S. E. (2010). The diagnosis of mental disorders: The problem of reification. Annual Review of Clinical Psychology, 6, 155–179.
  • Irwin, D. E., Stucky, B. D., Langer, M. M., Thissen, D., DeWitt, E. M., Lai, J. S., Yeatts, K. B., Varni, J. W., & DeWalt, D. A. (2012). PROMIS pediatric anger scale: An item response theory analysis. Quality of Life Research, 21(4), 697–706. https://doi.org/10.1007/s11136-011-9969-5
  • Ivanova, M. Y., Achenbach, T. M., Rescorla, L. A., Guo, J., Althoff, R. R., Kan, K., Almqvist, F., Begovac, I., Broberg, A.G., Chahed, M., Monzani da Rocha, M., Dobrean, A., Döepfner, M., Erol, N., Fombonne, E., Castro Fonseca, A., Forns, M., Frigerio, A., Grietens, H.,... Verhulst, F. C. (2019). Testing syndromes of psychopathology in parent and youth ratings across societies. Journal of Clinical Child & Adolescent Psychology, 48(4), 596–609. https://doi.org/10.1080/15374416.2017.1405352
  • Jacobs, G. A., Phelps, M., & Rohrs, B. (1989). Assessment of anger expression in children: The pediatric anger expression scale. Personality and Individual Differences, 10(1), 59–65. https://doi.org/10.1016/0191-8869(89)90178-5
  • Jellinek, M. S., & Murphy, J. M. (1988). Screening for psychosocial disorders in pediatric practice. American Journal of Diseases of Children, 142(11), 1153–1157. https://doi.org/10.1001/archpedi.1988.02150110031013
  • Jesness, C. F. (2003). Jesness inventory – revised. Multi-Health Systems.
  • Jeter, K., Zlomke, K., Shawler, P., & Sullivan, M. (2017). Comprehensive psychometric analysis of the eyberg child behavior inventory in children with autism spectrum disorder. Journal of Autism and Developmental Disorders, 47(5), 1354–1368. https://doi.org/10.1007/s10803-017-3048-x
  • Kamphaus, R. W., & Reynolds, C. R. (2015). Behavioral and emotional screening system. NCS Pearson.
  • Kay, S. R., Wolkenfeld, F., & Murrill, L. M. (1988). Profiles of aggression among psychiatric patients: I. Nature and prevalence. The Journal of Nervous and Mental Disease, 176(9), 539–546. https://doi.org/10.1097/00005053-198809000-00007
  • Kazdin, A. E., & Esveldt-Dawson, K. (1986). The interview for antisocial behavior: Psychometric characteristics and concurrent validity with child psychiatric inpatients. Journal of Psychopathology and Behavioral Assessment, 8(4), 289–303. https://doi.org/10.1007/BF00960727
  • Kazdin, A. E., Rodgers, A., Colbus, D., & Siegel, T. (1987). Children’s hostility inventory: Measurement of aggression and hostility in psychiatric inpatient children. Journal of Clinical Child Psychology, 16(4), 320–328. https://doi.org/10.1207/s15374424jccp1604_5
  • Keeley, J. W., Reed, G. M., Roberts, M. C., Evans, S. C., Medina-Mora, M. E., Robles, R., Rebello, T., Sharan, P., Gureje, O., First, M. B., Andrews, H. F., Ayuso-Mateos, J. L., Gaebel, W., Zielasek, J., & Saxena, S. (2016). Developing a science of clinical utility in diagnostic classification systems: Field study strategies for ICD-11 mental and behavioral disorders. American Psychologist, 71(1), 3–16. https://doi.org/10.1037/a0039972
  • Kerr, M. A., & Schneider, B. H. (2008). Anger expression in children and adolescents: A review of the empirical literature. Clinical Psychology Review, 28(4), 559–577. https://doi.org/10.1016/j.cpr.2007.08.001
  • Koth, C. W., Bradshaw, C. P., & Leaf, P. J. (2009). Teacher observation of classroom adaptation–checklist: Development and factor structure. Measurement and Evaluation in Counseling and Development, 42(1), 15–30. https://doi.org/10.1177/0748175609333560
  • Kronenberger, W. G. (2022). Pediatric inpatient behavior scale: Manuals for the 47-item and 25-item versions [PDFs]. https://drk.sitehost.iu.edu/PIBSInfo.html
  • Kronenberger, W. G., Carter, B. D., & Thomas, D. (1997). Assessment of behavior problems in pediatric inpatient settings: Development of the pediatric inpatient behavior scale. Children’s Health Care, 26(4), 211–232. https://doi.org/10.1207/s15326888chc2604_1
  • Lachar, D., Wingenfeld, S. A., Kline, R. B., & Gruber, C. P. (2000). Student behavior survey. Western Psychological Services.
  • Ladd, G. W., & Profilet, S. M. (1996). The child behavior scale: A teacher-report measure of young children’s aggressive, withdrawn, and prosocial behaviors. Developmental Psychology, 32(6), 1008–1024. https://doi.org/10.1037/0012-1649.32.6.1008
  • LaFreniere, P. J., & Dumas, J. E. (1995). Social competence and behavior evaluation: Preschool edition (SCBE). Western Psychological Services.
  • LaFreniere, P. J., & Dumas, J. E. (1996). Social competence and behavior evaluation in children aged three to six: The short form (SCBE-30). Psychological Assessment, 8(4), 369–377. https://doi.org/10.1037/1040-3590.8.4.369
  • Lahey, B. B., & Piacentini, J. C. (1985). An evaluation of the Quay-Peterson revised behavior problem checklist. Journal of School Psychology, 23(3), 285–289. https://doi.org/10.1016/0022-4405(85)90021-4
  • Landoll, R. R., La Greca, A. M., Lai, B. S., Chan, S. F., & Herge, W. M. (2015). Cyber victimization by peers: Prospective associations with adolescent social anxiety and depressive symptoms. Journal of Adolescence, 42(1), 77–86. https://doi.org/10.1016/j.adolescence.2015.04.002
  • Leibenluft, E. (2011). Severe mood dysregulation, irritability, and the diagnostic boundaries of bipolar disorder in youths. American Journal of Psychiatry, 168(2), 129–142. https://doi.org/10.1176/appi.ajp.2010.10050766
  • Little, T. D. (2013). Longitudinal structural equation modeling. Guilford.
  • Little, T. D., Henrich, C. C., Jones, S. M., & Hawley, P. H. (2003). Disentangling the “whys” from the “whats” of aggressive behaviour. International Journal of Behavioral Development, 27(2), 122–133. https://doi.org/10.1080/01650250244000128
  • Lochman, J. E., Dishion, T. J., Powell, N. P., Boxmeyer, C. L., Qu, L., & Sallee, M. (2015). Evidence-based preventive intervention for preadolescent aggressive children: One-year outcomes following randomization to group versus individual delivery. Journal of Consulting and Clinical Psychology, 83(4), 728–735. https://doi.org/10.1037/ccp0000030
  • Lochman, J. E., Evans, S. C., Burke, J. D., Roberts, M. C., Fite, P. J., Reed, G. M., De la Peña, F. R., Matthys, W., Ezpeleta, L., Siddiqui S., & Garralda, M. E. (2015). An empirically based alternative to DSM-5‘s disruptive mood dysregulation disorder for ICD-11. World Psychiatry, 14(1), 30–33. https://doi.org/10.1002/wps.20176
  • Lochman, J. E., & Matthys, W. (2018). The Wiley handbook of disruptive and impulse-control disorders. Wiley.
  • Loth, A. K., Drabick, D. A., Leibenluft, E., & Hulvershorn, L. A. (2014). Do childhood externalizing disorders predict adult depression? A meta-analysis. Journal of Abnormal Child Psychology, 42(7), 1103–1113. https://doi.org/10.1007/s10802-014-9867-8
  • Lutz, M. N., Fantuzzo, J., & McDermott, P. (2002). Multidimensional assessment of emotional and behavioral adjustment problems of low-income preschool children: Development and initial validation. Early Childhood Research Quarterly, 17(3), 338–355. https://doi.org/10.1016/S0885-2006(02)00168-0
  • Marchette, L. K., & Weisz, J. R. (2017). Practitioner review: Empirical evolution of youth psychotherapy toward transdiagnostic approaches. Journal of Child Psychology and Psychiatry, 58(9), 970–984. https://doi.org/10.1111/jcpp.12747
  • Marsee, M. A., Barry, C. T., Childs, K. K., Frick, P. J., Kimonis, E. R., Muñoz, L. C., Aucoin, K. J., Fassnacht, G. M., Kunimatsu, M. M., & Lau, K. S. L. (2011). Assessing the forms and functions of aggression using self-report: Factor structure and invariance of the peer conflict scale in youths. Psychological Assessment, 23(3), 792–804. https://doi.org/10.1037/a0023369
  • Marsee, M. A. & Frick, P. J. (2007). Exploring the cognitive and emotional correlates to proactive and reactive aggression in a sample of detained girls. Journal of Abnormal Child Psychology, 35, 969–981.
  • Matthys, W., & Lochman, J. E. (2017). Oppositional defiant disorder and conduct disorder in childhood. Wiley.
  • Mazefsky, C. A., Conner, C. M., Breitenfeldt, K., Leezenbaum, N., Chen, Q., Bylsma, L. M., & Pilkonis, P. (2021). Evidence base update for questionnaires of emotion regulation and reactivity for children and adolescents. Journal of Clinical Child & Adolescent Psychology, 50(6), 683–707. https://doi.org/10.1080/15374416.2021.1955372
  • McDermott, P. A. (1993). National standardization of uniform multisituational measures of child and adolescent behavior pathology. Psychological Assessment, 5(4), 413–424. https://doi.org/10.1037/1040-3590.5.4.413
  • McKinney, C., & Morse, M. (2012). Assessment of disruptive behavior disorders: Tools and recommendations. Professional Psychology: Research and Practice, 43(6), 641. https://doi.org/10.1037/a0027324
  • McLeod, B. D., Jensen-Doss, A., Lyon, A. R., Douglas, S., & Beidas, R. S. (2022). To utility and beyond! Specifying and advancing the utility of measurement-based care for youth. Journal of Clinical Child & Adolescent Psychology, 1–14. https://doi.org/10.1080/15374416.2022.2124515
  • McMahon, R. J., & Frick, P. J. (2005). Evidence-based assessment of conduct problems in children and adolescents. Journal of Clinical Child and Adolescent Psychology, 34(3), 477–505. https://doi.org/10.1207/s15374424jccp3403_6
  • Merrell, K. W.(2002a). Preschool and kindergarten behavior scales (2nd ed.). PRO-ED.
  • Merrell, K. W.(2002b). School social behavior scales (2nd ed.). Paul H. Brookes Publishing Co.
  • Merrell, K. W., & Caldarella, P. (2002). Home & community social behavior scales. Paul H. Brookes Publishing Co.
  • Mikita, N., & Stringaris, A. (2013). Mood dysregulation. European Child & Adolescent Psychiatry, 22(1), 11–16. https://doi.org/10.1007/s00787-012-0355-9
  • Morey, L. C. (2007). Personality assessment inventory – adolescent. Psychological Assessment Resources.
  • Mulraney, M. A., Melvin, G. A., & Tonge, B. J. (2014). Psychometric properties of the affective reactivity index in Australian adults and adolescents. Psychological Assessment, 26(1), 148–155. https://doi.org/10.1037/a0034891
  • Nadeau, J. M., McBride, N. M., Dane, B. F., Collier, A. B., Keene, A. C., Hacker, L. E., Cavitt, M. A., Alvaro, J. L., & Storch, E. A. (2016). Psychometric evaluation of the rage outbursts and anger rating scale in an outpatient psychiatric sample. Journal of Child and Family Studies, 25(4), 1229–1234. https://doi.org/10.1007/s10826-015-0303-7
  • Nelson, W. M., & Finch, A. J. (2000). Children’s inventory of anger. Western Psychological Services.
  • Nixon, R. D., Sweeney, L., Erickson, D. B., & Touyz, S. W. (2004). Parent–child interaction therapy: One-and two-year follow-up of standard and abbreviated treatments for oppositional preschoolers. Journal of Abnormal Child Psychology, 32(3), 263–271. https://doi.org/10.1023/B:JACP.0000026140.60558.05
  • Novaco, R. W. (2003). The novaco anger scale and provocation inventory. Western Psychological Services.
  • Odgers, C. L., Caspi, A., Broadbent, J. M., Dickson, N., Hancox, R. J., Harrington, H., Poulton, R., Sears, M. R., Thomson, W. M., & Moffitt, T. E. (2007). Prediction of differential adult health burden by conduct problem subtypes in males. Archives of General Psychiatry, 64(4), 476–484. https://doi.org/10.1001/archpsyc.64.4.476
  • Odgers, C. L., Moffitt, T. E., Broadbent, J. M., Dick­ Son, N., Hancox, R. J., Harrington, H., Poulton, R., Sears, M. R., Thomson, W. M., & Caspi, A. (2008). Female and male antisocial trajectories: From childhood origins to adult outcomes. Development and Psychopathology, 20(2), 673–716. https://doi.org/10.1017/S0954579408000333
  • Olfson, M., Blanco, C., Wang, S., Laje, G., & Correll, C. U. (2014). National trends in the mental health care of children, adolescents, and adults by office-based physicians. JAMA Psychiatry, 71(1), 81–90. https://doi.org/10.1001/jamapsychiatry.2013.3074
  • Olweus, D. (2013). School bullying: Development and some important challenges. Annual Review of Clinical Psychology, 9(1), 751–780. https://doi.org/10.1146/annurev-clinpsy-050212-185516
  • Orpinas, P., & Frankowski, R. (2001). The aggression scale: A self-report measure of aggressive behavior for young adolescents. The Journal of Early Adolescence, 21(1), 50–67. https://doi.org/10.1177/0272431601021001003
  • Page, M. J., Moher, D., Bossuyt, P. M., Boutron, I., Hoffmann, T. C., Mulrow, C. D., Shamseer L., Tetzlaff J. M., Akl E. A., Brennan S. E., Chou R.,... McKenzie, J. E. (2021). PRISMA 2020 explanation and elaboration: Updated guidance and exemplars for reporting systematic reviews. BMJ, 372(160), 1–36.
  • Polanczyk, G. V., Salum, G. A., Sugaya, L. S., Caye, A., & Rohde, L. A. (2015). Annual research review: A meta-analysis of the worldwide prevalence of mental disorders in children and adolescents. Journal of Child Psychology and Psychiatry, 56(3), 345–365. https://doi.org/10.1111/jcpp.12381
  • Prinstein, M. J., Boergers, J., & Vernberg, E. M. (2001). Overt and relational aggression in adolescents: Social-psychological adjustment of aggressors and victims. Journal of Clinical Child & Adolescent Psychology, 30(4), 479–491. https://doi.org/10.1207/S15374424JCCP3004_05
  • PROMIS: Patient-Report Outcome Measurement Information System. (2016). Anger: A brief guide to PROMIS anger instruments. https://staging.healthmeasures.net/images/PROMIS/manuals/PROMIS_Anger_Scoring_Manual_10072016.pdf
  • Quay, H. C. (1983). A dimensional approach to behavior disorder: The revised behavior problem checklist. School Psychology Review, 12(3), 244–249. https://doi.org/10.1080/02796015.1983.12085039
  • Quay, H. C., & Peterson, D. R. (1983). Manual for the revised behavior problem checklist. University of Miami, Authors.
  • Querido, J. G., & Eyberg, S. M. (2003). Psychometric properties of the sutter-eyberg student behavior inventory-revised with preschool children. Behavior Therapy, 34(1), 1–15. https://doi.org/10.1016/S0005-7894(03)80018-7
  • Raine, A., Dodge, K., Loeber, R., Gatzke-Kopp, L., Lynam, D., Reynolds, C., Stouthamer-Loeber, M., & Liu, J. (2006). The reactive–proactive aggression questionnaire: Differential correlates of reactive and proactive aggression in adolescent boys. Aggressive Behavior, 32(2), 159–171. https://doi.org/10.1002/ab.20115
  • Reynolds, C. R., & Kamphaus, R. W. (2015). Behavior assessment system for children (3rd ed.). NCS Pearson.
  • Rothbart, M. K., Ahadi, S. A., Hershey, K. L., & Fisher, P. (2001). Investigations of temperament at three to seven years: The children’s behavior questionnaire. Child Development, 72(5), 1394–1408. https://doi.org/10.1111/1467-8624.00355
  • Rowe, R., Costello, E. J., Angold, A., Copeland, W. E., & Maughan, B. (2010). Developmental pathways in oppositional defiant disorder and conduct disorder. Journal of Abnormal Psychology, 119(4), 726–738. https://doi.org/10.1037/a0020798
  • Roy, A. K., Lopes, V., & Klein, R. G. (2014). Disruptive mood dysregulation disorder: A new diagnostic approach to chronic irritability in youth. American Journal of Psychiatry, 171(9), 918–924. https://doi.org/10.1176/appi.ajp.2014.13101301
  • Rutter, M. (1967). A children’s behaviour questionnaire for completion by teachers: Preliminary findings. Journal of Child Psychology and Psychiatry, 8(1), 1–11. https://doi.org/10.1111/j.1469-7610.1967.tb02175.x
  • Santisteban, D. A., Czaja, S. J., Nair, S. N., Mena, M. P., & Tulloch, A. R. (2017). Computer informed and flexible family-based treatment for adolescents: A randomized clinical trial for at-risk racial/ethnic minority adolescents. Behavior Therapy, 48(4), 474–489. https://doi.org/10.1016/j.beth.2016.11.001
  • Sauer-Zavala, S., Gutner, C. A., Farchione, T. J., Boettcher, H. T., Bullis, J. R., & Barlow, D. H. (2017). Current definitions of “transdiagnostic” in treatment development: A search for consensus. Behavior Therapy, 48(1), 128–138. https://doi.org/10.1016/j.beth.2016.09.004
  • Shytle, R. D., Silver, A. A., Sheehan, K. H., Wilkinson, B. J., Newman, M., Sanberg, P. R., & Sheehan, D. (2003). The Tourette’s Disorder Scale (TODS) development, reliability, and validity. Assessment, 10(3), 273–287. https://doi.org/10.1177/1073191103255497
  • Simonds, J., & Rothbart, M. K. (2004, October). The Temperament in Middle Childhood Questionnaire (TMCQ): A computerized self-report measure of temperament for ages 7-10 [Poster presentation]. Occasional Temperament Conference, Athens, GA, United States.
  • Smith, D. C., Furlong, M., Bates, M., & Laughlin, J. D. (1998). Development of the multidimensional school anger inventory for males. Psychology in the Schools, 35(1), 1–15. https://doi.org/10.1002/(SICI)1520-6807(199801)35:1<1:AID-PITS1>3.0.CO;2-U
  • Sorgi, P., Ratey, J., Knoedler, D. W., Markert, R. J., & Reichman, M. (1991). Rating aggression in the clinical setting a retrospective adaptation of the overt aggression scale: Preliminary results. Journal of Neuropsychiatry, 3(2), 552–556.
  • Spielberger, C. D. (1985). The experience and expression of anger: Construction and validation of an anger expression scale. In M. A. Chesney & R. H. Rosenman (Eds.), Anger and hostility in cardiovascular and behavioral disorders (pp. 5–30). Hemisphere/McGraw-Hill.
  • Spielberger, C. D. (1999). STAXI-2: State-trait anger expression inventory-2. Psychological Assessment Resources.
  • Steele, R. G., Legerski, J. P., Nelson, T. D., & Phipps, S. (2009). The anger expression scale for children: Initial validation among healthy children and children with cancer. Journal of Pediatric Psychology, 34(1), 51–62. https://doi.org/10.1093/jpepsy/jsn054
  • Steiner, H., & Remsing, L. (2007). Practice parameter for the assessment and treatment of children and adolescents with oppositional defiant disorder. Journal of the American Academy of Child & Adolescent Psychiatry, 46(1), 126–141. https://doi.org/10.1097/01.chi.0000246060.62706.af
  • Stepanova, E., Youngstrom, E. A., Langfus, J. A., Evans, S. C., Stoddard, J., Young, A. S., Van Eck, K., & Findling, R. L. (2022). Finding a needed diagnostic home for children with impulsive aggression. Clinical Child & Family Psychology Review, 26(1), 259–271. Manuscript submitted for publication; revision under review. https://doi.org/10.1007/s10567-022-00422-3
  • Stewart, S. L., Hirdes, J. P., Curtin-Telegdi, N., Perlman, C. M., McKnight, M., MacLeod, K., Ninan, A., Currie, M., & Carson, S. (2015). interRAI Child and Youth Mental Health (ChYMH) assessment form and user’s manual (9.3 ed.). interRAI.
  • Stoddard, J., Zik, J., Mazefsky, C. A., DeChant, B., & Gabriels, R. (2020). The internal structure of the aberrant behavior checklist irritability subscale: Implications for studies of irritability in treatment-seeking youth with autism spectrum disorders. Behavior Therapy, 51(2), 310–319. https://doi.org/10.1016/j.beth.2019.09.006
  • Stringaris, A., & Goodman, R. (2009). Longitudinal outcome of youth oppositionality: Irritable, headstrong, and hurtful behaviors have distinctive predictions. Journal of the American Academy of Child & Adolescent Psychiatry, 48(4), 404–412. https://doi.org/10.1097/CHI.0b013e3181984f30
  • Stringaris, A., Goodman, R., Ferdinando, S., Razdan, V., Muhrer, E., Leibenluft, E., & Brotman, M. A. (2012). The affective reactivity index: A concise irritability scale for clinical and research settings. Journal of Child Psychology and Psychiatry, 53(11), 1109–1117. https://doi.org/10.1111/j.1469-7610.2012.02561.x
  • Stringaris, A., Vidal-Ribas, P., Brotman, M. A., & Leibenluft, E. (2018). Practitioner review: Definition, recognition, and treatment challenges of irritability in young people. Journal of Child Psychology and Psychiatry, 59(7), 721–739. https://doi.org/10.1111/jcpp.12823
  • Sukhodolsky, D. G., Smith, S. D., McCauley, S. A., Ibrahim, K., & Piasecka, J. B. (2016). Behavioral interventions for anger, irritability, and aggression in children and adolescents. Journal of Child and Adolescent Psychopharmacology, 26(1), 58–64. https://doi.org/10.1089/cap.2015.0120
  • Toohey, M. J., & DiGiuseppe, R. (2017). Defining and measuring irritability: Construct clarification and differentiation. Clinical Psychology Review, 53, 93–108. https://doi.org/10.1016/j.cpr.2017.01.009
  • Tseng, W. L., Moroney, E., Machlin, L., Roberson-Nay, R., Hettema, J. M., Carney, D., Stoddard, J., Towbin, K. A., Pine, D. S., Leibenluft, E., & Brotman, M. A. (2017). Test-retest reliability and validity of a frustration paradigm and irritability measures. Journal of Affective Disorders, 212, 38–45. https://doi.org/10.1016/j.jad.2017.01.024
  • Van Meter, A. R., & Anderson, E. A. (2020). Evidence base update on assessing sleep in youth. Journal of Clinical Child & Adolescent Psychology, 49(6), 701–736. https://doi.org/10.1080/15374416.2020.1802735
  • Vaz, S., Falkmer, T., Passmore, A. E., Parsons, R., Andreou, P., & Hempel, S. (2013). The case for using the repeatability coefficient when calculating test–retest reliability. PLoS ONE, 8(9), e73990. https://doi.org/10.1371/journal.pone.0073990
  • Vernberg, E. M., Jacobs, A. K., & Hershberger, S. L. (1999). Peer victimization and attitudes about violence during early adolescence. Journal of Clinical Child Psychology, 28(3), 386–395. https://doi.org/10.1207/S15374424jccp280311
  • Vidal-Ribas, P., Brotman, M. A., Valdivieso, I., Leibenluft, E., & Stringaris, A. (2016). The status of irritability in psychiatry: A conceptual and quantitative review. Journal of the American Academy of Child & Adolescent Psychiatry, 55(7), 556–570. https://doi.org/10.1016/j.jaac.2016.04.014
  • Wakschlag, L. S., Choi, S. W., Carter, A. S., Hullsiek, H., Burns, J., McCarthy, K., Leibenluft, E., & Briggs-Gowan, M. J. (2012). Defining the developmental parameters of temper loss in early childhood: Implications for developmental psychopathology. Journal of Child Psychology and Psychiatry, 53(11), 1099–1108. https://doi.org/10.1111/j.1469-7610.2012.02595.x
  • Walker, T. M., Frick, P. J., & McMahon, R. J. (2020). Conduct and oppositional disorders. In E. A. Youngstrom, M. J. Prinstein, E. J. Mash, & R. A. Barkley (Eds.), Assessment of disorders in childhood and adolescence (pp. 132–158). Guilford.
  • Weinberger, D. A. (1994). Overview of the Weinberger Adjustment Inventory (WAI) [ Unpublished manuscript [PDF]]. Case Western Reserve University.
  • Weinberger, D. A. (1997). Distress and self-restraint as measures of adjustment across the life span: Confirmatory factor analyses in clinical and nonclinical samples. Psychological Assessment, 9(2), 132. https://doi.org/10.1037/1040-3590.9.2.132
  • Weisz, J. R., Thomassin, K., Hersh, J., Santucci, L. C., MacPherson, H. A., Rodriguez, G. M., Bearman, S. K., Lang, J. M., Vanderploeg, J. J., Marshall, T. M., Lu, J. J., Jensen-Doss, A., & Evans, S. C. (2020). Clinician training, then what? Randomized clinical trial of child STEPs psychotherapy using lower-cost implementation supports with versus without expert consultation. Journal of Consulting and Clinical Psychology, 88(12), 1065–1078. https://doi.org/10.1037/ccp0000536
  • Wikiversity. (2022). Evidence-based assessment: Assessment Center/Clinician resources. https://en.wikiversity.org/wiki/Evidence-based_assessment/Assessment_Center/Clinician_resources
  • Wilson, M. K., Cornacchio, D., Brotman, M. A., & Comer, J. S. (2022). Measuring irritability in early childhood: A psychometric evaluation of the affective reactivity index in a clinical sample of 3- to 8-year-old children. Assessment, 29(7), 1473–1481. https://doi.org/10.1177/10731911211020078
  • Woods-Groves, S. (2015). The human behavior rating scale–brief: A tool to measure 21 century skills of K–12 learners. Psychological Reports, 116(3), 769–796. https://doi.org/10.2466/03.11.PR0.116k29w0
  • World Health Organization. (2022). International classification of diseases (eleventh edition; ICD-11): Mental, behavioural, and neurodevelopmental disorders. https://icd.who.int/en
  • Youngstrom, E. A., Findling, R. L., & Calabrese, J. R. (2003). Who are the comorbid adolescents? Agreement between psychiatric diagnosis, youth, parent, and teacher report. Journal of Abnormal Child Psychology, 31(3), 231–245. https://doi.org/10.1023/A:1023244512119
  • Youngstrom, E. A., Van Meter, A., Frazier, T. W., Hunsley, J., Prinstein, M. J., Ong, M. L., & Youngstrom, J. K. (2017). Evidence-based assessment as an integrative model for applying psychological science to guide the voyage of treatment. Clinical Psychology Science & Practice, 24(4), 331–363.
  • Yudofsky, S. C., Silver, J. M., Jackson, W., Endicott, J., & Williams, D. (1986). The overt aggression scale for the objective rating of verbal and physical aggression. The American Journal of Psychiatry, 143(1), 35–39.
  • Zeman, J. L., Cassano, M., Suveg, C., & Shipman, K. (2010). Initial validation of the children’s worry management scale. Journal of Child and Family Studies, 19(4), 381–392. https://doi.org/10.1007/s10826-009-9308-4
  • Zeman, J., Shipman, K., & Suveg, C. (2002). Anger and sadness regulation: Predictions to internalizing and externalizing symptoms in children. Journal of Clinical Child and Adolescent Psychology, 31(3), 393–398. https://doi.org/10.1207/S15374424JCCP3103_11
  • Zik, J., Deveney, C. M., Ellingson, J. M., Haller, S. P., Kircanski, K., Cardinale, E. M.,... Stoddard, J. (2022). Understanding irritability in relation to anger, aggression, and informant in a pediatric clinical population. The Journal of the American Academy of Child & Adolescent Psychiatry, 61(5), 711–720.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.