Abstract

Objective

Suicide and self-harm are widespread yet underreported. Risk assessment is key to effective self-harm and suicide prevention and management. There is contradicting evidence regarding the effectiveness of risk assessment tools in predicting self-harm and suicide risk. This systematic review examines the effect of risk assessment strategies on predicting suicide and self-harm outcomes among adult healthcare service users.

Method

Electronic and gray literature databases were searched for prospective research. Studies were screened and selected by independent reviewers. Quality and level of evidence assessments were conducted. Due to study heterogeneity, we present a narrative synthesis under three categories: (1) suicide- and self-harm-related outcomes; (2) clinician assessment of suicide and self-harm risk; and (3) healthcare utilization due to self-harm or suicide.

Results

Twenty-one studies were included in this review. The SAD PERSONS Scale was the most used tool. It outperformed the Beck Scale for Suicide Ideation in predicting hospital admissions and stay following suicide and self-harm, yet it failed to predict repeat suicide and self-harm and was not recommended for routine use. There were mixed findings relating to clinician risk assessment, with some studies recommending clinician assessment over structured tools, whilst others found that clinician assessment failed to predict future attempts and deaths.

Conclusions

There is insufficient evidence to support the use of any one tool, inclusive of clinician assessment of risk, for self-harm and suicidality. The discourse around risk assessment needs to move toward a broader discussion on the safety of patients who are at risk for self-harm and/or suicide.

    HIGHLIGHTS

  • There is insufficient evidence to support using standalone risk assessment tools.

  • There are mixed findings relating to clinician assessment of risk.

  • Structured professional judgment is widely accepted for risk assessment.

INTRODUCTION

Suicide and self-harm tend to be under-reported, underappreciated, and affect every country and society worldwide (Oyesanya, Lopez-Morinigo, & Dutta, Citation2015; Pritchard & Hansen, Citation2015). It is estimated that 800,000 individuals die by suicide each year and many more utilize healthcare services for self-harm (World Health Organization, Citation2019). These figures may be underestimated due to legal, societal, and cultural taboos surrounding suicide and self-harm (Centers for Disease Control and Prevention, 2010). For instance, in the United States of America (USA), self-harm data are not collated centrally; however, the Centers for Disease Control and Prevention collect survey data, as well as hospital data on non-fatal injuries from self-harm. In 2015—the most recent year for which data are available—approximately 575,000 people attended a hospital for injuries due to self-harm (American Foundation for Suicide Prevention, Citation2020; Centers for Disease Control a Prevention, Citation2010). In risk assessment, utilizing near-miss information is key in preventing seminal or serious adverse events (Jeffs, Berta, Lingard, & Baker, Citation2012).

Risk screening and risk assessment have been identified as important components of effective self-harm and suicide management (Boudreaux et al., Citation2016; Jobes, Citation2012). Risk screening refers to the use of standardized instruments to identify at-risk individuals, whereas risk assessment refers to a more comprehensive evaluation to confirm suspected suicide and self-harm risk, estimate the immediate danger to the individual, and decide on risk management strategies (Suicide Prevention Resource Center, Citation2014). One study found that greater risk screening in emergency departments was associated with a significant increase in the detection of suicidal ideation (Boudreaux et al., Citation2016). Studies indicate that people who die by suicide have had contact with primary care services, emergency services and, to a lesser extent, mental health services in the month prior to their death (King, Horwitz, Czyz, & Lindsay, Citation2017; Luoma, Martin, & Pearson, Citation2002; Vasiliadis, Ngamini-Ngui, & Lesage, Citation2015). Therefore, universal self-harm and suicide risk screening and assessment were recommended across various healthcare settings, including primary care, specialty medical care, and emergency departments (King et al., Citation2017). Notwithstanding, there is no gold standard for suicide and self-harm risk assessment which tend to vary globally (Vasiliadis et al., Citation2015). For instance, in the USA, the US Preventive Services Task Force (Citation2014) concluded that “current evidence is insufficient to assess the balance of benefits and harms of screening for suicide risk in adolescents, adults, and older adults in primary care.” However, tools like Suicide Risk Screen, the Patient Health Questionnaire (PHQ), the SAFE-T tool, and the Columbia-Suicide Severity Rating Scale (C-SSRS) remain widely used in various healthcare settings in the USA (O’Rourke, Jamil, & Siddiqui, Citation2021).

In the international literature, a number of risk assessment tools have been used to measure self-harm and suicide risk such as the SAD PERSONS (SPS) and modified SPS (Chang & Tan, Citation2015); the Beck Suicide Intent Scale (SIS) (Jordan & McNiel, Citation2018); the Beck Scale for Suicide Ideation (SSI) (de Beurs et al., Citation2015); Manchester Self-Harm Rule (Quinlivan et al., Citation2017); the ReACT Self-Harm Rule (Quinlivan et al., Citation2017); among others. Previous literature reviews concluded that the available assessment tools did not reliably predict future risk of suicide (Runeson et al., Citation2017), repeat self-harm (Quinlivan et al., Citation2017), or suicide following self-harm (Chan et al., Citation2016). Tools often performed well in terms of sensitivity or specificity but seldom both (Quinlivan et al., Citation2017; Runeson et al., Citation2017). To put this in context, for example, an assessment tool with a sensitivity of 85% will detect 85 out of every 100 individuals with the outcome, whereas 15 will be missed (i.e., false negatives). Similarly, an assessment tool with a specificity of 70% indicates that for every 100 individuals without the outcome, 30 will be wrongly categorized as having a risk for the outcome (i.e., false positives) (Bossuyt et al., Citation2008).

Risk assessment tools could potentially incorrectly identify people as having high risk, impacting resource usage, or conversely, may fail to identify individuals who are at high risk, compromising patient safety (Chan et al., Citation2016; Quinlivan et al., Citation2017; Runeson et al., Citation2017). The previous generation approach to risk assessment, including unstructured clinician risk assessment, has been recently evaluated in terms of predicting future risk of self-harm and was also found to be potentially inaccurate for clinical use (Woodford et al., Citation2019).

In the past, six systematic reviews (Chan et al., Citation2016; O’Shea & Dickens, Citation2014; Quinlivan et al., Citation2016; Runeson et al., Citation2017; Warden, Spiwak, Sareen, & Bolton, Citation2014; Woodford et al., Citation2019) and one narrative review (Thom, Hogan, & Hazen, Citation2020) evaluated how well multiple risk assessment tools predicted future suicide or self-harm in clinical practice. These reviews concluded that no single risk assessment tool was found to have enough evidence to support its routine use in clinical practice.

Some of the past reviews were limited by their focus on single instruments such as SPS (Warden et al., Citation2014), Short Term Assessment of Risk and Treatability (START) (O’Shea & Dickens, Citation2014), and unstructured clinician risk assessments (Woodford et al., Citation2019). Previous reviews also focused either on self-harm alone (Chan et al., Citation2016; Quinlivan et al., Citation2016; Woodford et al., Citation2019), suicide alone (Runeson et al., Citation2017; Warden et al., Citation2014; Thom et al., Citation2020), but seldom both (O’Shea & Dickens, Citation2014). From a methodological perspective, a number of past literature reviews did not provide a structured approach to searching the gray literature (Chan et al., Citation2016; Runeson et al., Citation2017; Thom et al., Citation2020; Warden et al., Citation2014), included studies published up until early 2014 (Chan et al., Citation2016; O’Shea & Dickens, Citation2014; Warden et al., Citation2014), and failed to address the methodological quality, level of evidence, and/or risk of bias within the reviewed studies (Thom et al., Citation2020).

For all the above reasons, a more up-to-date review of the empirical and gray literature would provide information on effective methods of suicide as well as self-harm risk assessment to identify those at risk of suicide and self-harm and ultimately offer appropriate support. Therefore, the aim of this systematic review was to examine the effect of risk assessment strategies on predicting suicide and self-harm outcomes among adult healthcare service users, with a focus on (i) suicide and self-harm related outcomes; (ii) clinician assessment of risk outcomes; and (iii) healthcare utilization outcomes.

METHODS

This systematic review was guided by the principles of conducting systematic reviews (Centre for Reviews and Dissemination, 2009), and reported according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) checklist (Moher, Liberati, Tetzlaff, Altman, & Prisma, Citation2009).

Eligibility Criteria

Review eligibility criteria were pre-determined using the PEO (Population, Exposure[s], and Outcome[s]) framework (Moola et al., Citation2015). Studies eligible for inclusion met the following criteria: Population: included adult (≥18 years of age) patients or service users who have a history of suicide or self-harm within any healthcare setting, including those with a history of any psychiatric and/or physical disorders which put them at risk for suicide and/or self-harm or repeat suicide and/or self-harm. Of note, in the context of the current review, repeat self-harm refers to individuals who have self-harmed in the past and present again with another episode of self-harm (Quinlivan et al., Citation2017); Exposure: involved the use of one or more instrument(s) to assess the risk of suicide or self-harm; and Outcome: followed service users up for varying lengths of time in order to evaluate the ability of risk assessment instruments to predict suicidal or self-harming ideations, suicide or self-harm attempts/behaviors, and death by suicide or self-harm. Notably, suicide and self-harm are related but not synonymous. Self-harm, also referred to as self-injury is defined as direct and deliberate harm to one’s body often without intent to die. On the other hand, suicidal attempts and behaviors are often linked to an intention to cause death (Cipriano, Cella, & Cotrufo, Citation2017). Suicidality and self-harm have different prevalence rates, functions, clinical correlates, and outcomes yet they are often measured using the same instruments (Klonsky, May, & Saffer, Citation2016). Therefore, this review will explore and capture the risk assessment for both, suicide and self-harm intentions and behaviors.

Studies conducted among pediatric patients (<18 years of age), in non-healthcare settings, and focusing on interventions for self-harm or suicide prevention or management were excluded. Literature reviews, surveys, qualitative studies, policy documents, dissertations, conference proceedings, commentaries, and editorials were also excluded.

Information Sources and Search

The following electronic databases were searched: CINAHL; MEDLINE; APA PsycINFO; APA PsycARTICLES; Psychology and Behavioral Science Collection; ERIC; SocINDEX; and The Cochrane Library. Subject headings were used where appropriate and combined using Boolean operators “OR” and “AND,” the proximity indicator for EBSCO “N,” and truncation “*.” The search was conducted on title or abstract as follows: Self-harm* OR “self harm*” OR self-poison* OR “self poison*” OR self-injur* OR “self injur*” OR self-mutilat* OR “self mutilat*” OR parasuicid* OR suicid* OR “suicid* idea*” OR DSH AND (risk N5 assess*) OR (risk N5 manag*).

A focused gray literature search was carried out and included customized Google and targeted website searches. This search was designed to source records from Australia, Canada, Ireland, New Zealand, the United Kingdom (UK), and USA. These countries were selected since they have similar health systems and infrastructure (Hegarty et al., Citation2020; United Nations Development Programme, Citation2019). Six separate Google searches were conducted within these countries using the terms “suicide,” “self-harm,” and “risk” and the domains of the selected countries. The first ten pages, or 100 hits, were reviewed to capture the most relevant hits (Godin, Stapleton, Kirkpatrick, Hanning, & Leatherdale, Citation2015). Targeted websites included ministries of health and national organizations involved in suicide prevention in each of the selected countries (see Table S1 in supplemental file for the full list of websites). Electronic database and gray literature searches were last conducted in August 2019. All the searches were limited to records published in English between January 2014 and August 2019.

Study Selection

Records from electronic database and gray literature searches were exported to a reference management software (EndNote 7) and duplicates were deleted. Records were then transferred to Covidence, an online software package recommended by Cochrane to produce systematic reviews (Cochrane Community, Citation2020). Records were initially screened on title and abstract for relevance. The full texts of potentially eligible records were subsequently obtained and reviewed. Title, abstract, and full-text screenings were conducted independently by members of the review team. Screening conflicts were resolved by a third independent reviewer.

Data Extraction and Synthesis

The following were extracted for each study using a standardized data extraction table: Reference; country; design; sample; setting; instrument; follow-up; outcome; and findings (see Table S2 in supplemental file for the full data extraction table). Two reviewers conducted data extraction and each extracted study was cross-checked by a third reviewer for accuracy. Studies were synthesized to address the review aims and outcomes. A meta-analysis was not completed due to the use of several tools in single studies; adapted/shortened versions of tools; different cutoff scores to predict the risk of suicide/self-harm across different studies/groups; and various methodological approaches to measuring risk.

Quality and Level of Evidence Assessment

The Joanna Briggs Institute’s (2017) critical appraisal checklist for cohort studies was used to determine whether individual studies have addressed potential biases in design, conduct, and analysis. The Scottish Intercollegiate Guidelines Network (SIGN) grading system was then used to assess the level of evidence for each of the included studies based on its design and quality (Healthcare Improvement Scotland, Citation2019). The eight levels of evidence range between 1++, 1+, 1−, 2++, 2+, 2−, 3, and 4. A score of 1++ corresponds to high quality meta-analyses, systematic reviews of randomized controlled trials, or randomized controlled trials with a very low risk of bias, whereas a score of 4 is assigned to expert opinions. Studies were included regardless of quality and level of evidence to minimize the risk of reporting bias.

TABLE 2. Quality appraisal and level of evidence assessment (n = 21).

RESULTS

Study Selection

Electronic database searching yielded 1,939 records. Following deletion of duplicates, 1,932 records were screened based on title and abstract and 1,642 irrelevant records were excluded. The full texts of 290 records were reviewed and 270 records were excluded, resulting in 20 studies that were included from electronic databases. A total of 1,912 records were identified from the gray literature search. Titles and abstracts of 1,902 records were screened and 1,814 irrelevant records were excluded. Of the full texts screened (n = 88), only one study was eligible for inclusion. Therefore, a total of 21 studies were included in this review. See for the study identification, screening, and selection process.

FIGURE 1. Study identification, screening, and selection process.

FIGURE 1. Study identification, screening, and selection process.

Study Characteristics

Study characteristics are summarized in . Most of the studies were conducted in the USA (n = 9) and the UK (n = 6) using a prospective cohort design (n = 17). Sample sizes ranged between 50 (Chang & Tan, Citation2015) and 5,462 (Katz et al., Citation2017) participants. More than half of the reviewed studies were conducted in emergency departments (n = 7) and acute care settings (n = 4). Several instruments were used to assess suicide, with 13 studies using more than one instrument. The most frequently used instruments included SPS and modified SPS (n = 6), the Beck SSI (n = 4), and the Beck Hopelessness Scale (BHS) (n = 3). Follow-up times varied between 2 weeks (Chang & Tan, Citation2015) and 20 years (Green et al., Citation2015; Stefansson, Nordström, Runeson, Åsberg, & Jokinen, Citation2015) with almost half of the studies (n = 10) reporting a 6-month follow-up.

TABLE 1. Study characteristics (n = 21).

Quality and Level of Evidence Assessment

All 21 studies used valid exposure measures and reliable outcome measures, but 10 failed to adequately identify or address potential confounders. All the studies were observational and all, but one rated as level 2+ on the SIGN level of evidence criteria, indicating well-conducted cohort studies with a low risk of confounding or bias and a moderate probability that the relationship is causal. Quality and level of evidence assessment are outlined in .

Synthesis of Results

Most studies used estimates of sensitivity and specificity or areas under the curve (AUC) to indicate the predictive validity of the risk assessment tools. Outcomes measured in the 21 studies are divided into three categories: (i) suicide and self-harm-related outcomes; (ii) clinician assessment of suicide and self-harm risk; and (iii) outcomes related to the number or frequency of episodes of healthcare utilization due to self-harm or suicide. A summary of findings from individual studies is presented in .

TABLE 3. Summary of findings from individual studies (n = 21).

Suicide and Self-Harm Related Outcomes

Across the six studies that evaluated SPS, or modified SPS, sensitivity for repeat self-harm ranged widely from 1% (Quinlivan et al., Citation2017) to 65% (Wu et al., Citation2014), while specificity for repeat self-harm ranged from 7% (Saunders, Brand, Lascelles, & Hawton, Citation2014) to 58% (Wu et al., Citation2014). Wu et al. (Citation2014) found the Chinese SPS useful in identifying high-risk individuals. However, five other studies did not support the use of SPS to predict suicide or repeat self-harm and recommended against using SPS to screen patients presenting to hospitals with self-harm (de Beurs et al., Citation2015; Katz et al., Citation2017; King et al., Citation2017; Saunders et al., Citation2014; Wang et al., Citation2016).

While de Beurs et al. (Citation2015) found that most items on the Beck SSI were significant predictors of a repeat suicide attempt within 15 months (p < 0.05), Wu et al. (Citation2014), using AUCs, found that the Beck SSI performed significantly poorer than the Chinese SPS in predicting repeat self-harm within 6 months (Chinese SPS: AUC = 0.66, p = 0.02; Beck SSI: AUC = 0.59, p = 0.18). Green et al. (Citation2015) reported that the Beck Depression Inventory (BDI) performed better than the Beck SSI at predicting suicide and repeat attempts (sensitivity of 81% vs 53% respectively); however, in this study, the Beck SSI was found to be better than the BDI in correctly identifying true negatives (specificity of 83% vs 54%, respectively). Given that the BDI suicide item was associated with the risk of repeat suicide attempts and death by suicide, this tool was recommended for use in routine clinical care, coupled with comprehensive clinician suicide risk assessment for a positive screen (Green et al., Citation2015).

The Beck SIS was used in two studies either alone (Jordan & McNiel, Citation2018), or with the Karolinska Interpersonal Violence Scale (Stefansson et al., Citation2015). When used alone, the Beck SIS predicted subsequent suicide attempts with 61.54% sensitivity, 56.91% specificity, 37.65% positive predictive value, and 77.78% negative predictive value (AUC = 0.43, 95%CI 0.34–0.58) (Jordan & McNiel, Citation2018). Another study found that Beck SIS alone had 52% specificity and 17% positive predictive value; however, when used together with the Karolinska Interpersonal Violence Scale, sensitivity was increased at 83%, specificity at 80%, and positive predictive value at 26% (Stefansson et al., Citation2015).

The Historical, Clinical and Risk (HCR-20) Management scale was evaluated in two studies (Campbell & Beech, Citation2018; O’Shea, Picchioni, Mason, Sugarman, & Dickens, Citation2014). It was found that higher mean total scores on HCR-20 were associated with more frequent self-harm (p < 0.001) (Campbell & Beech, Citation2018); however, effect sizes were not large enough (0.345–0.749) to support the use of HCR-20 in practice (O’Shea et al., Citation2014).

Madan et al. (Citation2016) reported findings that provide some support for the reliability and validity of the C-SSRS related to its potential to correctly predict suicide-related behavior (p < 0.01). The authors recommended using the total C-SSRS score and the summary score from the ideation/behavior factor together in order to find the best balance between sensitivity (69%) and specificity (65–67%) (Madan et al., Citation2016). The Columbia Classification Algorithm for Suicide Assessment scale (C-CASA) was found in one study to be moderately accurate at predicting suicide attempts (AUC = 0.666) and deaths from suicide (AUC = 0.678) (Randall, Sareen, Chateau, & Bolton, Citation2019).

In two studies, the self-report versions of Concise Health Risk Tracking (CHRT) showed good internal consistency and were strongly correlated with subsequent suicide risk (Reilly-Harrington et al., Citation2016; Villegas et al., Citation2018). In one study, the likelihood of a suicide-related event increased by 76% for every 10-point increase in baseline self-report CHRT scores (Reilly-Harrington et al., Citation2016). CHRT scores were also shown to be highly correlated with clinician ratings of depression, anxiety, and overall functioning. Therefore, the CHRT was recommended as a quick and robust self-report tool for assessing suicide risk. Similarly, Hawes, Yaseen, Briggs, and Galynker (Citation2017) found a significant correlation between Modular Assessment of Risk for Imminent Suicide (clinician- and self-report tool) score and lifetime suicide attempts (rho = 0.30, p = 0.005), depression (rho = 0.46, p < 0.001), lifetime suicidal ideations (rho = 0.25, p = 0.023), and suicidal ideations in the past month (rho = 0.35, p = 0.001). In addition, those who attempted suicide were found to have higher scores than those who did not (Mean difference = 15.69,2.96; Cohen's d = 1.54,0.77; U = 33,119.5; p = 0.001,0.036, respectively).

Using the 5 Minnesota Multiphasic Personality Inventory–2–Restructured Form, Suicidal/Death Ideation (SUI) items, it was found that the SUI scale demonstrated statistically significant associations (p < 0.05), with interview-reported history of suicide attempts (r = 0.35) and the total number of suicidal behaviors within one year of testing (r = 0.28) (Glassmire, Tarescavage, Burchett, Martinez, & Gomez, Citation2016). Moreover, Glassmire et al. (Citation2016) found that endorsing SUI items was significantly associated with greater risk for suicide. This supports the use of SUI-item endorsement and interview-reported risk information as predictors for future suicide.

In terms of survival rate post-self-harm among different risk groups, the use of START yielded survival rates that differed significantly between groups rated as low- and moderate-risk (p < 0.001), and between low- and high-risk groups (p < 0.001) but did not between moderate- and high-risk groups (p = 0.207) (Dickens & O’Shea, Citation2015).

Clinician Assessment of Risk

There were mixed findings relating to clinician assessment of risk. Quinlivan et al. (Citation2017) evaluated the performance of multiple tools (Manchester Self-Harm Rule, ReACT Self-Harm Rule, SPS, modified SPS, and Barratt Impulsiveness Scale) in comparison to clinician estimates of risk following self-harm. AUCs ranged from 0.55 (95%CI 0.50–0.61) for SPS to 0.74 (95%CI 0.69–0.79) for the clinician global estimation of risk scale, indicating that this scale performed better than the SPS in estimating risk for repeat self-harm. The remaining scales performed significantly worse, in comparison to clinician estimates. Similarly, Wang et al. (Citation2016) found that clinicians were able to predict future attempts with significantly greater accuracy in comparison to SPS (p < 0.001).

In contrast, Harrison, Stritzke, Fay, and Hudaib (Citation2018) reported that clinician prediction did not significantly predict future attempts at three- and six-month follow-up (p = 0.16 and p = 0.30, respectively), despite significantly predicting suicidal ideations at both timepoints (p = 0.049 and p = 0.011, respectively). Another study found that, while clinician assessment of risk was moderately accurate at predicting future suicide attempts (AUC = 0.728, 95%CI 0.66–0.79), it was not effective at predicting deaths from suicide (AUC = 0.546, 95%CI 0.36–0.73) (Randall et al., Citation2019). Moreover, clinician assessment was not significantly better at assessing the risk of suicide in comparison to the C-CASA classification system. Likewise, the Convergent Functional Information for Suicidality tool had the best diagnostic accuracy (AUC = 0.81, 95%CI 0.76–0.87) in comparison to clinician prediction of risk, which had modest diagnostic accuracy (Randall et al., Citation2019).

Only one study conducted analyses by level of clinician training (Wang et al., Citation2016). It was found that clinicians’ ability to predict future suicidal attempts with greater accuracy as compared to traditional risk assessment instruments was linked to their level of seniority, with senior psychiatric residents and staff psychiatrists demonstrating greater accuracy than junior psychiatric residents (AUC = 0.78 vs 0.76 respectively, p < 0.001).

Healthcare Utilization Outcomes

Chang and Tan (Citation2015) investigated the ability of C-SSRS, Beck SSI, SPS, and the Patient Health Questionnaire 9 (PHQ-9) to predict adverse events in the emergency department, following a presentation for suicidal ideation. They found that SPS was significantly better at predicting hospital admission (p = 0.009) and stay (p = 0.006) but not near-term adverse events in the emergency department. These included the “need for unscheduled psychiatric or sedating medications, physical restraints, or security staff intervention” (Chang & Tan, Citation2015; p.1681). The remaining instruments demonstrated poor predictive value for adverse events in the emergency department and psychiatric admissions. Likewise, Saunders et al. (Citation2014) found that SPS failed to identify most patients who presented to the emergency department following self-harm and went on to require psychiatric hospital admission or community psychiatric aftercare. Both studies concluded that currently available suicide risk assessment tools should not be routinely used in the emergency department to identify those at greatest risk (Chang & Tan, Citation2015; Saunders et al., Citation2014).

DISCUSSION

This systematic review examined the effect of suicide and self-harm risk assessment tools on predicting suicide and self-harm outcomes among adult healthcare service users. Overall, limited evidence was found to support the use of standalone risk assessment tools in healthcare settings. Of the 21 included studies, six evaluated SPS or modified SPS. All studies, except for one (Wu et al., Citation2014), advised against the use of SPS to screen patients presenting to hospitals with self-harm. Various other scales were evaluated including the Beck SSI, the Beck SIS, BDI, HCR-20, C-SSRS, C-CASA and CHRT scales, with promising, yet limited and weak evidence relating to their sensitivity and specificity. It was also found that combining two or more risk assessment tools was more effective than using a single tool (Glassmire et al., Citation2016; Reilly-Harrington et al., Citation2016; Stefansson et al., Citation2015), and that self-report measures can be potentially effective in predictive future suicide and self-harm (Glassmire et al., Citation2016; Reilly-Harrington et al., Citation2016; Villegas et al., Citation2018). Furthermore, studies measuring healthcare utilization outcomes advised against using suicide risk assessment tools such as SPS, the Beck SSI, PHQ-9, and C-SSRS routinely in emergency departments (Chang & Tan, Citation2015; Saunders et al., Citation2014).

Findings from the seven reviews discussed in the introduction support findings from our current review (Chan et al., Citation2016; O’Shea & Dickens, Citation2014; Quinlivan et al., Citation2016; Runeson et al., Citation2017; Thom et al., Citation2020; Warden et al., Citation2014; Woodford et al., Citation2019). Overall, there was insufficient evidence to support the use of SPS and START in assessing or predicting suicidal behavior (O’Shea & Dickens, Citation2014; Warden et al., Citation2014). In fact, SPS and modified SPS repeatedly failed to identify patients requiring psychiatric admission or community psychiatric aftercare, predict repetition of self-harm, and accurately predict future suicide attempts (Bolton, Spiwak, & Sareen, Citation2012; Stefansson et al., Citation2015). Therefore, SPS was judged as not being of clinical value and should not be used alone to assess for self-harm risk in acute care. It was also found that unstructured clinician risk assessment was too inaccurate to be clinically useful, and that after-care should be allocated based on a need rather than risk assessment (Woodford et al., Citation2019).

Structured professional judgment is a widely accepted approach to clinical risk assessment and management (Fagan et al., Citation2009). It is considered as the third generation of risk assessment, combining unstructured clinical judgment (first generation) and actuarial assessment (second generation) (Higgins, Morrissey, Doyle, Bailey, & Gill, Citation2015). Structured clinical judgment frameworks can assist practitioners in moving beyond the use of intuition and risk assessment tools; however, such frameworks are not elaborated upon in detail to provide sound clinical guidance for practitioners (Higgins et al., Citation2015).

A recent review by Hanratty, Kilicaslan, Wilding, and Castle (Citation2019) found limited evidence regarding the effectiveness of Collaborative Assessment and Management of Suicidality in reducing suicide risk and deliberate self-harm in adults. However, evidence from the present review was divided between studies favoring clinician assessment of risk (Quinlivan et al., Citation2017; Wang et al., Citation2016), and others where clinician assessment of risk did not significantly predict suicide attempts (Brucker et al., Citation2019; Harrison et al., Citation2018), or death from suicide (Randall et al., Citation2019).

Implications and Recommendations

Data to support the utilization of risk assessment tools and their impact on predicting suicide and self-harm are sparse, therefore the use of risk assessment tools in isolation as a predictor needs to be recognized. Indeed, no one scale was found to have sufficient evidence to support its use in clinical practice. It is argued that contemporary discourse in the patient safety literature on risk assessment tools needs to shift to reflect this lack of empirical evidence. The focus on risk assessment tools may be deterring the development of sound clinical judgment frameworks. Furthermore, risk assessment without the development and implementation of clinical judgment frameworks is an arbitrary practice and a shift in paradigm across all healthcare sectors is needed. Kapur and Goldney (Citation2019) argue that clinicians need to urgently recognize the “fallacy” of risk assessment, recognizing that assessment tools are more likely to be serving the organization instead of the patient.

While not meeting the criteria for inclusion in this systematic review, a number of best practice and policy guidelines for the assessment of risk were sourced from the gray literature search. Overall, it was clear that for any recommendations relating to the assessment of suicide and self-harm risk to be implemented, a whole system, multi-agency, and collaborative approach is needed (Department of Health & Human Services, Citation2016; Health Service Executive, Citation2017; Queensland Mental Health Commission, Citation2015; Ridani et al., Citation2016; SANE Australia, Citation2014; Welsh Government, Citation2015). However, while these recommendations were made in the policy and guidance documents internationally, there was a clear lack of specificity as to how to implement the recommendations in practice. In addition, no single model of risk assessment was discussed in more than one document, which supports findings from our review.

It is recommended that research needs to move beyond trying to determine the efficacy of risk assessment tools as predictors of self-harm and suicide. As corroborated by this latest review, there is insufficient evidence to support the use of risk assessment tools as a standalone assessment method.

Strengths and Limitations

To the best of the authors’ knowledge, this is the most up-to-date systematic review to evaluate and compare various suicide and self-harm risk assessment tools, inclusive of clinician assessment of risk. Rigor was sought in the conduct and reporting of this review and studies were sourced from various electronic databases and the gray literature. Moreover, record screening, data extraction, and quality appraisal were cross-checked by independent reviewers to ensure accuracy and minimize the risk of reporting bias.

Given this review is limited to prospective studies, some publications may have been missed. Most of the included studies were well-conducted cohort studies with a low risk of confounding or bias. However, quality appraisal of the included studies determined that, while all studies used valid exposure and reliable outcome measures, almost half of the studies inadequately identified or addressed potential confounders. The challenges of appropriate confounding control are particularly problematic in such studies, as exposure is established by a complex interaction between various patient, physician, and healthcare system factors and information (Brookhart, Stürmer, Glynn, Rassen, & Schneeweiss, Citation2010). While outcomes in some of the reviewed studies were described as self-harm, it was unclear how this term was operationalized and whether there were any distinctions made between suicidal and non-suicidal self-harm.

CONCLUSION

Findings from this systematic review indicate that there is insufficient evidence to support the use of any one clinical risk assessment tool, inclusive of clinician assessment of risk, for self-harm and suicidality in clinical settings. This review also found limited evidence pertaining to the effect of risk assessment on healthcare utilization due to self-harm or suicide. As such, it is timely that the discourse in relation to risk assessment moves toward a broader discussion on the safety of patients who have suicidal ideation and those who attempt self-harm or suicide. Findings from this review underscore the need to develop and evaluate clinical judgment frameworks that are evidence-based, and responsive to individual patient needs.

AUTHOR NOTES

Mohamad M. Saab, Margaret Murphy, and Elaine Meehan, Catherine McAuley School of Nursing and Midwifery, University College Cork, Cork, Ireland. Christina B. Dillon, Environmental Research Institute/School of Public Health, University College Cork, Cork, Ireland. Selena O’Connell, Josephine Hegarty, and Sinead Heffernan, Catherine McAuley School of Nursing and Midwifery, University College Cork, Cork, Ireland. Sonya Greaney, Southern Area Health Service Executive, Cork, Ireland. Caroline Kilty, John Goodwin, Irene Hartigan, and Maidy O’Brien, Catherine McAuley School of Nursing and Midwifery, University College Cork, Cork, Ireland. Derek Chambers and Una Twomey, Southern Area Health Service Executive, Cork, Ireland. Aine O’Donovan, Catherine McAuley School of Nursing and Midwifery, University College Cork, Cork, Ireland.

Supplemental material

Supplemental Material

Download MS Word (43.5 KB)

ACKNOWLEDGEMENTS

The authors would like to thank members of the Connecting for Life project working group steering group in Ireland.

DISCLOSURE STATEMENT

There are no relevant financial or non-financial competing interests to report.

DATA AVAILABILITY STATEMENT

The authors confirm that the data supporting the findings of this review are available within the article and its supplementary materials.

Additional information

Funding

This work was supported by Ireland’s Health Service Executive (HSE)—Mental Health Section.

REFERENCES