1,874
Views
1
CrossRef citations to date
0
Altmetric
Research Article

Balancing practicality and validity of elder abuse identification measures: using data from adult protective services investigations

, , &

ABSTRACT

Background and Objectives. In testing a comprehensive decision support system for Adult Protective Services (APS), this study addressed two problems common in APS research and practice: the psychometric quality of the measures and measurement burden. Research Design and Methods. Data were generated on 1,472 APS cases over six months in two California counties using the Identification, Services and Outcomes (ISO) Matrix, a comprehensive decision support system for APS. The ISO Matrix uses Short-Forms developed from the Elder Abuse Decision Support System (EADSS). Mini-Forms were developed from the Short-Forms and tested in order to reduce measurement burden. Mini-Forms were developed on each measure using sensitivity and specificity of the items in predicting the criterion of substantiation (yes/no). Psychometric quality was addressed by estimating predictive validity and Cronbach’s alpha of Short-Forms. Predictive validity and reliability were also estimated on the Mini-Forms as was their correlation with the Short-Forms. Results. On Short-Forms, good predictive validity was found for all measures except those that were very rare. Results for even shorter Mini-Forms were mixed, and some will require further research on their reliability and validity. Discussion and Implications. Short-Forms had good psychometric properties and some Mini-Forms did as well. Ongoing adoption by several California counties and Montana demonstrates the viability and sustainability of using the ISO Matrix for research and practice.

Introduction

Until recently, reliable, validated measures have been lacking for Adult Protective Services (APS) to identify the multiple problems of elder abuse. With few exceptions, programs have relied on the observations and decisions of caseworkers with unknown standards of validity for evaluating those decisions. For example, a study by Mosqueda et al. (Citation2016) found great variability among APS sites in California regarding substantiation decisions. They concluded that this was likely due, at least in part, to a lack of standardization in training and poor use of available data. Yet, standardized forms can be burdensome for both caseworkers and potential clients (National Adult Protective Services Association and National Committee for the Prevention of Elder Abuse, Citation2013) and can result in poor quality or incomplete data. However, having standardized, empirically validated elder abuse measures that are convenient to use in APS investigations would be ideal given that such measures can enable setting benchmarks and observing improvement or decline after case interventions.

In addition to benefitting practice, standardization can promote valid research (Cronbach, Citation1982; Shadish et al., Citation2002). In APS, a recent Cochrane review concluded that “There is inadequate trustworthy evidence to assess the effects of elder abuse interventions on occurrence or recurrence of abuse (Baker et al., Citation2016).” This conclusion was corroborated by a review by Ploeg et al. (Citation2009), and a later study by Day et al. (Citation2017). Like Ploeg et al. (Citation2009), Day et al. (Citation2017) concluded that there is currently insufficient evidence to guide the implementation of interventions that ameliorate elder abuse. In all, six of eight studies reviewed by Ploeg et al. (Citation2009) and two of eight subsequent studies reviewed by Day et al. (Citation2017) focused on client or perpetrator outcomes, e.g., “recurrence of abuse.” The remaining eight (50%) focused on outcomes of interventions for professional caregivers or caseworkers, e.g., a presentation about elder abuse to nursing assistants and a training for dental hygienists to improve their awareness of elder abuse. Therefore, only half of the 16 studies focused on what happened to APS clients and perpetrators. Even though all studies met the criterion of including a comparison group, they were still riddled with other significant threats to validity such as small samples, low completion rates, and poor or unspecified measurement properties. Due to problems such as these, APS agencies and researchers have searched for the most valid and least burdensome methods of measuring inputs and outcomes of their work (National Adult Protective Services Association and National Committee for the Prevention of Elder Abuse, Citation2013). Therefore, this study addressed two problems germane to APS research and practice: the psychometric quality of the measures and measurement burden.

Attempts to address the problems

We describe three distinctly different attempts to address the problems. The first introduced “structured decision making” (SDM) into APS practice. It was noted that identifying risk and directing resources to those most in need is the core of this model (Park et al., Citation2010). The desired outcome of SDM is to increase consistency and accuracy in assessing self-neglect and elder mistreatment. The system is in use in a number of jurisdictions in counties and states, such as Texas, Virginia, New Hampshire, California, and Minnesota. To avoid an extensive review that is beyond the scope of this article, we note that inter-rater agreement assessment after training of 24 caseworkers was excellent for self-neglect at 93% and fair for other types of abuse at 65%. The SDM was found to be a useful predictor of the risk of subsequent harm for both self-neglect and abuse by others (Johnson et al., Citation2012).

Another approach to systematic assessment is the Tool for Risk, Interventions, and Outcomes (TRIO). The TRIO is designed to facilitate consistent APS practice and collect data related to multiple dimensions of typical interactions with APS clients, including the investigation and assessment of risks, the provision of APS interventions, and associated health and safety outcomes (Sommerfeld et al., Citation2014). The authors reported high field utility, demonstrated by social workers’ “relevance and buy-in.” In addition, interrater reliability was high based on 12 caseworkers in Ventura County, California. Concurrent validity was demonstrated and predictive validity was examined by the prediction of the risk of actual APS recurrence. While these are indicators suggestive of validity, both validity indicators do not include several important indicators including Cronbach’s alpha reliability (Cronbach, Citation1951; Nunnally & Bernstein, Citation1994) or item response theory reliability since these were not applicable. The TRIO did use receiver operator characteristic (ROC) curve analysis for predictive validity which resulted in a fair area under the curve of 0.69 using recurrence of elder abuse as the validity criterion.

Standardized, validated measures of elder abuse were developed and tested in the Elder Abuse Decision Support System (EADSS) study (Conrad & Iris, Citation2015) in order to support and standardize decision-making to substantiate abuse. The EADSS has used traditional Cronbach’s alpha and modern item response theory Rasch (Wright & Stone, Citation1979) measurement models, e.g., (Conrad & Conrad, Citation2019; Conrad et al., Citation2011, Citation2010), and the measures developed have yielded hundreds of citations in peer-reviewed journals. In the EADSS study, the problem of measurement burden manifested itself in discussions with caseworkers and supervisors as a potential improvement area in investigations. These discussions included formal focus groups where the caseworkers complained principally about the number of items which they regarded as excessive. The researchers responded by creating Short-Forms for four of the measures (Beach et al., Citation2017). The alphas for three measures of financial exploitation, emotional abuse, and physical abuse were very good at 0.89, 0.88, and 0.86 and acceptable for neglect at 0.66. Areas under the curve estimating predictive validity using substantiation decision as the criterion were excellent with all four ranging from 0.95 to 0.97. Despite the development of the Short-Forms, the measurement burden problem was also observed in the current study of the Identification, Services, and Outcomes (ISO) Matrix using the Short-Forms (Liu et al., Citation2017) which is the focus of the current research.

Ironically, individual items and very short scales tend to be less reliable (Nunnally & Bernstein, Citation1994) while longer, more reliable measures are inherently more burdensome. Nevertheless, standardized empirical data are needed to obtain large numbers of cases for statistical conclusion validity (Shadish et al., Citation2002) and construct validity (Cronbach, Citation1982) in program evaluations and research studies. Additionally, both research and practice are based on the ability to take accurate, valid measures of the characteristics of interest (Nunnally & Bernstein, Citation1994). In practice, specifically, valid data are important in managing programs and improving them (Deming, Citation2000). Therefore, there is a balancing point or trade-off between practicality and validity that must be evaluated for optimal use.

Data source

This study used the Identification, Services, and Outcomes (ISO) Matrix (Liu et al., Citation2017), a comprehensive decision support system for APS developed principally from the EADSS (Conrad & Iris, Citation2015). The goal of the ISO Matrix is to improve Adult Protective Services’ (APS) ability to reduce clients’ risk of abuse and neglect and maintain clients’ independence in the community. As such, it is designed to identify alleged abuse by type to support substantiation decisions and guide the design of the service plan. Outcome assessment determines the success or failure of the identification and service plan implementation for each case. This paper focuses only on identification measures.

Under a two-year grant from the Administration on Community Living (Liu et al., Citation2017) the ISO Matrix was implemented in San Francisco and Napa Counties, California for a data collection period of 6 months. In addition to replicating four measures from the EADSS (Beach et al., Citation2017), the ISO Matrix provided psychometric assessments of four additional elder abuse constructs which the EADSS study lacked. All data were entered into San Francisco and Napa counties’ case management system called LEAPS and were transferred from the case management system by JUMP Technology Services to the research team for data cleaning and coding before analyses using SPSS.

Research objectives

To address the issues of validity and practicality, the objectives of this study were to:

  1. Evaluate the quality of the ISO Matrix Short-Forms which included four replications from the (Beach et al., Citation2017) study and four new scales, in terms of predictive validity and internal consistency reliability.

  2. Develop and test Mini-Form versions for the Short-Form elder abuse measures in order to further reduce the measurement burden.

Design and methods

Participants characteristics

Analysis was conducted on 1,472 APS cases of older adults (>65 years). These were all cases that included a home visit and reached the point of making a substantiation decision for elder abuse (yes/no). displays summary descriptive statistics for the client and alleged abuser. For those items with a majority of cases having a valid response, the average victim age was 78 years, 57% were female, 45% were White, 24% Asian, 15% Black, and 12% Hispanic. English was the primary language for 69% of clients, 19% spoke primarily an Asian language, 7% Spanish, and 5% another language. Twenty-four percent of clients did not speak English at all. Only 27% of clients were receiving an in-home service or support at the time of case opening. In terms of living arrangements, 45% lived alone, 30% with others, 11% lived with the alleged abuser, and 15% had other arrangements. Around three-quarters of the clients in cases had no employment information recorded; of those whose records included that information, only 4% were either actively employed or seeking employment. Twenty-seven percent were military veterans. For those non-self-neglect cases with a valid response, 69% of the alleged abusers were family members, 14% were known to the client (non-family members), 8% were the client’s caregiver, and 9% fell into other categories.

Table 1. Participants

Research design

To assess psychometric quality, a one-group, longitudinal design was used to assess predictive validity using receiver operator characteristic (ROC) curve analysis (Nunnally & Bernstein, Citation1994). Predictive validity was important since the ISO Matrix screeners for elder abuse problems are used to design service plans and referrals to treatment. The independent variables were the Short-Forms and Mini-Forms while the dependent criterion variable was the substantiation decision (yes/no) for elder abuse types arrived at by the caseworker and supervisor after an investigation usually ranging from about 30 to 90 days. Cronbach’s alpha was estimated to provide an estimate of internal consistency reliability (Cronbach, Citation1951).

Analysis methods

Objective 1. Evaluate the quality of four previously tested Short-Forms, i.e., replication study, and conduct initial testing of four new Short-Forms in terms of predictive validity and internal consistency reliability.

This study only addressed the quality of the ISO Matrix measures at pretest, i.e., the identification phase since this enabled prediction of the criterion, i.e., substantiation of abuse.

Short-Forms and Mini-Forms from the ISO Matrix measures

The ISO Matrix is based on the EADSS, which is a theory-based system developed through extensive literature review, concept mapping, and testing in the field (Conrad & Iris, Citation2015; Conrad et al., Citation2017, Citation2011, Citation2010). It includes comprehensive, structured interview guides to assess levels of abuse and perpetrator/victim characteristics, and has detailed measures of the four previously developed Short-Forms of elder mistreatment based on the EADSS (Beach et al., Citation2017): (1) the Older Adult Financial Exploitation Measure (OAFEM), (2) the Older Adult Emotional Abuse Measure (OAEAM), (3) the Older Adult Neglect Measure (OANM), and (4) the Older Adult Physical Abuse Measure (OAPAM). The OAPAM was modified/shortened for this study, and this is the first time this version has been tested empirically. The (5) Self-Neglect Short-Form was adapted from (Iris et al., Citation2014) and tested in (Liu et al. (Citation2020). The remaining abuse measures and Short-Forms have not been tested previously: The (6) Isolation Form and the (7) Sexual Abuse Form were adapted from the EADSS. The (8) Abandonment and (9) Abduction Forms each had two items using language taken from the California APS abandonment and abduction statutes. The (10) Abuser Risk Short-Form (Conrad & Conrad, Citation2019) was also adapted from the EADSS, and it measures characteristics of potential or suspected abusers that would put them at risk for committing abuse. Finally, (11) the Client Risk Form, originally developed by Riverside County APS in California, measures characteristics of clients that could make them more vulnerable to being abused. The eleven ISO Matrix measures are summarized below with previously available psychometric information (see Appendix A for abbreviated item descriptions and Appendix B for the full measures with construct definitions and with Mini-Forms indicated). As noted below several measures were of very rare phenomena and lacked data, i.e., Isolation, Sexual Abuse, Abandonment, and Abduction, but are listed to indicate the need for further research. Predictive validity is indicated by the area under the curve (AUC) described in the analysis section and reliability is indicated by Cronbach’s alpha (Cronbach, Citation1951).

Financial Exploitation Short-Form. The 11-item Short-Form was adapted from the EADSS, AUC = 0.94, alpha = 0.89 (Beach et al., Citation2017), replicated here, and the Mini-Form developed and tested here.

Emotional Abuse Short-Form. The 11-item Short-Form was adapted from the EADSS, AUC = 0.97, alpha = 0.88 (Beach et al., Citation2017), replicated here, and the Mini-Form was developed and tested here.

Physical Abuse Short-Form. The 3-item Short-Form was adapted from the EADSS (Beach et al., Citation2017) not previously analyzed, no additional Mini-form developed here. A 6-item form was tested by (Beach et al., Citation2017), AUC = 0.96, alpha = 0.86.

Neglect Short-Form. The 7-item Short-Form was adapted from the EADSS, AUC = 0.95, alpha = 0.66 (Beach et al., Citation2017), replicated here, and the Mini-Form was developed and tested here.

Self-Neglect Short-Form. The new 6-item Short-Form was adapted from (Iris et al., Citation2014), 25-item version had alpha = 0.87 (Iris et al., Citation2014), the 6-item Short-Form had AUC = 0.80, alpha = 0.74 (Liu et al., Citation2020), replicated here, and the Mini-Form was developed and tested here.

Isolation Short-Form. The 4-item Short-Form was adapted from the EADSS (Conrad & Iris, Citation2015), though it is a rare event, data were sufficient to obtain estimates only for the Short-Form here.

Sexual Abuse Short-Form. The 6-item Short-Form adapted from the EADSS (Conrad & Iris, Citation2015), had insufficient data for estimation.

Abandonment Form. The 2-item form from the California APS statute had insufficient data.

Abduction Form. The 2-item form from the California APS statute had insufficient data.

Abuser Risk Short-Form. The 11-item Short-Form was adapted from the EADSS, AUC = 0.84, alpha = 0.84 (Conrad & Conrad, Citation2019), replicated here, and the Mini-Form was developed and tested here.

Client Risk Form. The 9-item form was adapted from the 21-item risk assessment used by Riverside, CA APS, with the Short-Form and Mini-Form developed and tested here.

Scoring. Because the forms are straightforward with simple scoring procedures, caseworkers and researchers can easily learn to adopt the forms (see Appendix B for scale definitions and complete Short-Forms with items noted to be dropped to make Mini-Forms). Simply add up the item responses to get a total score for each section. Automated scoring is available on the case management system used by both counties. Items are scored “Yes” = 2; “Some Indication” = 1; “No” = 0; and “Refused/Don’t Know/NA” = 0 (or missing). Since the prior ROC analyses indicated that a total score of 1 was most sensitive and specific as a cutoff (Conrad et al., Citation2017), any score (i.e., a score of 1 or higher or “some indication” on at least one item) indicates a high probability of the presence of that type of abuse.

Objective 2. Develop and test Mini-Form versions for a suite of previously validated Short-Form elder abuse measures to reduce measurement burden.

Selection of Mini-Form items

The Mini-Form items were selected from the Short-Forms. Four criteria were considered in choosing items to delete sensitivity, specificity, construct validity, and caseworker input. For an example of caseworker input, in the emotional abuse section, due to caseworker preference “threatening gestures” were kept over “unkind names or put downs” though the latter had slightly higher sensitivity. Sensitivity is a type of criterion/construct validity that summarizes how well an item or measure detects a specific phenomenon, in this case, type of abuse. It is the probability (here expressed as a percentage) that a person who was ultimately substantiated as abused would be identified by the item as an abuse victim, i.e., score positive. Specificity measures how well the item identifies those who were not substantiated for abuse. This is the probability that the non-abused person scores negative, i.e., zero, on the item. We used sensitivity and specificity to decide which items were the best predictors of the abuse substantiation decision (construct validity again). In this study, the sensitivity was much more variable (ranging from 4% to 74%). The specificity statistics were all over 90% except for one Abuser Risk item and most Client Risk items. Therefore, except for the latter two types of items, this lack of variability in specificity did not provide useful information for decisions about the items. Instead, the sensitivity had greater variability and was more informative. In general, items were deleted from the Short-Form in order to create the Mini-Form if they had a sensitivity lower than 0.20. A few exceptions were made when it was clear that the caseworkers expressed a strong preference for a particular item with compelling logic or the item was required for construct validity. For Abuser and Client Risk items, both sensitivity (<0.20) and specificity (<0.20) were considered.

Statistical analyses

Criterion validity

Methods to test criterion validity were employed to examine the quality of the measures. Criterion validity refers to the ability of a measure or an item to predict a criterion of interest (Nunnally & Bernstein, Citation1994). In this study, the best criterion for elder abuse was the substantiation decision – the final judgment of the caseworker and the supervisor after an investigation, as to the presence or absence of a type or types of elder abuse. A positive substantiation decision, i.e., including both “confirmed” and “inconclusive” for abuse (coded as 1), formed the basis for moving ahead as an agent of the county in the steps needed to ameliorate the situation including removing and/or prosecuting a perpetrator. A decision of “unfounded”, coded as 0, resulted in an end to the investigation and no services for elder abuse. Using this 0/1 criterion, the predictive validity of all abuse and risk measures was estimated with receiver operator characteristic (ROC) curve analysis.

ROC curve analysis of predictive validity

ROC curve analysis (McNeil & Hanley, Citation1984) was used to test the predictive validity of the measures, i.e., what was the probability (as a percentage) of predicting the substantiation decision correctly? Predictive validity is a type of construct validity (Messick, Citation1989). This was estimated using logistic regression with the substantiation decision as the response and the relevant individual items as predictors, for each abuse type. As a sensitivity analysis, we repeated the analysis using summed scores. The ROC analysis charts the sensitivity (true positive rate) against 1 minus the specificity (true negative rate) to derive a curve that estimates the relationship of the measure to the criterion of interest. The area under the ROC curve (AUC) indicates the test’s predictive validity (Hosmer & Lemeshow, Citation2000) where 1.0 is perfect sensitivity and specificity, above 0.90 is excellent, above 0.80 is good, 0.70 is fair, and 0.50 is random chance. The validity of each form, e.g., Financial, Emotional, Physical, was judged against the criterion of the final yes/no substantiation decision of abuse to derive the probability of a correct decision expressed as a percentage.

Cronbach’s alpha reliability

Cronbach’s alpha (Cronbach, Citation1951) is a measure of internal consistency reliability or how well the items in a measure are related. Although predictive validity was the principal criterion for the Short-Forms, the measures were expected to demonstrate reasonable internal consistency. Kline (Citation2000) suggests ≥ .9 as “excellent,” .7–.9 as “good,” and .6–.7 as “acceptable.”

Correlations of sShort-Forms with Mini-Forms

To examine how well the Mini-Forms represented the Short-Forms, the scores from both were correlated. A high correlation was expected since the items of the Mini-Form overlap with the Short-Form items in the same data set. Additionally, dropping items with low endorsement, low value for predicting abuse substantiation, and not particularly valued by field staff are precisely what we wanted to accomplish. Therefore, this is simply a check that there is relatively low loss of information using the Mini-Forms.

Results

The study succeeded in accruing a large enough sample, with 1,472 cases with usable data collected over 6 months, to conduct psychometric analyses on all measures except those of very rare phenomena.

Objective 1. Evaluate the quality of four previously tested Short-Forms, i.e., replication study, and conduct first-testing of four new Short-Forms in terms of predictive validity and internal consistency reliability.

Short-Form results

For the Short-Forms, there were adequate data to compute scale statistics for all scales except Sexual Abuse, Abduction, and Abandonment. These are rare phenomena, and the latter two are obvious so as not to require dimensional scales (these three are not included in ). However, the data were adequate to obtain estimates for the other scales.

Table 2. Measurement properties for short- and mini-forms for each scale of abuse

Predictive validity

In , four Short-Form scales had good predictive validity (AUC>0.8), i.e., Emotional, Physical, Financial, and Abuser Risk. Three had fair predictive validity (AUC>0.7), i.e., Self-Neglect, Neglect, and Client Risk. Only Isolation was below 0.7. The sensitivity analysis using summed scores yielded similar results (typically within ± 0.01 when using items as predictors).

Internal consistency reliability

The Cronbach’s alphas for the Short-Forms were >0.6 for all that could be computed except for Isolation and Physical Abuse. While Isolation had marginal AUC, the alpha was low. While Physical Abuse had strong AUC results and, with only three items, one could not expect high alpha, as expected, it was low at 0.46.

Objective 2. Develop and test Mini-Form versions for a suite of previously validated Short-Form elder abuse measures to reduce measurement burden.

Mini-Form item statistics

The item statistics results used to develop the Mini-Forms are displayed in Appendix A. The items that were dropped from the Short-Forms to create the Mini-Forms are noted in Appendices A and B with the word DROP.

The item endorsement percentages and the item sample sizes show how much data were available for the item analyses. The Linacre (Citation2002) criterion of 10 cases per item response was not met for any of the Isolation, Sexual Abuse, Abduction, and Abandonment items. These are more rare events, so it was not unexpected to see this. The scarcity of data for these measures of rare events can be seen in Appendix A, and only one, Isolation, is not dropped from . Isolation was judged to have enough data for estimation and inspection, but not for inference, so it was included to support discussion and future research.

Mini-Form scale results

Predictive validity. In , for the six new Mini-Forms, five Mini-Form scales had fair predictive validity (AUC>0.7), and Client Risk was borderline at 0.68. Not-applicable (NA) indicates Short-Form scales with three or fewer items that could not be shortened further.

Internal consistency reliability. Where alpha computation of a “mini” form was applicable, four Mini-Forms were >0.6. Self-Neglect was 0.61. Financial Exploitation was 0.81. Abuser Risk was 0.64. Client Risk was 0.84.

Correlations of Short-Forms with Mini-Forms. Six Short-Forms comprised of 50 items were revised into six Mini-Forms comprised of 29 items, a reduction of 21 items or 42%. The correlations of the Short-Forms with the Mini-Forms were high at >0.87 for all available analyses having data for both forms (). This was clear evidence that the Mini-Forms were strongly representative of the Short-Forms while reducing the number of items substantially.

Discussion and implications

This study provided large enough samples to conduct psychometric analyses for an elder abuse investigative assessment tool. There were not enough data to validly analyze extremely rare phenomena such as abduction, abandonment, and sexual abuse. While there were enough data to enable estimates for the rare phenomenon of isolation, the numbers for these estimates were still quite small and will require further study with larger samples. The data are also being used in practice to improve the quality of implementation (Liu et al., Citation2020) and to improve the ISO Matrix to meet client and caseworker needs. The fact that the ISO Matrix has been adopted by two California counties, one urban and one suburban/rural, and the State of Montana, and that it is being considered by more states and California counties attests to the fact that increasingly large samples of data, both urban and rural, are being collected to inform issues concerning practice and research. Indeed, having standardized data will not neglect smaller programs. It will enable aggregation of data over smaller programs and more rural areas such as Montana so that they may be properly represented. The item endorsement percentages, the item sample sizes, the total score sample sizes, and the resulting estimates illustrate the importance and usefulness of having large samples. The ongoing collection of data by San Francisco and Napa Counties and the State of Montana makes them a national resource for APS research that can contribute to improving our understanding and service to abused clients.

Testing and improving quality of measures

Short-Forms

Cronbach’s alpha for reliability and AUC estimates for predictive validity were obtained for all measures except Sexual Abuse, Abandonment, and Abduction. The predictive validities for all Short-Forms were in the good range except for Isolation. This good predictive validity is evidence of the usefulness of these seven measures in both practice and research. This is further evidence to support the strong Short-Form estimates derived from the EADSS data (Beach et al., Citation2017) for Neglect, Emotional Abuse, Physical Abuse, and Financial Exploitation. This replication with a new data set is strong evidence of the validity of these measures.

Regarding Cronbach’s alphas, Self-Neglect, Neglect, Emotional Abuse, Financial Exploitation, Abuser Risk, and Client Risk had good alphas. The only scale not replicated from Beach et al. (Citation2017) was Physical Abuse since only a 3-item Short-Form was used here. The 3-item Physical Abuse scale also had a low alpha at 0.46. However, it had a strong predictive validity at AUC = 0.86. This is likely a multi-dimensional index whose strong predictive power attests to its usefulness. A slightly longer, 6-item version with alpha = 0.82 and AUC = 0.97 may be found in (Beach et al., Citation2017). Neglect was adequate at 0.65. However, a 25-item Neglect version useful for research with alpha = 0.87 may be found in (Iris et al., Citation2014).

Since Isolation had a rather low predictive validity and a low alpha reliability at 0.52, we concluded conservatively based on the data from this analysis that this measure should be studied and developed further to promote improvement in subsequent data collections. Again, Sexual Abuse, Abduction, and Abandonment had too little data for valid interpretation. While abduction and abandonment may not require dimensional measurement since only one or two items are required, sexual abuse does, so that ongoing data collection will be important to the field and should provide needed validity estimates in the near future.

Mini-Forms

Mini-Form reliabilities were acceptable or better for Self-Neglect, Financial Exploitation, Abuser Risk, and Client Risk, but were not acceptable for Neglect and Emotional Abuse. As noted above, the predictive validity estimates for the Mini-Forms were surprisingly strong with all being in the fair range (Client Risk was borderline at 0.68). The caveat, of course, is that more data are needed in the future for clearer inference. The correlations of Short-Forms with Mini-Forms were very strong as well with all surpassing the 0.8 criterion. There was noticeable decrease in the Cronbach’s alphas for the Mini-Forms in most cases (except Financial Exploitation and Client Risk), and the alphas for two of the Mini-Forms, i.e., Neglect and Emotional Abuse, were below 0.6 which is low. The alpha for Self-Neglect was borderline at 0.61 for the Mini-form. The AUCs were strong for Self-Neglect so it is useful as a predictor, but should be studied further to improve the reliability if possible. Another explanation is that the construct of Self-Neglect is not unidimensional. Since these items are comprised of disparate things ranging from personal to environmental, it is likely that this is more of an index than a scale, and, as such, is still a useful predictor. While some decrease in alpha was expected in the Mini-Forms, the four forms with strong estimates are still useful especially in busy practice settings since the AUCs were good. However, in research contexts and whenever slightly higher numbers of items are possible, the Short-Forms are recommended over the Mini-Forms. Another notable advantage of using the Short-Forms is that their good reliability enables caseworkers to use the scores on them as severity indicators where higher scores indicate greater severity.

Limitations

Though these estimates were relatively strong, they were conducted in only two California counties, one urban and one suburban/rural. Therefore, the generalizability of these results will need to be tested in other locations. Additionally, using the substantiation decision as the gold standard for predictive validity has the limitation of being a criterion that has been called into question as highly variable (Mosqueda et al., Citation2016). The purpose of standardization of the questions and procedures is to reduce this variability. While this study has employed standardized methods, it remains for future work to examine whether the goal of reduced variability, i.e., better agreement among caseworkers in substantiating elder abuse, has been achieved.

To address these limitations, it is the case that the use of the ISO Matrix is being continued and expanded. Regarding the measurement burden, it would be ideal to compare completion rates for Short-Forms versus Mini-Forms, but that was not possible in the six-month study. To do so would also require more counties (ideally with the random assignment) since many factors may play a role in improving completion rates. Therefore, this study can only conclude that it took promising steps toward the goals of improving completion rates by reducing the number of items and of improving the reliability of decision-making by standardizing methods.

The limitation to note in using mini-forms is the likely reduced validity due to dropped items. Therefore, the Short-Forms are recommended as being more valid for clinical judgments and research decisions. This enhanced validity is especially crucial in assessing pre/post-change, a topic to be explored in the future research. For in-depth studies of individual types of abuse, the original long-forms are recommended.

ISO Matrix advantages

Below is a brief list of potential advantages of the ISO Matrix: (1) As caseworkers become more proficient at using the standardized system, they work more efficiently and can make input for ongoing improvement; (2) Caseworkers, supervisors, and decision-makers within and across counties and states have clear, empirically developed standards with quantitative results on which to base judgments, and these can be improved with ongoing evidence-based practice; (3) Questions are focused on obtaining client input directly (or from appropriate collaterals when needed) to ensure client self-determination and engagement; (4) Standardization and computerization have many efficiencies such as improved training, learning, and implementation of best practice, communication at the case level to improve service referrals and outcome tracking, data analysis to provide evaluation for program accountability and improvement (Conrad & Iris, Citation2015; Conrad et al., Citation2017); (5) Empirically developed measures enable future research that will support valid studies and will professionalize the field to enable certification and education that can form the basis for obtaining jobs and advancement; (6) Science-based decision-making supports observations of program improvement and guides funding decisions to improve client outcomes.

Although this project breaks new ground in the development and testing of measures that are viable for use in actual APS casework in an ongoing way, future research is needed, especially in developing standards for collecting data on services and outcomes using the measures described here and to be developed in the future work with ongoing data collection in progress.

Conclusion

The problems of poor quality of data usage and measurement and the issue of measurement burden were addressed. This study employed the ISO Matrix to collect a reasonably large sample over 6 months of APS casework. The result is a practical set of measures with empirically established validity that is especially strong in all measures that were replicated from Beach et al. (Citation2017), i.e., four out of four with predictive validity and three out of three with Cronbach’s alpha. Findings from three of the new scales were also good with only the Isolation Scale indicating the need for further development.

This study provided replication on four Short-Forms and psychometrics on new Short-Forms and addressed the complaints of caseworkers that the ISO Matrix was too long and showed that the Mini-Forms may be able to contribute to improved completion rates with more study. Regarding the Short-Forms, the results supported their use in practice to provide ongoing program feedback (Liu et al., Citation2020) and also in research to provide larger samples of valid data in ongoing and future studies. With good validity for research, the measures were still convenient enough for use in APS practice beyond the study.

As proof of the concept of sustainable research quality data collection, the ISO Matrix has been adopted by the two California counties that participated in this research for use in their decision support systems going forward. The State of Montana and additional California counties are also doing so. Data collection and the ability to use the data in program evaluation and research are ongoing. Therefore, this study is unusual in APS since it merged research with practice to improve the empirically tested, standardized decision support system to the point of ongoing adoption.

Disclosure of interest statement

We have no conflict of interest to declare.

Human subjects review

The Institutional Review Board of Purdue University (IRB Protocol # 1812021397) deferred the approval to the University of California, San Francisco (IRB # 17–23904) to provide annual oversight.

Supplemental material

Supplemental Material

Download MS Word (41 KB)

Acknowledgments

We would like to express our gratitude towards San Francisco and Napa Adult Protective Services, from caseworkers, supervisors, analysts, managers, directors, deputy directors. Sara Stratton was instrumental in facilitating data collection, and Andrew Butler contributed to preparing the data file for analysis.

Data availability statement

The data that support the findings of this study are available from the corresponding author, (KJC), upon reasonable request. Restrictions may apply to the availability of these data based on the data usage agreement between Purdue University and San Francisco Adult Protective Services.

Supplementary material

Supplemental data for this article can be accessed on the publisher’s website.

Additional information

Funding

This work was supported by the Administration for Community Living, U.S. Department of Health and Human Services (DHHS) under Grant 90EJIG0010-01-01, Liu, P.I. Grantees carrying out projects under government sponsorship are encouraged to express freely their findings and conclusions. Therefore, points of view or opinions do not necessarily represent official Administration for Community Living or DHHS policy Administration for Community Living, U.S. Department of Health and Human Services [90EJIG0010-01-01];

References