Abstract
mHealth—the use of mobile technologies for health—is a growing element of health system activity globally, but evaluation of those activities remains quite scant, and remains an important knowledge gap for advancing mHealth activities. In 2010, the World Health Organization and Columbia University implemented a small-scale survey to generate preliminary data on evaluation activities used by mHealth initiatives. The authors describe self-reported data from 69 projects in 29 countries. The majority (74%) reported some sort of evaluation activity, primarily nonexperimental in design (62%). The authors developed a 6-point scale of evaluation rigor comprising information on use of comparison groups, sample size calculation, data collection timing, and randomization. The mean score was low (2.4); half (47%) were conducting evaluations with a minimum threshold (4 + ) of rigor, indicating use of a comparison group, while less than 20% had randomized the mHealth intervention. The authors were unable to assess whether the rigor score was appropriate for the type of mHealth activity being evaluated. What was clear was that although most data came from mHealth projects pilots aimed for scale-up, few had designed evaluations that would support crucial decisions on whether to scale up and how. Whether the mHealth activity is a strategy to improve health or a tool for achieving intermediate outcomes that should lead to better health, mHealth evaluations must be improved to generate robust evidence for cost-effectiveness assessment and to allow for accurate identification of the contribution of mHealth initiatives to health systems strengthening and the impact on actual health outcomes.
Notes
1Three of the seven listed elements are mutually exclusive in terms of the type of comparison group used. A study could use a comparison group, and that group could be matched or randomized—therefore they would receive one point for using a comparison group, matched or not, and an additional point for randomizing. It was not always possible to determine whether a comparison group was not matched, so we did not distinguish between these. Independent evaluation was considered to indicate higher rigor for the purposes of this analysis, although use of independent evaluators is not always appropriate.