255
Views
8
CrossRef citations to date
0
Altmetric
Articles

Semi-automated text mining strategies for identifying rare causes of injuries from emergency room triage data

, &

References

  • Aggarwal, C. C., and Zhai, C. (2012) An introduction to text mining. Pp. 1–10 in Mining Text Data, C. C. Aggarwal and C. Zhai (Eds.). Retrieved from http://link.springer.com/chapter/10.1007/978-1-4614-3223-4_1
  • Batista, G. E. A. P. A., Prati, R. C., and Monard, M. C. (2004) A study of the behavior of several methods for balancing machine learning training data. ACM SIGKDD Explorations Newsletter, 6(1), 20. doi:10.1145/1007730.1007735
  • Bertke, S. J., Meyers, A. R., Wurzelbacher, S. J., Bell, J., Lampl, M. L., and Robins, D. (2012) Development and evaluation of a Naïve Bayesian model for coding causation of workers’ compensation claims. Journal of Safety Research, 43(5–6), 327–332. doi:10.1016/j.jsr.2012.10.012
  • Bertke, S. J., Meyers, A. R., Wurzelbacher, S. J., Measure, A., Lampl, M. P., and Robins, D. (2016) Comparison of methods for auto-coding causation of injury narratives. Accident Analysis & Prevention, 88, 117–123. doi:10.1016/j.aap.2015.12.006
  • Chawla, N. V., Japkowicz, N., and Kotcz, A. (2004) Editorial: Special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter, 6(1), 1–6. Retrieved from http://dl.acm.org/citation.cfm?id=1007733
  • Chen, L., Vallmuur, K., and Nayak, R. (2015) Injury narrative text classification using factorization model. BMC Medical Informatics and Decision Making, 15(Suppl 1), S5. doi:10.1186/1472-6947-15-S1-S5
  • Corns, H. L., Marucci, H. R., and Lehto, M. R. (2007) Development of an approach for optimizing the accuracy of classifying claims narratives using a machine learning tool (TEXTMINER[4]). Pp. 411–416 in Human Interface and the Management of Information. Methods, Techniques and Tools in Information Design. Human Interface 2007. Lecture Notes in Computer Science, M. J. Smith and G. Salvendy (Eds.). Springer. Retrieved from http://link.springer.com/chapter/10.1007/978-3-540-73345-4_47
  • David W. Hosmer, Jr. (2004) Applied Logistic Regression. John Wiley & Sons.Retrieved from http://books.google.com/books?id=Po0RLQ7USIMC
  • Fan, R., Chang, K., Hsieh, C., Wang, X., and Lin, C. (2008) LIBLINEAR: A library for large linear classification. Retrieved from http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.140.9959
  • Frank, E., Hall, M., Holmes, G., Kirkby, R., Pfahringer, B., Witten, I. H., and Trigg, L. (2005) Weka. Pp. 1305–1314 in Data Mining and Knowledge Discovery Handbook, O. Maimon and L. Rokach (Eds.). Retrieved from http://link.springer.com/chapter/10.1007/0-387-25465-X_62
  • Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., and Witten, I. H. (2009) The WEKA data mining software: An update. SIGKDD Explor. Newsl., 11(1), 10–18. doi:10.1145/1656274.1656278
  • Japkowicz, N. (2000) The class imbalance problem: Significance and strategies. In Proceedings of the International Conference on Artificial Intelligence. Retrieved from http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.35.1693&rep=rep1&type=pdf
  • Lauría, E. J. M., and March, A. D. (2011) Combining Bayesian text classification and shrinkage to automate healthcare coding: A data quality analysis. J. Data and Information Quality, 2(3), 13:1–13:22. doi:10.1145/2063504.2063506
  • Lehto, M., Marucci-Wellman, H., and Corns, H. (2009) Bayesian methods: A useful tool for classifying injury narratives into cause groups. Injury Prevention: Journal of the International Society for Child and Adolescent Injury Prevention, 15(4), 259–265. doi:10.1136/ip.2008.021337
  • Manning, C. D., Raghavan, P., and Schütze, H. (2008) Introduction to Information Retrieval. Cambridge University Press. Retrieved from http://books.google.com/books?id=t1PoSh4uwVcC
  • Marucci-Wellman, H. R., Corns, H. L., and Lehto, M. R. (2017) Classifying injury narratives of large administrative databases for surveillance—A practical approach combining machine learning ensembles and human review. Accident Analysis & Prevention, 98, 359–371. doi:10.1016/j.aap.2016.10.014
  • Marucci-Wellman, H. R., Lehto, M. R., and Corns, H. L. (2015) A practical tool for public health surveillance: Semi-automated coding of short injury narratives from large administrative databases using Naïve Bayes algorithms. Accident Analysis & Prevention, 84, 165–176. doi:10.1016/j.aap.2015.06.014
  • McKenzie, K., Scott, D. A., Campbell, M. A., and McClure, R. J. (2010) The use of narrative text for injury surveillance research: A systematic review. Accident Analysis & Prevention, 42(2), 354–363. doi:10.1016/j.aap.2009.09.020
  • Measure, A. C. (2014). Automated Coding of Worker Injury Narratives. Boston, MA: JSM 2014 - Government Statistics Section. Retrieved from http://www.bls.gov/osmr/pdf/st140040.pdf
  • Nanda, G., Grattan, K. M., Chu, M. T., Davis, L. K., and Lehto, M. R. (2016) Bayesian decision support for coding occupational injury data. Journal of Safety Research, 57, 71–82. doi:10.1016/j.jsr.2016.03.001
  • Nanda, G., Vallmuur, K., and Lehto, M. R. (2018) Improving autocoding performance of rare categories in injury classification: Is more training data or filtering the solution? Accident Analysis & Prevention, 110, 115–127. doi:10.1016/j.aap.2017.10.020
  • Ng, A. Y., and Jordan, M. I. (2002) On discriminative vs. generative classifiers: A comparison of Logistic Regression and Naive Bayes. Pp. 841–848 in Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic (NIPS'01), T. G. Dietterich, S. Becker, and Z. Ghahramani (Eds.). Cambridge, MA: MIT Press. Retrieved from http://papers.nips.cc/paper/2020-on-discriminative-vs-generative-classifiers-a-comparison-of-logistic-regression-and-naive-bayes.pdf
  • Phua, C., Alahakoon, D., and Lee, V. (2004) Minority report in fraud detection. ACM SIGKDD Explorations Newsletter, 6(1), 50. doi:10.1145/1007730.1007738
  • Prati, R. C., Batista, G. E. A. P. A., and Silva, D. F. (2015). Class imbalance revisited: A new experimental setup to assess the performance of treatment methods. Knowledge and Information Systems, 45(1), 247–270. doi:10.1007/s10115-014-0794-3
  • Provost, F. (2000). Machine learning from imbalanced data sets 101. Retrieved from http://www.aaai.org/Papers/Workshops/2000/WS-00-05/WS00-05-001.pdf
  • QISU Guide to Collecting An Accurate Text Description of an Injury Event. (2011).South Brisbane, QLD: Queensland Injury Surveillance Unit.
  • Queensland Injury Surveillance Unit. (n.d.) Retrieved from http://www.qisu.org.au/ModCoreFrontEnd/index.asp?pageid=109
  • Rizzo, S. G., Montesi, D., Fabbri, A., and Marchesini, G. (2015) ICD code retrieval: Novel approach for assisted disease classification. Pp. 147–161 in Data Integration in the Life Sciences. DILS 2015. Lecture Notes in Computer Science, N. Ashish and J.-L. Ambite (Eds.), . Springer International Publishing. Retrieved from http://link.springer.com/chapter/10.1007/978-3-319-21843-4_12
  • Smith, G. S., Timmons, R. A., Lombardi, D. A., Mamidi, D. K., Matz, S., Courtney, T. K., and Perry, M. J. (2006) Work-related ladder fall fractures: Identification and diagnosis validation using narrative text. Accident Analysis & Prevention, 38(5), 973–980. doi:10.1016/j.aap.2006.04.008
  • Sun, A., Lim, E.-P., and Liu, Y. (2009) On strategies for imbalanced text classification using SVM: A comparative study. Decision Support Systems, 48(1), 191–201. doi:10.1016/j.dss.2009.07.011
  • Vallmuur, K. (2015) Machine learning approaches to analysing textual injury surveillance data: A systematic review. Accident Analysis & Prevention, 79, 41–49. doi:10.1016/j.aap.2015.03.018
  • Vallmuur, K., Marucci-Wellman, H. R., Taylor, J. A., Lehto, M., Corns, H. L., and Smith, G. S. (2016) Harnessing information from injury narratives in the “big data” era: Understanding and applying machine learning for injury surveillance. Injury Prevention, 22(Suppl 1), i34–i42. doi:10.1136/injuryprev-2015-041813
  • Van Hulse, J., Khoshgoftaar, T. M., and Napolitano, A. (2007) Experimental Perspectives on Learning from Imbalanced Data (pp. 935–942). ACM, New York, NY. doi:10.1145/1273496.1273614
  • Wellman, H. M., Lehto, M. R., Sorock, G. S., and Smith, G. S. (2004) Computerized coding of injury narrative data from the National Health Interview Survey. Accident Analysis & Prevention, 36(2), 165–171. doi:10.1016/S0001-4575(02)00146-X
  • Zhu, X. (2007) Advanced NLP: Text categorization with Logistic Regression. Retrieved from http://pages.cs.wisc.edu/∼jerryzhu/cs838/LR.pdf

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.