621
Views
0
CrossRef citations to date
0
Altmetric
Research Article

Improving Imbalanced Machine Learning with Neighborhood-Informed Synthetic Sample Placement

, , , &

References

  • Abbas, A.; Zhou, Y.; Deng, S.; and Zhang, P. Text analytics to support sense-making in social media: A language-action perspective. MIS Quarterly, 42, 2 (2018), 427–464.
  • Abbasi, A.; Albrecht, C.; Vance, A.; and Hansen, J. Metafraud: a meta-learning framework for detecting financial fraud. MIS Quarterly (2012), 1293–1327.
  • Adamopoulos, P.; Ghose, A.; and Todri, V. The impact of user personality traits on word of mouth: Text-mining social media platforms. Information Systems Research, 29, 3 (2018), 612–640.
  • Barua, S.; Islam, M.M.; Yao, X.; and Murase, K. MWMOTE–majority weighted minority oversampling technique for imbalanced data set learning. IEEE Transactions on Knowledge and Data Engineering, 26, 2 (2012), 405–425.
  • Beyan, C.; and Fisher, R. Classifying imbalanced data sets using similarity based hierarchical decomposition. Pattern Recognition, 48, 5 (2015), 1653-1672.
  • Bock, M.E. Statistics: harnessing the power of information. Journal of the American Statistical Association, 103, 484 (2008), 1331–1333.
  • Branco, P.; Torgo, L.; and Ribeiro, R.P. A survey of predictive modeling on imbalanced domains. ACM computing surveys (CSUR), 49, 2 (2016), 1–50.
  • Chau, M.; Li, T.M.; Wong, P.W.; Xu, J.J.; Yip, P.S.; and Chen, H. Finding people with emotional distress in online social media: A design combining machine learning and rule-based classification. MIS Quarterly, 44, 2 (2020), 933–955.
  • Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; and Kegelmeyer, W.P. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research, 16(2002), 321–357.
  • Clarke, J.; Chen, H.; Du, D.; and Hu, Y.J. Fake news, investor attention, and market reaction. Information Systems Research, 32, 1 (2020), 35–52.
  • Cui, G.; Wong, M.L.; and Wan, X. Cost-sensitive learning via priority sampling to improve the return on marketing and CRM investment. Journal of Management Information Systems, 29, 1 (2012), 341–374.
  • Dal Pozzolo, A.; Caelen, O.; Johnson, R.A.; and Bontempi, G. Calibrating probability with undersampling for unbalanced classification. IEEE Symposium Series on Computational Intelligence. Cape Town: IEEE, 2015, pp. 159–166.
  • Das, B.; Krishnan, N.C.; and Cook, D.J. RACOG and wRACOG: Two probabilistic oversampling techniques. IEEE Transactions on Knowledge and Data Engineering, 27, 1 (2014), 222–234.
  • Dong, W.; Liao, S.; and Zhang, Z. Leveraging financial social media data for corporate fraud detection. Journal of Management Information Systems, 35, 2 (2018), 461–487.
  • Doumpos, M.; and Zopounidis, C. A multicriteria outranking modeling approach for credit rating. Decision Sciences, 42, 3 (2011), 721–742.
  • Ebrahimi, M.; Nunamaker Jr, J.F.; and Chen, H. Semi-supervised cyber threat identification in dark net markets: A transductive and deep learning approach. Journal of Management Information Systems, 37, 3 (2020), 694–722.
  • Elkan, C. The foundations of cost-sensitive learning. International Joint Conference on Artificial Intelligence. Seattle: Morgan Kaufmann Publishers Inc., 2001, pp. 973–978.
  • Fang, X.; Gao, Y.; and Hu, P.J. A prescriptive analytics method for cost reduction in clinical decision making. MIS Quarterly, 45, 1a (2019), 83–116.
  • Gao, M.; Hong, X.; Chen, S.; Harris, C.J.; and Khalaf, E. PDFOS: PDF estimation based over-sampling for imbalanced two-class problems. Neurocomputing, 138(2014), 248–259.
  • Gartner, D.; Kolisch, R.; Neill, D.B.; and Padman, R. Machine learning approaches for early DRG classification and resource allocation. INFORMS Journal on Computing, 27, 4 (2015), 718–734.
  • Haixiang, G.; Yijing, L.; Shang, J.; Mingyun, G.; Yuanyue, H.; and Bing, G. Learning from class-imbalanced data: Review of methods and applications. Expert Systems with Applications, 73(2017), 220–239.
  • Han, H.; Wang, W.-Y.; and Mao, B.-H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. International Conference on Intelligent Computing. Berlin: Springer, 2005, pp. 878–887.
  • He, H.; Bai, Y.; Garcia, E.A.; and Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence). Hong Kong: IEEE, 2008, pp. 1322–1328.
  • He, H.; and Garcia, E.A. Learning from imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 21, 9 (2009), 1263–1284.
  • Japkowicz, N.; and Stephen, S. The class imbalance problem: A systematic study. Intelligent data analysis, 6, 5 (2002), 429–449.
  • Jo, T.; and Japkowicz, N. Class imbalances versus small disjuncts. ACM Sigkdd Explorations Newsletter, 6, 1 (2004), 40–49.
  • Simsek, S.; Kursuncu, U.; Kibis, E.; AnisAbdellatif, M.; and Dag, A. A hybrid data mining approach for identifying the temporal effects of variables associated with breast cancer survival. Expert Systems with Application, 139 (2020).
  • Kitchens, B.; Dobolyi, D.; Li, J.; and Abbasi, A. Advanced customer analytics: Strategic value through integration of relationship-oriented big data. Journal of Management Information Systems, 35, 2 (2018), 540–574.
  • Krishna, V.; Shrestha, Y.R.; and von Krogh, G. Integrating Advanced Machine Learning in Information Systems Research: What can Automated Machine Learning and Transfer Learning offer? Available at SSRN 3855652, (2021).
  • Kumar, N.; Venugopal, D.; Qiu, L.; and Kumar, S. Detecting review manipulation on online platforms with hierarchical supervised learning. Journal of Management Information Systems, 35, 1 (2018), 350–380.
  • Lausen, J.; Clapham, B.; Siering, M.; and Gomber, P. Who Is the next “Wolf of Wall Street”? Detection of financial intermediary misconduct. Journal of the Association for Information Systems, 21, 5 (2020), 1153–1190.
  • Lee, G.M.; He, S.; Lee, J.; and Whinston, A.B. Matching mobile applications for cross-promotion. Information Systems Research, 31, 3 (2020), 865–891.
  • Lin, Y.-K.; Chen, H.; Brown, R.A.; Li, S.-H.; and Yang, H.-J. Healthcare predictive analytics for risk profiling in chronic care: A Bayesian multitask learning approach. MIS Quarterly, 41, 2 (2017), 473–495.
  • Liu, X.-Y.; Wu, J.; and Zhou, Z.-H. Exploratory undersampling for class-imbalance learning. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 39, 2 (2008), 539–550.
  • Liu, X.; Wang, G.A.; Fan, W.; and Zhang, Z. Finding useful solutions in online knowledge communities: A theory-driven design and multilevel analysis. Information Systems Research, 31, 3 (2020), 731–752.
  • Luca, M.; and Zervas, G. Fake it till you make it: Reputation, competition, and Yelp review fraud. Management Science, 62, 12 (2016), 3412–3427.
  • Miric, M.; Pagani, M.; and El Sawy, O.A. When and who do platform companies acquire? Understanding the role of acquisitions in the growth of platform companies. MIS Quarterly, 45, 4 (2021) 2159–2174.
  • Nasir, M.; South-Winter, C.; Ragothaman, S.; and Dag, A. A comparative data analytic approach to construct a risk trade-off for cardiac patients’ re-admissions. Industrial Management & Data Systems, 119, 1 (2019), 189–209.
  • Oztekin, A.; Kizilaslan, R.; Freund, S.; and Iseri, A. A data analytic approach to forecasting daily stock returns in an emerging market. European Journal of Operational Research, 253, 3 (2016), 697–710.
  • Piri, S.; Delen, D.; and Liu, T. A synthetic informative minority over-sampling (SIMO) algorithm leveraging support vector machine to enhance learning from imbalanced datasets. Decision Support Systems, 106(2018), 15–29.
  • Prati, R.C.; Batista, G.E.; and Monard, M.C. Class imbalances versus class overlapping: an analysis of a learning system behavior. Mexican International Conference on Artificial Intelligence. Berlin: Springer, 2004, pp. 312–321.
  • Nasir, M.; Simsek, S.; Cornelsen, E.; Ragothaman, S.; and Dag, A. Developing a decision support system to detect material weaknesses in internal control. Decision Support Systems,151 (2021).
  • Rayana, S.; and Akoglu, L. Collective opinion spam detection using active inference. Proceedings of the 2016 SIAM International Conference on Data Mining. Philadelphia: SIAM, 2016, pp. 630–638.
  • Sharif Vaghefi, M.; and Nazareth, D.L. Mining online social networks: Deriving user preferences through node embedding. Journal of the Association for Information Systems, 22, 6 (2021), 1625-1658.
  • Shmueli, G.; and Koppius, O.R. Predictive analytics in information systems research. MIS Quarterly, 35, 3 (2011), 553–572.
  • Siering, M.; Koch, J.-A.; and Deokar, A.V. Detecting fraudulent behavior on crowdfunding platforms: The role of linguistic and content-based cues in static and dynamic contexts. Journal of Management Information Systems, 33, 2 (2016), 421–455.
  • Simester, D.; Timoshenko, A.; and Zoumpoulis, S.I. Targeting prospective customers: Robustness of machine-learning methods to typical data challenges. Management Science, 66, 6 (2020), 2495–2522.
  • Sun Yin, H.H.; Langenheldt, K.; Harlev, M.; Mukkamala, R.R.; and Vatrapu, R. Regulating cryptocurrencies: A supervised machine learning approach to de-anonymizing the bitcoin blockchain. Journal of Management Information Systems, 36, 1 (2019), 37–73.
  • Wang, B.X.; and Japkowicz, N. Imbalanced data set learning with synthetic samples. In Proceedings of the IRIS Machine Learning Workshop. Totowa: Scientific Research, 2004.
  • Wang, G.; Chen, G.; Zhao, H.; Zhang, F.; Yang, S.; and Lu, T. Leveraging multisource heterogeneous data for financial risk prediction: A novel hybrid-strategy-based self-adaptive method. MIS Quarterly, 45, 4 (2021), 1949-1998.
  • Wang, Y.; Ramachandran, V.; and Liu Sheng, O.R. Do fit opinions matter? The impact of fit context on online product returns. Information Systems Research, 32, 1 (2021), 268–289.
  • Wasikowski, M.; and Chen, X.-w. Combating the small sample class imbalance problem using feature selection. IEEE Transactions on Knowledge and Data Engineering, 22, 10 (2009), 1388–1400.
  • Weiss, G.M. Mining with rarity: A unifying framework. ACM Sigkdd Explorations Newsletter, 6, 1 (2004), 7–19.
  • Zhang, H.; and Li, M. RWO-Sampling: A random walk over-sampling approach to imbalanced data classification. Information Fusion, 20(2014), 99–116.
  • Zhang, Z.; Hummel, J.T.; Nandhakumar, J.; and Waardenburg, L. Addressing the key challenges of developing machine learning AI systems for knowledge-intensive work. MIS Quarterly Executive, 19, 4 (2020), 1153–1190.
  • Zhou, L.; Burgoon, J.K.; Twitchell, D.P.; Qin, T.; and Nunamaker Jr, J.F. A comparison of classification methods for predicting deception in computer-mediated communication. Journal of Management Information Systems, 20, 4 (2004), 139–166.
  • Zhou, W.; and Duan, W. Do professional reviews affect online user choices through user reviews? An empirical study. Journal of Management Information Systems, 33, 1 (2016), 202–228.
  • Zhu, H.; Samtani, S.; Brown, R.; and Chen, H. A deep learning approach for recognizing activity of daily living (ADL) for senior care: Exploiting interaction dependency and temporal patterns. MIS Quarterly, 45, 2 (2021), 859–896.
  • Zhu, H.; Samtani, S.; Chen, H.; and Nunamaker Jr, J.F. Human identification for activities of daily living: A deep transfer learning approach. Journal of Management Information Systems, 37, 2 (2020), 457–483.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.