Publication Cover
Journal of Intelligent Transportation Systems
Technology, Planning, and Operations
Volume 24, 2020 - Issue 5
591
Views
9
CrossRef citations to date
0
Altmetric
Original Articles

Multi-view crowd congestion monitoring system based on an ensemble of convolutional neural network classifiers

, , &
Pages 437-448 | Received 25 Mar 2019, Accepted 21 Mar 2020, Published online: 13 Apr 2020

References

  • Atrey, P. K., Hossain, M. A., El Saddik, A., & Kankanhalli, M. S. (2010). Multimodal fusion for multimedia analysis: a survey. Multimedia Systems, 16(6), 345–379. doi:10.1007/s00530-010-0182-0
  • Avidan, S. (2007). Ensemble tracking. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(2), 261–271. doi:10.1109/TPAMI.2007.35
  • Azadbakht, M., Fraser, C. S., & Khoshelham, K. (2018). Synergy of sampling techniques and ensemble classifiers for classification of urban environments using full-waveform lidar data. International Journal of Applied Earth Observation and Geoinformation, 73, 277–291. doi:10.1016/j.jag.2018.06.009
  • Breiman, L. (1996). Bagging predictors. Machine Learning, 24(2), 123–140. doi:10.1007/BF00058655
  • Chan, A. B., Liang, Z.-S. J., & Vasconcelos, N. (2008). Privacy preserving crowd monitoring: Counting people without people models or tracking. In 2008 IEEE Conference on Computer Vision and Pattern Recognition (pp. 1–7). Anchorage, Alaska: IEEE.
  • Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297. doi:10.1007/BF00994018
  • Dittrich, F., Oliveira, L. E. d., Britto, A. S., Jr., & Koerich, A. L. (2017). People counting in crowded and outdoor scenes using a hybrid multi-camera approach. arXiv preprint arXiv:1704.00326
  • Ferryman, J., & Shahrokni, A. (2009). Pets2009: Dataset and challenge. In 2009 Twelfth IEEE International Workshop on Performance Evaluation of Tracking and Surveillance (pp. 1–6). Snowbird, Utah: IEEE.
  • Freund, Y., Schapire, R., & Abe, N. (1999). A short introduction to boosting. Journal-Japanese Society for Artificial Intelligence, 14 (771-780), 1612.
  • Gardziński, P., Kowalak, K., Kamiński, Ł., & Maćkowiak, S. (2015). Crowd density estimation based on voxel model in multi-view surveillance systems. In 2015 International Conference on Systems, Signals and Image Processing (IWSSIP) (pp. 216–219). London, United Kingdom: IEEE. doi:10.1109/IWSSIP.2015.7314215
  • Haghani, M., & Sarvi, M. (2016). Pedestrian crowd tactical-level decision making during emergency evacuations. Journal of Advanced Transportation, 50(8), 1870–1895.
  • Hall, D. L., & Llinas, J. (1997). An introduction to multisensor data fusion. Proceedings of the IEEE, 85(1), 6–23. doi:10.1109/5.554205
  • He, K., Gkioxari, G., Dollár, P., & Girshick, R. (2017). Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision (pp. 2961–2969). Venice, Italy: IEEE.
  • He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 770–778). Las Vegas, Nevada: IEEE.
  • Huang, G., Liu, Z., Maaten, L. v d., & Weinberger, K. Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, Hawaii: IEEE.
  • Idrees, H., Saleemi, I., Seibert, C., & Shah, M. (2013). Multi-source multi-scale counting in extremely dense crowd images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 2547–2554). Portland, Oregon: IEEE.
  • Jacobs, R. A., Jordan, M. I., Nowlan, S. J., & Hinton, G. E. (1991). Adaptive mixtures of local experts. Neural Computation, 3(1), 79–87. doi:10.1162/neco.1991.3.1.79
  • Johansson, A., Helbing, D., Al-Abideen, H. Z., & Al-Bosta, S. (2008). From crowd dynamics to crowd safety: a video-based analysis. Advances in Complex Systems, 11(04), 497–527. doi:10.1142/S0219525908001854
  • Junior, J. C. S. J., Musse, S. R., & Jung, C. R. (2010). Crowd analysis using computer vision techniques. IEEE Signal Processing Magazine, 27(5), 66–77. doi:10.1109/MSP.2010.937394
  • Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2012). Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems (pp. 1097–1105).
  • Li, B., Yao, Q., & Wang, K. (2012). A review on vision-based pedestrian detection in intelligent transportation systems. In Proceedings of 2012 9th IEEE International Conference on Networking, Sensing and Control (pp. 393–398). Beijing, China: IEEE.
  • Li, J., Huang, L., & Liu, C. (2012). People counting across multiple cameras for intelligent video surveillance. In 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance (pp. 178–183). Beijing, China: IEEE.
  • Li, Y., Chen, Y., Rajabifard, A., Khoshelham, K., & Aleksandrov, M. (2018). Estimating building age from google street view images using deep learning (short paper). In 10th International Conference on Geographic Information Science (GIScience 2018), Melbourne, Australia: Schloss Dagstuhl-Leibniz-Zentrum fuer Informatik.
  • Li, Y., Khoshelham, K., Sarvi, M., & Haghani, M. (2019). Direct generation of level of service maps from images using convolutional and long short-term memory networks. Journal of Intelligent Transportation Systems, 23(3), 300–309. doi:10.1080/15472450.2018.1563865
  • Li, Y., Sarvi, M., Khoshelham, K., & Haghani, M. (2018). Real-Time Level-of-Service Maps Generation from CCTV Videos (Tech. Rep.). Washington, DC: Transportation Research Board 97th Annual Meeting.
  • Li, Y., Sarvi, M., Khoshelham, K., Haghani, M., & Tian, Y. (2019). Multi-view crowd congestion map generation based on ensemble learning (Tech. Rep.). Washington, DC: Transportation Research Board 98th Annual Meeting.
  • Liu, J., Collins, R. T., & Liu, Y. (2011). Surveillance camera autocalibration based on pedestrian height distributions. In British Machine Vision Conference (BMVC) (Vol. 2). Dundee, UK: BMVC Press.
  • Lo, B. P. L., & Velastin, S. (2001). Automatic congestion detection system for underground platforms. In Proceedings of 2001 International Symposium on Intelligent Multimedia, Video and Speech Processing (pp. 158–161). Hong Kong: IEEE.
  • Ma, H., Zeng, C., & Ling, C. X. (2012). A reliable people counting system via multiple cameras. ACM Transactions on Intelligent Systems and Technology, 3(2), 1–22. doi:10.1145/2089094.2089107
  • Maddalena, L., Petrosino, A., & Russo, F. (2014). People counting by learning their appearance in a multi-view camera environment. Pattern Recognition Letters, 36, 125–134. doi:10.1016/j.patrec.2013.10.006
  • Mousse, M. A., Motamed, C., & Ezin, E. C. (2017). People counting via multiple views using a fast information fusion approach. Multimedia Tools and Applications, 76(5), 6801–6819. doi:10.1007/s11042-016-3352-z
  • Opitz, D., & Maclin, R. (1999). Popular ensemble methods: An empirical study. Journal of Artificial Intelligence Research, 11, 169–198. doi:10.1613/jair.614
  • Paisitkriangkrai, S., Shen, C., & Hengel, A. v d. (2016). Pedestrian detection with spatially pooled features and structured ensemble learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(6), 1243–1257. doi:10.1109/TPAMI.2015.2474388
  • Polikar, R. (2006). Ensemble based systems in decision making. IEEE Circuits and Systems Magazine, 6(3), 21–45. doi:10.1109/MCAS.2006.1688199
  • Rokach, L. (2010). Ensemble-based classifiers. Artificial Intelligence Review, 33(1-2), 1–39. doi:10.1007/s10462-009-9124-7
  • Sagar, S. A., & Holambe, A. (2017). A noval system architecture for multi object tracking using multiple overlapping and non-overlapping cameras. International Journal of Biotechnology and Biochemistry, 13(3), 275–283.
  • Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61, 85–117. doi:10.1016/j.neunet.2014.09.003
  • Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
  • Snoek, C. G., Worring, M., & Smeulders, A. W. (2005). Early versus late fusion in semantic video analysis. In Proceedings of the 13th Annual ACM International Conference on Multimedia (pp. 399–402). Singapore: ACM. doi:10.1145/1101149.1101236
  • Tang, N. C., Lin, Y.-Y., Weng, M.-F., & Liao, H.-Y M. (2015). Cross-camera knowledge transfer for multiview people counting. IEEE Transactions on Image Processing, 24(1), 80–93. doi:10.1109/TIP.2014.2363445
  • Tian, Y., Luo, P., Wang, X., & Tang, X. (2015). Deep learning strong parts for pedestrian detection. In Proceedings of the IEEE International Conference on Computer Vision (pp. 1904–1912). Santiago, Chile: IEEE.
  • Wang, X. (2013). Intelligent multi-camera video surveillance: A review. Pattern Recognition Letters, 34(1), 3–19. doi:10.1016/j.patrec.2012.07.005
  • Wolpert, D. H. (1992). Stacked generalization. Neural Networks, 5(2), 241–259. doi:10.1016/S0893-6080(05)80023-1
  • Zhan, B., Monekosso, D. N., Remagnino, P., Velastin, S. A., & Xu, L.-Q. (2008). Crowd analysis: a survey. Machine Vision and Applications, 19(5-6), 345–357. doi:10.1007/s00138-008-0132-4
  • Zhang, C., & Ma, Y. (2012). Ensemble machine learning: methods and applications. Springer.

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.