Abstract
The paper is devoted to the problem of estimating the number of people visible in a camera. It uses as features the ratio of foreground pixels in each cell of a rectangular grid. Using the above features and data mining techniques allowed reaching accuracy up to 85% for exact match and up to 95% for plus–minus one estimates for an indoor surveillance environment. Applying median filters to the sequence of estimation results increased the accuracy up to 91% for exact match. The architecture of a real-time people counting estimator is suggested. The results of analysis of experimental data are provided and discussed.
Notes
§ As it will become clear in Section 6, training the classifiers with shuffled data provides us with a baseline for accuracy that leaves room for improvement, e.g. if the classifier can take advantage of the chronological order of the images.