283
Views
2
CrossRef citations to date
0
Altmetric
Original Articles

DISCRETE OPTIMIZATION ALGORITHMS IN REAL-TIME VISUAL TRACKING

, , , &
Pages 805-827 | Published online: 16 Oct 2009

Abstract

In this work we introduce a novel formulation of the association problem in visual tracking systems as a discrete optimization problem. The full data association problem is formulated as a search for the best tracking configuration to match hypothesis. We have implemented three local search algorithms: Hill Climbing, Simulated Annealing, and Tabu Search algorithms. These algorithms are guided by heuristic evaluation function which takes into account structural and specific information such as distance and shape. We have also introduced a novel technique in order to achieve incrementality in discrete optimization algorithms searching on an indirect space. We will show how this incrementality produces efficient and fast results, especially so when real-time is a hard constraint. The results obtained with the three discrete optimization algorithms are compared with other well-known and powerful computer vision tracking algorithms. We will prove the effectiveness and robustness of the discrete optimization algorithms in five different real-life scenarios.

Visual tracking of moving objects in image sequences is an important issue for research and application (Patricio, Garcia, Berlanga, and Molina, Citation2007). For instance, in multimedia content-based retrieval applications, video data has been considered as the most complex form of multimedia data (Shyu, Chen, Sun, and Yu, Citation2007). To efficiently manage large visual information is challenging because it requires the analysis and retrieval of high-level semantic descriptions like the moving object trajectories. Another area of interest are video surveillance systems (Castanedo, Patricio, Garcia, and Molina, Citation2006). A surveillance system covers many different tasks such as tracking, identifying faces or objects, or analyzing activities and situations. Tracking is a critical task for higher level applications like trajectory reasoning or motion analysis (Jin, Qian, and Rajko, Citation2006; Pérez et al., Citation2007). Tracking multiple people in video sequences is also important for more applications such as creating intelligent environments (Krumm et al., Citation2000), or for security reasons.

Therefore, a robust extraction and real-time spatio-temporal tracking process of visual cues is indeed one of the keys to success in a video understanding task. A generic tracking process tracks all the targets moving within a field of view along a video sequence. The coherent information along the time for a particular moving object is called track. Fuentes and Velastin (Citation2006), Patricio et al. (Citation2007), and Yilmaz, Javed, and Shah (Citation2006) usually divide the tracking process into two subtasks. The first subtask is the Foreground Detection, where the aim is to extract the pixels belonging to moving object for every frame. A connected set of detected pixels are designated as a blob. The algorithms of this subtask are normally based on background subtraction (Pérez, Patricio, García, and Molina, Citation2006), although some approaches combine this method with a temporal difference. These methods are based on extracting motion information by thresholding the differences between the current image and a reference image (background) or the previous image, respectively. The second subtask is carried out by those typically called Tracking Algorithms which establish a correspondence between the image structures extracted from a series of consecutive frames to estimate the attributes describing each object behavior. Typically, the tracking process involves a matching (or data association) of image features for nonrigid objects such as people, or correspondence models used widely with rigid objects like cars.

In this work, we introduce a novel family of tracking algorithms based on discrete optimization techniques. Tracking algorithms are formulated as an assignment problem, where for every frame, detected blobs are assigned to active tracks. The full data association problem in these tracking algorithms is formulated as a search for the best tracking configuration to match a hypothesis. Our aim is to develop a system that can detect, analyze, and act upon the recognized tracks in real-time so that a hard constraint in the available computation time is set to tracking algorithms.

The set of chosen algorithms to explore data association as a discrete optimization consists of Hill Climbing (HC), Simulated Annealing (SA), and Tabu Search. Hill Climbing (Russell and Norvig, Citation2003) is one of the simplest local search methods and we implemented it for the sake of comparison with the other two more complex (and efficient) methods. Simulated Annealing (Kirkpatrick, Cd, and Mp, Citation1983) is a popular metaheuristic based on the Metropolis Heuristic (Metropolis et al., Citation1953). It is known to have a property that ensures convergence to the optimum, although this does not guarantee that the optimum will be reached in an acceptable amount of time. The last algorithm is Tabu Search (Glover and Laguna, Citation1993) and it has proved very efficient in several different optimization problems. A key issue to implement in these discrete optimization techniques is that of incrementality. They performs a local move that is the best possible one at each iteration; thus, it is of utmost importance that the change in the fitness function yielded by every move can be evaluated in constant time. Therefore, one of the main contributions of this work is to have developed a method to achieve incrementality when the search space is not the same as the space in which we have to calculate the fitness function.

RELATED WORK

As previously indicated, the success of tracking algorithms depends on two strictly coupled subtasks: the data association method used for assigning observations to tracks and the model selected to estimate the movement of an object.

Data association can be formulated as the correspondence of detected objects represented by tracks across frames. Only in the case that a single target appears, with no false alarms, there is no association problem. However, the tracking system must handle complex motions and interactions between objects such as passing, occlusions, and stop-and-go motion. Each real object may generate multiple observations and the problem of searching for an optimal or near-optimal hypothesis requires a deeper analysis.

Methods applicable to this problem can be divided into heuristic and statistical methods. Probabilistic methods consider the object measurement and the uncertainties to establish their correspondence with a maximum likelihood, while heuristic methods use qualitative motion heuristics to constrain the problem and minimize a cost function that qualifies every alternative solution.

Probabilistic Methods

The Kalman filter (Bar-Shalom, Citation1995) is the more extended probabilistic approach within the vision community for video-tracking processes. It states that measurements obtained from video cameras invariably contain noise. Moreover, the object motions can undergo random perturbations, for instance, maneuvering vehicles. The Kalman filter performs the tracking taking into account the measurement and model uncertainties during the object state estimation. It uses the state space approach to model several mobile object properties such as position, velocity, and acceleration. In Broida and Chellappa (Citation1986), authors use the Kalman filter to track a point in noisy images. Beymer and Konolige (Citation1999) use a Kalman filter for predicting the object's position and speed in a 3D space. One limitation of the Kalman filter is the assumption that the state variables are normally distributed and state dynamics follows a direct linear model. Suboptimal Bayesian filters such as interacting multiple models (IMM) (Yeddanapudi, Bar-Shalom, and Pattipati, Citation1997) for realistic maneuvering situations have been proposed, and particle filters (PF) (Arulampalam, Maskell, Gordon, and Clapp, Citation2002; Djuric et al., Citation2003; Kitagawa, Citation1987) in non-Gaussian conditions.

The extension of Kalman filters to the multi-object situation requires from data association. A number of probabilistic data association techniques have been developed for this purpose, following a Bayesian approach of maximizing the likelihood of associated data in consecutive frames. The problem can also be extended across multiple scans (frames) to search the best hypothesis for the associations, using a tree of open hypotheses to delay assignment decisions until evidence is enough to take a decision. Multi-scan, multiple-target algorithms commonly used include multi-hypothesis trackers (MHT) and joint probabilistic data association (JPDA) (Blackman and Popoli, Citation1999). The probabilistic multiple hypotheses tracking algorithm (Ruan and Willett, Citation2004), and also artificial neural networks (Shams, Citation1996) have been developed in order to reduce the computational complexity of JPDA and MHT.

These probabilistic methods have received wide attention by the computer vision community. For instance, the JPDA filter has been applied to vision 3-D reconstruction (Chang and Aggarwal, Citation1991). Cox and Hingorani (Citation1996) proposes an implementation of an MHT algorithm with visual sensors (although objects are simplified to points), not considering the problem of data complexity.

It is generally necessary to build an extended function to evaluate the likelihood of assignment hypotheses (relation between estimated attributes and associated sequences), considering structural information such as size, shape, and orientation as well as position (centroid). The required computational burden may be too heavy with large numbers of targets and observations.

Application of Heuristic Search Algorithms

On the other hand, heuristic algorithms define a cost of associating each object in frame t − 1 to a single object in frame t using a set of motion constraints. Minimization of the correspondence cost is formulated as a combinatorial optimization problem. Veenman, Reinders, and Backer (Citation2001) uses the common motion constraint for correspondence of detected tracks across frames. The algorithm implements a cost function that is minimized in two consecutive frames. This method assumes that the number of objects remains the same throughout the sequence, and thus, there are no new or deleted objects.

Genetic algorithms (GAs) have been previously applied to the data association problem in the radar context by Angus et al. (Citation1993) and Carrier, Litva, Leung, and Lo (Citation1996) (on a single scan scenario), and by Hillis (Citation1997) to deal with the multi-scan data association problem. In Patricio et al. (Citation2007), in order to achieve a fast video process, the performance of a family of very efficient evolutionary computation algorithms is analyzed. This novel approach is called estimation of distribution algorithms (EDAs) Larrañaga and Lozano (Citation2001).

Data association problems can be formulated in terms of discrete optimization methods. Turkmen, Guney, and Karaboga (Citation2004) applied the Tabu Search algorithm to the assignment of a sequence of radar measurements to a track. However, from the point of view of the information process, video analysis attempt to deal with more complex and accurate information than radar analysis. In this work, authors present a novel method to achieve incremental search when it is impracticable to evaluate each assignment under real-time constraints.

FORMULATION OF THE ASSOCIATION PROCESS AS A SEARCH PROBLEM

We can consider the association problem as a sequential decision process that takes into account, for each frame, current detections and the result of previous assignments. Let us define the assignment matrix A i, j [k] at frame k:

In our implementation, we codify the assignment matrix A[k] in a N × (M + 1) matrix (see Figure ), where N is the number of extracted blobs and M is the number of tracks in the scene. We have added an extra column to account for the special case where blobs are assigned to the “null track” at the current time. These blobs are considered to be residual noise after the detection process or be the presence of new targets dynamically appearing in the scene.

FIGURE 1 Assignment matrix representation. Note that its dimension is N × (M + 1). First column is added, for special case where a blob is not assigned to none track (“null track”).

FIGURE 1 Assignment matrix representation. Note that its dimension is N × (M + 1). First column is added, for special case where a blob is not assigned to none track (“null track”).

Figure illustrates an optimal assignment in a hypothetical situation with three tracks and 13 blobs. In this hypothetical situation, there are three moving objects. Objects labeled as Track 1 and Track 2 overlap and, therefore, share the blobs labeled as b7, b8, and b9.

FIGURE 2 Optimal assignment in a hypothetical situation.

FIGURE 2 Optimal assignment in a hypothetical situation.

Therefore, a state is codified by an assignment matrix A[k] and the problem is to search the best association between blobs and tracks. The number of states, n, is determined by

Assignment Problem

Let us now formalize the specific problem we will be dealing with. Given a set of blobs with information about their width, length, and centroid (measured in a 2-D grid of pixels) and a set of tracks with the same predicted information, assign blobs to tracks in such a way that the distance to the prediction for each track is minimized. Additionally, we can access a hypothesis matrix that discards some blob-to-track combinations, thus reducing the search space.

The association problem is thus defined as:

A set of vars X where we have a variable x[b, t] for each blob and track. Each of these vars are an element of the assignment matrix A i, j [k] of Equation (Equation1).

The domain D x for all the variables is Boolean. It corresponds to whether a blob is assigned to belong to a track or not. This domain is further restricted by the hypothesis matrix which fixes the domain of several x[b, t] variables to false or 0 (blobs that cannot belong to certain tracks).

A cost function f that represents the distance to a prediction. In the next section, we will describe more depth in the components of this cost function.

It is worth mentioning that this method is independent of the prediction technique; the most usual prediction algorithm is the Kalman filter. We address the readers to Patricio et al. (Citation2007) for further details on this issue.

Heuristic to Evaluate Assignments

An extended distance is used as an evaluation or cost function for groups of detected blobs assigned to tracks according to matrix A (A represents each hypothesis to be evaluated). The heuristic is aimed at providing a measure of probability density of assigned observations to tracks. This likelihood function considers several types of terms and their probabilistic characterization: the separation between tracks and centroids of groups of blobs, the “similarity” between track-smoothed target attributes and those extracted from blob groups, and the events related with erasing existing tracks and creating new ones. The final objective is to achieve a good trade-off between capability to reconnect image regions, keeping a single track per target, and at the same time avoid the missassignment of blobs coming from different objects or from extraneous sources.

The extended distance allows the evaluation of a certain hypothesis for grouping blobs in sets and assigning them to tracks. The term considering centroid residual, typically used in other approaches, is enriched with terms for attributes to take into account the available structural characteristics of targets which can be extracted from data. There are also terms considering that hypotheses may label some blobs as false alarms or may leave confirmed tracks with no updating blobs. Once we obtain an hypothesis of assignment, we need to evaluate it.

Let Gb(j) be the group of blobs assigned to the track j, corresponding to nonzero cells in the jth column of A i, j [k]. The cost function f for a frame k is defined as:

d Centroid(j): it is the normalized residual between jth track prediction and centroid of the assigned group of blobs Gb(j).

d Attributes(j): it is the normalized residual between track attributes and those extracted from the group Gb(j). Its value is given, assuming Gaussian distribution and attribute independence.

d PD(j): assesses the cost of not updating a confirmed track for those hypotheses in which no blob is assigned to jth track. It considers the probability of updating each track.

d PFA(i, j): assesses the cost of labeling a blob as a false alarm, also assuming a certain probability of false alarm (PFA).

All the discrete algorithms presented here perform a local search for the best possible move at each iteration (it can usually take thousands of iterations). This is a nonviable task under real-time constraints as visual tracking applications. As we will show in the next section we introduce the concept of incrementality in this domain. The cost function is not calculated directly, but it is the result of modifying the previous value with the effect of possible moves. We first need to assign blobs to tracks, then we have to calculate the shape of every track, and finally, we must find the distance of the produced tracks to their prediction. A local move such as changing x[b, t] would not require the evaluation of the function cost, but the increment/decrement of the previous value before this change. The cost function is an aggregation of the distances of all the tracks to their predictions.

DISCRETE OPTIMIZATION ALGORITHMS APPLIED TO THE ASSIGNMENT PROBLEM

The algorithms chosen to solve the association problem are HC, SA, and Tabu Search. In this section, we first describe common issues related to the implementation of these algorithms, namely, modeling, incrementality issues, the initialization phase, and the neighborhood. Then we present the sketch of each algorithm, respectively.

The Modeling

Modeling is a key feature of any algorithm but specially so in the local search framework. Different representations of the same problem define different search spaces as well. In this article we use a modeling that associates a variable x[b, t] for each pair of blob b and track t. The value of x[b, t] in a given solution, denoted as σ(x[b, t]), will be 1 if blob b is considered to belong to track t and 0 in the contrary.

This is to allow the possibility of a blob to belong to two different tracks. At this level of information, when two tracks collide, we are unable to decide whether the blob belongs to one track or another. Also, forcing a blob to belong to one single track in these situations, could result in the disappearance of one of the objects in the collision. This is obviously not desirable.

In our specific problem there are no constraints, all combinations are allowed, and thus, we only need to consider the optimization function. As previously stated, the optimization criteria is to minimize the distance to a prediction. However, the prediction is defined in terms of track characteristics. In a straightforward approach we would need to:

  1. Determine which blobs belong to which tracks from the set of x[b, t] variables,

  2. Calculate the characteristics of each potential track enclosing the assigned blobs,

  3. Calculate the distance of each potential track to its prediction,

  4. Add all the distances to compute total distance to prediction.

However, in order to implement a local search algorithm for this problem we need a procedure to calculate the cost of a solution (distance to prediction) in constant time, i.e., achieve incrementality.

Achieving Incrementality

An assignment σ to all the variables has an associated cost f(σ), which corresponds to the distance to its prediction. The problem consists of finding the solution σ that minimizes this distance. This is computed in terms of shape and position. We have just seen, this is a many-steps process which is not suitable for a local search algorithm. However, we can surpass this obstacle by introducing several additional structures. It is not in the scope of this article to fully describe these auxiliary data structures and their updating mechanisms. We will only remark that this issue relies on the fact that each blob adds a certain area to the track it belongs to, and thus, we can characterize this information by maintaining a list of all these areas. We define this added space of each blob to a given track as four different coordinates: north, south, east, and west.

North: This represents how many pixels blob b adds to track t from the centroid toward a virtual north coordinate.

South: This represents how many pixels blob b adds to track t from the centroid toward a virtual south coordinate.

East: This represents how many pixels blob b adds to track t from the centroid toward a virtual east coordinate.

West: This represents how many pixels blob b adds to track t from the centroid toward a virtual west coordinate.

This information corresponds to how many pixels would the track lose if the blob was missing in each coordinate's direction, respectively. Figure depicts this information for a simple situation.

FIGURE 3 Information necessary for each blob in order to achieve incrementality.

FIGURE 3 Information necessary for each blob in order to achieve incrementality.

The Initialization Phase

The initialization phase is known to have a significant impact on the efficiency of a local search algorithm (see Dotu and Hentenryck, Citation2005; and Harvey and Winterer, Citation2005). In this case, we implemented two different initialization methods:

Randomly assign 0 or 1 to the x[b, t] variables. This is equivalent to randomly assigning blobs to tracks. Whenever a blob is not allowed to belong to a certain track (given by the hypothesis matrix previously mentioned), the corresponding x[b, t] variable is set to 0.

Copy the hypothesis matrix, i.e., set to 1 all the allowed blob-to-track combinations.

The former method is used when the problem presents a low blob-track ratio, i.e., when there will not be many blobs belonging to each track. On the other hand, the second method will be more efficient in scenarios where there is a high blob-track ratio, i.e., when each track is composed of many different blobs.

The Neighborhood

The neighborhood implemented consists of flipping the value of a x[b, t] variable. Notice that we can always flip a variable when it is set to 1. However, we can only flip a variable set to 0 if the corresponding blob-to-track combination is allowed by the hypothesis matrix.

The set of swaps is thus defined as

where we consider h[t] to be the set of blobs allowed to belong to track t.

Hill Climbing

Hill Climbing is one of the simplest local search algorithms and it is implemented here for comparison purposes. Our implementation of the Hill Climbing algorithm is depicted in Figure . Lines 2 and 3 perform the initializations. In particular, the initial solution is generated randomly (or copied from the hypothesis) in line 2, while line 3 initializes the iteration counter k.

FIGURE 4 Hill Climbing for the blobs-to-tracks assignment problem.

FIGURE 4 Hill Climbing for the blobs-to-tracks assignment problem.

The core of the algorithm is given in lines 4–8. They repeatedly select a local move (at random) to perform until a maximum number of iterations is reached. The key idea is to randomly select a move in the neighborhood  and carry it out only if it improves the current solution. Observe that the expression f(σ[move]) represents the value of the optimization function obtained after the move. Line 9 simply updates the iteration counter.

Simulated Annealing

This algorithm is inspired by the annealing in metallurgy, a technique involving heating and controlled cooling of a material to increase the size of its crystals and reduce their defects. By analogy with this physical process, each step of the algorithm attempts to perform a random move; if the new solution is better, it is chosen, whereas if it is worse, it can still be chosen with a probability that depends on the difference between the corresponding function values and on a global parameter called temperature, which is gradually decreased during the process. The dependency is such that the current solution changes almost randomly when the temperature is large, but increasingly “downhill” as the temperature goes to zero. The allowance for “uphill” moves saves the method from becoming stuck at local minima.

Figure represents the sketch of a Simulated Annealing algorithm with a restarting component. Notice that in lines 9 and 10 the move is performed if the fitness function improves the current solution. Otherwise it is also performed (lines 13 and 14) if a random number r is smaller than a probability that depends on the fitness differences and the temperature (which is represented by τ). After a max Stable amount of iterations, the temperature is updated and if it is below a reheating threshold φ the current solution is restarted and the temperature is reset (lines 16 to 19).

FIGURE 5 Simulated annealing for blobs-to-tracks assignment problem.

FIGURE 5 Simulated annealing for blobs-to-tracks assignment problem.

The Tabu Search Algorithm

One of the main differences in this algorithm with respect to other local search procedures is a list of tabu (forbidden) moves. Let us first introduce this issue here.

The Tabu Component

The tabu component consists of an array tabu that maintains a triplet 〈b, t, i〉, where b is the blob, t is the track and i represents the first iteration where the assignment of blob b to track t can be flipped again.

The tabu tenure, i.e., the time a pair of blob and track (b, t) stays in the list, is dynamic: It is randomly generated in the interval [4, 100]. In other words, each time a blob in a track (b, t) is flipped, a random value τ is drawn uniformly from the interval [4, 100] and the pair (b, t) is tabu for the next τ iterations. At iteration k, flipping the value of a pair (b, t) is tabu, which is denoted by

if the Boolean expression
holds. As a consequence, for the complete assignment σ and iteration k, the neighborhood consists of the set of moves  t (σ, k) defined as

Tabu Search

We are now ready to present our tabu search algorithm. The algorithm, depicted in Figure , is a Tabu Search with a restarting component. Lines 2–5 perform the initializations. In particular, the initial solution is generated randomly (or copied from the hypothesis) in line 2, while lines 4 and 5 initialize the iteration counter k, and the stability counter s. The best solution found so far σ∗ is initialized to σ.

FIGURE 6 Tabu Search for blobs-to-tracks assignment problem.

FIGURE 6 Tabu Search for blobs-to-tracks assignment problem.

The core of the algorithm is given in lines 6–19. They repeatedly select a local move to perform until a maximum number of iterations is reached. The key idea is to select the best move in the neighborhood , i.e., the one that minimizes the optimization function. If there are several moves minimizing the optimization function one is randomly chosen. Observe that the expression f(σ[move]) represents the value of the optimization function obtained after the move. The corresponding tabu list is updated in line 9, and the new solution is computed in line 10, where we consider σ[move] to be the effect of performing the move in the current solution. Lines 11–13 update the best solution, while lines 14–16 specify the restarting component.

The restarting component simply reinitializes the search from a random (or copied) configuration whenever the best solution found so far has not been improved upon for maxStable iterations. Note that the stability counter s is incremented in line 18 and reset to zero in line 13 (when a new best solution is found) and in line 16 (when the search is restarted).

Notice that choosing a copied initialization will not necessarily yield the same local moves after each restart. This is due to the random component included both in the tabu list and in the selection of the best move.

ANALYZING REAL-WORLD SCENES

The system described here has been implemented in C+ + under Microsoft Visual Studio, using the “visual surveillance” algorithms incorporated in the Open Source Computer Vision Library (OpenCV, Citation). The system was tested on two different computers. The first of them is an AMD Athlon 64 Processor 3200 + with 2.01 GHz and 1 Gb of RAM. The second is a high performance computer DELL PE1950 Quad-Core Xeon E5310 1.6 GHz/2x4MB 1066FSB. In order to evaluate the different computational load of the algorithms, we have run the SQUASH, BOAT, and HANDBALL scenarios on the first environment, and the PETS2002 and CORRIDOR scenarios on the high performance environment.

The OpenCV “visual surveillance” algorithms use the pipeline structure depicted in Figure . The input data for the pipeline is the image of current frame and the output data is the information about track position and size. The “FG/BG Detection” module performs foreground/background segmentation for each pixel; the “Blob Entering Detection” module uses the result of “FG/BG Detection” module to detect new blob objects which entered in the scene on each frame; and the “Blob Tracking” module is initialized by the “Blob Entering Detection” results and it tracks each new entered blob. This pipeline structure allows us to exchange different algorithms for “Blob Tracking” module, and to maintain the same execution conditions, i.e., using the same “FG/BG Detection” and “Blob Entering Detection” modules.

FIGURE 7 The OpenCV visual surveillance algorithms.

FIGURE 7 The OpenCV visual surveillance algorithms.

In Table we show the five “Blob Tracking” modules implemented in this work. For the purpose of evaluating these new approaches with other well-known algorithms, we also present the results of the particle filtering – mean shift algorithm (MSPF) and a simple connected component tracking (CC) from the OpenCV library. We have fixed the same “FG/BG Detection” module for every test that we have carried out. The selected module was the OpenCV implementation of the adaptive background mixture models for real-time tracking (Stauffer and Grimson, Citation1999).

TABLE 1 Tracking Algorithms Used in the Evaluation Section

The evaluation of our algorithm is tested against different scenes (see Figure ). These datasets are from different sources: the publicly available CVBASE and PETS2002 datasets (Machine Vision Group, Citation2001), a digital-video camcorder, and an analog camera. The datasets are quite diverse in their technical aspects, and the quality of the image sequences differs radically from poor to excellent along with their pixel resolutions.

FIGURE 8 Scenes of the datasets used in the experimentation: (a) SQUASH; (b) BOAT; (c) HANDBALL; (d) CORRIDOR; and (e) PETS2002. The original sequence is depicted in the first row and their respective foreground images are in the second row.

FIGURE 8 Scenes of the datasets used in the experimentation: (a) SQUASH; (b) BOAT; (c) HANDBALL; (d) CORRIDOR; and (e) PETS2002. The original sequence is depicted in the first row and their respective foreground images are in the second row.

Datasets

Maritime Scenes (BOAT)

The videos were recorded in an outdoor scenario using a DV video recorder. The videos have a high quality, with a resolution of 720 × 480 pixels with 15 fps. The videos feature several boats in an outdoor environment lit by the sun. The videos are very interesting due to the complex segmentation of maritime scenes. The sea has continuous movement, which contributes to the creation of a great amount of noisy blobs.

Squash Tournament (SQUASH)

These videos are from the CVBASE dataset and were filmed during a tournament of recreative players. The videos were recorded in S-VHS video recorder, using a birds eye view with a wide angle lens. The videos were digitized to digital video format with 25 fps, with a resolution of 384 × 576 and M-JPEG compression. The selected video is a zenithal record of two players playing squash. They are in close proximity to each other, they wear a similar dress, they carry slightly faster movements, and there are constant crossings between players, which make for a challenging sequence.

Handball Match (HANDBALL)

This video is also from the CVBASE dataset and it has the same characteristics than the squash tournament sequences. Players do not leave the court during the match, there are constant crossings among players with occlusions and disocclusions, and the number of objects (players) per track is quite high. These conditions also make for a challenging sequence.

PETS2002

The performance Evaluation of Tracking and Surveillance (PETS) event is an annual open workshop, where a surveillance problem is suggested to be resolved. Every year the organization makes public a dataset related to different scenarios. The task for the PETS2002 workshop was to track the motion of pedestrians in indoor video sequences of a shopping mall. PETS2002 proposes a challenging problem since the people are close to the camera within a wide area and are moving beyond a glass window which partially reflects objects behind the camera.

CORRIDOR

CORRIDOR videos are the dataset recorded in our installations. They were filmed using a SONY SSC-DC598P camera and a Matrox Morphis frame grabber board. These videos were recorded with a resolution of 768 × 576 pixels with 25 fps. There is no noticeable change in the ambient lighting during the indoor scenes; however the lighting source frequency (conventional fluorescent lamps) produces illumination changes between adjacent frames. This fact along with the fact that people being observed are walking on a reflective surface, makes for a challenging scene.

Evaluation Metrics

Tracking methods can be evaluated on the basis of whether they generate correct mobile object trajectories. A qualitative comparison of tracking algorithms can be based on the ability to maintain the number of targets during the sequence video and to provide an optimal solution to the cost function minimization problem used for establishing correspondence (Yilmaz et al., Citation2006). Therefore, the metrics that allow us to provide formal comparisons among the algorithms tested are as follows:

  • Tracks per frame (TPF). It is used to compare the behavior of the tracking algorithms in terms of continuity of the tracks. An optimal tracker would have to obtain the value referred to as “ideal.” When the obtained value is below to this “ideal” value, it means that the tracker lost the continuity of the tracks (merge effect), and conversely, when it is over “ideal” value, the tracker had an excess of tracks (split effect). The standard deviation of TPF allows for discriminating between the behaviors with very similar averages but worse quality (greater deviation).

  • Frames per second (FPS). It is the rate of processed images by the tracking algorithms; high values imply a greater capacity of processing.

  • Lost track probability (LTP). It determines the probability of losing a track on a given frame. Note that this measure also has been used in other frameworks (Kan and Krogmeier, Citation1996).

Results

In Tables through 6 we present the quality measurements of the HC, SA, Tabu Search (TABU), MSPF, and CC applied to the SQUASH (Figure ), BOAT (Figure ), HANDBALL (Figure ), CORRIDOR (Figure ) and PETS2002 (Figure ) sequences. Hill Climbing, SA, and TABU have been executed up to a limit of iterations (5000 iterations limit for the three algorithms) and also with the same threshold of similarity. This means that the three algorithms will be executed until they have either performed 5000 iterations or they have yielded a solution that is similar to the hypothesis within a given threshold.

TABLE 2 Measures of Quality of the Algorithms Applied to SQUASH

The tables with the tracking metrics are ordered by a decreasing number of FPS. This metric is very important when all approaches perform similar results in terms of accuracy.

We can divide the five scenarios by the complexity of that their assignment problem represents. Obviously, when there are more objects to track, the number of assignment hypothesis grow up and the assignment problem is more complex. HANDBALL is the most complex scenario, with a ideal number of moving objects of 14.

Tracks per frame shows how many tracks appear on every frame which should be close the actual number of tracks. However, this does not mean that these tracks are the real tracks we would like to target (noise instead of objects). Thus, this measure should be taken into account along with the LTP measure. This measure tells us how likely it is that we lose a track using a given algorithm. In these scenarios, we notice that the MSPF algorithm entails more computation load, and therefore, the capacity of processing decreases (FPS).

All the algorithms obtain comparable results in terms of accuracy in the simplest scenarios (all excepting HANDBALL). The three local search algorithms perform clearly as they are supposed to: HC is the poorest in terms of accuracy and it is as fast as SA since they both select a neighbor at random at every iteration, while TABU is necessarily slower since it performs a complete evaluation of the neighborhood before choosing a certain move. Since all three algorithms were executed with the same limit of iterations, it is not surprising to see HC and SA yielding a similar amount of FPS with TABU a little behind them. However, TABU is faster than HC and SA, under conditions of a restricted number of iterations, due to the fact that TABU may reach the similarity threshold faster. This points out the potential of TABU for this type of problem.

In the HANDBALL scenario, we can observe that TABU and SA algorithms perform better than the remaining algorithms in terms of accuracy. The probability of losing a track (PLT) is smaller than the remaining algorithms. Simulated annealing outperforms the TABU search algorithm in terms of FPS while it comparable to the best algorithms in terms of accuracy. However, this is still not a very challenging problem for any of these two algorithms except for the fact that they have a limited amount of iterations in order to process the tracks in real-time. This is very important because it means that now, with Tabu and SA, we are able to perform video tracking for complicated scenes (namely, more than 10 tracks and around 70 blobs) in real-time while not compromising accuracy. Note that MSPF algorithm is able to process less than 1 frame per second while tabu is capable of processing more than 7. On the other hand, we can observe how the simplest algorithms (CC and HC algorithms) are able to perform 12 and 10 fps; however, the accuracy of their tracking results are poor and unstable. For instance, the CC algorithm has a standard deviation of TPF of 2.32.

Although SA seems to be the best in some evaluations we should not underestimate TABU. Tabu search is capable outperforming SA and HC in terms of FPS for simple problems due to its faster convergence (even with its more complex exploration of the search space). It can also provide better solutions in very few iterations which could become a key issue when dealing with more complex scenarios (think of a soccer match with 23 or 24 objects and hundreds of blobs).

CONCLUSIONS

In this work we have introduced a new formulation of the association problem in visual tracking as a discrete optimization problem. We have shown that local search techniques can be very useful for solving real-life problems, specially so when real-time is a hard constraint. The accuracy results that we have obtained with the three local search algorithms (HC, SA, and TABU) that we have implemented are similar to advanced computer vision algorithms like the particle filtering based on the MSPF and outperform the CC. In this sense, we have shown that the TABU algorithm is the most powerful algorithm and is able to solve the association problem in video tracking more accurately and much faster than previous approaches.

We have also presented a novel technique to achieve incrementality when the search space is not the same as the solution space, i.e., the space needed to calculate the fitness of a candidate solution. This is very important because it allows us to use Tabu Search and SA on this problem which provided excellent results.

On the other hand, we are interested in doing further research on this problem. Now that the incrementality issue allowed us to use Tabu search (and SA), yielding for the first time a real-time algorithm for the association problem, we would like to improve the fitness function by adding other components such as color, texture information, etc. We would also like to develop a self-tuning version of the local search algorithms so it can choose the right parameters for every instance of a problem, for example, increase the maximum amount of iterations depending on the amount of blobs and tracks or switch to a different initialization depending on the blobs-to-tracks ratio.

TABLE 3 Measures of Quality of the Algorithms Applied to BOAT

TABLE 4 Measures of Quality of the Algorithms Applied to HANDBALL

TABLE 5 Measures of Quality of the Algorithms Applied to PETS2002

TABLE 6 Measures of Quality of the Algorithms Applied to CORRIDOR

This work was supported in part by projects CICYT TIN2008-06742-C02-02/TSI,CICYT TEC2008-06732-C02-02/TEC, SINPROB, CAM MADRINET S-0505/TIC/0255 and DPS2008-07029-C02-02.

REFERENCES

  • Angus , J. , H. Zhou , C. Bea , L. Becket-Lemus , J. Klose , and S. Tubbs . 1993 . Genetic algorithms in passive tracking. Technical Report, Claremont Graduate School , Math Clinic Report , Claremont , CA .
  • Arulampalam , M. S. , S. Maskell , N. Gordon , and T. Clapp . 2002 . A tutorial on particle filters for online nonlinear/non-Gaussian Bayesian tracking . Signal Processing, IEEE Transactions 50 ( 2 ): 174 – 188 .
  • Bar-Shalom , Y. and X.-R. Li . 1995 . Multitarget-Multisensor Tracking: Principles and Techniques. Storrs , CT : YBS Publishing .
  • Beymer , D. and K. Konolige . 1999. Real-time tracking of multiple people using continous detection. In: Proceedings of IEEE International Conference on Computer Vision (ICCV'99). Corfu , Greece : Frame-Rate Workshop.
  • Blackman , S. S. and R. Popoli . 1999 . Design and Analysis of Modern Tracking Systems. Boston , MA : Artech House, Inc .
  • Broida , T. J. and R. Chellappa . 1986 . Estimation of object motion parameters from noisy images . IEEE Trans. Pattern Anal. Mach. Intell. 8 ( 1 ): 90 – 99 .
  • Carrier , J.-Y. , J. Litva , H. Leung , and T. Lo . 1996 . Genetic algorithm for multiple target tracking data association . In: Acquisition, Tracking, and Pointing X , eds. M. K. Masten and L. A. Stockum , pp. 180 – 190 . Orlando , FL : Pergamon Press, Inc .
  • Castanedo , F. , M. A. Patricio , J. Garcia , and J. M. Molina . 2006 . Extending surveillance systems capabilities using BDI cooperative sensor agents . In: VSSN 06: Proceedings of the 4th ACM International Workshop on Video Surveillance and Sensor Networks , pp. 131 – 138 . New York : ACM Press .
  • Chang , Y. and J. Aggarwal . 1991 . Neural network optimization for multi-target multi-sensor passive tracking . In: Proc. IEEE Workshop on Visual Motion , pp. 268 – 273 . Charleston , SC .
  • Cox , I. J. and S. L. Hingorani . 1996 . An efficient implementation of Reid's multiple hypothesis tracking algorithm and its evaluation for the purpose of visual tracking . IEEE Trans. Pattern Anal. Mach. Intell. 18 ( 2 ): 138 – 150 .
  • Djuric , P. , J. Kotecha , J. Zhang , Y. Huang , T. Ghirmai , M. Bugallo , and J. Miguez . 2003 . Particle filtering . IEEE Signal Processing Magazine 19 – 38 .
  • Dotu , I. and P. V. Hentenryck . 2005 . Scheduling social golfers locally . In: International Conference on Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems (CP-AIOR-05) . Prague , Czech Republic .
  • Fuentes , L. M. and S. A. Velastin . 2006 . People tracking in surveillance applications . Image Vision Comput. 24 ( 11 ): 1165 – 1171 .
  • Glover , F. and M. Laguna , 1993 . Tabu search . In: Modern Heuristic Techniques for Combinatorial Problems , pp. 70 – 150 . ed. C. Reeves , New York : John Wiley & Sons, Inc .
  • Harvey , W. and T. Winterer . 2005 . Solving the molr and social golfers problems . In: Proceedings of the Eleventh International Conference on Principles and Practice of Constraint Programming (CP 2005) , pp. 286 – 300 . Barcelona , Spain : Springer-Verlag .
  • Hillis , D. B. 1997 . Using a genetic algorithm for multi-hypothesis tracking . In: ICTAI '97: Proceedings of the 9th International Conference on Tools with Artificial Intelligence , p. 112 . Washington : IEEE Computer Society .
  • Jin , H. , G. Qian , and S. Rajko . 2006 . Real-time multi-view 3d object tracking in cluttered scenes . In: ISVC (2) , eds. G. Bebis , R. Boyle , B. Parvin , D. Koracin , P. Remagnino , A. V. Nefian , M. Gopi , V. Pascucci , J. Zara , J. Molineros , H. Theisel , and T. Malzbender , Lecture Notes in Computer Science , Vol. 4292 , pp. 647 – 656 . Springer .
  • Kan , W. and J. Krogmeier . 1996 . A generalization of the pda target tracking algorithm using hypothesis clustering . Signals, Systems and Computers 2 : 878 – 882 .
  • Kirkpatrick , S. , G. Cd , and V. Mp . 1983 . Optimization by simulated annealing . Science 220 ( 4598 ): 671 – 680 .
  • Kitagawa , G. 1987 . Non-Gaussian state-space modeling of nonstationary time series . J. Amer. Statist. Assoc. 82 : 1032 – 1063 .
  • Krumm , J. , S. Harris , B. Meyers , B. Brumitt , M. Hale , and S. Shafer . 2000 . Multi-camera multi-person tracking for easyliving . In: VS '00: Proceedings of the Third IEEE International Workshop on Visual Surveillance (VS'2000). Washington , DC : IEEE Computer Society .
  • Larrañaga , P. and J. A. Lozano . 2001 . Estimation of Distribution Algorithms: A New Tool for Evolutionary Computation. Norwell , MA : Kluwer Academic Publishers .
  • Machine Vision Group, U.o. L . 2001 . Cvbase '06 Workshop on Computer Vision Based Analysis in Sport Environments . http://vision.fe.uni-lj.si/cvbase06/. Last accessed September 29, 2009 .
  • Metropolis , N. , A. Rosenbluth , M. Rosenbluth , A. Teller , and E. Teller . 1953 . Equations of state calculations by fast computing machines . Journal of Chemical Physics 21 ( 6 ): 1087 – 1092 .
  • OpenCV. http://opencvlibrary.sourceforge.net. Last accesssed September 31, 2009 .
  • Patricio , M. A. , J. Garcia , A. Berlanga , and J. M. Molina . 2007 . Video tracking association problem using estimation of distribution algorithms in complex scenes . In: IWINAC (2) , eds. J. Mira and J. R. Alvarez , Lecture Notes in Computer Science , Vol. 4528 , pp. 261 – 270 . Springer .
  • Pérez , O. , M. A. Patricio , J. García , and J. M. Molina . 2006. Improving the segmentation stage of a pedestrian tracking video-based system by means of evolution strategies. In: 8th European Workshop on Evolutionary Computation in Image Analysis and Signal Processing. EvoIASP 2006 , Budapest , Hungary.
  • Pérez , O. , M. Piccardi , J. García , M. A. Patricio , and J. M. Molina . 2007 . Comparison between genetic algorithms and baum-welch algorithm for the adjustment of HMM to classify human activities . In: Applications of Evolutionary Computing, EvoWorkshops 2007 , Valencia , España .
  • Ruan , Y. and P. Willett . 2004 . Multiple model pmht and its application to the benchmark radar tracking problem . IEEE Transactions on Aerospace and Electronic Systems 40 ( 4 ): 1337 – 1350 .
  • Russell , S. J. and P. Norvig . 2003 . Artificial Intelligence: A Modern Approach . Pearson Education .
  • Shams , S. 1996 . Neural network optimization for multi-target multi-sensor passive tracking . In: Proceedings of the IEEE , Vol. 84 , pp. 1442 – 1457 .
  • Shyu , M.-L. , S.-C. Chen , Q. Sun , and H. Yu . 2007 . Overview and future trends of multimedia research for content access and distribution . International Journal of Semantic Computing 1 ( 1 ): 29 – 66 .
  • Stauffer , C. and W. Grimson . 1999 . Adaptive background mixture models for real-time tracking . In: Computer Vision and Pattern Recognition, 1999. IEEE Computer Society Conference , Vol. 2 . Los Alamitos , CA : IEEE Computer Society .
  • Turkmen , I. , K. Guney , and D. Karaboga . 2004 . Tabu search tracker for multiple target tracking . Journal of Electromagnetic Waves and Applications 18 ( 12 ): 1573 – 1589 .
  • Veenman , C. J. , M. J. T. Reinders , and E. Backer . 2001 . Resolving motion correspondence for densely moving points . IEEE Trans. Pattern Anal. Mach. Intell. 23 ( 1 ): 54 – 72 .
  • Yeddanapudi , M. , Y. Bar-Shalom , and K. Pattipati . 1997 . Imm estimation for multitarget-multisensor air traffic surveillance . In: Proceedings of the IEEE , Vol. 85 , pp. 80 – 96 .
  • Yilmaz , A. , O. Javed , and M. Shah . 2006 . Object tracking: A survey . ACM Comput. Surv. 38 ( 4 ): 1 – 45 .

Reprints and Corporate Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

To request a reprint or corporate permissions for this article, please click on the relevant link below:

Academic Permissions

Please note: Selecting permissions does not provide access to the full text of the article, please see our help page How do I view content?

Obtain permissions instantly via Rightslink by clicking on the button below:

If you are unable to obtain permissions via Rightslink, please complete and submit this Permissions form. For more information, please visit our Permissions help page.