Figures & data
Figure 1. The training and validation error of the CNN, (a) our network without the STN; (b) our network with the STN.
![Figure 1. The training and validation error of the CNN, (a) our network without the STN; (b) our network with the STN.](/cms/asset/1b8c4fde-9b51-4f8f-8598-dfebeb98b833/icsu_a_1560097_f0001_c.jpg)
Figure 3. (a) Learning spatial context at the frame; (b) detecting an object’s location at the
frame and tracking.
![Figure 3. (a) Learning spatial context at the tth frame; (b) detecting an object’s location at the (t+1)th frame and tracking.](/cms/asset/12ce0167-7c29-40d3-bfca-1ad4436494b1/icsu_a_1560097_f0003_c.jpg)
Figure 4. Test video images, from top to bottom, with rows showing the performance of our method, the MIL method, ATF method, DDVT method, and CF method on eight datasets, respectively. Every column represents one kind of dataset; the dataset names appear below each column.
![Figure 4. Test video images, from top to bottom, with rows showing the performance of our method, the MIL method, ATF method, DDVT method, and CF method on eight datasets, respectively. Every column represents one kind of dataset; the dataset names appear below each column.](/cms/asset/6ed04e3c-7477-4a13-8702-e90f738ec8b1/icsu_a_1560097_f0004_c.jpg)
Figure 5. Each graph represents one kind of dataset and each colour line represents each method’s error; the dataset names appear below each graph. Red represents the error of our method, green that of the CF method, yellow that of the MIL method, black that of the ATF method, and blue that of the DDVT method.
![Figure 5. Each graph represents one kind of dataset and each colour line represents each method’s error; the dataset names appear below each graph. Red represents the error of our method, green that of the CF method, yellow that of the MIL method, black that of the ATF method, and blue that of the DDVT method.](/cms/asset/1ca95d65-2c00-402e-994c-86d19057a58a/icsu_a_1560097_f0005_c.jpg)
Table 1. Mean tracking errors (pixel) of every method on every dataset.
Table 2. Standard deviations (pixel) of every method’s tracking errors on every dataset.
Table 3. The trackers’ accuracy on every dataset; the accuracy have been calculated based on the distance between the ground truth and the tracking position. If the distance is smaller than the half length of side of the tracking window centred on the ground truth in one frame, we believe the tracker successfully track the object in this frame.
Table 4. Every method’s frames per second (FPS) on every dataset.