The computational cost is relatively low compared to pixel-level and feature-level fusion tracking methods.Īlthough RGBT tracking has made significant progress, the mainstream methods can be roughly divided into two categories, namely the combined fusion tracking method and the discriminant fusion tracking method. The results or response maps are then fused to get the final tracking result. Decision-level fusion tracking is first performed in individual modalities to obtain tracking results or response maps. In feature-level fusion tracking, the features of RGB and infrared images are first extracted and then fused according to the designed fusion rules to obtain the fused feature and finally use the fused feature to perform tracking. Although rich information is preserved, it will bring greater computational costs during tracking. Pixel-level fusion tracking means that heterogeneous images are fused first, then target tracking is carried out based on the rich images. The existing fusion tracking methods can be divided into pixel-level, feature-level, and decision-level. To obtain more accurate and robust target attribute information and compensate for the information uncertainty of a single modality in target tracking, it is necessary to fuse the data collected by multi-sensors. Numerous experiments on publicly available GTOT, RGBT234, and LasHeR datasets show that our algorithm outperforms the current mainstream tracking algorithms. Finally, a global feature fusion module is designed to adjust the global information adaptively. At the same time, a Transformer interactive fusion module is proposed to build long-distance dependencies and enhance semantic representation further. ![]() Specifically, we use different convolution branches for multi-scale feature extraction and aggregate them through the feature selection module adaptively. To solve this problem, this work proposes a new multi-scale feature interactive fusion network (MSIFNet) for RGBT tracking. ![]() ![]() They do not fully exploit the multi-scale information and ignore the rich contextual information among features, which limits the tracking performance to some extent. Currently, most algorithms obtain modality weights through attention mechanisms to integrate multi-modalities information. The fusion tracking of RGB and thermal infrared image (RGBT) is paid wide attention to due to their complementary advantages.
0 Comments
Leave a Reply. |