Read more
Discover the cutting-edge advancements in visual object tracking (VOT) with this comprehensive resource, designed to revolutionize how researchers and professionals approach tracking systems. This book presents deep learning techniques and multimodal fusion strategies, offering state-of-the-art solutions for robust and accurate object tracking in dynamic environments.
With applications ranging from autonomous vehicles to intelligent surveillance, VOT has become a cornerstone of modern computer vision. By addressing challenges like scalability, real-time performance, and robustness, this book equips readers with the tools to navigate the rapidly evolving landscape of tracking systems. It s the first of its kind to seamlessly integrate single-modal and multimodal approaches, bridging the gap between foundational methods and emerging technologies.
Explore key topics including Siamese networks, transformer-based models, RGB-LiDAR and RGB-thermal fusion, and spatio-temporal modeling. Gain insights into benchmark datasets, evaluation protocols, and future trends like large model transfer and cross-domain learning. Each chapter builds on the next, ensuring a structured progression from theoretical principles to practical applications.
Whether you re a researcher, practitioner, or student in computer vision, artificial intelligence, or machine learning, this book is an indispensable guide to mastering VOT. A basic understanding of computer science and deep learning concepts is recommended to fully benefit from the material.
List of contents
Chapter 1. Introduction.- Chapter 2. Fundamental Knowledge and Frameworks.- Chapter 3. Single-modal Object Tracking.- Chapter 4. Multi-modal Object Tracking.- Chapter 5. Datasets and Evaluation Protocols.- Chapter 6. Future Prospects.
About the author
Mengmeng Wang
is an associate professor in the College of Computer Science and Technology, Zhejiang University of Technology. She earned her B.Sc., Master and the Ph.D. degrees in control science and engineering from Zhejiang University in 2015, 2018, and 2024. Her research focus on image/video understanding, text-to-video/image-to-video generation, computer vision, robotics, and intelligent transportation systems. Dr. Wang has published more than 50 papers at top journals and conferences, e.g., TPAMI, TIP, ICLR, NeurIPS, ICCV, AAAI, CVPR, ICRA, and IROS. She has served as area chairs of leading conferences in computer vision, such as ICCV.
Xiangjie Kong
is a professor at the College of Computer Science and Technology, Zhejiang University of Technology, China. He received his B.Sc. and Ph.D. degrees from Zhejiang University, in 2004 and 2009, respectively. Before joining Zhejiang University of Technology, he was an associate professor at the School of Software, Dalian University of Technology. Dr. Kong’s research interests lies in social computing, mobile computing, and data science. He has authored over 200 scientific papers in international journals and conferences, with more than 180 indexed by ISI SCIE (with over 180 indexed by ISI SCIE). Dr. Kong is a Senior Member of the IEEE, a Distinguished Member of CCF, and is a member of ACM.
Guojiang Shen
is a professor at the College of Computer Science and Technology, Zhejiang University of Technology. He received his B.Sc. degree in Control Theory and Control Engineering and his Ph.D. degree in Control Science and Engineering from Zhejiang University in 1999 and 2004, respectively. His research expertise spans artificial intelligence, Big Data analytics, and intelligent transportation systems.