Because of some reasons, we cannot provide specific YouTube videos used for training, but I can tell you that using the keywords walk in or walk through to search on YouTube will find relevant videos.
Abstract: Video Large Language Models (Vid-LLMs) have made remarkable advancements in comprehending video content for QA dialogue. However, they struggle to extend this visual understanding to tasks ...