Given CLIP’s training on the extensive LAION dataset, identifying datasets unknown to CLIP not only facilitates the application of transfer learning for downstream tasks but also serves as a means to evaluate CLIP’s ability to detect out-of-distribution or novel instances. This is particularly relevant in the context of addressing the hallucination issues prevalent in large models. To advance research in this area, we introduce a dataset of TV series released post-2021, named TV100, to explore CLIP’s performance further.
Below are data collection process and detailed information about TV100, including the country distribution, class distribution, and an empirical evaluation of zero-shot and finetuned performance.
Here we elaborate on the data collection process. Specifically, we manually search for TV series from IMDB and collect the items released after 2021. Afterward, we download the related images on Google by searching the keyword “[NAME] TV Series,” where [NAME] is the name of the TV series. The downloaded images are then processed manually to delete repeated and meaningless ones. Hence, we can get a large dataset that contains around 800 classes. However, some classes may be seen for the CLIP, e.g., “The Snoopy Show” (Snoopy is a famous cartoon character). Hence, we use a pre-trained CLIP to rank the difficulty of these classes by measuring the zero-shot accuracy of each image and the text “a photo of the TV series [CLASS].” We choose the top-100 hard classes based on the zero-shot accuracy and construct the TV100 dataset.
The dataset is accessible at: TV100.
@article{zhou2024tv,
title={TV100: A TV Series Dataset that Pre-Trained CLIP Has Not Seen},
author={Zhou, Da-Wei and Qi, Zhi-Hong and Ye, Han-Jia and Zhan, De-Chuan},
journal={Frontiers of Computer Science},
year={2024},
volume = {18},
number = {5},
pages = {185349},
}