CRAFT Your Dataset: Task-Specific Synthetic Dataset Generation Through Corpus Retrieval and Augmentation Paper β’ 2409.02098 β’ Published Sep 3, 2024 β’ 2
CommonForms: A Large, Diverse Dataset for Form Field Detection Paper β’ 2509.16506 β’ Published Sep 20 β’ 19
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Paper β’ 2012.14740 β’ Published Dec 29, 2020 β’ 2
Structured 3D Latents for Scalable and Versatile 3D Generation Paper β’ 2412.01506 β’ Published Dec 2, 2024 β’ 84
Guiding Vision-Language Model Selection for Visual Question-Answering Across Tasks, Domains, and Knowledge Types Paper β’ 2409.09269 β’ Published Sep 14, 2024 β’ 9
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper β’ 2409.09214 β’ Published Sep 13, 2024 β’ 53
CIVICS: Building a Dataset for Examining Culturally-Informed Values in Large Language Models Paper β’ 2405.13974 β’ Published May 22, 2024 β’ 10
Building and better understanding vision-language models: insights and future directions Paper β’ 2408.12637 β’ Published Aug 22, 2024 β’ 133
view article Article Multimodal Augmentation for Documents: Recovering βComprehensionβ in βReading and Comprehensionβ task May 16, 2024 β’ 17