MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation Paper • 2511.09611 • Published 19 days ago • 67
PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model Paper • 2510.14528 • Published Oct 16 • 98
Optimized Table Tokenization for Table Structure Recognition Paper • 2305.03393 • Published May 5, 2023 • 1
PubTables-1M: Towards comprehensive table extraction from unstructured documents Paper • 2110.00061 • Published Sep 30, 2021 • 3
Prismatic Synthesis: Gradient-based Data Diversification Boosts Generalization in LLM Reasoning Paper • 2505.20161 • Published May 26 • 1
Image-GS: Content-Adaptive Image Representation via 2D Gaussians Paper • 2407.01866 • Published Jul 2, 2024 • 1
2D Gaussian Splatting with Semantic Alignment for Image Inpainting Paper • 2509.01964 • Published Sep 2 • 6
Mixture of Global and Local Experts with Diffusion Transformer for Controllable Face Generation Paper • 2509.00428 • Published Aug 30 • 17
Representing Speech Through Autoregressive Prediction of Cochlear Tokens Paper • 2508.11598 • Published Aug 15 • 17
StyleMM: Stylized 3D Morphable Face Model via Text-Driven Aligned Image Translation Paper • 2508.11203 • Published Aug 15 • 10
FantasyPortrait: Enhancing Multi-Character Portrait Animation with Expression-Augmented Diffusion Transformers Paper • 2507.12956 • Published Jul 17 • 24
FLEXITOKENS: Flexible Tokenization for Evolving Language Models Paper • 2507.12720 • Published Jul 17 • 9
SpeakerVid-5M: A Large-Scale High-Quality Dataset for Audio-Visual Dyadic Interactive Human Generation Paper • 2507.09862 • Published Jul 14 • 49
Radial Attention: O(nlog n) Sparse Attention with Energy Decay for Long Video Generation Paper • 2506.19852 • Published Jun 24 • 41
Peccavi: Visual Paraphrase Attack Safe and Distortion Free Image Watermarking Technique for AI-Generated Images Paper • 2506.22960 • Published Jun 28 • 6