Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up

All HF Hub posts

DmitryRyuminΒ 
posted an update 2 days ago
view post
Post
4391
πŸš€πŸ€–πŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ€–πŸš€
πŸ“„ Title: Variance-based Pruning for Accelerating and Compressing Trained Networks πŸ”

πŸ“ Description: The one-shot pruning method efficiently compresses networks, reducing computation and memory usage while retaining almost full performance and requiring minimal fine-tuning.

πŸ‘₯ Authors: Uranik Berisha, Jens Mehnert, and Alexandru Paul Condurache

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Variance-Based Pruning for Accelerating and Compressing Trained Networks (2507.12988)

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #VarianceBasedPruning #NetworkCompression #ModelAcceleration #EfficientDeepLearning #VisionTransformers #AI #ICCV2025 #ResearchHighlight
onekqΒ 
posted an update 1 day ago
view post
Post
2953
Context rot is such a catchy phrase, but the problem has been identified 2+ years ago, called attention decay.
Lost in the Middle: How Language Models Use Long Contexts (2307.03172)

I spotted the same problem in coding tasks, and documented in my book (https://www.amazon.com/dp/9999331130).

Why did this problem become hot again? This is because many of us thought the problem has been solved by long context models, which is not true.

Here we were misled by benchmarks. Most long-context benchmarks build around the QA scenario, i.e. "finding needle in haystack". But in agentic scenarios, the model needs to find EVERYTHING in the haystack, and just can't afford enough attention for this challenge.
  • 5 replies
Β·
gokaygokayΒ 
posted an update 3 days ago
view post
Post
3615
FlashPack: Lightning-Fast Model Loading for PyTorch

https://github.com/fal-ai/flashpack

FlashPack β€” a new, high-throughput file format and loading mechanism for PyTorch that makes model checkpoint I/O blazingly fast, even on systems without access to GPU Direct Storage (GDS).

With FlashPack, loading any model can be 3–6Γ— faster than with the current state-of-the-art methods like accelerate or the standard load_state_dict() and to() flow β€” all wrapped in a lightweight, pure-Python package that works anywhere.

  • 2 replies
Β·
mrfakenameΒ 
posted an update about 12 hours ago
view post
Post
523
Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.

Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.

Will probably kick off a new run later with some settings tweaked.

Put up a demo here: mrfakename/EmoAct-MiMo

(Turn πŸ”Š on to hear audio samples)
onekqΒ 
posted an update 2 days ago
view post
Post
3974
I am on the model layer and focus on atomic tasks, so I don't get involved in product discussions. But this provocative article provoked the community quite a bit. The case in point is Claude Code, which happens to be my biggest productivity revolution since ChatGPT.

RAG predated TUI and agents. So to be fair it's quite an achievement to survive the AI evolution. But I feel it is overshadowed by context engineering in the agent era. How does everyone feel about this?

https://www.nicolasbustamante.com/p/the-rag-obituary-killed-by-agents
  • 2 replies
Β·
jbilcke-hfΒ 
posted an update about 22 hours ago
view post
Post
732
I made a code sniping agent to detect when new AI papers with code (and weights) are released, and then automatically create a Gradio demo on Hugging Face πŸ§™

Here are some examples generated 100% automatically:
https://huggingface.co/collections/jbilcke-hf/sniped

I call this agent CheatCode (https://github.com/jbilcke-hf/CheatCode) because it skips so many steps that it kinda feels like breaking the rules of the AI tech release game πŸ˜…

As with any experimental technology, there is still room for improvement πŸ‘©πŸ»β€πŸ”¬:

- Currently the demos are all generated in one go and not built or tested by the agent itself. A more robust version should loop over the deployed app to fix build/runtime issues.
- There is still a bit of human curation done to avoid making demos for things that can’t really be demonstrated on ZeroGPU (eg. tasks taking several minutes)
- Some papers can actually be showcased in a variety of ways, which isn’t really supported (see Demo 2)


DmitryRyuminΒ 
posted an update about 22 hours ago
view post
Post
706
πŸš€πŸ·οΈπŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ§©πŸš€
πŸ“„ Title: Heavy Labels Out! Dataset Distillation with Label Space Lightening πŸ”

πŸ“ Description: The HeLlO framework is a new corpus distillation method that removes the need for large soft labels. It uses a lightweight, online image-to-label projector based on CLIP. This projector has been adapted using LoRA-style, parameter-efficient tuning. It has also been initialized with text embeddings.

πŸ‘₯ Authors: @roseannelexie , @Huage001 , Zigeng Chen, Jingwen Ye, and Xinchao Wang

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Heavy Labels Out! Dataset Distillation with Label Space Lightening (2408.08201)

πŸ“Ί Video: https://www.youtube.com/watch?v=kAyK_3wskgA

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Efficient Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/efficient-learning.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #DatasetDistillation #LabelCompression #CLIP #LoRA #EfficientAI #FoundationModels #AI #ICCV2025 #ResearchHighlight
  • 1 reply
Β·
DmitryRyuminΒ 
posted an update 3 days ago
view post
Post
2778
πŸš€πŸ‘οΈπŸŒŸ New Research Alert - ICCV 2025 (Oral)! πŸŒŸπŸ‘οΈπŸš€
πŸ“„ Title: Token Activation Map to Visually Explain Multimodal LLMs πŸ”

πŸ“ Description: The Token Activation Map (TAM) is an advanced explainability method for multimodal LLMs. Using causal inference and a Rank Gaussian Filter, TAM reveals token-level interactions and eliminates redundant activations. The result is clearer, high-quality visualizations that enhance understanding of object localization, reasoning and multimodal alignment across models.

πŸ‘₯ Authors: Yi Li, Hualiang Wang, Xinpeng Ding, Haonan Wang, and Xiaomeng Li

πŸ“… Conference: ICCV, 19 – 23 Oct, 2025 | Honolulu, Hawai'i, USA πŸ‡ΊπŸ‡Έ

πŸ“„ Paper: Token Activation Map to Visually Explain Multimodal LLMs (2506.23270)

πŸ“ Repository: https://github.com/xmed-lab/TAM

πŸš€ ICCV-2023-25-Papers: https://github.com/DmitryRyumin/ICCV-2023-25-Papers

πŸš€ Added to the Multi-Modal Learning Section: https://github.com/DmitryRyumin/ICCV-2023-25-Papers/blob/main/sections/2025/main/multi-modal-learning.md

πŸ“š More Papers: more cutting-edge research presented at other conferences in the DmitryRyumin/NewEraAI-Papers curated by @DmitryRyumin

πŸ” Keywords: #TokenActivationMap #TAM #CausalInference #VisualReasoning #Multimodal #Explainability #VisionLanguage #LLM #XAI #AI #ICCV2025 #ResearchHighlight
  • 1 reply
Β·
prithivMLmodsΒ 
posted an update about 23 hours ago
view post
Post
734
Here is the official Florence-2 Transformers-converted demo for the following vision models: florence-community/Florence-2-large, florence-community/Florence-2-large-ft, florence-community/Florence-2-base, and florence-community/Florence-2-base-ft. These models support tasks such as object detection, captioning, detailed captioning, more detailed captioning, dense region captioning, region proposal, OCR, and OCR with region. Try the official demo at the link below:

> Space: prithivMLmods/florence2-vision-models
> Collection: prithivMLmods/multimodal-implementations-67c9982ea04b39f0608badb0

> To know more about it, visit the app page or the respective model page!!
AdinaYΒ 
posted an update about 23 hours ago
view post
Post
571
Ming-flash-omni Preview πŸš€ Multimodal foundation model from AntGroup

inclusionAI/Ming-flash-omni-Preview

✨ Built on Ling-Flash-2.0: 10B total/6B active
✨ Generative segmentation-as-editing
✨ SOTA contextual & dialect ASR
✨ High-fidelity image generation