akashicmarga (Akash Singh)

liked 2 models 8 days ago

browser-use/bu-30b-a3b-preview

Image-Text-to-Text • 31B • Updated 7 days ago • 5.4k • 225

facebook/sam-audio-large

Updated about 4 hours ago • 12.5k • 307

liked a model about 1 month ago

ai4bharat/indic-conformer-600m-multilingual

Updated Sep 27 • 16.2k • 39

reacted to sergiopaniego's post with 🔥 3 months ago

Post

2938

A few days ago, Thinking Machines Lab released “LoRA Without Regret”, showing that LoRA can match full fine-tuning performance when configured right.

Naturally, we decided to reproduce the results with TRL and release a guide!

https://huggingface.co/docs/trl/main/en/lora_without_regret

liked a model 7 months ago

microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • 6B • Updated 20 days ago • 274k • 1.55k

upvoted a collection 10 months ago

🔊 Audio model 2025

Collection

14 items • Updated 16 days ago • 5

upvoted a collection 11 months ago

Hibiki fr-en

Collection

Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki. • 7 items • Updated 7 days ago • 55

upvoted a collection 12 months ago

Cosmos

Collection

⚠️ This collection is archived. 👉 https://huggingface.co/collections/nvidia/cosmos-predict25 • 31 items • Updated 7 days ago • 299

liked a Space about 1 year ago

Scaling test-time compute

📈

588

Implement test-time compute scaling for math problems

liked a model about 1 year ago

nvidia/Hymba-1.5B-Instruct

Text Generation • 2B • Updated Jan 2 • 240 • 242

reacted to clem's post with 🚀 about 1 year ago

Post

2005

I've been in Brazil for 10 days now 🇧🇷🇧🇷🇧🇷

I've been surprised by the gap between the massive number of people interested in AI (chatgpt adoption is crazy here) and the relatively low number of real AI builders - aka people and companies building their own AI models, datasets and apps.

Lots of efforts needed across the world for everyone to participate, control and benefit this foundational technology, starting with open-source & multi-lingual AI, more access to GPUs & AI builder training for all!

reacted to maxiw's post with 👍 about 1 year ago

Post

2905

You can now try out computer use models from the hub to automate your local machine with https://github.com/askui/vision-agent. 💻

import time
from askui import VisionAgent

with VisionAgent() as agent:
    agent.tools.webbrowser.open_new("http://www.google.com")
    time.sleep(0.5)
    agent.click("search field in the center of the screen", model_name="Qwen/Qwen2-VL-7B-Instruct")
    agent.type("cats")
    agent.keyboard("enter")
    time.sleep(0.5)
    agent.click("text 'Images'", model_name="AskUI/PTA-1")
    time.sleep(0.5)
    agent.click("second cat image", model_name="OS-Copilot/OS-Atlas-Base-7B")

Currently these models are integrated with Gradio Spaces API. Also planning to add local inference soon!

Currently supported:
- Qwen/Qwen2-VL-7B-Instruct
- Qwen/Qwen2-VL-2B-Instruct
- AskUI/PTA-1
- OS-Copilot/OS-Atlas-Base-7B

3 replies

·

reacted to Jaward's post with 👍 about 1 year ago

Post

1256

This is supercool!!
Explores o1-like multimodal reasoning.
Multi-agents with DPO is a nice touch 👍
Paper: https://arxiv.org/pdf/2411.14432
Code: https://github.com/dongyh20/Insight-V

liked a model about 1 year ago

Etched/oasis-500m

Updated Nov 4, 2024 • 115 • 485

upvoted 2 collections about 1 year ago

🪐 SmolLM

Collection

A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated May 5 • 241

MobileLLM

Collection

Optimizing Sub-billion Parameter Language Models for On-Device Use Cases (ICML 2024) https://arxiv.org/abs/2402.14905 • 50 items • Updated 19 days ago • 136

upvoted a collection over 1 year ago

Moshi v0.1 Release

Collection

MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 16 items • Updated 7 days ago • 242

liked a dataset over 1 year ago

argilla/FinePersonas-v0.1

Viewer • Updated Dec 11, 2024 • 42.1M • 10.2k • 408

liked a Space over 1 year ago

MusicGen Web

🎵

245

In-browser text-to-music w/ Transformers.js!

reacted to akhaliq's post with 👍 over 1 year ago

Post

21392

Chameleon

Mixed-Modal Early-Fusion Foundation Models

Chameleon: Mixed-Modal Early-Fusion Foundation Models (2405.09818)

We present Chameleon, a family of early-fusion token-based mixed-modal models capable of understanding and generating images and text in any arbitrary sequence. We outline a stable training approach from inception, an alignment recipe, and an architectural parameterization tailored for the early-fusion, token-based, mixed-modal setting. The models are evaluated on a comprehensive range of tasks, including visual question answering, image captioning, text generation, image generation, and long-form mixed modal generation. Chameleon demonstrates broad and general capabilities, including state-of-the-art performance in image captioning tasks, outperforms Llama-2 in text-only tasks while being competitive with models such as Mixtral 8x7B and Gemini-Pro, and performs non-trivial image generation, all in a single model. It also matches or exceeds the performance of much larger models, including Gemini Pro and GPT-4V, according to human judgments on a new long-form mixed-modal generation evaluation, where either the prompt or outputs contain mixed sequences of both images and text. Chameleon marks a significant step forward in a unified modeling of full multimodal documents.

Akash Singh

AI & ML interests

Recent Activity

Organizations

browser-use/bu-30b-a3b-preview

facebook/sam-audio-large

ai4bharat/indic-conformer-600m-multilingual

microsoft/Phi-4-multimodal-instruct

🔊 Audio model 2025

Hibiki fr-en

Cosmos

Scaling test-time compute

nvidia/Hymba-1.5B-Instruct

Etched/oasis-500m

🪐 SmolLM

MobileLLM

Moshi v0.1 Release

argilla/FinePersonas-v0.1

MusicGen Web

Akash Singh

AI & ML interests

Recent Activity

Organizations

akashicmarga's activity

Scaling test-time compute

MusicGen Web