sagar007 (Sagar pallai)

replied to their post about 2 months ago

I appreciate the clarification! My goal with this project was purely educational to understand the mechanics of how vision-language connectors work. While the base weights belong to Google and OpenAI, the implementation of the projection layers and the fine-tuning process was my contribution. I'm still learning the ropes, so I appreciate the call-out on the terminology!

replied to their post about 2 months ago

Thanks

reacted to unmodeled-tyler's post with 👍 about 2 months ago

Post

3698

NEW MODEL: vanta-research/mox-tiny-1

Excited to share the first model from the new Mox-1 model family from VANTA Research!

Mox-Tiny-1 is finetuned from Llama's 3.1 8B architecture specifically for thinking partnership and high-fidelity AI interaction.

Training domains were carefully selected with this goal in mind, and all datasets used were synthetically generated for the Mox-1 models specifically.

Domains include but not limited to:
- Direct opinions (clear stances on various topics)
- Constructive disagreement (Pushing back on flawed premises)
- Epistemic confidence (calibrated certainty and uncertainty)
- Warmth & presence (Empathetic, human responses)
- Wonder & puzzlement (Intellectual curiosity)

Interaction Examples:

Direct opinions:

User: What's the best programming language?
Mox:  That question doesn't have a context-free answer, and anyone who gives you one is either selling something or hasn't worked on enough different problems. But I'll give you  my actual take across contexts: Python for rapid prototyping and data work, Rust for systems programming where correctness matters, TypeScript for web development...

Warmth & Humor:

User: I've been debugging this issue for six hours. 
Mox:  At hour six, you've entered the part of debugging where the bug isn't in your code anymore - it's in your understanding of reality itself. The semicolon starts looking suspicious. Show me what you're dealing with. Fresh eyes help, and mine haven't been staring at it since lunch.

Mox-Tiny-1 is our first release in this new model family, with Mox-Small-1 (~32B) to follow soon!

5 replies

·

replied to their post about 2 months ago

i removed word "scratch", scratch mean building the full MLOps pipeline myself, including training, configuration with Hydra, data/versioning with DVC, and experiment tracking with MLflow.

replied to their post about 2 months ago

By from scratch, I mean building the full MLOps pipeline myself, including training, configuration with Hydra, data/versioning with DVC, and experiment tracking with MLflow. Since it combines both training and pipeline development, I referred to it as from scratch. Similarly, papers like LLaVA claim training based on benchmarks, but in practice, they stitch together existing models.

posted an update about 2 months ago

Post

4172

🚀 I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP!

Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results!

🔧 What I Built:
A vision-language model that can understand images and answer questions about them, combining:
- Google Gemma-3-270M (language)
- OpenAI CLIP ViT-Large/14 (vision)
- LoRA fine-tuning for efficiency

📊 Training Stats:
- 157,712 training samples (full LLaVA dataset)
- 3 epochs on A100 40GB
- ~9 hours training time
- Final loss: 1.333 training / 1.430 validation
- Only 18.6M trainable params (3.4% of 539M total)

📈 sagar007/multigemma
Benchmark Results:
- VQA Accuracy: 53.8%
- Works great for: animal detection, room identification, scene understanding

🔗 **Try it yourself:**
- 🤗 Model: sagar007/multigemma
- 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma
- 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m

Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD!

Would love to hear your feedback! 🙏

#multimodal #gemma #clip #llava #vision-language #pytorch

9 replies

·

reacted to clem's post with 🔥 about 1 year ago

Post

7546

AI is not a zero-sum game. Open-source AI is the tide that lifts all boats!

reacted to their post with 👍🔥 about 1 year ago

Post

3749

🚀 Just built a Perplexity-inspired AI search assistant using Gradio, DeepSeek, and DuckDuckGo!
Ask it anything, and it’ll:

Scour the web for answers 📚

Cite sources like a pro 🔗

Even talk back with TTS (thanks, Kokoro!) 🎙️

Ask it anything, and it’ll:

Scour the web for answers 📚

Check it out → sagar007/DeepSeekR1_Search

posted an update about 1 year ago

Post

3749

🚀 Just built a Perplexity-inspired AI search assistant using Gradio, DeepSeek, and DuckDuckGo!
Ask it anything, and it’ll:

Scour the web for answers 📚

Cite sources like a pro 🔗

Even talk back with TTS (thanks, Kokoro!) 🎙️

Ask it anything, and it’ll:

Scour the web for answers 📚

Check it out → sagar007/DeepSeekR1_Search

posted an update over 1 year ago

Post

804

Fine-tuned Phi-3.5 Chatbot

This project presents a fine-tuned version of Microsoft's Phi-3.5 model, optimized for enhanced conversational abilities and general knowledge tasks.

Model Details
- Base model: microsoft/Phi-3.5-mini-instruct
- Fine-tuning method: PEFT (Parameter-Efficient Fine-Tuning)
- Training data: [Brief description of your dataset]

Features
- Improved response generation for a wide range of topics
- Enhanced context understanding and coherence
- Optimized for deployment on Hugging Face Spaces

Usage
This model can be used for various natural language processing tasks, including:
- General conversation
- Question answering
- Task instructions
- Creative writing

Try out the model here :
sagar007/phi3.5_mini_instruct_finetune

Limitations
While this fine-tuned model shows improved performance, users should be aware of potential biases and limitations inherent in language models. Always critically evaluate the model's outputs.

Feedback
I welcome any feedback, suggestions, or questions about this project. Feel free to open an issue or contribute to further improvements!

#Phi35 #FineTuning #NLP #MachineLearning #HuggingFace

1 reply

·

reacted to their post with 👍 over 1 year ago

Post

1729

Excited to share my new Gradio app featuring the impressive Llama-3.1-Storm-8B model!
This app demonstrates the capabilities of Llama-3.1-Storm-8B, an 8B parameter language model created by Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh,@akjindal53244
Key highlights of Llama-3.1-Storm-8B:

Outperforms Llama-3.1-8B-Instruct on multiple benchmarks:

Instruction Following (IFEval): +3.93%
Knowledge-driven QA (GPQA): +7.21%
Reduced Hallucinations (TruthfulQA): +9%
Function Calling (BFCL): +7.92%

Achieves impressive results with only 8B parameters
Uses innovative techniques like self-curation and model merging

Try out the model yourself: sagar007/lama_storm_8b

Kudos to the creators for pushing the boundaries of smaller language models! This work makes advanced AI more accessible and efficient.
#AI #NLP #MachineLearning #GradioApp #Llama3

reacted to merve's post with 👍 over 1 year ago

Post

2419

NVIDIA just dropped NVEagle 🦅

Super impressive vision language model that comes in 7B, 13B and 13B fine-tuned on chat 💬
Model repositories: merve/nveagle-66d0705108582d73bb235c26
Try it: NVEagle/Eagle-X5-13B-Chat 💬 (works very well! 🤯)

This model essentially explores having different experts (MoE) for image encoder part of vision language model.
How? 🧐
The authors concatenate the vision encoder output tokens together, and they apply "pre-alignment" essentially fine-tune experts with frozen text encoder.

Then they freeze both experts and the decoder and just train the projection layer, and finally, they unfreeze everything for supervised fine-tuning ✨

In the paper, they explore different fusion strategies and vision encoders, extending basic CLIP encoder, and figure out simply concatenating visual tokens works well.
Rest of the architecture is quite similar to LLaVA. (see below the architecture)

posted an update over 1 year ago

Post

1729

Excited to share my new Gradio app featuring the impressive Llama-3.1-Storm-8B model!
This app demonstrates the capabilities of Llama-3.1-Storm-8B, an 8B parameter language model created by Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh,@akjindal53244
Key highlights of Llama-3.1-Storm-8B:

Outperforms Llama-3.1-8B-Instruct on multiple benchmarks:

Instruction Following (IFEval): +3.93%
Knowledge-driven QA (GPQA): +7.21%
Reduced Hallucinations (TruthfulQA): +9%
Function Calling (BFCL): +7.92%

Achieves impressive results with only 8B parameters
Uses innovative techniques like self-curation and model merging

Try out the model yourself: sagar007/lama_storm_8b

Kudos to the creators for pushing the boundaries of smaller language models! This work makes advanced AI more accessible and efficient.
#AI #NLP #MachineLearning #GradioApp #Llama3

replied to their post over 1 year ago

This comment has been hidden

posted an update over 1 year ago

Post

648

📣 New Project Alert: Phi 3.5 Multimodal AI Demo 🎉
Excited to share my latest project that combines the power of Phi 3.5 text and vision models with text-to-speech capabilities!
🔑 Key Features:
1️⃣ Phi 3.5 Text Model for dynamic conversations
2️⃣ Phi 3.5 Vision Model for advanced image analysis
3️⃣ Text-to-Speech integration for an audio dimension
🛠️ Tech Stack:

Transformers
Gradio
PyTorch
Flash Attention 2
Parler TTS

This project demonstrates the potential of integrating multiple AI models to create a more comprehensive and interactive user experience. It's a step towards more natural and versatile AI assistants.
👉 Check out the demo and let me know your thoughts! How would you extend this project?
🔗 Demo Link: sagar007/Multimodal_App
#MultimodalAI #PhiModel #MachineLearning #AIDemo

1 reply

·

posted an update over 1 year ago

Post

984

📐 AI Math Equation Solver: Your Step-by-Step Solution Companion

Hello Hugging Face community! 👋 I'm excited to share my latest Space: the AI Math Equation Solver!

🔍 What does it do?

This Space uses the power of AI to solve math problems from images. Simply upload a picture of a math equation or problem, and the AI will provide a detailed, step-by-step solution. It's perfect for students, teachers, or anyone looking to understand complex mathematical concepts better.

🧠 How does it work?

- Backend: Utilizes the microsoft/Phi-3.5-vision-instruct model for image understanding and mathematical reasoning.
- Frontend: Built with Gradio for a clean, user-friendly interface.
- Features:
- Image upload for math problems
- Detailed step-by-step solutions
- Example problems to try instantly

🚀 Try it out! sagar007/phi-vision-math-assistant

Visit the Space here: [Insert your Hugging Face Space URL]

💡 Use cases:

- Students: Check your work or get help with homework
- Teachers: Create detailed solution guides quickly
- Tutors: Explain complex problems more effectively
- Self-learners: Understand new mathematical concepts

🛠️ Technical Details:

- Model: microsoft/Phi-3.5-vision-instruct
- Libraries: transformers, Gradio, PyTorch
- Optimizations: Uses Flash Attention for improved performance

🤝 Contribute:

This is an open project, and I welcome contributions! Whether it's improving the model, enhancing the UI, or adding new features, feel free to fork the project and submit your pull requests.

📣 Feedback:

I'd love to hear your thoughts! How are you using this Space? Any suggestions for improvements? Let me know in the comments below.

Happy problem-solving! 🎉

#MachineLearning #AI #Mathematics #Education #HuggingFace

posted an update over 1 year ago

Post

765

🌟 Enchanted Tales Generator: A GPT-2 Inspired Story Weaver 🌟

Hello Hugging Face community! I'm excited to share my latest project, the Enchanted Tales Generator, inspired by Andrej Karpathy's enlightening YouTube videos on GPT-2.

🧙‍♂️ What is the Enchanted Tales Generator?

The Enchanted Tales Generator is a magical text generation app that weaves whimsical stories based on your prompts. It's powered by a fine-tuned GPT model and brings the wonder of AI-generated storytelling to your fingertips.

🎥 Inspiration

This project was born from the inspiration I drew from Andrej Karpathy's incredible YouTube series on GPT-2. His clear explanations and deep insights into the workings of language models sparked my imagination and drove me to create something that could bring joy and creativity to others.

🔮 Features

- Story Incantation: Input your magical prompt to start your tale
- Scroll Length: Adjust the length of your generated story
- Magical Intensity: Control the creativity (temperature) of the generation
- Arcane Diversity: Fine-tune the variety of word choices (top-k sampling)

🚀 Try It Out: sagar007/GPT-2-with_gpu

You can experience the Enchanted Tales Generator right here on Hugging Face Spaces: [Insert your Spaces link here]

🛠️ Behind the Scenes

The app is built using:
- PyTorch for the underlying GPT model
- Gradio for the user interface
- Hugging Face Spaces for deployment

I've implemented techniques learned from Karpathy's videos, such as:
- Custom GPT architecture
- Top-k sampling for diverse outputs
- Temperature-controlled generation

🌈 Future Enchantments

I'm continuously working on improving the Enchanted Tales Generator. Some ideas for future updates include:
- Fine-tuning on specific genres of stories
- Adding options for character and setting generation
- Implementing more advanced sampling techniques

🤝 Join the Magic!

Sagar pallai

AI & ML interests

Recent Activity

Organizations

Sagar pallai

AI & ML interests

Recent Activity

Organizations

sagar007's activity