Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
Lyte 
posted an update 10 days ago
Post
5583
Introducing Nanochat Moroccan

Nanochat Moroccan is the first language model family built specifically for Moroccan Darija.

This project brings together a small family of models and datasets centered on Darija, with the goal of building something genuinely useful for a language that is still underserved in AI.

1. Models

- KandirResearch/Nanochat-Moroccan-Base-0.7B
- KandirResearch/Nanochat-Moroccan-Instruct-0.7B-pt-raw
- KandirResearch/Nanochat-Moroccan-Instruct-0.7B

2. Data

- Lyte/darija-pretraining-corpus
- Lyte/darija-pretraining-corpus-nanochat
- Lyte/Moroccan-Darija-Instruct-573K
- GemMaroc/TULU-3-50k-darija-english

3. Collection

- https://huggingface.co/collections/KandirResearch/nanochat-the-first-moroccan-darija-language-model-family

Moroccan Darija is spoken by millions of people, yet it remains underrepresented in language technology. Nanochat Moroccan is a step toward building tools that take the language seriously.

You are welcome to try it and chat with it here:
Lyte/Nanochat-Moroccco-Instruct
In this post