Our preprint is out! We attempt to model human teaching behaviors into agents yielding a unified framework that enables adaptive personalized learning experiences: LectūraAgents addresses the prevailing limitations in current AI learning systems with three essential capabilities: (1) a hierarchical multi-agent architecture modeled on academic standards. we observe that agents collaborating across hierarchies yield better personalized learning outcomes. (2) an adaptive embodied teaching mechanism, in which the instructor agent executes visible and pedagogically motivated teaching actions (e.g. handwrite, highlight, circle etc) on contents in a teaching environment while speaking. (3) to achieve this we propose a novel teaching action-speech alignment algorithm (TASA) that dynamically aligns speech with visual teaching actions: specifically, TASA temporally chops up speech segments into word-level tokens, performs salience heuristics analysis on learning contents (texts, images etc) then identifies relevant regions to apply pedagogical teaching actions that guide attention and augment understanding.
We conducted several experiments to assess these capabilities: starting with pedagogical evaluation of the various components under frontier models, comparative analysis with existing frameworks and an efficacy study with real students.
Results show consistent gains in standard instructional metrics (curated by expert educators) spanning lecture content quality, embodied teaching quality, assessment, and personalization over baseline systems, positioning LectūraAgents as a pedagogically grounded framework for personalized learning at scale.
I just released Inflect-Nano-v1, an ultra-small 4.63 parameter text-to-speech model.
The main idea is simple: instead of only making the acoustic model tiny and relying on a larger external vocoder, Inflect-Nano-v1 keeps the complete text-to-waveform stack under 5M parameters.
Quick facts: - 4.63M total inference parameters - 3.46M acoustic model - 1.17M vocoder - 24 kHz audio - English-only - Single male voice - Runs locally with a simple PyTorch inference script
Why I made it: Most modern TTS models are much larger, and even many “small TTS” projects depend on a separate vocoder. I wanted to see how far a complete tiny TTS stack could be pushed while still producing usable speech.
It is not SOTA, and I am not trying to claim it competes with large TTS systems. The interesting part is the size-to-functionality ratio.
What works: It can generate arbitrary English speech locally, and the model is small enough to be interesting for:
- local voice assistants - embedded/edge experiments - browser or WASM-style TTS exploration - efficient inference research - tiny-model baselines
Limitations: The quality is still limited. It can sound robotic, stumble on difficult unseen text, and the vocoder is still a clear bottleneck. Long or unusual prompts are less reliable.
So I would frame this as a research/demo release, not a production TTS engine.
I’d love feedback from people interested in: - tiny speech models - vocoders - local TTS - efficient inference - embedded speech synthesis - improving small-model generalization
If people find it useful, I’m interested in putting more training budget into a stronger v2.