AI & ML interests

None defined yet.

fffiloniย 
posted an update 1 day ago
view post
Post
1136
โœ… Back up and running!

My TIGER app is now fully working again, with fixes and full compatibility with Gradio 6 ๐Ÿš€

It lets you:
- ๐ŸŽ™๏ธ Separate multiple speakers from an audio file
- ๐ŸŽฌ Extract each speaker directly from a video
- ๐ŸŽง Split audio into dialog, music, and sound effects (DnR)
- ๐ŸŽฅ Apply DnR separation directly on videos

All powered by lightweight TIGER models for fast and efficient speech separation.

Try it here ๐Ÿ‘‰ fffiloni/TIGER-audio-extraction
fffiloniย 
posted an update 2 days ago
view post
Post
2116
AniDoc is back ๐ŸŽ‰

Iโ€™ve fixed the Space and brought it back to life:
- โœ… Working again after being broken for a while
- โœ… Updated to Gradio 6
- โœ… Compatible with ZeroGPU
- โœ… Output videos now preserve original resolution and FPS

I also added advanced controls so you can experiment more (tracking, seed, motion, sketch).

Try it here: fffiloni/AniDoc
fffiloniย 
posted an update 15 days ago
view post
Post
4094
I brought DALLยทE mini back to life ๐Ÿค–๐ŸŽจ

You can try it here:
fffiloni/dalle-mini-reboot

And I also built a batch version using Hugging Face Jobs (up to 50 images per prompt):
fffiloni/dalle-mini-via-jobs

The goal was to stay close to the original JAX/Flax pipeline, while integrating it with modern tooling (Gradio + Jobs).

It ended up being a fun way to revisit this model โ€” still weird, still fun ๐Ÿ˜„
  • 3 replies
ยท
fffiloniย 
posted an update 21 days ago
view post
Post
478
A clearer demo for TADA (now multilingual) ๐Ÿ”Š๐ŸŒ

I improved the public demo for TADA โ€” a generative framework for speech modeling via textโ€“acoustic dual alignment.

TADA models speech as a joint sequence of text tokens and acoustic tokens, using a transformer backbone to keep text and audio synchronized during generation.

The original demo already exposed these mechanisms, but the workflow made the pipeline hard to understand.

This updated demo makes the process clearer:

โ€ข load the model
โ€ข prepare a reference voice (optionally with transcript or Whisper auto-transcription)
โ€ข generate speech conditioned on that reference

It also adds multilingual support.

Presets are included for a few languages, but the model supports more:

English, French, Spanish, German, Arabic, Mandarin Chinese, Italian, Japanese, Polish, Portuguese

Feel free to try different voices, accents, or languages and see how the alignment behaves.

๐Ÿ‘‰ fffiloni/tada-dual-alignment-tts-demo

Paper
TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (2602.23068)
fffiloniย 
posted an update about 1 year ago
view post
Post
28161
I was thinking i need to step up my game on training Flux LoRas models, time to have some fun ! โ˜€๏ธ

Expect a new drop per week on aesthetics that catched my attention, here are 3 of them that worked really well !

fffiloni/cute-comic-800
fffiloni/carbo-800
fffiloni/oniric-750
  • 3 replies
ยท
fffiloniย 
posted an update about 1 year ago
view post
Post
3606
Explain like i'm 5 the last take from @thomwolf on X about Dario's essay on DeepSeek:

โ€”โ€บ Open-source AI is like a big cookbook that everyone can read and improve. Instead of a few chefs keeping their recipes secret, anyone can cook, test, and invent new things.

If only one company controls AI, everything stops if they have a problemโ€”like when the internet goes down. With open-source, many people can help, making sure it keeps running smoothly.

AI isnโ€™t just a race between two countries; itโ€™s a team effort around the world. By sharing, we move faster and create safer technology for everyone.
โ€”
๐Ÿค—
fffiloniย 
posted an update over 1 year ago
fffiloniย 
posted an update over 1 year ago
view post
Post
20040
Visionary Walter Murch (editor for Francis Ford Coppola), in 1999:

โ€œ So let's suppose a technical apotheosis some time in the middle of the 21st century, when it somehow becomes possible for one person to make an entire feature film, with virtual actors. Would this be a good thing?

If the history of oil painting is any guide, the broadest answer would be yes, with the obvious caution to keep a wary eye on the destabilizing effect of following too intently a hermetically personal vision. One need only look at the unraveling of painting or classical music in the 20th century to see the risks.

Let's go even further, and force the issue to its ultimate conclusion by supposing the diabolical invention of a black box that could directly convert a single person's thoughts into a viewable cinematic reality. You would attach a series of electrodes to various points on your skull and simply think the film into existence.

And since we are time-traveling, let us present this hypothetical invention as a Faustian bargain to the future filmmakers of the 21st century. If this box were offered by some mysterious cloaked figure in exchange for your eternal soul, would you take it?

The kind of filmmakers who would accept, even leap, at the offer are driven by the desire to see their own vision on screen in as pure a form as possible. They accept present levels of collaboration as the evil necessary to achieve this vision. Alfred Hitchcock, I imagine, would be one of them, judging from his description of the creative process: "The film is already made in my head before we start shooting."โ€
โ€”
Read "A Digital Cinema of the Mind? Could Be" by Walter Murch: https://archive.nytimes.com/www.nytimes.com/library/film/050299future-film.html

  • 1 reply
ยท
fffiloniย 
posted an update almost 2 years ago
view post
Post
19571
๐Ÿ‡ซ๐Ÿ‡ท
Quel impact de lโ€™IA sur les filiรจres du cinรฉma, de lโ€™audiovisuel et du jeu vidรฉo?
Etude prospective ร  destination des professionnels
โ€” CNC & BearingPoint | 09/04/2024

Si lโ€™Intelligence Artificielle (IA) est utilisรฉe de longue date dans les secteurs du cinรฉma, de lโ€™audiovisuel et du jeu vidรฉo, les nouvelles applications de lโ€™IA gรฉnรฉrative bousculent notre vision de ce dont est capable une machine et possรจdent un potentiel de transformation inรฉdit. Elles impressionnent par la qualitรฉ de leurs productions et suscitent par consรฉquent de nombreux dรฉbats, entre attentes et apprรฉhensions.

Le CNC a donc dรฉcider de lancer un nouvel Observatoire de lโ€™IA Afin de mieux comprendre les usages de lโ€™IA et ses impacts rรฉels sur la filiรจre de lโ€™image. Dans le cadre de cet Observatoire, le CNC a souhaitรฉ dresser un premier รฉtat des lieux ร  travers la cartographie des usages actuels ou potentiels de lโ€™IA ร  chaque รฉtape du processus de crรฉation et de diffusion dโ€™une ล“uvre, en identifiant les opportunitรฉs et risques associรฉs, notamment en termes de mรฉtiers et dโ€™emploi. Cette รฉtude CNC / Bearing Point en a prรฉsentรฉ les principaux enseignements le 6 mars, lors de la journรฉe CNC ยซ Crรฉer, produire, diffuser ร  lโ€™heure de lโ€™intelligence artificielle ยป.

Le CNC publie la version augmentรฉe de la cartographie des usages de lโ€™IA dans les filiรจres du cinรฉma, de lโ€™audiovisuel et du jeu vidรฉo.

Lien vers la cartographie complรจte: https://www.cnc.fr/documents/36995/2097582/Cartographie+des+usages+IA_rapport+complet.pdf/96532829-747e-b85e-c74b-af313072cab7?t=1712309387891
  • 4 replies
ยท
fffiloniย 
updated a Space almost 2 years ago
fffiloniย 
posted an update about 2 years ago
view post
Post
"The principle of explainability of ai and its application in organizations"
Louis Vuarin, Vรฉronique Steyer
โ€”โ€บย ๐Ÿ“” https://doi.org/10.3917/res.240.0179

ABSTRACT: The explainability of Artificial Intelligence (AI) is cited in the literature as a pillar of AI ethics, yet few studies explore its organizational reality. This study proposes to remedy this shortcoming, based on interviews with actors in charge of designing and implementing AI in 17 organizations. Our results highlight: the massive substitution of explainability by the emphasis on performance indicators; the substitution of the requirement of understanding by a requirement of accountability; and the ambiguous place of industry experts within design processes, where they are employed to validate the apparent coherence of โ€˜black-boxโ€™ algorithms rather than to open and understand them. In organizational practice, explainability thus appears sufficiently undefined to reconcile contradictory injunctions. Comparing prescriptions in the literature and practices in the field, we discuss the risk of crystallizing these organizational issues via the standardization of management tools used as part of (or instead of) AI explainability.

Vuarin, Louis, et Vรฉronique Steyer. ยซ Le principe dโ€™explicabilitรฉ de lโ€™IA et son application dans les organisations ยป, Rรฉseaux, vol. 240, no. 4, 2023, pp. 179-210.

#ArtificialIntelligence #AIEthics #Explainability #Accountability
fffiloniย 
posted an update about 2 years ago
view post
Post
I'm happy to announce that โœจ Image to Music v2 โœจ is ready for you to try and i hope you'll like it too ! ๐Ÿ˜Œ

This new version has been crafted with transparency in mind,
so you can understand the process of translating an image to a musical equivalent.

How does it works under the hood ? ๐Ÿค”

First, we get a very literal caption from microsoft/kosmos-2-patch14-224; this caption is then given to a LLM Agent (currently HuggingFaceH4/zephyr-7b-beta )which task is to translate the image caption to a musical and inspirational prompt for the next step.

Once we got a nice musical text from the LLM, we can send it to the text-to-music model of your choice:
MAGNet, MusicGen, AudioLDM-2, Riffusion or Mustango

Instead of the previous version of Image to Music which used Mubert API, and could output curious and obscure combinations, we only provide open sourced models available on the hub, called via the gradio API.

Also i guess the music result should be more accurate to the atmosphere of the image input, thanks to the LLM Agent step.

Pro tip, you can adjust the inspirational prompt to match your expectations, according to the chosen model and specific behavior of each one ๐Ÿ‘Œ

Try it, explore different models and tell me which one is your favorite ๐Ÿค—
โ€”โ€บ fffiloni/image-to-music-v2
  • 11 replies
ยท
fffiloniย 
posted an update about 2 years ago
view post
Post
InstantID-2V is out ! โœจ

It's like InstantID, but you get a video instead. Nothing crazy here, it's simply a shortcut between two demos.

Let's see how it does work with gradio API:

1. We call InstantX/InstantID with a conditional pose from cinematic camera shot (example provided in the demo)
2. Then we send the previous generated image to ali-vilab/i2vgen-xl

Et voilร  ๐Ÿค— Try it : fffiloni/InstantID-2V

โ€”
Note that generation can be quite long, so take the opportunity to brew you some coffee ๐Ÿ˜Œ
If you want to skip the queue, you can of course reproduce this pipeline manually
  • 2 replies
ยท
fffiloniย 
posted an update about 2 years ago
view post
Post
Quick build of the day: LCM Supa Fast Image Variation
โ€”
We take the opportunity to combine moondream1 vision and LCM SDXL fast abilities to generate a variation from the subject of the image input.
All that thanks to gradio APIs ๐Ÿค—

Try the space: https://huggingface.co/spaces/fffiloni/lcm-img-variations
  • 3 replies
ยท
fffiloniย 
posted an update about 2 years ago
view post
Post
Just published a quick community blog post mainly aimed at Art and Design students, but which is also an attempt to nudge AI researchers who would like to better consider benefits from collaboration with designers and artists ๐Ÿ˜‰
Feel free to share your thoughts !

"Breaking Barriers: The Critical Role of Art and Design in Advancing AI Capabilities" ๐Ÿ“„ https://huggingface.co/blog/fffiloni/the-critical-role-of-art-and-design-in-advancing-a

โ€”
This short publication follows the results of two AI Workshops that took place at ร‰cole des Arts Dรฉcoratifs - Paris, lead by Etienne Mineur, Vadim Bernard, Martin de Bie, Antoine Pintout & Sylvain Filoni.
  • 3 replies
ยท
fffiloniย 
posted an update about 2 years ago
view post
Post
I just published a Gradio demo for AliBaba's DreamTalk ๐Ÿค—

Try it now: fffiloni/dreamtalk
Paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models (2312.09767)
โ€”
DreamTalk is a diffusion-based audio-driven expressive talking head generation framework that can produce high-quality talking head videos across diverse speaking styles. DreamTalk exhibits robust performance with a diverse array of inputs, including songs, speech in multiple languages, noisy audio, and out-of-domain portraits.
fffiloniย 
posted an update over 2 years ago
view post
Post
just setting up my new hf social posts account feature ๐Ÿค—
  • 1 reply
ยท