AI & ML interests

The Fellowship is a network of exceptional people from different backgrounds who contribute to open-source machine learning πŸ§™β€β™‚οΈπŸ¦Έβ€β™€οΈπŸ¦ΉπŸ§β€β™‚οΈ

Recent Activity

prithivMLmodsΒ 
posted an update about 12 hours ago
view post
Post
157
HY-World-2.0 β€” A Multi-Modal World Model for Reconstructing, Generating, and Simulating 3D Worlds is now available on Spaces, and it works both as native Gradio components and in Gradio server mode.

> HY-World-2.0-Demo: prithivMLmods/HY-World-2.0-Demo
> HY-World-2.0 [Server Mode]: prithivMLmods/HY-World-2.0-Demo
> Featuring 3D reconstruction and Gaussian splats with the Rerun viewer, along with camera poses, depth maps, and surface normals.
> In Server Mode, Gradio is served via FastAPI, with FastAPI remaining the top-level server.
> Model: tencent/HY-World-2.0
> GitHub: https://github.com/PRITHIVSAKTHIUR/HY-World-2.0-Demo

πŸ€—To learn more, visit the app page or the respective model pages.
prithivMLmodsΒ 
posted an update 6 days ago
view post
Post
6104
A new comparator on Spaces showcases Standard FLUX.2 Decoder vs. FLUX.2 Small Decoder. The Small Decoder is ~1.4Γ— faster, uses ~1.4Γ— less VRAM, and maintains near-identical image quality. It has ~28M parameters with narrower channels [96, 192, 384, 384] vs. [128, 256, 512, 512], and the demo supports sequence generation by running both decoders simultaneously and comparing the results side by side.

πŸ€— Comparator: prithivMLmods/Flux.2-4B-Decoder-Comparator
πŸ”— FLUX.2-small-decoder: black-forest-labs/FLUX.2-small-decoder
πŸ”— GitHub: https://github.com/PRITHIVSAKTHIUR/Flux.2-4B-Encoder-Comparator
🚁 Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

πŸ€— > App built on the Gradio SDK. To learn more, visit the app page or the respective model pages.
prithivMLmodsΒ 
posted an update 7 days ago
view post
Post
4168
Now, a collection of various compression schemes for Gemma 4 and the abliterated version 1 of dense models is available on the Hub. Check it out via the links below. πŸ‘‡

πŸ”—Gemma 4 Compression(s)- https://huggingface.co/collections/prithivMLmods/gemma-4-compressions
πŸ”—Gemma 4 Uncensored [MAX] + Compression(s) - [`Ξ² ]- https://huggingface.co/collections/prithivMLmods/gemma-4-uncensored-max-compressions
πŸ”—Gemma 4 Compression(s) - MoE- https://huggingface.co/collections/prithivMLmods/gemma-4-compressions-moe
πŸ”—Gemma-4 F32 GGUF- https://huggingface.co/collections/prithivMLmods/gemma-4-f32-gguf

πŸ€— > To learn more, visit the app page or the respective model pages.
fffiloniΒ 
posted an update 8 days ago
view post
Post
3015
✨ PASD Magnify is back on Hugging Face Spaces

fffiloni/PASD

PASD isn’t recent, but still delivers strong results β€” worth restoring rather than replacing.

Getting it to run again wasn’t a simple dependency issue.
It relied on parts of diffusers that no longer exist, while moving to Gradio 6 forced a much newer HF stack β€” and I couldn’t modify the original source directly.

Recreating the old environment wasn’t practical.
So I patched the downloaded code at runtime before import and made it compatible with today’s stack.

That ended up being the only approach that held without forking or freezing everything to outdated versions.

If you’ve used it before (or are curious), feel free to give it another try.
tomaarsenΒ 
posted an update 9 days ago
view post
Post
440
🌐 I've just published Sentence Transformers v5.4 to make the project fully multimodal for embeddings and reranking. The release also includes a modular CrossEncoder, and automatic Flash Attention 2 input flattening. Details:

You can now use SentenceTransformer and CrossEncoder with text, images, audio, and video, with the same familiar API. That means you can compute embeddings for an image and a text query using model.encode(), compare them with model.similarity(), and it just works. Models like Qwen3-VL-Embedding-2B and jinaai/jina-reranker-m0 are supported out of the box.

Beyond multimodal, I also fully modularized the CrossEncoder class. It's now a torch.nn.Sequential of composable modules, just like SentenceTransformer has been. This unlocked support for generative rerankers (CausalLM-based models like mxbai-rerank-v2 and the Qwen3 rerankers) via a new LogitScore module, which wasn't possible before without custom code.

Also, Flash Attention 2 now automatically skips padding for text-only inputs. If your batch has a mix of short and long texts, this gives you a nice speedup and lower VRAM usage for free.

I wrote a blog post walking through the multimodal features with practical examples. Check it out if you want to get started, or just point your Agent to the URL: https://huggingface.co/blog/multimodal-sentence-transformers

This release has set up the groundwork for more easily introducing late-interaction models (both text-only and multimodal) into Sentence Transformers in the next major release. I'm looking forward to it!
prithivMLmodsΒ 
posted an update 10 days ago
view post
Post
2269
Now the demo for image detection based on SAM3 and Gemma-4 (*Filter) is available on Spaces, using full-fledged Transformers inference with multimodal reasoning for processed images. It also supports video segmentation (mask), video segmentation (annotation), and image click segmentation.

πŸ€— Demo Space: prithivMLmods/SAM3-Gemma4-CUDA
πŸ₯½ SAM3: facebook/sam3
πŸ”— gemma-4-E2B-it: google/gemma-4-E2B-it

To learn more, visit the app page or the respective model pages.
  • 1 reply
Β·
prithivMLmodsΒ 
posted an update 13 days ago
view post
Post
4737
The demo for Image Detection (*Filter) based on SAM3 and Qwen-3.5 is now available on Hugging Face Spaces using Transformers inference, with multimodal reasoning for processed images, and it also supports video segmentation (mask), video segmentation (annotation), and image click segmentation.

πŸ€— Demo Space: prithivMLmods/SAM3-Plus-Qwen3.5
πŸ₯½ SAM3: facebook/sam3
πŸ”— Qwen-3.5: Qwen/Qwen3.5-2B

To learn more, visit the app page or the respective model pages.
  • 5 replies
Β·
fffiloniΒ 
posted an update 17 days ago
view post
Post
2848
βœ… Back up and running!

My TIGER app is now fully working again, with fixes and full compatibility with Gradio 6 πŸš€

It lets you:
- πŸŽ™οΈ Separate multiple speakers from an audio file
- 🎬 Extract each speaker directly from a video
- 🎧 Split audio into dialog, music, and sound effects (DnR)
- πŸŽ₯ Apply DnR separation directly on videos

All powered by lightweight TIGER models for fast and efficient speech separation.

Try it here πŸ‘‰ fffiloni/TIGER-audio-extraction
fffiloniΒ 
posted an update 18 days ago
view post
Post
2240
AniDoc is back πŸŽ‰

I’ve fixed the Space and brought it back to life:
- βœ… Working again after being broken for a while
- βœ… Updated to Gradio 6
- βœ… Compatible with ZeroGPU
- βœ… Output videos now preserve original resolution and FPS

I also added advanced controls so you can experiment more (tracking, seed, motion, sketch).

Try it here: fffiloni/AniDoc
prithivMLmodsΒ 
posted an update 22 days ago
view post
Post
5296
Flux-Klein-KV-Edit-Consistency demo is now available on Spaces. It preserves character identity and delivers high-quality, realistic results after edits. No need for any special prompts, just upload the image, type your prompt, and get the resulting image blazing fast.

πŸ”₯ Demo Space: prithivMLmods/flux-klein-kv-edit-consistency
πŸ€— Model: black-forest-labs/FLUX.2-klein-9b-kv
πŸ€— Collection: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection
πŸ”— Gradio Server Mode: https://www.gradio.app/main/guides/server-mode

βž” Built with Headless Gradio, an alternative to using gr.Blocks for creating the frontend and triggering events, powered by FastAPI + Gradio. You can now design the frontend however you want, with continued support for APIs, MCP, and ZeroGPU.

βž” Gradio Server Mode is now available from gradio@v6.10.0.

To learn more, visit the app page or the respective model pages.
prithivMLmodsΒ 
posted an update 29 days ago
view post
Post
4471
Map-Anything v1 (Universal Feed-Forward Metric 3D Reconstruction) demo is now available on Hugging Face Spaces. Built with Gradio and integrated with Rerun, it performs multi-image and video-based 3D reconstruction, depth, normal map, and interactive measurements.

πŸ€— Demo: prithivMLmods/Map-Anything-v1
πŸ€— Model: facebook/map-anything-v1
πŸ€— Hf-Papers: MapAnything: Universal Feed-Forward Metric 3D Reconstruction (2509.13414)
fffiloniΒ 
posted an update about 1 month ago
view post
Post
4125
I brought DALLΒ·E mini back to life πŸ€–πŸŽ¨

You can try it here:
fffiloni/dalle-mini-reboot

And I also built a batch version using Hugging Face Jobs (up to 50 images per prompt):
fffiloni/dalle-mini-via-jobs

The goal was to stay close to the original JAX/Flax pipeline, while integrating it with modern tooling (Gradio + Jobs).

It ended up being a fun way to revisit this model β€” still weird, still fun πŸ˜„
  • 4 replies
Β·
prithivMLmodsΒ 
posted an update about 1 month ago
view post
Post
3123
Introducing QIE-Bbox-Studio! πŸ”₯πŸ€—

The QIE-Bbox-Studio demo is now live β€” more precise and packed with more options. Users can manipulate images with object removal, design addition, and even move objects from one place to another, all in just 4-step fast inference.

πŸ€— Demo: prithivMLmods/QIE-Bbox-Studio
πŸ”— GitHub: https://github.com/PRITHIVSAKTHIUR/QIE-Bbox-Studio

πŸš€ Models [LoRA] :

● QIE-2511-Object-Mover-Bbox: prithivMLmods/QIE-2511-Object-Mover-Bbox
● QIE-2511-Object-Remover-Bbox-v3: prithivMLmods/QIE-2511-Object-Remover-Bbox-v3
● QIE-2511-Outfit-Design-Layout: prithivMLmods/QIE-2511-Outfit-Design-Layout
● QIE-2509-Object-Remover-Bbox-v3: prithivMLmods/QIE-2509-Object-Remover-Bbox-v3
● QIE-2509-Object-Mover-Bbox: prithivMLmods/QIE-2509-Object-Mover-Bbox

πŸš€ Collection:

● Qwen Image Edit [Layout Bbox]: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-layout-bbox

To learn more, visit the app page or the respective model pages.
prithivMLmodsΒ 
posted an update about 1 month ago
view post
Post
5090
QIE-2509-Object-Remover-Bbox-v3 is a more stable version of the Qwen Image Edit visual grounding–based object removal model. The app was previously featured in HF Spaces of the Week and is now updated with the latest Bbox-v3 LoRA adapter.

πŸ€— Demo: prithivMLmods/QIE-Object-Remover-Bbox
πŸ€— LoRA: prithivMLmods/QIE-2509-Object-Remover-Bbox-v3
πŸ€— Collection: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-layout-bbox

To learn more, visit the app page or the respective model pages.
  • 2 replies
Β·
fffiloniΒ 
posted an update about 1 month ago
view post
Post
492
A clearer demo for TADA (now multilingual) πŸ”ŠπŸŒ

I improved the public demo for TADA β€” a generative framework for speech modeling via text–acoustic dual alignment.

TADA models speech as a joint sequence of text tokens and acoustic tokens, using a transformer backbone to keep text and audio synchronized during generation.

The original demo already exposed these mechanisms, but the workflow made the pipeline hard to understand.

This updated demo makes the process clearer:

β€’ load the model
β€’ prepare a reference voice (optionally with transcript or Whisper auto-transcription)
β€’ generate speech conditioned on that reference

It also adds multilingual support.

Presets are included for a few languages, but the model supports more:

English, French, Spanish, German, Arabic, Mandarin Chinese, Italian, Japanese, Polish, Portuguese

Feel free to try different voices, accents, or languages and see how the alignment behaves.

πŸ‘‰ fffiloni/tada-dual-alignment-tts-demo

Paper
TADA: A Generative Framework for Speech Modeling via Text-Acoustic Dual Alignment (2602.23068)
prithivMLmodsΒ 
posted an update about 1 month ago
view post
Post
5044
The Qwen3.5 Multimodal Understanding Demo, powered by Qwen3.5-2B, is now available on HF Spaces! It is a lightweight model designed for fast image and video reasoning. Built with Gradio, the demo showcases Image QA, Video QA, object detection, and 2D point tracking, along with real-time token streaming.

πŸ€— Demo: prithivMLmods/Qwen-3.5-HF-Demo
βœ… Collection: https://huggingface.co/collections/prithivMLmods/multimodal-implementations
πŸ”— Qwen3.5-2B: Qwen/Qwen3.5-2B

To learn more, visit the app page or the respective model pages.
prithivMLmodsΒ 
posted an update about 2 months ago
view post
Post
4018
QIE-Object-Remover-Bbox Demo removes objects and artifacts from selected regions using bounding box grounding. Built on Qwen-Image-Edit-2509 with Rapid Diffusers acceleration, it delivers fast 4-step inference via the QIE-2509 adapter. πŸ€—πŸ”₯

πŸ”—Demo Space: prithivMLmods/QIE-Object-Remover-Bbox
πŸ”—Qwen-Image-Edit-Rapid-AIO: prithivMLmods/Qwen-Image-Edit-Rapid-AIO-V4
πŸ”—Adapter-(LoRA): prithivMLmods/QIE-2509-Object-Remover-Bbox

πŸ”—Collection: https://huggingface.co/collections/prithivMLmods/qwen-image-edit-layout-bbox

To learn more, visit the app page or the respective model pages.
  • 1 reply
Β·
prithivMLmodsΒ 
posted an update about 2 months ago
view post
Post
2557
FireRed-Image-Edit-1.0 (Rapid) Fast Experimental Demo is Out! πŸš€πŸ€—

Demo: prithivMLmods/FireRed-Image-Edit-1.0-Fast

-> Paired the EditPlusPipeline with the Diffusers-compatible transformer weights of Rapid AIO from Qwen-Image-Edit. (experimental)
-> This fusion delivers more accurate instruction following, higher image quality, and consistent visual coherence @ 4-step fast inference.
-> Better maintains text styles with high fidelity, along with high-quality old photo restoration, enhancement, and best-in-class virtual try-on.

prithivMLmodsΒ 
posted an update about 2 months ago