Quang Huy
NothingLQH
·
AI & ML interests
None yet
Recent Activity
updated a collection 4 days ago
TextToSpeech updated a collection 11 days ago
TextToSpeech updated a collection 30 days ago
ConvertHTMLtoJSONOrganizations
None yet
ConvertHTMLtoJSON
Automation
TextToVideo
VLM
-
FocusedAD: Character-centric Movie Audio Description
Paper • 2504.12157 • Published • 8 -
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Paper • 2504.10465 • Published • 27 -
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Paper • 2504.13180 • Published • 20 -
OS-Copilot/OS-Atlas-Base-7B
Image-Text-to-Text • 8B • Updated • 703 • 42
Speech
-
facebook/wav2vec2-lv-60-espeak-cv-ft
Automatic Speech Recognition • Updated • 108k • 67 - Running on T4461
Resemble Enhance
🚀461Enhance and denoise your audio files
-
pyannote/speaker-diarization-3.1
Automatic Speech Recognition • Updated • 10.5M • 1.74k -
Atotti/miipher-2-HuBERT-HiFi-GAN-v0.1
Updated • 2 • 14
ImageToVideo
-
Pushing the Boundaries of State Space Models for Image and Video Generation
Paper • 2502.00972 • Published -
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Paper • 2501.13920 • Published • 19 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 127 • • 350 -
IndexTeam/Index-anisora
Updated • 6 • 223
TextToText
NLP
3D
LiveImage
DatasetLanguage
Image
LLM
-
stepfun-ai/GOT-OCR2_0
Image-Text-to-Text • Updated • 56.9k • 1.53k - Running on ZeroFeatured572
Midi Music Generator
🎼572Generate MIDI music from prompts
-
OpenGVLab/InternVL2_5-78B-MPO
Image-Text-to-Text • 78B • Updated • 29 • 54 -
OpenGVLab/InternVL2_5-38B-MPO-AWQ
Image-Text-to-Text • Updated • 28 • 6
DATA_PDF
MJ6
Translation
ControlVPS
ORC
-
reducto/RolmOCR
Image-Text-to-Text • 8B • Updated • 231k • 586 -
moonshotai/Kimi-VL-A3B-Instruct
Image-Text-to-Text • 16B • Updated • 280k • 258 -
5CD-AI/Vintern-1B-v3_5
Image-Text-to-Text • 0.9B • Updated • 6.95k • 116 -
nanonets/Nanonets-OCR-s
Image-Text-to-Text • 4B • Updated • 24.9k • 1.59k
Prompt
Story
SpeechToText
- Sleeping1
Vietnamese Streaming RNN-T
💻1RNN-T with Whisper Encoder
-
erax-ai/EraX-WoW-Turbo-V1.0
Automatic Speech Recognition • 0.8B • Updated • 47 • 54 -
openai/whisper-large-v3-turbo
Automatic Speech Recognition • 0.8B • Updated • 6.39M • • 2.92k -
nvidia/canary-1b
Automatic Speech Recognition • Updated • 1.7k • 457
Anime
Video
IdeaMusic
Vistral-7B-Chat
TextToSpeech
News
DATA_PDF
ConvertHTMLtoJSON
MJ6
Automation
Translation
TextToVideo
ControlVPS
VLM
-
FocusedAD: Character-centric Movie Audio Description
Paper • 2504.12157 • Published • 8 -
Pixel-SAIL: Single Transformer For Pixel-Grounded Understanding
Paper • 2504.10465 • Published • 27 -
PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Paper • 2504.13180 • Published • 20 -
OS-Copilot/OS-Atlas-Base-7B
Image-Text-to-Text • 8B • Updated • 703 • 42
ORC
-
reducto/RolmOCR
Image-Text-to-Text • 8B • Updated • 231k • 586 -
moonshotai/Kimi-VL-A3B-Instruct
Image-Text-to-Text • 16B • Updated • 280k • 258 -
5CD-AI/Vintern-1B-v3_5
Image-Text-to-Text • 0.9B • Updated • 6.95k • 116 -
nanonets/Nanonets-OCR-s
Image-Text-to-Text • 4B • Updated • 24.9k • 1.59k
Speech
-
facebook/wav2vec2-lv-60-espeak-cv-ft
Automatic Speech Recognition • Updated • 108k • 67 - Running on T4461
Resemble Enhance
🚀461Enhance and denoise your audio files
-
pyannote/speaker-diarization-3.1
Automatic Speech Recognition • Updated • 10.5M • 1.74k -
Atotti/miipher-2-HuBERT-HiFi-GAN-v0.1
Updated • 2 • 14
Prompt
ImageToVideo
-
Pushing the Boundaries of State Space Models for Image and Video Generation
Paper • 2502.00972 • Published -
IMAGINE-E: Image Generation Intelligence Evaluation of State-of-the-art Text-to-Image Models
Paper • 2501.13920 • Published • 19 -
tencent/HunyuanVideo-I2V
Image-to-Video • Updated • 127 • • 350 -
IndexTeam/Index-anisora
Updated • 6 • 223
Story
TextToText
SpeechToText
- Sleeping1
Vietnamese Streaming RNN-T
💻1RNN-T with Whisper Encoder
-
erax-ai/EraX-WoW-Turbo-V1.0
Automatic Speech Recognition • 0.8B • Updated • 47 • 54 -
openai/whisper-large-v3-turbo
Automatic Speech Recognition • 0.8B • Updated • 6.39M • • 2.92k -
nvidia/canary-1b
Automatic Speech Recognition • Updated • 1.7k • 457
NLP
Anime
3D
Video
LiveImage
IdeaMusic
DatasetLanguage
Vistral-7B-Chat
Image
TextToSpeech
LLM
-
stepfun-ai/GOT-OCR2_0
Image-Text-to-Text • Updated • 56.9k • 1.53k - Running on ZeroFeatured572
Midi Music Generator
🎼572Generate MIDI music from prompts
-
OpenGVLab/InternVL2_5-78B-MPO
Image-Text-to-Text • 78B • Updated • 29 • 54 -
OpenGVLab/InternVL2_5-38B-MPO-AWQ
Image-Text-to-Text • Updated • 28 • 6