AI & ML interests

Al & ML interests

wangbuer999 
posted an update 2 days ago
view post
Post
2350
Hands-on testing of HY-World 2.0 shows a significant improvement in end-to-end engineering maturity compared to version 1.5

The model supports direct multimodal input from text, single-frame images, and video. Inference can be launched without camera intrinsic/extrinsic calibration or additional preprocessing

After panorama generation, the built-in Spatial Agent automatically performs semantic navigation path planning. Combined with spatial consistency constraints from HY-WorldStereo, it ensures artifact-free multi-view generation and stable geometric alignment

Outputs include standard 3D asset formats such as Mesh, 3DGS, and point clouds, which can be directly imported into Unity/UE

It is suitable for engineering scenarios including game level prototyping, digital twins, and embodied simulation
wangbuer999 
posted an update 3 months ago
view post
Post
2650
HunyuanImage 3.0-Instruct just dropped

fresh -sourceImage 3.0model! Spent 20 mins testing it on a Messi + retro scrambler fusion case

Ran on diffusers v0.26.3 + CUDA 12.1 | 8B MoE params (1.3B activated) | zero VRAM issues

strength=0.9 Messi #10 kit/tattoo sharp, moto’s rusted metal texture blurred (classic open-source pain)
strength=0.7 Moto/cobblestone background crisp, Messi’s jersey details faded completely

strength=0.75 + prompt "Blend seamlessly, keep all original details": both subject & background sharp
No ControlNet, no manual masking the model’s chain-of-thought reasoning parses image+prompt first
Already outperforms Qwen-Image-Edit 2511 (GSB eval +25.7% on single-image edits) | 100% open-source

👉 Repo: https://hunyuan.tencent.com/chat/HunyuanDefault?from=modelSquare&modelId=Hunyuan-Image-3.0-Instruct

technical report:https://arxiv.org/abs/2509.23951

Anyone else struggled with strength tweaks for fusion? This fixed it for my Messi+moto case did it work as well for yours?
  • 6 replies
·
wangbuer999 
posted an update 3 months ago
view post
Post
3223
HY-MT1.5-1.8B Lightweight Translation Model Open-Source Game-Changer

Tencent raised the bar for lightweight translation!

Supports bidirectional translation across 36 languages total—33 mainstream languages + 5 ethnic/minority dialects

With only 1.8B parameters (less than 1/3 the size of HY-MT1.5-7B), it delivers performance on par with the 7B counterpart and outperforms most commercial translation APIs.

✅ Quantized versions (FP8/GPTQ-Int4) available for edge device deployment, perfect for real-time translation
✅ Full support for terminology intervention, context-aware translation, and formatted output
✅ Ready-to-use prompt templates + seamless integration with Hugging Face Transformers
✅ Recommended transformers ≥ 4.56.0 (FP8 model requires compressed-tensors 0.11.0)

10+ Hugging Face Spaces already integrated this model!

👉 Model Repo: tencent/HY-MT1.5-1.8B
👉 Technical Report: https://arxiv.org/abs/2512.24092
wangbuer999 
posted an update 3 months ago
view post
Post
3171
Qwen-Image-Edit LoRA 96 Camera Angles for 3D-Consistent Image Tweaks

fal/Qwen-Image-Edit-2511-Multiple-Angles-LoRA levels up perspective editing

96 poses (4 elevations × 8 azimuths × 3 distances) – close-ups, wide shots, all angles covered

Trained on 3000+ Gaussian Splatting renders – 3D consistency holds even for -30° low-angle shots

Works with Qwen/Qwen-Image-Edit-2511 base models (LoRA strength 0.8-1.0) + ComfyUI workflow included
Tested it – plug-and-play, no fussy setup.

fal/Qwen-Image-Edit-2511-Multiple-Angles-LoRA
wangbuer999 
published a Space 3 months ago