license: apache-2.0 Important!!

Gemma-3-12B-Heretic-X (Sikaworld High-Fidelity Edition) This is the ultra-dynamic, fully uncensored text encoder for LTX-2, based on the experimental Heretic-X fine-tune by LastRef. While the standard abliterated version removes the "refusal" mechanism, Heretic-X was actively steered with a custom dataset to be proactively descriptive and uninhibited. In LTX-2 video generation, this translates to significantly stronger motion vectors, helping to "unfreeze" static videos and generate more intense dynamics in complex scenes. This edition applies the Sikaworld High-Fidelity Quantization method to tame the aggressive nature of Heretic-X, ensuring that the increased dynamics do not come at the cost of facial symmetry or anatomical coherence.

๐Ÿš€ Key Features Aggressive Uncensoring (Heretic-X): Unlike standard abliteration (which just deletes the refusal direction), this model uses modified weights (attn.o_proj, mlp.down_proj) derived from x-rated dataset training. It delivers a "louder" and more confident signal to the video transformer, which is often the cure for "frozen" I2V generations. High-Fidelity Layer Protection (The Stabilizer): Aggressive fine-tunes can often lead to "melting" faces in video. This version uses a Mixed Precision Strategy: The critical input layers (0-1) and the final output layers (44-47), as well as all LayerNorms and Biases, are kept in BF16. This acts as a safety rail, keeping facial features symmetric while allowing the body and background to move dynamically. True Standalone (.safetensors): Includes the embedded spiece_model tensor. It works as a single-file plug-and-play solution in ComfyUI (LTX-2) without requiring external tokenizer.model files or complex folder structures. Surgical Extraction: Stripped of the 20GB+ Vision-Tower weights (which LTX-2 does not use) to save VRAM and loading time, while retaining the full 48-layer text intelligence of the 24GB BF16 source.

๐Ÿ›  Usage in ComfyUI Place the .safetensors file in your ComfyUI/models/text_encoders/ folder. In your LTX-2 workflow (DualCLIPLoader), select this model. Recommended Dtype: Set weight_dtype to fp8_e4m3fn (the critical layers remain BF16 automatically). Prompting Tip: This model reacts very well to "action verbs" at the very beginning of the prompt. It requires less CFG scale than standard models to produce motion.

๐Ÿ“Š Technical Background

Why Heretic-X for Video? LTX-2 (especially the Dev version) often suffers from "motion collapse" (frozen video) when the text embedding is too neutral. Heretic-X provides a higher variance in its embeddings. Why this Quantization? Standard FP8 conversions of Heretic models often result in "weird" artifacts because the aggressive weights clip during quantization. By protecting the last 4 layers (44-47) in BF16, we ensure that the final instructions sent to the Video Transformer retain their high-precision spatial alignment, preventing the "uncanny valley" effect often seen in dynamic clips. Credits Base Model: Google Gemma 3 Heretic Fine-tune: LastRef Optimization & Architecture Fixes: Sikaworld

Base Model: Google Gemma 3 Heretic Fine-tune: LastRef https://huggingface.co/LastRef/gemma-3-12b-it-heretic-x Optimization & Architecture Fixes: Sikaworld

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Sikaworld1990/gemma3-12B-hereticx-sikaworld-ltx-2

Finetuned
(221)
this model