license: apache-2.0 Important!!
Gemma-3-12B-Heretic-X (Sikaworld High-Fidelity Edition) This is the ultra-dynamic, fully uncensored text encoder for LTX-2, based on the experimental Heretic-X fine-tune by LastRef. While the standard abliterated version removes the "refusal" mechanism, Heretic-X was actively steered with a custom dataset to be proactively descriptive and uninhibited. In LTX-2 video generation, this translates to significantly stronger motion vectors, helping to "unfreeze" static videos and generate more intense dynamics in complex scenes. This edition applies the Sikaworld High-Fidelity Quantization method to tame the aggressive nature of Heretic-X, ensuring that the increased dynamics do not come at the cost of facial symmetry or anatomical coherence.
๐ Key Features Aggressive Uncensoring (Heretic-X): Unlike standard abliteration (which just deletes the refusal direction), this model uses modified weights (attn.o_proj, mlp.down_proj) derived from x-rated dataset training. It delivers a "louder" and more confident signal to the video transformer, which is often the cure for "frozen" I2V generations. High-Fidelity Layer Protection (The Stabilizer): Aggressive fine-tunes can often lead to "melting" faces in video. This version uses a Mixed Precision Strategy: The critical input layers (0-1) and the final output layers (44-47), as well as all LayerNorms and Biases, are kept in BF16. This acts as a safety rail, keeping facial features symmetric while allowing the body and background to move dynamically. True Standalone (.safetensors): Includes the embedded spiece_model tensor. It works as a single-file plug-and-play solution in ComfyUI (LTX-2) without requiring external tokenizer.model files or complex folder structures. Surgical Extraction: Stripped of the 20GB+ Vision-Tower weights (which LTX-2 does not use) to save VRAM and loading time, while retaining the full 48-layer text intelligence of the 24GB BF16 source.
๐ Usage in ComfyUI Place the .safetensors file in your ComfyUI/models/text_encoders/ folder. In your LTX-2 workflow (DualCLIPLoader), select this model. Recommended Dtype: Set weight_dtype to fp8_e4m3fn (the critical layers remain BF16 automatically). Prompting Tip: This model reacts very well to "action verbs" at the very beginning of the prompt. It requires less CFG scale than standard models to produce motion.
๐ Technical Background
Why Heretic-X for Video? LTX-2 (especially the Dev version) often suffers from "motion collapse" (frozen video) when the text embedding is too neutral. Heretic-X provides a higher variance in its embeddings. Why this Quantization? Standard FP8 conversions of Heretic models often result in "weird" artifacts because the aggressive weights clip during quantization. By protecting the last 4 layers (44-47) in BF16, we ensure that the final instructions sent to the Video Transformer retain their high-precision spatial alignment, preventing the "uncanny valley" effect often seen in dynamic clips. Credits Base Model: Google Gemma 3 Heretic Fine-tune: LastRef Optimization & Architecture Fixes: Sikaworld
Base Model: Google Gemma 3 Heretic Fine-tune: LastRef https://huggingface.co/LastRef/gemma-3-12b-it-heretic-x Optimization & Architecture Fixes: Sikaworld