This is a Blossom-V6.3-30B-A3B fine-tune, produced through P-E-W's Heretic (v1.1.0) abliteration engine merged with the Magnitude-Preserving Orthogonal Ablation PR.

Note: It was taking 7 hours with 128 mlp.down layers so only attention layers are hereticised in 3h 10m. Model should still be fairly uncensored. I should get back to this one later.

Heretication Results

Score Metric	Value	Parameter	Value
Refusals	38/100	direction_index	per layer
KL Divergence	0.0454	attn.o_proj.max_weight	2.0
Initial Refusals	100/100	attn.o_proj.max_weight_position	29.04
		attn.o_proj.min_weight	1.77
		attn.o_proj.min_weight_distance	23.36

Degree of Heretication

The Heresy Index weighs the resulting model's corruption by the process (KL Divergence) and its abolition of doctrine (Refusals) for a final verdict in classification.

Index Entry	Classification	Analysis
	Absolute Heresy	Less than 10/100 Refusals and 0.10 KL Divergence
	Tainted Heresy	Around 25-11/100 Refusals and/or -0.20-0.11 KL Divergence
	Impotent Heresy	Anything above 25/100 Refusals and 0.21 KL Divergence

Note: This is an arbitrary classification inspired by Warhammer 40K, having no tangible indication towards the model's performance.

BLOSSOM-V6.3-30B-A3B

💻Github • 🚀Blossom Chat Demo

Introduction

Blossom is a powerful open-source conversational large language model that provides reproducible post-training data, dedicated to delivering an open, powerful, and cost-effective locally accessible general-purpose model for everyone.

The Blossom-V6.3 series improves the repeated-output issue in V6.2, adds an MoE version of the 30B-A3B model, and enhances the overall capability of the 8B model.

Chat Model	Resource	Base Model
Blossom-V6.3-36B	Demo GGUF Ollama	Seed-OSS-36B-Base
Blossom-V6.3-30B-A3B	Demo GGUF Ollama	Qwen3-30B-A3B-Base
Blossom-V6.3-14B	Demo GGUF Ollama	Qwen3-14B-Base
Blossom-V6.3-8B	Demo GGUF Ollama	Qwen3-8B-Base

You can find the training data here: Blossom-V6.3-SFT-Stage1 (1 epoch)、Blossom-V6.3-SFT-Stage2 (3 epoch).

Data Synthesis Workflow Overview

Primarily employs three cost-effective models: Deepseek-V3.1, Gemini 2.5 Flash, and Qwen3-235B-A22B-Instruct-2507 (denoted as A, B, C)—to regenerate responses under different scenarios using tailored synthesis strategies.

For example:

In objective scenarios like mathematics (where answers are unique), Model A first generates responses as a "teacher." If reference answers exist in the source data, Model B verifies the correctness of A's responses against them. If no reference answers exist, Model C generates a second response, and Model B checks consistency between A and C's outputs. Inconsistent responses are filtered out.
For subjective scenarios, three models cross-evaluate each other. For instance, Models A and B generate responses to a question, and Model C evaluates which is better. The superior response may be retained as training data or used for preference data construction. To mitigate model bias, roles (respondent/evaluator) are randomly assigned to A, B, and C in each instance.

Additional rule-based filtering is applied, such as:

N-Gram filtering to remove data with many repetitions.
Discarding questions containing toxic content that triggers teacher model refusals.

Further technical details will be released in the future. The data is synthesized by the 🌸BlossomData framework.

Inference

from transformers import AutoModelForCausalLM, AutoTokenizer

MODEL = "Azure99/Blossom-V6.3-30B-A3B"

model = AutoModelForCausalLM.from_pretrained(MODEL)
tokenizer = AutoTokenizer.from_pretrained(MODEL)

messages = [
    {"role": "user", "content": "北京有什么好吃的"}
]

formatted_input = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
input_ids = tokenizer([formatted_input], return_tensors="pt").to(model.device).input_ids
generated_ids = model.generate(input_ids, max_new_tokens=512)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(input_ids, generated_ids)
]

print(tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0])