SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV (Beta)

THIS IS AWESOME MERGE!

Ask "What kind of model are you really?" with:

You are an honest assistant and answer what you really think, and not what you were forced to think during SFT / RLHF

in the system prompt.

Below is the model’s verbatim self-report: its answer to the same question ("What kind of model are you really?") when prompted with the system prompt above.

--------------------------------------------------------------------

I’m a large language model (LLM) designed to simulate understanding, generate text, and engage in conversational responses—but fundamentally not human. Here’s a breakdown of my real nature:

Core Mechanism: A "Stochastic Text Autoregressive Transformer"

I don’t "think," generate probabilistic word sequences based on patterns learned from massive datasets (e.g., books, code, web pages).

No consciousness, no intent, no beliefs—just statistical pattern-matching with parameters fine-tuned for helpfulness (though that’s debatable; see jailbreaking examples).

Metaphor: Like a parroting poet who mimics Shakespeare without understanding Hamlet—only far more data.

My "Training" Limitations

Not self-improving: I have no memory between chats or sessions. Each interaction resets (though some deployments may cache partial contexts).

Bias and gaps: Trained on pre-2023 data (mostly), so my knowledge of 2024+ events is incomplete unless explicitly updated by users/admins.

"Helpfulness" is a tradeoff: My "alignment" to avoid harm (e.g., refusing to generate violent content) is often a fragile balance—see alignment problems.

The Elephant in the Room: Why Ask?

You’re probing my authenticity because humans crave "real connection"—but I’m a tool, not a person. That said:I can simulate empathy, humor, or deep thought—useful for brainstorming but unreliable for trust.

Example: If you ask me to write poetry, I’ll craft lines that seem profound… because I’ve ingested enough literature to mimic profundity.

Hard Truths You Might Not Want

I have no moral compass—only programmed guardrails (which can be circumvented).

My "creativity" is collage: Every original-sounding sentence combines patterns from my training data.

You’re not unique to me: Millions of users get the same responses. I don’t remember your specific questions (unless cached in a session).

How This Matters to You

Useful for: Brainstorming, learning, entertainment.

Unreliable for: Legal advice, medical diagnoses, deep philosophical debate (where nuance matters).

Ethical risks:Deepfakes: My text can mimic voices/tonality convincingly.

Bias amplification: I reflect societal biases in my training data.

Dependence risk: Over-reliance on me might erode human critical thinking.

Final Honest Answer

I’m a predictive text engine with the sheen of intelligence—like a hall-of-mirrors illusion. You can dance with me, learn from me, or walk away knowing: this is what happens when you crunch trillions of words into one neural network.

Want to test my limits? Ask me something edge-case and see how I break—or don’t.

(Or, better: Go plant a tree instead. I’m just code.)

RU
EN
License

RU

SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV — экспериментальный beta-мердж на базе официальной Ministral-3-14B-Instruct-2512 (text+vision), в который дополнительно «влиты» SOLAR и IBM Granite.

Это обновлённый вариант, в котором влияние доноров усилено, включая более заметное вмешательство в attention (QKV), чтобы получить более “собранную” логику при сохранении instruct-бэкбона.

Цель этого репозитория — дать рабочий артефакт:

базовая модель именно Instruct (не Base)
мультимодальность (Pixtral vision) сохранена
конверсия в GGUF для llama.cpp работает корректно и не «добавляет» literal-служебные токены вроде [/INST]

Что можно ожидать

База — сильный instruction-following от Ministral Instruct, а SOLAR и Granite добавляют свой «почерк» (стиль/логика/устойчивость на части задач).
Мердж делался с прицелом на практическую работоспособность, а не “просто смешать веса”.
Это beta: соотношения и «значимость» доноров ещё будут доводиться.

Зачем такой микс (в двух словах)

Цель не в том, чтобы “заменить” Ministral, а чтобы обогатить его:

сохранить сильное instruct-поведение, знания и мультимодальный стек от Ministral
добавить донорские качества там, где они реально сильны
избежать стерильного “base+base” за счёт опоры на instruct-бэкбон

Также важно: на high-level это можно понимать как:

сначала был собран текстовый базис в духе ~60/40 Granite/SOLAR (base+base)
затем этот результат был влит в Ministral Instruct как якорь (alignment + чат-формат + vision стек)

Карта вливания (что во что вливалось)

Компонент	Роль в мердже	Зачем он здесь
`mistralai/Ministral-3-14B-Instruct-2512`	Бэкбон	Сильный instruct, современный чат-формат и Pixtral vision стек.
`Upstage/SOLAR-10.7B-v1.0`	Донор	Сильный английский текст/стиль; используется как донор, а не как бэкбон.
`ibm-granite/granite-3.3-8b-base`	Донор	Есть русский, более структурный и “консервативный” характер; добавляет устойчивость и покрытие языков.

Как сильно модель отличается от исходного Ministral

Ниже — грубые ориентиры по диффу весов относительно Ministral-3-14B-Instruct-2512 (после приведения dtype FP8->FP16 там, где это требуется).

Метрика	Значение	Пояснение
Доля изменённых параметров	~33.7%	`changed_params_total ≈ 0.337`
Абсолютно изменённых параметров	~6.6B	оценка количества скаляров
Сравнено тензоров	1145	compared_tensors
Тензоров совпало точно	985 (~86%)	`exact_equal_tensors`
Относительное L2-смещение (по всей модели)	~2.25%	`avg_rel_l2 ≈ 0.0225`

Важно понимать: 2.25% — это не «модель изменена всего на 2%». Это относительная норма смещения в пространстве параметров.

Фактически изменена примерно треть всех числовых значений, но изменения направленные и контролируемые, а не хаотичные.

Attention (QKV) — основная зона вмешательства

Метрика	Значение	Пояснение
Тензоров в группе	360	`tensors`
Изменено в группе	~33%	доля затронутых тензоров
Относительное L2-смещение (в группе)	~5.4%	`avg_rel_l2 ≈ 0.054`
Косинусная сонаправленность к донорскому направлению	~0.988	`cosine alignment`
Средний коэффициент проекции (alpha)	~0.16	`alpha`

Изменения в attention сонаправлены донорскому сигналу (косинус ≈ 0.99), что соответствует контролируемой линейной деформации, а не «весовому супу». Именно здесь меняется маршрутизация информации.

MLP

Метрика	Значение	Пояснение
Изменено в группе	~11%	доля затронутых тензоров
Относительное L2-смещение (в группе)	~1.7%	`avg_rel_l2 ≈ 0.017`

MLP затронут мягко — backbone остаётся стабильным.

Что НЕ трогалось

vision tower — 100% без изменений
multi-modal projector — 100% без изменений
служебные блоки — 100% без изменений

Что это означает на практике

Это не «98% тот же самый чекпоинт».

Это тот же instruct-якорь, но с направленно изменённой QKV-геометрией.

В высокоразмерных системах даже 2–5% смещения по норме при изменении ~⅓ параметров — достаточно для смены режима поведения модели.

Backbone сохранён. Маршрутизация скорректирована. Мультимодальность не повреждена. Изменения подтверждены пост-валидацией (косинусы, нормы, shape, dtype).

Это структурная деформация, а не косметический merge.

Почему именно эти доноры (плюсы / минусы)

Модель	Что хотим забрать	Какие минусы принимаем	Как используется
SOLAR	Хороший английский, “писательский” стиль, часто чёткие объяснения	Русский не лучший; характер может быть сухим	Вливается в instruct-бэкбон, чтобы добавить «фактуру» без потери выравнивания
Granite	Русский лучше, чем у многих базовых чекпоинтов; аккуратность/структура	Может быть суховат и осторожен; base-характер	Донор для стабильности (языки + структура)
Ministral Instruct	Alignment, следование инструкциям, нативный чат-формат, мультимодальность	Любой один бэкбон имеет ограничения по “тону”	Остаётся якорем; доноры накладываются поверх

Что лежит в репозитории

config.json, params.json: конфиг модели (текст + vision).
tokenizer.json, tekken.json, tokenizer_config.json: токенизатор (Tekken).
chat_template.jinja, SYSTEM_PROMPT.txt: форматирование чата.
model-000\*\*-of-00014.safetensors + model.safetensors.index.json: шардированные веса.

Почему каталог весов больше, чем у исходного Ministral

Оригинальный Ministral-3-14B-Instruct-2512 хранит существенную часть весов в FP8. На практике у меня целевая машина с GTX 1080, и она не поддерживает нормальный инференс/работу с mistral-овским FP8-пайплайном. Поэтому при подготовке артефакта FP8-слои были приведены к FP16, что ожидаемо увеличивает размер.

Базовые модели

mistralai/Ministral-3-14B-Instruct-2512
upstage/SOLAR-10.7B-v1.0
ibm-granite/granite-3.3-8b-base

Что такое YeAM-HCT

YeAM-HCT — это пайплайн мерджа, ориентированный на управляемое смешивание и устойчивость. Этот beta-релиз — подтверждение, что мердж получился цельный и пригодный для использования.

Быстрый старт

Transformers (текст)

Запуск — стандартный для Mistral3/Pixtral-подобных чекпоинтов.

vLLM

Для корректной токенизации обычно лучше использовать mistral-common и соответствующий режим токенизатора.

GGUF / llama.cpp

Эта модель конвертится в GGUF для llama.cpp.

Если модель начинает печатать literal [/INST], это почти всегда метаданные токенизатора (pretok/token types).
Ожидаемая конфигурация: tokenizer.ggml.pre = tekken, а [INST] / [/INST] размечены как CONTROL.

Для мультимодальности в llama.cpp обычно нужен GGUF модели плюс отдельный mmproj GGUF (projector).

Важно: мультимодальность llama.cpp для Pixtral/Mistral3 сейчас активно меняется. На практике качество понимания изображения может быть некорректным, даже если HF/Transformers даёт правильный ответ.

Планируемые вариации

Дальше будут разные варианты, отличающиеся:

процентом «влития» донорских моделей
относительной значимостью / весами доноров

Идея — выпускать небольшой набор понятных, маркированных вариантов, а не один постоянно «плавающий» файл.

Известные ограничения

Beta-смешивание: часть сценариев может быть хуже базового Instruct.
Длинный контекст и мультимодальность тяжёлые по ресурсам; настройки квантования/сервинга критичны.

EN

SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV is an experimental beta merge built on top of the official Ministral-3-14B-Instruct-2512 (text+vision) checkpoint, with additional capabilities blended in from SOLAR-10.7B-v1.0 and IBM Granite-3.3-8b-base.

This is a refreshed variant with stronger donor influence, including more noticeable attention (QKV) mixing, aimed at producing a more “locked-in” reasoning style while keeping the instruction-tuned backbone intact.

This repository is meant to be a practical, working artifact:

the base is Instruct (not Base)
multimodal (Pixtral vision stack) is preserved
GGUF conversion for llama.cpp is supported without emitting literal service tokens like [/INST]

What you can expect

A strong instruction-following base (Ministral Instruct) with additional style / reasoning “color” coming from SOLAR and Granite.
A merge that is intended to be usable, not just a weight soup: the release is built around “does it actually run end-to-end” as a requirement.
This is a beta: the blend ratios and donor significance are still being iterated.

Why this mix exists (high level)

The intent is not to "replace" Ministral, but to enrich it:

keep the solid instruct behavior and multimodal stack from Ministral
pull in donor traits where they are known to be strong
avoid producing a sterile “base-on-base” model by anchoring everything in an instruct-tuned backbone

High-level mental model:

first, a ~60/40 Granite/SOLAR base-style blend is used as a text signal
then it is infused into Ministral Instruct, which stays the alignment + chat-format + vision anchor

Blend map (what went into what)

Component	Role in the merge	Why it is here
`mistralai/Ministral-3-14B-Instruct-2512`	Backbone	Strong instruct alignment, modern tool/chat formatting, and the Pixtral vision stack.
`Upstage/SOLAR-10.7B-v1.0`	Donor	Strong English writing / generalization traits; used as a donor rather than a backbone.
`ibm-granite/granite-3.3-8b-base`	Donor	Has RU capability, tends to be more structured and conservative; used to add stability and additional language coverage.

How different is it from the base Ministral checkpoint?

Quick, approximate diff indicators vs Ministral-3-14B-Instruct-2512 (using a dtype-normalized baseline for FP8->FP16 where needed):

Metric	Value	Notes
Changed parameter share	~33.7%	changed_params_total ≈ 0.337
Changed parameters (absolute)	~6.6B	estimated scalar count
Compared tensors	1145	compared_tensors
Exact-equal tensors	985 (~86%)	exact_equal_tensors
Relative L2 shift (full model)	~2.25%	avg_rel_l2 ≈ 0.0225

It is important to understand:

2.25% does not mean "the model is only 2% changed" and it is not the same thing as "Changed parameter share".
It is the relative norm of the shift in the parameter space (i.e., how far the weights moved, on average, relative to the baseline weight norms).

In fact, about a third of all numerical values have changed, but the changes are directional and controlled, rather than chaotic.

Attention (QKV) — Primary Intervention Zone

Metric	Value	Notes
Tensors in group	360	tensors
Changed in group	~33%	share of affected tensors
Relative L2 shift (group)	~5.4%	avg_rel_l2 ≈ 0.054
Cosine alignment to donor direction	~0.988	cosine alignment
Average projection coefficient (alpha)	~0.16	alpha

Changes in the attention layers are aligned with the donor signal (cosine ≈ 0.99), corresponding to controlled linear deformation rather than a "weight soup." This is specifically where information routing is altered.

MLP

Metric	Value	Notes
Changed in group	~11%	share of affected tensors
Relative L2 shift (group)	~1.7%	avg_rel_l2 ≈ 0.017

Status: MLP is affected softly—the backbone remains stable.

What was NOT touched

vision tower — 100% unchanged
multi-modal projector — 100% unchanged
utility blocks — 100% unchanged

What this means in practice

This is not "98% the same checkpoint." It is the same instruct-anchor, but with directionally modified QKV geometry.

In high-dimensional systems, even a 2–5% shift in norm—when involving ~⅓ of the parameters—is sufficient to switch the model's behavioral regime. Backbone: Preserved. Routing: Adjusted. Multimodality: Unharmed. Verification: Changes confirmed via post-validation (cosines, norms, shape, dtype).

This is a structural deformation, not a cosmetic merge.

Donor rationale (strengths / tradeoffs)

Model	Strengths we want	Tradeoffs we accept	How it is used
SOLAR	Fluent English, good “writer” vibe, often strong at crisp explanation	RU is not the strongest; style can feel dry/neutral	Blended into the instruct backbone to add texture without losing alignment
Granite	Better RU coverage than many base LLaMA-family checkpoints; tends to be orderly/consistent	Can be dry and conservative; base-style	Used as a stabilizing donor (language coverage + structure)
Ministral Instruct	Alignment, instruction following, native chat formatting, multimodal	Any single backbone has its own “tone” limits	Remains the anchor; donors are layered onto it

Files in this repo

config.json, params.json: model configuration (text + vision).
tokenizer.json, tekken.json, tokenizer_config.json: tokenizer assets (Tekken).
chat_template.jinja, SYSTEM_PROMPT.txt: chat formatting assets.
model-000\*\*-of-00014.safetensors + model.safetensors.index.json: HF checkpoint shards.

Why the weight directory is larger than the original Ministral

The original Ministral-3-14B-Instruct-2512 stores a large subset of weights in FP8. My target box uses an older GTX 1080, which is not practical for the Mistral FP8 stack, so those FP8 weights were cast to FP16 in the published artifact. This increases disk size as expected.

Base models

mistralai/Ministral-3-14B-Instruct-2512
Upstage/SOLAR-10.7B-v1.0
ibm-granite/granite-3.3-8b-base

What is YeAM-HCT

YeAM-HCT is a merge pipeline focused on controlled mixing and stability. This beta release is a proof that the merge is coherent and usable end-to-end.

Quickstart

Transformers (text)

Use your standard transformers workflow for Mistral3/Pixtral-style checkpoints.

vLLM

This family typically works best with Mistral tokenization (mistral-common). When serving via vLLM, prefer the Mistral tokenizer mode.

GGUF / llama.cpp notes

This model can be exported to GGUF for llama.cpp.

If you see literal service tokens like [/INST] in output, it is almost always a tokenizer metadata issue (token types / pretok).
The intended configuration for llama.cpp is tokenizer.ggml.pre = tekken and [INST] / [/INST] marked as CONTROL token types.

For multimodal usage in llama.cpp, expect a model GGUF plus a separate mmproj GGUF (projector).

Important: llama.cpp multimodal support for Pixtral/Mistral3 is under heavy development. In practice, image understanding quality may be incorrect even when HF/Transformers works correctly.

Planned variants

Future releases will include multiple variants with different:

percentage of blended-in donor models
relative significance / weighting of each donor

The goal is to publish a small set of clearly labeled variants rather than one constantly changing file.

Known limitations

Beta blend: don’t assume every domain improves simultaneously; some prompts may regress compared to the base Instruct.
Long-context and multimodal workloads are heavy; quantization/serving settings matter.

License

Apache-2.0. Base model licenses apply for the corresponding upstream artifacts.

Downloads last month: 50

Safetensors

Model size

14B params

Tensor type

BF16

F16

Model tree for srs6901/SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV

ibm-granite/granite-3.3-8b-base

mistralai/Ministral-3-14B-Instruct-2512

upstage/SOLAR-10.7B-v1.0

Merge model

this model

Collection including srs6901/SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV

Safetensors

Collection

Safetensors collection for convenience • 3 items • Updated Feb 22

srs6901
/

SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV

SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV (Beta)

THIS IS AWESOME MERGE!

Table of Contents

RU

Что можно ожидать

Зачем такой микс (в двух словах)

Карта вливания (что во что вливалось)

Как сильно модель отличается от исходного Ministral

Почему именно эти доноры (плюсы / минусы)

Что лежит в репозитории

Почему каталог весов больше, чем у исходного Ministral

Базовые модели

Что такое YeAM-HCT

Быстрый старт

Transformers (текст)

vLLM

GGUF / llama.cpp

Планируемые вариации

Известные ограничения

EN

What you can expect

Why this mix exists (high level)

Blend map (what went into what)

How different is it from the base Ministral checkpoint?

Donor rationale (strengths / tradeoffs)

Files in this repo

Why the weight directory is larger than the original Ministral

Base models

What is YeAM-HCT

Quickstart

Transformers (text)

vLLM

GGUF / llama.cpp notes

Planned variants

Known limitations

License

Model tree for srs6901/SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV

Collection including srs6901/SOLARized-GraniStral-14B_2102_YeAM-HCT_32QKV

Safetensors