README.md · OpenReasonAI/Graphite1.0-4B at main

Starred09

Clean model card for public Graphite 1.0 4B release

68ada46 19 days ago

11.7 kB

	---
	license: apache-2.0
	base_model: Qwen/Qwen3.5-4B-Base
	library_name: peft
	pipeline_tag: text-generation
	model_name: Graphite 1.0 4B
	language:
	- en
	- ru
	tags:
	- qwen
	- qwen3.5
	- peft
	- lora
	- unsloth
	- trl
	- sft
	- code
	- reasoning
	- bilingual
	- obsidian
	- graphite
	---

	# Graphite 1.0 4B

	`Graphite 1.0 4B` is the first public LoRA adapter from the Graphite / Obsidian-Critic training stream. It is built on top of [`Qwen/Qwen3.5-4B-Base`](https://huggingface.co/Qwen/Qwen3.5-4B-Base) and tuned for strict, grounded, low-noise responses across:

	- repo repair and debugging
	- agent tool-use formatting
	- technical writing and Markdown workflows
	- code review and integration tasks
	- logic and factual precision
	- bilingual Russian / English instruction following

	## What This Repository Contains

	This repo contains a LoRA adapter, not merged base weights.

	- Base model: `Qwen/Qwen3.5-4B-Base`
	- Adapter type: `LoRA`
	- Rank: `r=16`
	- Alpha: `16`
	- Dropout: `0.0`
	- Target modules: `q_proj`, `k_proj`, `v_proj`, `o_proj`, `gate_proj`, `up_proj`, `down_proj`

	Files of interest:

	- `adapter_model.safetensors`: LoRA weights
	- `adapter_config.json`: PEFT adapter config
	- `tokenizer.json`, `tokenizer_config.json`, `chat_template.jinja`: tokenizer assets
	- `run_summary.json`: public training run summary
	- `length_stats.json`: length filtering summary
	- `masking_sanity.json`: formatting sanity check

	## Training Lineage

	This adapter corresponds to the first public Graphite 1.0 4B full fine-tune stream.

	- dataset family: `obsidian-critic-broad-mix-20260321`
	- training stack: Unsloth + TRL + torchrun DDP
	- base model: `Qwen/Qwen3.5-4B-Base`

	Notebook lineage used for this stream:

	- `obsidian_critic_qwen35_t4x2_unsloth_kaggle.ipynb`: smoke-test notebook for the broad mix
	- `obsidian_critic_qwen35_t4x2_unsloth_kaggle_full.ipynb`: full fine-tune lineage used to produce the public LoRA run

	## Dataset Provenance

	The training data for this first public stream comes from the mixed dataset:

	- dataset name: `obsidian-critic-broad-mix-20260321`
	- examples in mixed dataset: `37,008`
	- approximate token volume: `6,885,960`
	- exact duplicate `(user, assistant)` pairs removed during mix build: `3,469`
	- normalized near-duplicates removed from wave backfill rows: `201`
	- dataset SHA-256: `5ba1924b46d08a8ab8ad7ed5e1f74b13cc3e847b3a04b714934953975fd9300a`

	The public training run then created a deterministic train / validation split and applied sequence-length filtering:

	- train rows before filter: `36,638`
	- validation rows before filter: `370`
	- train rows after filter: `36,081`
	- validation rows after filter: `363`
	- removed for length filtering: `564`
	- minimum kept sequence length: `48`
	- maximum kept sequence length: `2048`

	### Mix Roles

	\| Role \| Examples \| Approx. tokens \|
	\| --- \| ---: \| ---: \|
	\| `repair` \| 5,353 \| 983,890 \|
	\| `tool_use` \| 4,682 \| 455,600 \|
	\| `core_real` \| 4,200 \| 1,043,187 \|
	\| `robustness` \| 3,600 \| 397,399 \|
	\| `agent_core` \| 3,200 \| 645,426 \|
	\| `logic` \| 3,031 \| 297,812 \|
	\| `factual` \| 2,960 \| 142,787 \|
	\| `obsidian_docs` \| 2,740 \| 490,850 \|
	\| `reasoning` \| 2,200 \| 655,624 \|
	\| `greenfield` \| 1,488 \| 563,331 \|
	\| `integration` \| 1,473 \| 241,331 \|
	\| `review` \| 1,327 \| 343,007 \|
	\| `regularizer` \| 500 \| 55,840 \|
	\| `wave_backfill` \| 230 \| 218,922 \|
	\| `long_context` \| 24 \| 350,954 \|

	### Source Dataset Table

	\| Dataset \| Role \| Examples \| Approx. tokens \|
	\| --- \| --- \| ---: \| ---: \|
	\| `real-world-grounded-topup-sft-20260320` \| `core_real` \| 3,000 \| 804,280 \|
	\| `robustness-noise-traps-sft-20260320` \| `robustness` \| 3,200 \| 363,139 \|
	\| `factual-erudition-sft-20260319` \| `factual` \| 2,960 \| 142,787 \|
	\| `agent-gap-fixes-sft-20260320` \| `agent_core` \| 2,500 \| 555,785 \|
	\| `code-fix-critical-topup-sft-20260321` \| `repair` \| 2,500 \| 440,355 \|
	\| `code-agent-tooluse-sft-20260319` \| `tool_use` \| 2,400 \| 208,335 \|
	\| `docs-engineering-review-topup-sft-20260320` \| `obsidian_docs` \| 1,600 \| 206,794 \|
	\| `format-tool-discipline-sft-20260319` \| `tool_use` \| 1,582 \| 179,009 \|
	\| `multi-step-debug-sft-20260319` \| `reasoning` \| 1,200 \| 345,084 \|
	\| `real-world-seed-expansion-sft-20260321` \| `core_real` \| 1,200 \| 238,907 \|
	\| `runtime-debug-grounded-sft-20260319` \| `repair` \| 1,193 \| 218,845 \|
	\| `logic-core-sft-20260319` \| `logic` \| 1,131 \| 139,188 \|
	\| `code-architecture-sft-20260319` \| `greenfield` \| 1,100 \| 308,380 \|
	\| `tdd-test-first-sft-20260319` \| `reasoning` \| 1,000 \| 310,540 \|
	\| `logic-sanity-sft-20260319` \| `logic` \| 996 \| 69,260 \|
	\| `code-repair-patch-sft-20260319` \| `repair` \| 955 \| 145,544 \|
	\| `logic-precision-ru-sft-20260319` \| `logic` \| 904 \| 89,364 \|
	\| `security-repair-review-sft-20260319` \| `review` \| 893 \| 159,086 \|
	\| `db-and-migrations-sft-20260319` \| `integration` \| 867 \| 119,740 \|
	\| `agent-gap-fixes-ru-topup-sft-20260320` \| `agent_core` \| 700 \| 89,641 \|
	\| `code-agent-tooluse-ru-topup-sft-20260320` \| `tool_use` \| 700 \| 68,256 \|
	\| `multi-file-repo-repair-sft-20260319` \| `repair` \| 705 \| 179,146 \|
	\| `backend-frontend-ops-sft-20260319` \| `integration` \| 606 \| 121,591 \|
	\| `docs-topup-sft-20260320` \| `obsidian_docs` \| 600 \| 125,125 \|
	\| `anti-overthinking-pack-sft-20260321` \| `regularizer` \| 500 \| 55,840 \|
	\| `docs-markdown-sft-20260318-v3` \| `obsidian_docs` \| 440 \| 149,852 \|
	\| `ts-rust-code-review-sft-20260318-v3` \| `review` \| 434 \| 183,921 \|
	\| `robustness-noise-traps-ru-topup-sft-20260320` \| `robustness` \| 400 \| 34,260 \|
	\| `ts-rust-coding-sft-20260318-v3` \| `greenfield` \| 388 \| 254,951 \|
	\| `wave-03-growth-sft-20260320` \| `wave_backfill` \| 230 \| 218,922 \|
	\| `docs-topup-ru-sft-20260320` \| `obsidian_docs` \| 100 \| 9,079 \|
	\| `long-context-memory-topup-sft-20260321` \| `long_context` \| 24 \| 350,954 \|

	## Representative Training Examples

	These are short excerpts from the actual broad-mix and component datasets used in the first public stream.

	### Broad mix: grounded repo repair

	System
	```text
	Ground the answer in one active boundary, not in general advice.
	```

	User
	```text
	First fix failed. Repo: packages/notes-index
	Patch 1 removed one fallback in `app/config.py`.
	New evidence: `docker compose: service worker exited with code 1`.
	Need the second minimal fix.
	```

	Assistant
	```text
	The first patch hit the wrong seam. The new signal points back to `app/config.py`.
	```

	### Broad mix: Russian ambiguity tightening

	System
	```text
	Пиши как строгий русскоязычный логический редактор: коротко, ясно, без филологии ради филологии.
	```

	User
	```text
	Оцени формулировку из инструкции:
	`Разрешён доступ аналитики и владельцы без резервной копией.`
	```

	Assistant
	```text
	Да. Неясно, относится `без резервной копией` к обеим группам или только ко второй.
	Строже: `И аналитики, и владельцы получают доступ без резервной копией.`
	```

	### Tool-use source: strict JSON action selection

	System
	```text
	Ты code-workspace агент. Возвращай только валидный JSON.
	```

	User
	```text
	Исправь этот сломанный tool call и верни только валидный JSON.
	{"tool":"search_codebase","args":{"query":"normalize_note_path","path":"src","limit":"4"}}
	```

	Assistant
	```json
	{"tool":"search_codebase","arguments":{"query":"normalize_note_path","path":"src"}}
	```

	### Docs source: structured README generation

	System
	```text
	Ты technical writer для engineering workflows. Только полезная структура.
	```

	User
	```text
	Напиши полноценный README.md для `md-audit`.
	Контекст: scan Markdown repositories for broken links, empty headings, duplicate slugs, and metadata regressions.
	```

	### Logic source: contradiction checking

	System
	```text
	Будь аккуратен с условиями и кванторами. Предпочитай точность эффектности.
	```

	User
	```text
	Проверь набор на непротиворечивость.
	1. `Борис` гость -> не может публиковать.
	2. `Борис` гость.
	3. `Борис` может публиковать.
	```

	Assistant
	```text
	Нет. Противоречие между выводом из 1+2 и пунктом 3.
	```

	## Training Recipe

	The public run used:

	- distributed setup: `torchrun` DDP
	- training framework: Unsloth + TRL
	- base model loading: 4-bit
	- LoRA rank: 16
	- LoRA alpha: 16
	- LoRA dropout: 0.0
	- max sequence length: 2048
	- per-device train batch size: 1
	- gradient accumulation steps: 8
	- effective global batch size: 16 examples / optimization step
	- epochs: 1
	- optimizer: `adamw_8bit`
	- scheduler: cosine
	- learning rate: 1e-4
	- warmup steps: 5
	- gradient checkpointing: enabled
	- FP16: forced
	- packing: disabled
	- completion-only loss: disabled
	- public run total steps: 2256
	- logging / eval / save cadence: 50 / 125 / 250

	## Prompt Style

	This adapter was trained on a simple, explicit prompt layout:

	```text
	System:
	<system prompt>

	User:
	<user prompt>

	Assistant:
	```

	For best results, keep prompts concise, grounded, and task-shaped. The adapter responds best to:

	- repo repair tasks with concrete evidence
	- exact wording / logic cleanup tasks
	- tool-call selection with explicit schemas
	- technical writing with clear requested sections
	- review / integration prompts that specify files, symptoms, and expected outcomes

	## Quick Start

	```python
	import torch
	from peft import PeftModel
	from transformers import AutoModelForCausalLM, AutoTokenizer

	base_id = "Qwen/Qwen3.5-4B-Base"
	adapter_id = "Starred09/obsidian-critic-qwen35-4b-base-lora"

	tokenizer = AutoTokenizer.from_pretrained(adapter_id)
	base_model = AutoModelForCausalLM.from_pretrained(
	base_id,
	torch_dtype="auto",
	device_map="auto",
	)
	model = PeftModel.from_pretrained(base_model, adapter_id)

	system = "Return the smallest useful answer. Do not invent missing evidence."
	user = "Repo: apps/desktop-shell. Build fails with ENOENT on dist/server.js. Point to the first file to inspect."
	prompt = f"System:\\n{system}\\n\\nUser:\\n{user}\\n\\nAssistant:\\n"

	inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
	with torch.no_grad():
	out = model.generate(**inputs, max_new_tokens=160)

	print(tokenizer.decode(out[0], skip_special_tokens=True))
	```

	## Intended Use

	Graphite 1.0 4B is intended for:

	- coding assistants
	- repo triage and patch-planning copilots
	- Markdown / docs tooling assistants
	- logic and wording critique
	- bilingual technical task routing

	It is especially useful when you want short, grounded, non-theatrical outputs instead of generic assistant prose.

	## Limitations

	- This is an adapter, not a standalone merged model.
	- It is tuned for structured technical work, not general consumer chat.
	- It inherits both strengths and weaknesses from `Qwen/Qwen3.5-4B-Base`.
	- The broad mix is intentionally heavy on repair, tool-use, and reasoning, so purely creative behavior is not a target.

	## License

	This repository is released under Apache License 2.0. See [`LICENSE`](./LICENSE).

	Please also review the license and usage terms of the base model:

	- [`Qwen/Qwen3.5-4B-Base`](https://huggingface.co/Qwen/Qwen3.5-4B-Base)

	## Acknowledgements

	- Alibaba Qwen team for the base model
	- Unsloth for the efficient LoRA training stack
	- TRL / Transformers / PEFT / PyTorch maintainers