Update README.md
Browse files
README.md
CHANGED
|
@@ -48,6 +48,7 @@ This repository contains:
|
|
| 48 |
|
| 49 |
- **2026-04-28** β KSA technical report is released on arXiv: [arXiv:2604.24432](https://arxiv.org/abs/2604.24432).
|
| 50 |
- **2026-04-28** β Code, training recipes, block-sparse kernel, and HuggingFace `trust_remote_code` template are open-sourced under this repository.
|
|
|
|
| 51 |
|
| 52 |
## β¨ Highlights
|
| 53 |
|
|
@@ -60,11 +61,11 @@ This repository contains:
|
|
| 60 |
|
| 61 |
## π€ Model Zoo
|
| 62 |
|
| 63 |
-
|
| 64 |
|
| 65 |
| Model | Backbone | Parameters | Context | Training | Link |
|
| 66 |
| :------------ | :---------- | :--------- | :------ | :-------------------- | :---- |
|
| 67 |
-
| KSA-4B
|
| 68 |
|
| 69 |
The 1.9B *from-scratch* configuration is provided as a reproducible recipe only; no 1.9B weights will be released.
|
| 70 |
|
|
@@ -257,8 +258,7 @@ The inference path uses HuggingFace's `AutoModelForCausalLM` with `trust_remote_
|
|
| 257 |
We are actively working on:
|
| 258 |
|
| 259 |
- [x] Technical report on arXiv ([arXiv:2604.24432](https://arxiv.org/abs/2604.24432)).
|
| 260 |
-
- [
|
| 261 |
-
- [ ] Release the 4B continual-pretraining recipe and checkpoint.
|
| 262 |
- [ ] Expanded evaluation scripts for RULER / NIAH / LongBench v2 reproduction.
|
| 263 |
- [ ] A reference serving stack with the ring-buffer KV cache.
|
| 264 |
- [ ] Additional ablations and tutorials.
|
|
@@ -292,4 +292,4 @@ KSA is built upon and inspired by the open-source ecosystem. We would like to th
|
|
| 292 |
- **HuggingFace Transformers** β for the model / tokenizer / generation abstractions that make `trust_remote_code` deployment painless.
|
| 293 |
- **PyTorch distributed training** β for FSDP, DCP, and the communication primitives that make large-scale pretraining tractable.
|
| 294 |
|
| 295 |
-
We sincerely thank these projects for their outstanding work.
|
|
|
|
| 48 |
|
| 49 |
- **2026-04-28** β KSA technical report is released on arXiv: [arXiv:2604.24432](https://arxiv.org/abs/2604.24432).
|
| 50 |
- **2026-04-28** β Code, training recipes, block-sparse kernel, and HuggingFace `trust_remote_code` template are open-sourced under this repository.
|
| 51 |
+
- **2026-05-08** β [KSA-4B-base](https://huggingface.co/OpenOneRec/KSA-4B-base) (CPT from Qwen3-4B, 128K context) weights are released on HuggingFace.
|
| 52 |
|
| 53 |
## β¨ Highlights
|
| 54 |
|
|
|
|
| 61 |
|
| 62 |
## π€ Model Zoo
|
| 63 |
|
| 64 |
+
Pretrained checkpoints published on HuggingFace.
|
| 65 |
|
| 66 |
| Model | Backbone | Parameters | Context | Training | Link |
|
| 67 |
| :------------ | :---------- | :--------- | :------ | :-------------------- | :---- |
|
| 68 |
+
| KSA-4B-base | Qwen3-4B | 4B | 128k | Continual pretraining | [π€ OpenOneRec/KSA-4B-base](https://huggingface.co/OpenOneRec/KSA-4B-base) |
|
| 69 |
|
| 70 |
The 1.9B *from-scratch* configuration is provided as a reproducible recipe only; no 1.9B weights will be released.
|
| 71 |
|
|
|
|
| 258 |
We are actively working on:
|
| 259 |
|
| 260 |
- [x] Technical report on arXiv ([arXiv:2604.24432](https://arxiv.org/abs/2604.24432)).
|
| 261 |
+
- [x] Release the 4B continual-pretraining checkpoint ([KSA-4B-base](https://huggingface.co/OpenOneRec/KSA-4B-base)).
|
|
|
|
| 262 |
- [ ] Expanded evaluation scripts for RULER / NIAH / LongBench v2 reproduction.
|
| 263 |
- [ ] A reference serving stack with the ring-buffer KV cache.
|
| 264 |
- [ ] Additional ablations and tutorials.
|
|
|
|
| 292 |
- **HuggingFace Transformers** β for the model / tokenizer / generation abstractions that make `trust_remote_code` deployment painless.
|
| 293 |
- **PyTorch distributed training** β for FSDP, DCP, and the communication primitives that make large-scale pretraining tractable.
|
| 294 |
|
| 295 |
+
We sincerely thank these projects for their outstanding work.
|