Update README.md
Browse files
README.md
CHANGED
|
@@ -1,3 +1,12 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
<div align="center">
|
| 2 |
<h1>Kwai Summary Attention (KSA)</h1>
|
| 3 |
<p align="center">
|
|
@@ -283,4 +292,4 @@ KSA is built upon and inspired by the open-source ecosystem. We would like to th
|
|
| 283 |
- **HuggingFace Transformers** — for the model / tokenizer / generation abstractions that make `trust_remote_code` deployment painless.
|
| 284 |
- **PyTorch distributed training** — for FSDP, DCP, and the communication primitives that make large-scale pretraining tractable.
|
| 285 |
|
| 286 |
-
We sincerely thank these projects for their outstanding work.
|
|
|
|
| 1 |
+
---
|
| 2 |
+
license: apache-2.0
|
| 3 |
+
language:
|
| 4 |
+
- zh
|
| 5 |
+
- en
|
| 6 |
+
base_model:
|
| 7 |
+
- Qwen/Qwen3-4B-Base
|
| 8 |
+
pipeline_tag: text-generation
|
| 9 |
+
---
|
| 10 |
<div align="center">
|
| 11 |
<h1>Kwai Summary Attention (KSA)</h1>
|
| 12 |
<p align="center">
|
|
|
|
| 292 |
- **HuggingFace Transformers** — for the model / tokenizer / generation abstractions that make `trust_remote_code` deployment painless.
|
| 293 |
- **PyTorch distributed training** — for FSDP, DCP, and the communication primitives that make large-scale pretraining tractable.
|
| 294 |
|
| 295 |
+
We sincerely thank these projects for their outstanding work.
|