HiCI: Hierarchical Construction-Integration for Long-Context Attention
Paper β’ 2603.20843 β’ Published
This is a LoRA adapter for Qwen3-8B with HiCI (Hierarchical Construction-Integration) memory architecture, trained for long-context understanding up to 48K tokens.
Paper: HiCI (arXiv 2603.20843) Base: LongLoRA (ICLR 2024 Oral)
Three-stage hierarchy per transformer layer:
Input (48K tokens) β 4 segments Γ 12K
Stage 1: 8 local slots per segment β L_i
Stage 2: multi-view stats β K=4 global slots G
Stage 3: Q=[chunk], KV=[G, L_i, chunk] β Flash Attention
adapter_model.safetensors (27 MB)
βββ LoRA Adapters (r=8, alpha=16): q_proj, k_proj, v_proj, o_proj
trainable_params.bin (~4 GB)
βββ global_memory.* β Local Construction modules (36 layers)
βββ hierarchical_aggregator.* β Global Integration modules (36 layers)
βββ self_attn.q_norm / k_norm β QK-Norm weights (Qwen3-specific, 36 layers)
βββ input_layernorm / post_attention_layernorm β LayerNorm weights (36 layers)
βββ model.embed_tokens.weight β Token embeddings
βββ model.norm.weight β Final LayerNorm
Requires qwen3_attn_hici.py from this repo.
import torch
import transformers
from peft import PeftModel
# Download qwen3_attn_hici.py from this repo first
import qwen3_attn_hici as hici_attn
# 1. Replace attention with HiCI BEFORE loading model
hici_attn.MIXED_GROUP_TRAINING = False
hici_attn.replace_qwen3_attn(
use_flash_attn=True, use_full=False, use_hierarchical_forward=True
)
# 2. Load base model
base_model = transformers.AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-8B",
torch_dtype=torch.bfloat16,
device_map="auto",
)
# 3. Register HiCI modules (must match training config)
hici_attn.register_hici_to_model(
base_model,
num_memory_slots=8,
global_slots=4,
num_heads=8,
bottleneck_dim=512,
)
# 4. Load LoRA adapter + trainable_params
model = PeftModel.from_pretrained(base_model, "ZengXiangyu/Qwen3-8b-HiCI-48k-500steps")
# Load HiCI params (embed, norm, global_memory, hierarchical_aggregator)
import os
trainable_params_path = os.path.join(
"ZengXiangyu/Qwen3-8b-HiCI-48k-500steps", "trainable_params.bin"
)
# (auto-loaded by PeftModel if using the HiCI-aware load script)
# 5. Tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained("ZengXiangyu/Qwen3-8b-HiCI-48k-500steps")
@article{zeng2026hici,
title={HiCI: Hierarchical Construction-Integration for Long-Context Attention},
author={Zeng, Xiangyu and Xu, Qi and Wang, Yunke and Xu, Chang},
journal={arXiv preprint arXiv:2603.20843},
year={2026}
}
Apache 2.0 (follows Qwen3 license)