AttnRes DevOps GPT

A 48M-parameter GPT trained on DevOps/GitOps data (Kubernetes, Helm, Terraform, ArgoCD, Flux) with a novel Attention Residual (AttnRes) mechanism.

What is AttnRes?

Instead of the standard fixed residual (x + sublayer(x)), AttnRes replaces the residual with a learned attention aggregation over previous hidden states within the same block:

residual = x + sigmoid(alpha) * (attn_aggregate(prev_states) - x)
h = residual + sublayer(x)
  • alpha is initialized to -4 โ†’ sigmoid(-4) โ‰ˆ 0.018 (starts as classic residual)
  • The model gradually opens the AttnRes channel as it learns
  • Layers are grouped in blocks of 4; each layer attends only to previous states in its block

Results

Trained on ~50M tokens of DevOps corpus (GitHub repos, official docs, Stack Overflow):

Variant Best val ppl Final val ppl Params
Baseline 11.99 13.66 46.4M
AttnRes 11.74 13.91 48.0M

AttnRes reached a lower perplexity minimum (~2% better) and with faster convergence. Qualitatively, AttnRes generates more syntactically correct YAML/Terraform/kubectl output.

Architecture

GPTConfig(
    vocab_size = 16_384,   # BPE tokenizer trained on DevOps corpus
    max_seq    = 512,
    d_model    = 512,
    n_layers   = 12,
    n_heads    = 8,
    d_ff       = 2_048,
    dropout    = 0.1,
    variant    = "attnres",
    block_size = 4,
)

Usage

import torch
from huggingface_hub import hf_hub_download
from model import GPT, GPTConfig
from tokenizers import ByteLevelBPETokenizer

# Load model
ckpt = torch.load(hf_hub_download("roanbrasil/attnres-devops-gpt", "attnres_latest.pt"),
                  map_location="cpu", weights_only=False)
cfg   = GPTConfig(**ckpt["config"])
model = GPT(cfg)
state = {k.replace("_orig_mod.", ""): v for k, v in ckpt["model"].items()}
model.load_state_dict(state)
model.eval()

# Load tokenizer
vocab  = hf_hub_download("roanbrasil/attnres-devops-gpt", "tokenizer/vocab.json")
merges = hf_hub_download("roanbrasil/attnres-devops-gpt", "tokenizer/merges.txt")
tok = ByteLevelBPETokenizer(vocab, merges)

# Generate
prompt  = "apiVersion: apps/v1\nkind: Deployment\nmetadata:\n  name: my-app"
ids     = tok.encode(prompt).ids
x       = torch.tensor([ids])
out     = model.generate(x, max_new=100, temperature=0.7, top_k=40)
print(tok.decode(out[0].tolist()))

Training

  • Corpus: ~50M tokens from GitHub (kubernetes/helm/terraform repos), official docs, Stack Overflow
  • Tokenizer: ByteLevelBPE, vocab_size=16384, trained on domain corpus
  • Steps: 50k, batch_size=16, grad_accum=4, seq_len=512
  • Optimizer: AdamW, lr=3e-4, cosine decay, warmup=2k steps
  • Gate params (alpha) use 100x higher LR to allow AttnRes to open during training
  • Hardware: NVIDIA RTX 3080 (~4.5h)
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Space using roanbrasil/attnres-devops-gpt 1