Qwen3-14B with LoRA -- Pre-compiled for AWS Trainium2

Pre-compiled artifacts for running Qwen/Qwen3-14B with LoRA adapters on AWS Trainium2 (trn2.3xlarge).

Configuration

Setting Value
Instance type trn2.3xlarge (4 NeuronCores at LNC=2)
Tensor parallel 4
Batch size 1
Max sequence length 4096
Data type BF16
ISA Kernels QKV+Attn ON, MLP OFF
LNC 2
Compile time {compile_time:.0f}s
SDK Neuron SDK 2.28 (DLAMI 20260227)
NxD Inference 0.8.x
vLLM-neuron 0.4.1

Benchmark Results

Config Throughput (tok/s) Latency (s) Avg Tokens
Adapter A (nicoboss/Uncensored) {results_a['mean_throughput_tok_s']:.1f} +/- {results_a['std_throughput_tok_s']:.1f} {results_a['mean_latency_s']:.2f} {results_a['mean_tokens']:.0f}
Adapter B (Wuhall/LoRA) {results_b['mean_throughput_tok_s']:.1f} +/- {results_b['std_throughput_tok_s']:.1f} {results_b['mean_latency_s']:.2f} {results_b['mean_tokens']:.0f}

Included LoRA Adapters

Adapter Source Rank Alpha Target Modules
adapter_a nicoboss/Qwen3-14B-Uncensored-Lora 32 16 q/k/v/o/gate/up/down_proj
adapter_b Wuhall/Qwen3-14B-LoRA 32 32 q/k/v/o/gate/up/down_proj

Important Notes

  1. Base model weights required: Download Qwen/Qwen3-14B separately (~30 GB).
  2. LoRA always required: When enable_lora=True, every request MUST include a lora_request. Omitting it causes AttributeError. Internal ticket filed.
  3. trn2.3xlarge required: tp=4 needs 4 NeuronCores (LNC=2 default).
  4. SDK-specific: Artifacts work with Neuron SDK 2.28 (DLAMI 20260227) only.
  5. Update LoRA paths: The lora-ckpt-dir in adapters.json must be an absolute path matching your local layout.
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support