Qwen3-14B with LoRA -- Pre-compiled for AWS Trainium2
Pre-compiled artifacts for running Qwen/Qwen3-14B with LoRA adapters on AWS Trainium2 (trn2.3xlarge).
Configuration
| Setting | Value |
|---|---|
| Instance type | trn2.3xlarge (4 NeuronCores at LNC=2) |
| Tensor parallel | 4 |
| Batch size | 1 |
| Max sequence length | 4096 |
| Data type | BF16 |
| ISA Kernels | QKV+Attn ON, MLP OFF |
| LNC | 2 |
| Compile time | {compile_time:.0f}s |
| SDK | Neuron SDK 2.28 (DLAMI 20260227) |
| NxD Inference | 0.8.x |
| vLLM-neuron | 0.4.1 |
Benchmark Results
| Config | Throughput (tok/s) | Latency (s) | Avg Tokens |
|---|---|---|---|
| Adapter A (nicoboss/Uncensored) | {results_a['mean_throughput_tok_s']:.1f} +/- {results_a['std_throughput_tok_s']:.1f} | {results_a['mean_latency_s']:.2f} | {results_a['mean_tokens']:.0f} |
| Adapter B (Wuhall/LoRA) | {results_b['mean_throughput_tok_s']:.1f} +/- {results_b['std_throughput_tok_s']:.1f} | {results_b['mean_latency_s']:.2f} | {results_b['mean_tokens']:.0f} |
Included LoRA Adapters
| Adapter | Source | Rank | Alpha | Target Modules |
|---|---|---|---|---|
| adapter_a | nicoboss/Qwen3-14B-Uncensored-Lora | 32 | 16 | q/k/v/o/gate/up/down_proj |
| adapter_b | Wuhall/Qwen3-14B-LoRA | 32 | 32 | q/k/v/o/gate/up/down_proj |
Important Notes
- Base model weights required: Download
Qwen/Qwen3-14Bseparately (~30 GB). - LoRA always required: When
enable_lora=True, every request MUST include alora_request. Omitting it causesAttributeError. Internal ticket filed. - trn2.3xlarge required: tp=4 needs 4 NeuronCores (LNC=2 default).
- SDK-specific: Artifacts work with Neuron SDK 2.28 (DLAMI 20260227) only.
- Update LoRA paths: The
lora-ckpt-dirinadapters.jsonmust be an absolute path matching your local layout.
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support