Qwen3-14B with LoRA -- Pre-compiled for AWS Trainium2

Pre-compiled artifacts for running Qwen/Qwen3-14B with LoRA adapters on AWS Trainium2 (trn2.3xlarge).

Configuration

Setting	Value
Instance type	trn2.3xlarge (4 NeuronCores at LNC=2)
Tensor parallel	4
Batch size	1
Max sequence length	4096
Data type	BF16
ISA Kernels	QKV+Attn ON, MLP OFF
LNC	2
Compile time	{compile_time:.0f}s
SDK	Neuron SDK 2.28 (DLAMI 20260227)
NxD Inference	0.8.x
vLLM-neuron	0.4.1

Config	Throughput (tok/s)	Latency (s)	Avg Tokens
Adapter A (nicoboss/Uncensored)	{results_a['mean_throughput_tok_s']:.1f} +/- {results_a['std_throughput_tok_s']:.1f}	{results_a['mean_latency_s']:.2f}	{results_a['mean_tokens']:.0f}
Adapter B (Wuhall/LoRA)	{results_b['mean_throughput_tok_s']:.1f} +/- {results_b['std_throughput_tok_s']:.1f}	{results_b['mean_latency_s']:.2f}	{results_b['mean_tokens']:.0f}

Adapter	Source	Rank	Alpha	Target Modules
adapter_a	nicoboss/Qwen3-14B-Uncensored-Lora	32	16	q/k/v/o/gate/up/down_proj
adapter_b	Wuhall/Qwen3-14B-LoRA	32	32	q/k/v/o/gate/up/down_proj

Base model weights required: Download Qwen/Qwen3-14B separately (~30 GB).
LoRA always required: When enable_lora=True, every request MUST include a lora_request. Omitting it causes AttributeError. Internal ticket filed.
trn2.3xlarge required: tp=4 needs 4 NeuronCores (LNC=2 default).
SDK-specific: Artifacts work with Neuron SDK 2.28 (DLAMI 20260227) only.
Update LoRA paths: The lora-ckpt-dir in adapters.json must be an absolute path matching your local layout.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support