Instructions to use leapeto/Qwen3-4B-AbstractCoT-warmup with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use leapeto/Qwen3-4B-AbstractCoT-warmup with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="leapeto/Qwen3-4B-AbstractCoT-warmup")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("leapeto/Qwen3-4B-AbstractCoT-warmup", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use leapeto/Qwen3-4B-AbstractCoT-warmup with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "leapeto/Qwen3-4B-AbstractCoT-warmup" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leapeto/Qwen3-4B-AbstractCoT-warmup", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/leapeto/Qwen3-4B-AbstractCoT-warmup
- SGLang
How to use leapeto/Qwen3-4B-AbstractCoT-warmup with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "leapeto/Qwen3-4B-AbstractCoT-warmup" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leapeto/Qwen3-4B-AbstractCoT-warmup", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "leapeto/Qwen3-4B-AbstractCoT-warmup" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "leapeto/Qwen3-4B-AbstractCoT-warmup", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use leapeto/Qwen3-4B-AbstractCoT-warmup with Docker Model Runner:
docker model run hf.co/leapeto/Qwen3-4B-AbstractCoT-warmup
File size: 1,791 Bytes
e68e718 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | {
"losses": [
0.24705615827115252,
0.19863037412578705,
0.20961245244543533,
0.23088445312132536,
0.20231807196396404,
0.21607737393860588,
0.2541877782714437,
0.23250892728756298,
0.26798237174225503,
0.2450843573300517,
0.26578180299547965,
0.26796957241640484,
0.25621661408949875,
0.30977868895424765,
0.26767159106384497,
0.3011666447739117,
0.2829906645929441,
0.2978396859412896,
0.3276536007644609,
0.3177644655283075,
0.3392833017860539,
0.35701846808369736,
0.3581092601059936,
0.3342221752507612,
0.34096981105394664,
0.4076683118997607,
0.36950865692924706,
0.32229581456631423,
0.37231990886793936,
0.36889135412639007,
0.35545787583687344
],
"lrs": [
6.666666666666667e-05,
9.993008576227247e-05,
9.937194443381972e-05,
9.826190093588563e-05,
9.661236384224129e-05,
9.444177243274618e-05,
9.177439057064683e-05,
8.864003547001915e-05,
8.507374438531607e-05,
8.111538294891684e-05,
7.680919953486048e-05,
7.220333063028872e-05,
6.734926274378312e-05,
6.230125686563068e-05,
5.7115741913664264e-05,
5.185068394501791e-05,
4.6564938185035956e-05,
4.131759111665349e-05,
3.616729998467365e-05,
3.1171637098265064e-05,
2.638644626136587e-05,
2.1865218525109495e-05,
1.7658494240397126e-05,
1.3813298094746491e-05,
1.037261344883343e-05,
7.374901848832683e-06,
4.853673085668947e-06,
2.8371106072518195e-06,
1.3477564710088098e-06,
4.02259358460233e-07,
1.1188468644907079e-08
],
"wallclock_s": 947,
"n_examples": 5000,
"epochs": 1,
"mode": "distill",
"lora_rank": 32,
"total_opt_steps": 156,
"num_processes": 2
} |