Instructions to use Qwen/WebWorld-32B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Qwen/WebWorld-32B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="Qwen/WebWorld-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Qwen/WebWorld-32B") model = AutoModelForCausalLM.from_pretrained("Qwen/WebWorld-32B") messages = [ {"role": "user", "content": "Who are you?"}, ] inputs = tokenizer.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use Qwen/WebWorld-32B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "Qwen/WebWorld-32B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/WebWorld-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/Qwen/WebWorld-32B
- SGLang
How to use Qwen/WebWorld-32B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "Qwen/WebWorld-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/WebWorld-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "Qwen/WebWorld-32B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "Qwen/WebWorld-32B", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use Qwen/WebWorld-32B with Docker Model Runner:
docker model run hf.co/Qwen/WebWorld-32B
Thank you so much, Qwen team for these tools!!!
Qwen/WebWorldData, Qwen/WebWorld, and Qwen/SAE-Res...
Thank you very much for releasing all of this to the community!!!
Thank you for sharing your developments with us and, essentially, for what has made the Qwen 3.5 and 3.6 models so smart!
These models are extremely valuable.
Please, if you still have any models that you would like to share with the community -- I’m very much looking forward to it.
PS (personally from me): Could you consider open-sourcing the weights of Qwen-3.5-Omni? The community already has a model with audio understanding and an encoder (XiaomiMiMo/MiMo-V2.5), though it's huge. On chat.qwen.ai you have a Flash version of Omni and a Plus version of Omni, which means there is a large Omni model and a small one (presumably 397B and 35B). Perhaps audio input will be possible in future versions of Qwen models, but that’s just speculation for now).
yeah 👏
I have never been disappointed in any of your models that are actually used for what they are designed to be used for. Thank you.🫵💪