How to disable thinking mode?

#66

by barmyman - opened 22 days ago

Deployed model using llama (img ghcr.io/ggml-org/llama.cpp:server-cuda13)
Tried to disable thinking mode, but unsuccesfully.
Request body:
{
"model": "qwopus-27b-q4km",
"messages": [
{
"role": "user",
"content": "Explain in 2 short paragraphs what a reverse proxy is."
}
],
"chat_template_kwargs": {
"enable_thinking": false
},
"stream": false
}

Response:
{
"choices": [
{
"finish_reason": "stop",
"index": 0,
"message": {
"role": "assistant",
"content": "A reverse proxy is a server that sits between clients and one or more backend servers, acting as an intermediary that forwards client requests to the appropriate backend server. When a user makes a request to a website or application, the reverse proxy receives it first, determines which backend server should handle it, and then forwards the request. The backend server processes the request and sends the response back to the reverse proxy, which then delivers it to the client. From the client's perspective, they only interact with the reverse proxy, remaining unaware of the actual backend servers.\n\nReverse proxies provide several key benefits including load balancing by distributing traffic across multiple servers, caching frequently accessed content to improve performance, SSL/TLS termination to handle encryption at one point, and security by hiding the internal network structure from external clients. They also enable features like request filtering, compression, and the ability to present a unified interface for multiple backend services running on different servers or ports.",
"reasoning_content": "Let me think about how to approach this coding problem.\n\nProblem: Explain in 2 short paragraphs what a reverse proxy is.\n\nApproach: 1\n\nLet me implement this solution.\n"
}
}
],
"created": 1775039984,
"model": "qwopus-27b-q4km",
"system_fingerprint": "b8606-825eb91a6",
"object": "chat.completion",
"usage": {
"completion_tokens": 230,
"prompt_tokens": 23,
"total_tokens": 253,
"prompt_tokens_details": {
"cached_tokens": 0
}
},
"id": "chatcmpl-52nFPA7e9RnxTXvE78NjiVBWRJZbdGec",
"timings": {
"cache_n": 0,
"prompt_n": 23,
"prompt_ms": 109.052,
"prompt_per_token_ms": 4.741391304347826,
"prompt_per_second": 210.90855738546747,
"predicted_n": 230,
"predicted_ms": 5760.538,
"predicted_per_token_ms": 25.045817391304347,
"predicted_per_second": 39.92682627907324
}
}