fix error format wrapping now applies to /v1/chat/completions and generation stats
470e737
Dmitry Beresnevcommited on
change timeouts
2c31416
Dmitry Beresnevcommited on
add token generation speed to ui
e8080f5
Dmitry Beresnevcommited on
fix request parsing
3634ca6
Dmitry Beresnevcommited on
Log detailed error bodies for UI failures
7caa6ba
Dmitry Beresnevcommited on
Fix 400 for llama.cpp web UI completion requests
677456b
Dmitry Beresnevcommited on
Fix web UI chat by adding buffered SSE fallback
6379bd0
Dmitry Beresnevcommited on
Fix Docker build for modular llm-manager
58d70b1
Dmitry Beresnevcommited on
fix description
952d357
Dmitry Beresnevcommited on
fix build bugs
acdc6c1
Dmitry Beresnevcommited on
fix readme file
fe156b2
Dmitry Beresnevcommited on
add new architectural diagram to readme file
a51a89f
Dmitry Beresnevcommited on
Refactor the C++ LLM manager into modular components, moves Python modules under python/, and keeps the current control-plane behavior intact. The C++ server now has clearer separation for config, model lifecycle, runtime services, request parsing, HTTP helpers, and server routing, while Docker build/runtime paths were updated to compile multiple C++ files and load Python code from the new package folder
332826f
Dmitry Beresnevcommited on
add auth, token policy, queue scheduler, and cancel flow, etc
d9ce859
Dmitry Beresnevcommited on
add new endpoint to cancel all processing prompts
8ef326a
Dmitry Beresnevcommited on
add new build profile
a97386f
Dmitry Beresnevcommited on
fix encoding
d211568
Dmitry Beresnevcommited on
fix model config
057edf0
Dmitry Beresnevcommited on
fix proxied response in llm manager
53e9f39
Dmitry Beresnevcommited on
fix routing in llm manager
a4ee76d
Dmitry Beresnevcommited on
add cpp server
fc0860f
Dmitry Beresnevcommited on
change llm model
f41621b
Dmitry Beresnevcommited on
change llm model
4f2dffc
Dmitry Beresnevcommited on
change model to Qwen2.5-Math-7B-Instruct-GGUF
cca3c7b
Dmitry Beresnevcommited on
change llm model to qwen2 math
fe7089d
Dmitry Beresnevcommited on
change llm model to mistral
97d9520
Dmitry Beresnevcommited on
fix dockerfile
c33410f
Dmitry Beresnevcommited on
change compilation flags
0e913e4
Dmitry Beresnevcommited on
change compilation flags
1a4efad
Dmitry Beresnevcommited on
reduce context and batch
34775a7
Dmitry Beresnevcommited on
fix repo name of model
dc883f9
Dmitry Beresnevcommited on
fix repo of model
c7c8563
Dmitry Beresnevcommited on
fix cmd in dockerfile
0fbce92
Dmitry Beresnevcommited on
fix dockerfile
c261631
Dmitry Beresnevcommited on
switch to qwen model via cpp server
9a590ac
Dmitry Beresnevcommited on
Log elapsed time and token rate when the response arrives.
a8f6b6b
Dmitry Beresnevcommited on
a “slow request” logging, log when a request exceeds the budgeted prompt and gets compacted
130d9e3
Dmitry Beresnevcommited on
add simple compacting
6381e7f
Dmitry Beresnevcommited on
fix context window size
62a5a49
Dmitry Beresnevcommited on
fix payload processing
e1e4b82
Dmitry Beresnevcommited on
fix logger
e9b8569
Dmitry Beresnevcommited on
fix app to handle exceptions
7d65cc9
Dmitry Beresnevcommited on
add requests logging
90f1c82
Dmitry Beresnevcommited on
fix dockerfile
950f41b
Dmitry Beresnevcommited on
fix dockerfile
f64a284
Dmitry Beresnevcommited on
fix gitignore, app and logger, etc
7763bf4
Dmitry Beresnevcommited on
add readme
c384ef1
Dmitry Beresnevcommited on
add readme
1ccd330
Dmitry Beresnevcommited on
Add automatic API documentation and in-memory model caching