Commit History

fix exceptions
dc32982

Dmitry Beresnev commited on

fix error format wrapping now applies to /v1/chat/completions and generation stats
470e737

Dmitry Beresnev commited on

change timeouts
2c31416

Dmitry Beresnev commited on

add token generation speed to ui
e8080f5

Dmitry Beresnev commited on

fix request parsing
3634ca6

Dmitry Beresnev commited on

Log detailed error bodies for UI failures
7caa6ba

Dmitry Beresnev commited on

Fix 400 for llama.cpp web UI completion requests
677456b

Dmitry Beresnev commited on

Fix web UI chat by adding buffered SSE fallback
6379bd0

Dmitry Beresnev commited on

Fix Docker build for modular llm-manager
58d70b1

Dmitry Beresnev commited on

fix description
952d357

Dmitry Beresnev commited on

fix build bugs
acdc6c1

Dmitry Beresnev commited on

fix readme file
fe156b2

Dmitry Beresnev commited on

add new architectural diagram to readme file
a51a89f

Dmitry Beresnev commited on

Refactor the C++ LLM manager into modular components, moves Python modules under python/, and keeps the current control-plane behavior intact. The C++ server now has clearer separation for config, model lifecycle, runtime services, request parsing, HTTP helpers, and server routing, while Docker build/runtime paths were updated to compile multiple C++ files and load Python code from the new package folder
332826f

Dmitry Beresnev commited on

add auth, token policy, queue scheduler, and cancel flow, etc
d9ce859

Dmitry Beresnev commited on

add new endpoint to cancel all processing prompts
8ef326a

Dmitry Beresnev commited on

add new build profile
a97386f

Dmitry Beresnev commited on

fix encoding
d211568

Dmitry Beresnev commited on

fix model config
057edf0

Dmitry Beresnev commited on

fix proxied response in llm manager
53e9f39

Dmitry Beresnev commited on

fix routing in llm manager
a4ee76d

Dmitry Beresnev commited on

add cpp server
fc0860f

Dmitry Beresnev commited on

change llm model
f41621b

Dmitry Beresnev commited on

change llm model
4f2dffc

Dmitry Beresnev commited on

change model to Qwen2.5-Math-7B-Instruct-GGUF
cca3c7b

Dmitry Beresnev commited on

change llm model to qwen2 math
fe7089d

Dmitry Beresnev commited on

change llm model to mistral
97d9520

Dmitry Beresnev commited on

fix dockerfile
c33410f

Dmitry Beresnev commited on

change compilation flags
0e913e4

Dmitry Beresnev commited on

change compilation flags
1a4efad

Dmitry Beresnev commited on

reduce context and batch
34775a7

Dmitry Beresnev commited on

fix repo name of model
dc883f9

Dmitry Beresnev commited on

fix repo of model
c7c8563

Dmitry Beresnev commited on

fix cmd in dockerfile
0fbce92

Dmitry Beresnev commited on

fix dockerfile
c261631

Dmitry Beresnev commited on

switch to qwen model via cpp server
9a590ac

Dmitry Beresnev commited on

Log elapsed time and token rate when the response arrives.
a8f6b6b

Dmitry Beresnev commited on

a “slow request” logging, log when a request exceeds the budgeted prompt and gets compacted
130d9e3

Dmitry Beresnev commited on

add simple compacting
6381e7f

Dmitry Beresnev commited on

fix context window size
62a5a49

Dmitry Beresnev commited on

fix payload processing
e1e4b82

Dmitry Beresnev commited on

fix logger
e9b8569

Dmitry Beresnev commited on

fix app to handle exceptions
7d65cc9

Dmitry Beresnev commited on

add requests logging
90f1c82

Dmitry Beresnev commited on

fix dockerfile
950f41b

Dmitry Beresnev commited on

fix dockerfile
f64a284

Dmitry Beresnev commited on

fix gitignore, app and logger, etc
7763bf4

Dmitry Beresnev commited on

add readme
c384ef1

Dmitry Beresnev commited on

add readme
1ccd330

Dmitry Beresnev commited on

Add automatic API documentation and in-memory model caching
2295174

Dmitry Beresnev commited on