OmniMesh-Qwen-v0.1 (Proof of Concept)
What is this?
This is a standard Qwen2.5-0.5B-Instruct GGUF file. On its own, it has a strict 4K context limit and no special abilities.
This model is intended to be run exclusively alongside the OmniMesh: Infinite Memory Engine, which can be found here: 👉 GitHub: OmniMesh-Infinite-Memory-Engine
The OmniMesh Architecture
OmniMesh patches the llama.cpp inference engine directly. By downloading the OmniMesh engine from GitHub and running this GGUF file through it, you gain:
- Infinite RAM-Resident Memory: Ingest massive datasets (e.g. 1.5MB+ / 400K+ tokens).
- Sub-Millisecond Retrieval: Context is injected natively using an Okapi BM25 index built into C++.
- Zero-Shot Context Injection: No fine-tuning or vector databases required.
Note on Model Size
This is a tiny 496M parameter model used simply as a Proof-of-Concept for the OmniMesh Engine. It may hallucinate or struggle with complex reasoning. You can download much larger GGUF models (like 7B or 14B) and run them through OmniMesh to gain infinite memory with vastly superior reasoning skills.
⚠️ Before You Run (Required)
This model will not have infinite memory if you run it in standard Ollama, LM Studio, or vanilla llama.cpp. It will just act like a normal 4K context model.
To unlock the infinite RAM-resident memory, you must run it through the custom C++ engine:
- Go to the GitHub repo: OmniMesh-Infinite-Memory-Engine
- Download the
OmniMesh-V4-FinalEngine release. - Place this
.gguffile in the same folder asOmniMesh-CUDA.exe.
Usage
Once you have the engine, open your terminal and run:
.\OmniMesh-CUDA.exe -m OmniMesh-Qwen-v0.1.gguf -c 4096 -ngl 99
Then use /ingest <file.txt> to chunk and load your data directly into RAM.
- Downloads last month
- 1
We're not able to determine the quantization variants.