Unofficial Zero-Cost Guide for Running N-ATLaS: Local vs Cloud Options
For most people who want to run N-ATLas cost free either locally on their laptop or in the cloud, I recommend the following options:
π₯οΈ Local (using llama.cpp)
First, install llama.cpp and download the appropriate GGUF from tosinamuda/N-ATLaS-GGUF:
| RAM | Recommended GGUF | File | Size |
|---|---|---|---|
| 6-8GB | Q4_K_M | N-ATLaS-GGUF-Q4_K_M.gguf | 4.92 GB |
| 8-10GB | Q5_K_M | N-ATLaS-GGUF-Q5_K_M.gguf | 5.73 GB |
| 10-12GB | Q6_K | N-ATLaS-GGUF-Q6_K.gguf | 6.6 GB |
| 12-16GB | Q8_0 | N-ATLaS-GGUF-Q8_0.gguf | 8.54 GB |
| 24GB+ | F16 | N-ATLaS-GGUF-F16.gguf | 16.1 GB |
Higher bit = better quality. Pick the largest your system can handle comfortably.
For example if you pick the 8-bit version
Then run:
llama-server -m N-ATLaS-GGUF-Q8_0.gguf --port 8080
# Basic web UI can be accessed via browser: http://localhost:8080
# Chat completion endpoint: http://localhost:8080/v1/chat/completions
This creates a local OpenAI-compatible API at http://localhost:8080 that you can use in your code.
βοΈ Cloud (using Modal) (check here for step by step guide)
If you don't have the hardware, I recommend Modal over other providers (HuggingFace, Runpod, Lambda, etc.) because:
- Free $30/month credit
- Pay per second (only when running)
- No credit card required to start
I'm using Modal with a cheaper GPU option and the FP8 quantized version (tosinamuda/N-ATLaS-FP8). Set a short idle timeout so the GPU shuts down when you're not using it.
π Full deployment guide: here
Good day,
I trust this message finds you well.
I tried deploying the N-ATLaS-FP8 as explained here, it actually worked testing it on Vs code terminal but when I deployed it to an existing web app using chatbot plugin, it doesn't work properly. It either responding with "Network Error" or not responding at all.
I connect the plugin to the modal server using the server URL generated when I deployed it, and the API Token generated from modal.
At this point, I don't know what to do, that's why I am reaching out.
Is there anyway you can help?
Thanks in anticipation.
Hello Dr Idris,
Sorry for responding late - in case you still need help with this, can you share like how you are integrating with modal
Also modal usually have cold start - first request could take a while.
Do you have a /v1 in your api url?
No, I didn't include it on the backend plugin but it's in my API code connected to Modal. Meanwhile, i will do that now and test.

