Unofficial Zero-Cost Guide for Running N-ATLaS: Local vs Cloud Options

#4
by tosinamuda - opened

For most people who want to run N-ATLas cost free either locally on their laptop or in the cloud, I recommend the following options:

πŸ–₯️ Local (using llama.cpp)

First, install llama.cpp and download the appropriate GGUF from tosinamuda/N-ATLaS-GGUF:

RAM Recommended GGUF File Size
6-8GB Q4_K_M N-ATLaS-GGUF-Q4_K_M.gguf 4.92 GB
8-10GB Q5_K_M N-ATLaS-GGUF-Q5_K_M.gguf 5.73 GB
10-12GB Q6_K N-ATLaS-GGUF-Q6_K.gguf 6.6 GB
12-16GB Q8_0 N-ATLaS-GGUF-Q8_0.gguf 8.54 GB
24GB+ F16 N-ATLaS-GGUF-F16.gguf 16.1 GB

Higher bit = better quality. Pick the largest your system can handle comfortably.

For example if you pick the 8-bit version

Then run:

llama-server -m N-ATLaS-GGUF-Q8_0.gguf --port 8080

# Basic web UI can be accessed via browser: http://localhost:8080
# Chat completion endpoint: http://localhost:8080/v1/chat/completions

This creates a local OpenAI-compatible API at http://localhost:8080 that you can use in your code.

☁️ Cloud (using Modal) (check here for step by step guide)

If you don't have the hardware, I recommend Modal over other providers (HuggingFace, Runpod, Lambda, etc.) because:

  • Free $30/month credit
  • Pay per second (only when running)
  • No credit card required to start

I'm using Modal with a cheaper GPU option and the FP8 quantized version (tosinamuda/N-ATLaS-FP8). Set a short idle timeout so the GPU shuts down when you're not using it.

πŸ‘‰ Full deployment guide: here


National Centre for Artificial Intelligence and Robotics org

Good day,
I trust this message finds you well.

I tried deploying the N-ATLaS-FP8 as explained here, it actually worked testing it on Vs code terminal but when I deployed it to an existing web app using chatbot plugin, it doesn't work properly. It either responding with "Network Error" or not responding at all.

I connect the plugin to the modal server using the server URL generated when I deployed it, and the API Token generated from modal.

At this point, I don't know what to do, that's why I am reaching out.

Is there anyway you can help?

Thanks in anticipation.

Hello Dr Idris,

Sorry for responding late - in case you still need help with this, can you share like how you are integrating with modal

Also modal usually have cold start - first request could take a while.

National Centre for Artificial Intelligence and Robotics org

No problem at all for the response time.
Yes, I still need help.
I'm using a plugin to integrate the AI server (Modal) to my Web-based app.
Attached is the screenshot of the frontend and backend of the plugin.
The backend is the place where I input the modal API Key And URL.

Screenshot_2026-01-22-19-02-37-574_com.android.chrome-edit
IMG_20260122_190700

Do you have a /v1 in your api url?

National Centre for Artificial Intelligence and Robotics org
β€’
edited Jan 24

No, I didn't include it on the backend plugin but it's in my API code connected to Modal. Meanwhile, i will do that now and test.

Sign up or log in to comment