feat: add Ollama config files + complete repo file documentation

- template: explicit Go template for Ollama chat formatting
- params: sampling parameters (temp=1.0, top_p=0.95, top_k=64, stop tokens)
- README: full repo file table, MLX quick start, conversational tag
- Tags: added safetensors, mlx, conversational for HF discoverability

Explicit config > auto-detect for every supported ecosystem.

Co-Authored-By: Cladius Maximus <cladius@lethean.io>

Files changed (3) hide show

README.md +28 -1
params +6 -0
template +7 -0

README.md CHANGED Viewed

@@ -12,11 +12,14 @@ tags:
 - gemma4
 - gemma
 - gguf
 - lemma
 - lethean
 - lem
 - apple-silicon
 - on-device
 datasets:
 - lthn/LEM-research
 - lthn/livebench
@@ -41,6 +44,23 @@ The smallest member of the [Lemma model family](https://huggingface.co/collectio
 All variants tested and verified working with Ollama on Apple Silicon (M3 Ultra, 96GB).
 ## Quick Start
 ### Ollama
@@ -52,7 +72,14 @@ ollama run hf.co/lthn/lemer:Q4_K_M
 ### llama.cpp
 ```bash
-llama-cli -m lemer-q4_k_m.gguf -p "Hello, how are you?" -n 200
 ```
 ## Model Details

 - gemma4
 - gemma
 - gguf
+- safetensors
+- mlx
 - lemma
 - lethean
 - lem
 - apple-silicon
 - on-device
+- conversational
 datasets:
 - lthn/LEM-research
 - lthn/livebench
 All variants tested and verified working with Ollama on Apple Silicon (M3 Ultra, 96GB).
+This repo also includes MLX Q4 safetensors for native Apple Silicon inference via `mlx-lm`. See [lemer-mlx-q8](https://huggingface.co/lthn/lemer-mlx-q8) and [lemer-mlx-bf16](https://huggingface.co/lthn/lemer-mlx-bf16) for other MLX quant levels.
+### Repo Files
+| File | Format | Purpose |
+|------|--------|---------|
+| `lemer-*.gguf` | GGUF | Ollama, llama.cpp, GPT4All, LM Studio |
+| `model.safetensors` | MLX safetensors | Native Apple Silicon via `mlx-lm` (Q4) |
+| `config.json` | JSON | Model architecture config |
+| `tokenizer.json` | JSON | Tokenizer vocabulary |
+| `tokenizer_config.json` | JSON | Tokenizer settings |
+| `chat_template.jinja` | Jinja2 | Chat template for transformers/mlx-lm |
+| `processor_config.json` | JSON | Image/audio processor config |
+| `generation_config.json` | JSON | Default generation parameters |
+| `template` | Go template | Ollama chat template override |
+| `params` | JSON | Ollama sampling parameters |
 ## Quick Start
 ### Ollama
 ### llama.cpp
 ```bash
+llama-cli -hf lthn/lemer:Q4_K_M
+```
+### MLX (Apple Silicon native)
+```bash
+pip install mlx-lm
+mlx_lm.generate --model lthn/lemer --prompt "Hello, how are you?" --max-tokens 200
 ```
 ## Model Details

params ADDED Viewed

	@@ -0,0 +1,6 @@

+{
+    "temperature": 1.0,
+    "top_p": 0.95,
+    "top_k": 64,
+    "stop": ["<turn|>", "<eos>"]
+}

template ADDED Viewed

	@@ -0,0 +1,7 @@

+{{- if .System }}<|turn>system
+{{ .System }}<turn|>
+{{ end -}}
+<|turn>user
+{{ .Prompt }}<turn|>
+<|turn>model
+{{ .Response }}