Snider Cladius Maximus commited on
Commit
f526be0
·
1 Parent(s): dd31d86

feat: add Ollama config files + complete repo file documentation

Browse files

- template: explicit Go template for Ollama chat formatting
- params: sampling parameters (temp=1.0, top_p=0.95, top_k=64, stop tokens)
- README: full repo file table, MLX quick start, conversational tag
- Tags: added safetensors, mlx, conversational for HF discoverability

Explicit config > auto-detect for every supported ecosystem.

Co-Authored-By: Cladius Maximus <cladius@lethean.io>

Files changed (3) hide show
  1. README.md +28 -1
  2. params +6 -0
  3. template +7 -0
README.md CHANGED
@@ -12,11 +12,14 @@ tags:
12
  - gemma4
13
  - gemma
14
  - gguf
 
 
15
  - lemma
16
  - lethean
17
  - lem
18
  - apple-silicon
19
  - on-device
 
20
  datasets:
21
  - lthn/LEM-research
22
  - lthn/livebench
@@ -41,6 +44,23 @@ The smallest member of the [Lemma model family](https://huggingface.co/collectio
41
 
42
  All variants tested and verified working with Ollama on Apple Silicon (M3 Ultra, 96GB).
43
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
  ## Quick Start
45
 
46
  ### Ollama
@@ -52,7 +72,14 @@ ollama run hf.co/lthn/lemer:Q4_K_M
52
  ### llama.cpp
53
 
54
  ```bash
55
- llama-cli -m lemer-q4_k_m.gguf -p "Hello, how are you?" -n 200
 
 
 
 
 
 
 
56
  ```
57
 
58
  ## Model Details
 
12
  - gemma4
13
  - gemma
14
  - gguf
15
+ - safetensors
16
+ - mlx
17
  - lemma
18
  - lethean
19
  - lem
20
  - apple-silicon
21
  - on-device
22
+ - conversational
23
  datasets:
24
  - lthn/LEM-research
25
  - lthn/livebench
 
44
 
45
  All variants tested and verified working with Ollama on Apple Silicon (M3 Ultra, 96GB).
46
 
47
+ This repo also includes MLX Q4 safetensors for native Apple Silicon inference via `mlx-lm`. See [lemer-mlx-q8](https://huggingface.co/lthn/lemer-mlx-q8) and [lemer-mlx-bf16](https://huggingface.co/lthn/lemer-mlx-bf16) for other MLX quant levels.
48
+
49
+ ### Repo Files
50
+
51
+ | File | Format | Purpose |
52
+ |------|--------|---------|
53
+ | `lemer-*.gguf` | GGUF | Ollama, llama.cpp, GPT4All, LM Studio |
54
+ | `model.safetensors` | MLX safetensors | Native Apple Silicon via `mlx-lm` (Q4) |
55
+ | `config.json` | JSON | Model architecture config |
56
+ | `tokenizer.json` | JSON | Tokenizer vocabulary |
57
+ | `tokenizer_config.json` | JSON | Tokenizer settings |
58
+ | `chat_template.jinja` | Jinja2 | Chat template for transformers/mlx-lm |
59
+ | `processor_config.json` | JSON | Image/audio processor config |
60
+ | `generation_config.json` | JSON | Default generation parameters |
61
+ | `template` | Go template | Ollama chat template override |
62
+ | `params` | JSON | Ollama sampling parameters |
63
+
64
  ## Quick Start
65
 
66
  ### Ollama
 
72
  ### llama.cpp
73
 
74
  ```bash
75
+ llama-cli -hf lthn/lemer:Q4_K_M
76
+ ```
77
+
78
+ ### MLX (Apple Silicon native)
79
+
80
+ ```bash
81
+ pip install mlx-lm
82
+ mlx_lm.generate --model lthn/lemer --prompt "Hello, how are you?" --max-tokens 200
83
  ```
84
 
85
  ## Model Details
params ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ {
2
+ "temperature": 1.0,
3
+ "top_p": 0.95,
4
+ "top_k": 64,
5
+ "stop": ["<turn|>", "<eos>"]
6
+ }
template ADDED
@@ -0,0 +1,7 @@
 
 
 
 
 
 
 
 
1
+ {{- if .System }}<|turn>system
2
+ {{ .System }}<turn|>
3
+ {{ end -}}
4
+ <|turn>user
5
+ {{ .Prompt }}<turn|>
6
+ <|turn>model
7
+ {{ .Response }}