Instructions to use mlx-community/Ling-2.6-flash-mlx-4bit-gs32 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use mlx-community/Ling-2.6-flash-mlx-4bit-gs32 with MLX:
# Make sure mlx-lm is installed # pip install --upgrade mlx-lm # if on a CUDA device, also pip install mlx[cuda] # Generate text with mlx-lm from mlx_lm import load, generate model, tokenizer = load("mlx-community/Ling-2.6-flash-mlx-4bit-gs32") prompt = "Once upon a time in" text = generate(model, tokenizer, prompt=prompt, verbose=True) - Notebooks
- Google Colab
- Kaggle
- Local Apps
- LM Studio
- MLX LM
How to use mlx-community/Ling-2.6-flash-mlx-4bit-gs32 with MLX LM:
Generate or start a chat session
# Install MLX LM uv tool install mlx-lm # Generate some text mlx_lm.generate --model "mlx-community/Ling-2.6-flash-mlx-4bit-gs32" --prompt "Once upon a time"
File size: 522 Bytes
6df6e49 | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 | {
"backend": "tokenizers",
"bos_token": "<|startoftext|>",
"clean_up_tokenization_spaces": false,
"cls_token": "[CLS]",
"eos_token": "<|role_end|>",
"fast_tokenizer": true,
"gmask_token": "[gMASK]",
"is_local": true,
"local_files_only": false,
"merges_file": null,
"model_max_length": 1000000000000000019884624838656,
"model_specific_special_tokens": {
"gmask_token": "[gMASK]"
},
"pad_token": "<|endoftext|>",
"tokenizer_class": "TokenizersBackend",
"tool_parser_type": "json_tools"
}
|