answer repeat
from mlx_lm import load, generate
model, tokenizer = load("mlx-community/Nemotron-Cascade-2-30B-A3B-4bit")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
[WARNING] Generating with a model that requires 16951 MB which is close to the maximum recommended size of 18186 MB. This can be slow. See the documentation for possible work-arounds: https://github.com/ml-explore/mlx-lm/tree/main#large-models
We need to respond in a friendly tone. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a
Prompt: 34 tokens, 3.206 tokens-per-sec
Generation: 256 tokens, 93.527 tokens-per-sec
Peak memory: 17.882 GB
We need to respond in a friendly tone. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a