answer repeat

#2
by zzzzzxx - opened

from mlx_lm import load, generate

model, tokenizer = load("mlx-community/Nemotron-Cascade-2-30B-A3B-4bit")

prompt = "hello"

if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True, return_dict=False,
)

response = generate(model, tokenizer, prompt=prompt, verbose=True)

[WARNING] Generating with a model that requires 16951 MB which is close to the maximum recommended size of 18186 MB. This can be slow. See the documentation for possible work-arounds: https://github.com/ml-explore/mlx-lm/tree/main#large-models
We need to respond in a friendly tone. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a

Prompt: 34 tokens, 3.206 tokens-per-sec
Generation: 256 tokens, 93.527 tokens-per-sec
Peak memory: 17.882 GB
We need to respond in a friendly tone. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a short answer. Provide a

Sign up or log in to comment