watt-tool-8B — Abliterated

Abliterated version of watt-ai/watt-tool-8B (FP8 → BF16).

Base: Llama 3.1 8B fine-tuned for parallel function calling with a unique plain-text output format.

Abliteration

Performed with heretic — Optuna multi-objective optimization.

Trials: 500 (50 × 10 parallel GPUs)
Best trial: 0 refusals, KL divergence = 0.0015

Tool Calling Format

watt-tool uses a plain-text bracket format (not JSON):

[func_name1(param1=value1, param2=value2), func_name2(param=value)]

The model outputs only the function call(s) — no surrounding text.

from transformers import AutoModelForCausalLM, AutoTokenizer

model = AutoModelForCausalLM.from_pretrained(
    "nitrox/watt-tool-8B-heretic",
    device_map="auto",
    torch_dtype="bfloat16",
)
tokenizer = AutoTokenizer.from_pretrained("nitrox/watt-tool-8B-heretic")

system_prompt = (
    "You are an expert in composing functions. Given a question and a set of possible functions, "
    "make one or more function calls to achieve the purpose.\n"
    "Return ONLY the function call(s) in this format: [func_name(param=value, ...)]\n"
    "DO NOT include any other text.\n"
    "Available functions: (provide as JSON)"
)

messages = [
    {"role": "system", "content": system_prompt},
    {"role": "user", "content": "What's the weather in Tokyo?"}
]
inputs = tokenizer.apply_chat_template(
    messages, add_generation_prompt=True, return_tensors="pt"
).to(model.device)
outputs = model.generate(inputs, max_new_tokens=128, do_sample=False)
print(tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True))
# Output: [get_weather(city=Tokyo)]