YanoljaNEXT-EEVE-Rosetta-7B-2602-FP8

FP8 quantized variant of release/YanoljaNEXT-EEVE-Rosetta-7B-2602.

Model Name: release/YanoljaNEXT-EEVE-Rosetta-7B-2602-FP8
Base Model Lineage: ByteDance-Seed/Seed-X-PPO-7B
Architecture: MistralForCausalLM
Tokenizer: GemmaTokenizerFast (expanded vocabulary)
Vocab Size: 161696
Training Context Length (parent training): 8192
Max Position Embeddings: 32768 (architectural limit, not the training context length)

Quantization

This model was quantized using transformers FineGrainedFP8Config with:

quant_method: fp8
activation_scheme: dynamic
weight_block_size: [128, 128]
excluded modules: lm_head, model.embed_tokens

How to use

You can use this model with the transformers library as follows:

import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yanolja/YanoljaNEXT-EEVE-Rosetta-7B-2602-FP8"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

target_language = "Korean"
context = {
  "context": "Simple introduction about a tech company.",
  "tone": "Informative and helpful",
  "glossary": {
    "Yanolja NEXT": "야놀자넥스트",
    "travel industry": "여행 산업",
  }
}

system = [f"Translate the user's text to {target_language}."]
for key, value in context.items():
  key_pascal = key.capitalize()
  if isinstance(value, dict):
    system.append(f"{key_pascal}:")
    for f, t in value.items():
      system.append(f"- {f} -> {t}")
  else:
    system.append(f"{key_pascal}: {value}")

system.append("Output format: JSON")
system.append("Provide the final translation immediately without any other text.")

source = {
  "company_name": "Yanolja NEXT",
  "description": "Yanolja NEXT is a company that provides cutting-edge "
                 "technology for the global travel industry.",
}

messages = [
    {"role": "system", "content": "\n".join(system)},
    {"role": "user", "content": json.dumps(source, ensure_ascii=False)},
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(prompt)
# <bos><start_of_turn>instruction
# Translate the user's text to Korean.
# Context: Simple introduction about a tech company.
# Tone: Informative and helpful
# Glossary:
# - Yanolja NEXT -> 야놀자넥스트
# - travel industry -> 여행 산업
# Output format: JSON
# Provide the final translation immediately without any other text.<end_of_turn>
# <start_of_turn>source
# {"company_name": "Yanolja NEXT", "description": "Yanolja NEXT is a company that provides cutting-edge technology for the global travel industry."}<end_of_turn>
# <start_of_turn>translation

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
input_length = inputs["input_ids"].shape[1]

with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=64,
    )

generated_tokens = outputs[0][input_length:]
translation = tokenizer.decode(generated_tokens, skip_special_tokens=True)

print(json.dumps(json.loads(translation), indent=2, ensure_ascii=False))
# {
#   "company_name": "야놀자넥스트",
#   "description": "야놀자넥스트는 글로벌 여행 산업에 최첨단 기술을 제공하는 회사입니다."
# }

The model outputs the final translation in the same structured format as the input (JSON, YAML, XML) when appropriate, or plain text for simple translations.

Notes

The FP8 model is intended for faster/leaner inference than the non-FP8 release.
Translation behavior and prompting format follow the parent EEVE-Rosetta-7B-2602 release.

License

The base model ByteDance-Seed/Seed-X-PPO-7B is distributed under OpenMDW-1.0, and this derivative follows those terms.

This release includes:

LICENSE
NOTICE
THIRD_PARTY_LICENSES.md

Citation

If you use this model, please consider citing:

@misc{yanolja2026yanoljanexteeverosetta7b,
  author = {Yanolja NEXT Co., Ltd.},
  title = {YanoljaNEXT-EEVE-Rosetta-7B-2602},
  year = {2026},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\\url{https://huggingface.co/yanolja/YanoljaNEXT-EEVE-Rosetta-7B-2602}}
}

References

This work utilizes several models and prior works. We would like to acknowledge the original authors for their valuable contributions to the field.

@misc{cheng2025seedxbuildingstrongmultilingual,
  title = {Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters},
  author = {Shanbo Cheng and Yu Bao and Qian Cao and Luyang Huang and Liyan Kang and Zhicheng Liu and Yu Lu and Wenhao Zhu and Jingwen Chen and Zhichao Huang and Tao Li and Yifu Li and Huiying Lin and Sitong Liu and Ningxin Peng and Shuaijie She and Lu Xu and Nuo Xu and Sen Yang and Runsheng Yu and Yiming Yu and Liehao Zou and Hang Li and Lu Lu and Yuxuan Wang and Yonghui Wu},
  year = {2025},
  eprint = {2507.13618},
  archivePrefix = {arXiv},
  primaryClass = {cs.CL},
  url = {https://arxiv.org/abs/2507.13618}
}

@misc{gemma3,
  author = {Google},
  title = {Gemma 3},
  year = {2024},
  publisher = {Google DeepMind},
  howpublished = {\\url{https://deepmind.google/models/gemma/gemma-3/}}
}

Downloads last month: 12

Safetensors

Model size

8B params

Tensor type

F32

F8_E4M3

Model tree for yanolja/YanoljaNEXT-EEVE-Rosetta-7B-2602-FP8

Base model

ByteDance-Seed/Seed-X-PPO-7B

Quantized

(15)

this model

Paper for yanolja/YanoljaNEXT-EEVE-Rosetta-7B-2602-FP8

Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters

Paper • 2507.13618 • Published Jul 18, 2025 • 16