YanoljaNEXT-EEVE-Rosetta-7B-2602-FP8

FP8 quantized variant of release/YanoljaNEXT-EEVE-Rosetta-7B-2602.

  • Model Name: release/YanoljaNEXT-EEVE-Rosetta-7B-2602-FP8
  • Base Model Lineage: ByteDance-Seed/Seed-X-PPO-7B
  • Architecture: MistralForCausalLM
  • Tokenizer: GemmaTokenizerFast (expanded vocabulary)
  • Vocab Size: 161696
  • Training Context Length (parent training): 8192
  • Max Position Embeddings: 32768 (architectural limit, not the training context length)

Quantization

This model was quantized using transformers FineGrainedFP8Config with:

  • quant_method: fp8
  • activation_scheme: dynamic
  • weight_block_size: [128, 128]
  • excluded modules: lm_head, model.embed_tokens

How to use

You can use this model with the transformers library as follows:

import json
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "yanolja/YanoljaNEXT-EEVE-Rosetta-7B-2602-FP8"
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    dtype=torch.bfloat16,
    device_map="auto",
)
tokenizer = AutoTokenizer.from_pretrained(model_id)

target_language = "Korean"
context = {
  "context": "Simple introduction about a tech company.",
  "tone": "Informative and helpful",
  "glossary": {
    "Yanolja NEXT": "μ•Όλ†€μžλ„₯슀트",
    "travel industry": "μ—¬ν–‰ μ‚°μ—…",
  }
}

system = [f"Translate the user's text to {target_language}."]
for key, value in context.items():
  key_pascal = key.capitalize()
  if isinstance(value, dict):
    system.append(f"{key_pascal}:")
    for f, t in value.items():
      system.append(f"- {f} -> {t}")
  else:
    system.append(f"{key_pascal}: {value}")

system.append("Output format: JSON")
system.append("Provide the final translation immediately without any other text.")

source = {
  "company_name": "Yanolja NEXT",
  "description": "Yanolja NEXT is a company that provides cutting-edge "
                 "technology for the global travel industry.",
}

messages = [
    {"role": "system", "content": "\n".join(system)},
    {"role": "user", "content": json.dumps(source, ensure_ascii=False)},
]

prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(prompt)
# <bos><start_of_turn>instruction
# Translate the user's text to Korean.
# Context: Simple introduction about a tech company.
# Tone: Informative and helpful
# Glossary:
# - Yanolja NEXT -> μ•Όλ†€μžλ„₯슀트
# - travel industry -> μ—¬ν–‰ μ‚°μ—…
# Output format: JSON
# Provide the final translation immediately without any other text.<end_of_turn>
# <start_of_turn>source
# {"company_name": "Yanolja NEXT", "description": "Yanolja NEXT is a company that provides cutting-edge technology for the global travel industry."}<end_of_turn>
# <start_of_turn>translation

inputs = tokenizer(prompt, return_tensors="pt").to("cuda")
input_length = inputs["input_ids"].shape[1]

with torch.inference_mode():
    outputs = model.generate(
        **inputs,
        max_new_tokens=64,
    )

generated_tokens = outputs[0][input_length:]
translation = tokenizer.decode(generated_tokens, skip_special_tokens=True)

print(json.dumps(json.loads(translation), indent=2, ensure_ascii=False))
# {
#   "company_name": "μ•Όλ†€μžλ„₯슀트",
#   "description": "μ•Όλ†€μžλ„₯μŠ€νŠΈλŠ” κΈ€λ‘œλ²Œ μ—¬ν–‰ 산업에 μ΅œμ²¨λ‹¨ κΈ°μˆ μ„ μ œκ³΅ν•˜λŠ” νšŒμ‚¬μž…λ‹ˆλ‹€."
# }

The model outputs the final translation in the same structured format as the input (JSON, YAML, XML) when appropriate, or plain text for simple translations.

Notes

  • The FP8 model is intended for faster/leaner inference than the non-FP8 release.
  • Translation behavior and prompting format follow the parent EEVE-Rosetta-7B-2602 release.

License

The base model ByteDance-Seed/Seed-X-PPO-7B is distributed under OpenMDW-1.0, and this derivative follows those terms.

This release includes:

  • LICENSE
  • NOTICE
  • THIRD_PARTY_LICENSES.md

Citation

If you use this model, please consider citing:

@misc{yanolja2026yanoljanexteeverosetta7b,
  author = {Yanolja NEXT Co., Ltd.},
  title = {YanoljaNEXT-EEVE-Rosetta-7B-2602},
  year = {2026},
  publisher = {Hugging Face},
  journal = {Hugging Face repository},
  howpublished = {\\url{https://huggingface.co/yanolja/YanoljaNEXT-EEVE-Rosetta-7B-2602}}
}

References

This work utilizes several models and prior works. We would like to acknowledge the original authors for their valuable contributions to the field.

@misc{cheng2025seedxbuildingstrongmultilingual,
  title = {Seed-X: Building Strong Multilingual Translation LLM with 7B Parameters},
  author = {Shanbo Cheng and Yu Bao and Qian Cao and Luyang Huang and Liyan Kang and Zhicheng Liu and Yu Lu and Wenhao Zhu and Jingwen Chen and Zhichao Huang and Tao Li and Yifu Li and Huiying Lin and Sitong Liu and Ningxin Peng and Shuaijie She and Lu Xu and Nuo Xu and Sen Yang and Runsheng Yu and Yiming Yu and Liehao Zou and Hang Li and Lu Lu and Yuxuan Wang and Yonghui Wu},
  year = {2025},
  eprint = {2507.13618},
  archivePrefix = {arXiv},
  primaryClass = {cs.CL},
  url = {https://arxiv.org/abs/2507.13618}
}

@misc{gemma3,
  author = {Google},
  title = {Gemma 3},
  year = {2024},
  publisher = {Google DeepMind},
  howpublished = {\\url{https://deepmind.google/models/gemma/gemma-3/}}
}
Downloads last month
12
Safetensors
Model size
8B params
Tensor type
F32
Β·
F8_E4M3
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for yanolja/YanoljaNEXT-EEVE-Rosetta-7B-2602-FP8

Quantized
(15)
this model

Paper for yanolja/YanoljaNEXT-EEVE-Rosetta-7B-2602-FP8