Aitana-2B-S-base-IP-1.0

Table of Contents

  • Model description
  • Intended uses and limitations
  • How to use
  • Training
  • Technical specifications
  • Additional information

Model description

Aitana-2B-S-base-IP-1.0 is a generative language model with a decoder-only architecture. This repository contains the base checkpoint, intended for causal language modeling and for further adaptation or task-specific fine-tuning.

Based on the files shipped in this repository, the checkpoint uses the Llama architecture and the Transformers ecosystem. The local configuration indicates:

  • architecture: LlamaForCausalLM
  • hidden size: 2048
  • layers: 24
  • attention heads: 16
  • vocabulary size: 256000
  • context length: 8192
  • tensor dtype in config: bfloat16

Intended uses and limitations

Aitana-2B-S-base-IP-1.0 is a base model that can be used for causal language modeling and text generation. As with other base checkpoints, it is generally more useful as a starting point for instruction-tuning, domain adaptation, or downstream fine-tuning than as a final end-user assistant model.

Because this repository currently only exposes the model artifacts and not the full training report, claims about domain coverage, language balance, safety behavior, and benchmark performance should be added only once they are confirmed by the model authors.

How to use

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

model_id = "gplsi/Aitana-2B-S-base-IP-1.0"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
)

prompt = "Escriu un breu resum sobre la importància de la llengua."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

outputs = model.generate(
    **inputs,
    max_new_tokens=128,
    do_sample=True,
    top_p=0.9,
    temperature=0.7,
    eos_token_id=tokenizer.eos_token_id,
    pad_token_id=tokenizer.pad_token_id,
)

print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Training

Base model

TO-DO: document the original parent checkpoint or initialization source for Aitana-2B-S-base-IP-1.0.

Training data

TO-DO: document the training corpora, language distribution, preprocessing steps, deduplication policy, anonymization steps, and data filtering criteria.

Training hyperparameters

TO-DO: document the effective batch size, learning rate schedule, optimizer setup, number of epochs or tokens seen, sequence length used during training, and hardware.

Technical specifications

Model architecture and objective

  • architecture: decoder-only causal language model
  • implementation class: LlamaForCausalLM
  • hidden size: 2048
  • intermediate size: 5440
  • layers: 24
  • attention heads: 16
  • key/value heads: 16
  • maximum position embeddings: 8192
  • vocabulary size: 256000
  • BOS token id: 1
  • EOS token id: 2
  • PAD token id: 3

Tokenizer

The tokenizer files in this repository define:

  • BOS token: <s>
  • EOS token: </s>
  • PAD token: <pad>
  • UNK token: <unk>

Hardware and software

The repository is packaged for the Hugging Face transformers library. Specific training hardware and training software details should be documented by the model authors if they are intended to be part of the public model card.

Additional information

Author

TO-DO: confirm the author list and institutional attribution to be displayed in the public model card.

Contact

TO-DO: add a contact email or project contact point.

License

TO-DO: confirm the license for this checkpoint and add it both here and in config.json if desired.

Funding

TO-DO: add funding information if this checkpoint is part of a funded project.

Disclaimer

This repository contains a base language model checkpoint. Base models can reflect biases present in their training data and may generate inaccurate, misleading, or unsafe content. Anyone deploying this model, or systems built on top of it, is responsible for evaluating those risks and ensuring compliance with applicable legal, ethical, and operational requirements.

Downloads last month
172
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gplsi/Aitana-2B-S-base-IP-1.0

Quantizations
1 model