Aitana-2B-S-base-IP-1.0
Table of Contents
- Model description
- Intended uses and limitations
- How to use
- Training
- Technical specifications
- Additional information
Model description
Aitana-2B-S-base-IP-1.0 is a generative language model with a decoder-only architecture. This repository contains the base checkpoint, intended for causal language modeling and for further adaptation or task-specific fine-tuning.
Based on the files shipped in this repository, the checkpoint uses the Llama architecture and the Transformers ecosystem. The local configuration indicates:
- architecture:
LlamaForCausalLM - hidden size:
2048 - layers:
24 - attention heads:
16 - vocabulary size:
256000 - context length:
8192 - tensor dtype in config:
bfloat16
Intended uses and limitations
Aitana-2B-S-base-IP-1.0 is a base model that can be used for causal language modeling and text generation. As with other base checkpoints, it is generally more useful as a starting point for instruction-tuning, domain adaptation, or downstream fine-tuning than as a final end-user assistant model.
Because this repository currently only exposes the model artifacts and not the full training report, claims about domain coverage, language balance, safety behavior, and benchmark performance should be added only once they are confirmed by the model authors.
How to use
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
model_id = "gplsi/Aitana-2B-S-base-IP-1.0"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
torch_dtype=torch.bfloat16,
device_map="auto",
)
prompt = "Escriu un breu resum sobre la importà ncia de la llengua."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
outputs = model.generate(
**inputs,
max_new_tokens=128,
do_sample=True,
top_p=0.9,
temperature=0.7,
eos_token_id=tokenizer.eos_token_id,
pad_token_id=tokenizer.pad_token_id,
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
Training
Base model
TO-DO: document the original parent checkpoint or initialization source for Aitana-2B-S-base-IP-1.0.
Training data
TO-DO: document the training corpora, language distribution, preprocessing steps, deduplication policy, anonymization steps, and data filtering criteria.
Training hyperparameters
TO-DO: document the effective batch size, learning rate schedule, optimizer setup, number of epochs or tokens seen, sequence length used during training, and hardware.
Technical specifications
Model architecture and objective
- architecture: decoder-only causal language model
- implementation class:
LlamaForCausalLM - hidden size:
2048 - intermediate size:
5440 - layers:
24 - attention heads:
16 - key/value heads:
16 - maximum position embeddings:
8192 - vocabulary size:
256000 - BOS token id:
1 - EOS token id:
2 - PAD token id:
3
Tokenizer
The tokenizer files in this repository define:
- BOS token:
<s> - EOS token:
</s> - PAD token:
<pad> - UNK token:
<unk>
Hardware and software
The repository is packaged for the Hugging Face transformers library.
Specific training hardware and training software details should be documented by the
model authors if they are intended to be part of the public model card.
Additional information
Author
TO-DO: confirm the author list and institutional attribution to be displayed in the public model card.
Contact
TO-DO: add a contact email or project contact point.
License
TO-DO: confirm the license for this checkpoint and add it both here and in
config.json if desired.
Funding
TO-DO: add funding information if this checkpoint is part of a funded project.
Disclaimer
This repository contains a base language model checkpoint. Base models can reflect biases present in their training data and may generate inaccurate, misleading, or unsafe content. Anyone deploying this model, or systems built on top of it, is responsible for evaluating those risks and ensuring compliance with applicable legal, ethical, and operational requirements.
- Downloads last month
- 172