Charlie 1.5
Charlie 1.5 is a high-performance, 12B-parameter large language model (LLM) built on the Mistral architecture. It is designed for long-context reasoning, complex enterprise workflows, and structured decision-making.
With a 131,072-token (128k+) context window, Charlie 1.5 enables native processing of large inputs, such as financial filings, legal contracts, technical reports without requiring retrieval-augmented generation (RAG) pipelines or external chunking systems.
Model Summary
| Attribute | Description |
|---|---|
| Architecture | Mistral-based decoder-only transformer |
| Parameters | ~12B |
| Layers | 40 |
| Hidden Size | 5,120 |
| Context Window | 131,072 tokens |
| Vocabulary Size | 131,072 |
| Precision | bfloat16 (BF16) |
| License | Apache License 2.0 |
Model Highlights
- Extended Context: Native support for 131k-token sequences using RoPE (theta: 1,000,000)
- Efficient Attention: Grouped Query Attention (32 attention heads, 8 KV heads)
- Broad Coverage: Large vocabulary supporting multilingual, technical, and domain-specific text
- Deployment-Friendly: Optimized for mid-range GPUs such as NVIDIA A10G
- Long-Form Reasoning: Particularly effective on large-document and multi-step reasoning tasks
Performance & Benchmarks
| Benchmark | Score |
|---|---|
| MMLU | 68 |
| MMLU-Pro | 39 |
| ARC-Challenge | 60 |
Inference Performance (NVIDIA A10G)
- Time to First Token (TTFT): ~80 ms
- Throughput: ~146 tokens/sec
- Precision: bfloat16 (BF16)
Benchmark results are indicative and may vary depending on hardware, prompt length, and configuration.
Intended Use & Scope
Charlie 1.5 is intended for:
- Long-context document analysis
- Enterprise decision-support systems
- Research and experimentation
- Commercial and non-commercial applications
- Fine-tuning and derivative model development
The model is provided as-is and should be independently evaluated before use in high-risk or safety-critical applications.
Usage
Charlie 1.5 can be used with the Hugging Face transformers library:
import torch
from transformers import pipeline
pipe = pipeline(
"text-generation",
model="your-username/charlie-1.5",
torch_dtype=torch.bfloat16,
device_map="auto"
)
prompt = """
Analyze the impact of a 15% tariff increase on lithium-ion components from the Asia-Pacific region.
1. Identify the top 3 Tier 2 suppliers most at risk based on current lead times.
2. Propose a diversification strategy for our European assembly plant.
3. Calculate the projected shift in COGS if we pivot 40% of sourcing to Mexico.
"""
messages = [
{"role": "system", "content": ""},
{"role": "user", "content": prompt},
]
outputs = pipe(messages,
max_new_tokens=512,
do_sample=True,
temperature=0.1,
use_cache=True,
return_full_text=False,
num_return_sequences=1
)
for output in outputs:
print(output['generated_text'])
Technical Specifications
- Hidden Size: 5,120
- Intermediate Size: 14,336
- Attention Heads: 32 (8 KV heads using Grouped Query Attention)
- Activation Function: SiLU
- Normalization: RMSNorm (epsilon: 1e-05)
- Max Position Embeddings: 131,072
License
The model is released freely and without restriction under the Apache License 2.0. There are no restrictions on downstream usage beyond those stated in the license.
Citation & Attribution
If you use Charlie 1.5 in research or commercial applications, please attribute it to the original Gaudium AI development team.
- Downloads last month
- 12