Charlie 1.5

Charlie 1.5 is a high-performance, 12B-parameter large language model (LLM) built on the Mistral architecture. It is designed for long-context reasoning, complex enterprise workflows, and structured decision-making.

With a 131,072-token (128k+) context window, Charlie 1.5 enables native processing of large inputs, such as financial filings, legal contracts, technical reports without requiring retrieval-augmented generation (RAG) pipelines or external chunking systems.


Model Summary

Attribute Description
Architecture Mistral-based decoder-only transformer
Parameters ~12B
Layers 40
Hidden Size 5,120
Context Window 131,072 tokens
Vocabulary Size 131,072
Precision bfloat16 (BF16)
License Apache License 2.0

Model Highlights

  • Extended Context: Native support for 131k-token sequences using RoPE (theta: 1,000,000)
  • Efficient Attention: Grouped Query Attention (32 attention heads, 8 KV heads)
  • Broad Coverage: Large vocabulary supporting multilingual, technical, and domain-specific text
  • Deployment-Friendly: Optimized for mid-range GPUs such as NVIDIA A10G
  • Long-Form Reasoning: Particularly effective on large-document and multi-step reasoning tasks

Performance & Benchmarks

Benchmark Score
MMLU 68
MMLU-Pro 39
ARC-Challenge 60

Inference Performance (NVIDIA A10G)

  • Time to First Token (TTFT): ~80 ms
  • Throughput: ~146 tokens/sec
  • Precision: bfloat16 (BF16)

Benchmark results are indicative and may vary depending on hardware, prompt length, and configuration.


Intended Use & Scope

Charlie 1.5 is intended for:

  • Long-context document analysis
  • Enterprise decision-support systems
  • Research and experimentation
  • Commercial and non-commercial applications
  • Fine-tuning and derivative model development

The model is provided as-is and should be independently evaluated before use in high-risk or safety-critical applications.


Usage

Charlie 1.5 can be used with the Hugging Face transformers library:

import torch
from transformers import pipeline

pipe = pipeline(
    "text-generation",
    model="your-username/charlie-1.5",
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = """
Analyze the impact of a 15% tariff increase on lithium-ion components from the Asia-Pacific region. 

1. Identify the top 3 Tier 2 suppliers most at risk based on current lead times.
2. Propose a diversification strategy for our European assembly plant.
3. Calculate the projected shift in COGS if we pivot 40% of sourcing to Mexico.
"""
messages = [
     {"role": "system", "content": ""},
     {"role": "user", "content": prompt},
]

outputs = pipe(messages,
              max_new_tokens=512,
              do_sample=True,
              temperature=0.1,
              use_cache=True,
              return_full_text=False,
              num_return_sequences=1
              )
for output in outputs:
    print(output['generated_text'])

Technical Specifications

  • Hidden Size: 5,120
  • Intermediate Size: 14,336
  • Attention Heads: 32 (8 KV heads using Grouped Query Attention)
  • Activation Function: SiLU
  • Normalization: RMSNorm (epsilon: 1e-05)
  • Max Position Embeddings: 131,072

License

The model is released freely and without restriction under the Apache License 2.0. There are no restrictions on downstream usage beyond those stated in the license.

Citation & Attribution

If you use Charlie 1.5 in research or commercial applications, please attribute it to the original Gaudium AI development team.

Downloads last month
12
Safetensors
Model size
12B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for gocharlie-ai/charlie-1.5

Quantizations
2 models