Echo88-150M-Base

Echo88-150M-Base is a small English decoder-only causal language model trained from scratch on the Echo88 pretraining dataset.

Echo88 is designed as a retro language model inspired by the language, culture, computing, magazines, Usenet discussions, and older book text available up to the late 1980s.

This is a base model, not an instruction-tuned chatbot. It is trained for next-token prediction and should be fine-tuned before being used as a helpful assistant.

Model Details

  • Model name: Echo88-150M-Base
  • Model type: decoder-only causal language model
  • Architecture: LLaMA-style transformer
  • Training type: from scratch
  • Parameter count: 163,606,272 parameters
  • Language: English
  • Context length: 2048 tokens
  • Tokenizer: custom Echo88 byte-level BPE tokenizer
  • Vocabulary size: 32,768
  • Training objective: autoregressive next-token prediction

Training Data

Echo88-150M-Base was trained on the Echo88 pretraining dataset.

The packed training set contains:

  • Train tokens: 1,470,629,888
  • Eval tokens: 1,454,080
  • Train blocks: 718,081 blocks
  • Eval blocks: 710 blocks
  • Block size: 2048 tokens
  • Packed dtype: uint16

The dataset includes a mixture of:

  • public-domain book text
  • Gutenberg-style older books
  • UTZOO Usenet posts
  • BYTE Magazine text
  • PC Magazine text
  • TIME Magazine text
  • Internet Archive Magazine Rack OCR text
  • computer and technology magazine text
  • general historical magazine text

The dataset emphasizes the 1950s through the late 1980s, with a strong focus on early personal computing, printed magazines, Usenet, and older long-form writing.

Dataset used:

  • guus4324343/Echo88-Pretrain-1.17B

Intended Use

Echo88-150M-Base is intended for:

  • causal language modeling
  • retro / historical AI experiments
  • small language model research
  • continued pretraining
  • instruction tuning
  • 1980s-style assistant experiments
  • computer-history language modeling
  • training Echo88-150M-Instruct

Recommended flow:

Echo88-150M-Base
→ supervised fine-tuning on Echo88-Instruct-173K
→ Echo88-150M-Instruct

Not Instruction Tuned

This model is not instruction tuned.

It may not reliably follow commands, answer questions directly, or behave like a chat assistant. It is a base model that continues text.

Expected behavior:

  • continues prompts
  • completes paragraphs
  • imitates old magazine/book/Usenet style
  • may produce raw text instead of direct answers
  • may hallucinate
  • may repeat phrases
  • may generate OCR-like artifacts

For chat behavior, use or create an instruction-tuned version using:

  • guus4324343/Echo88-Instruct-173K

Knowledge Boundary

Echo88 is designed around a historical data mixture ending around the late 1980s.

The model should not be expected to know modern topics such as:

  • Google
  • Wikipedia
  • iPhone
  • smartphones
  • modern social media
  • Windows 95 and later software
  • COVID-19
  • modern AI systems
  • 2000s, 2010s, or 2020s events

Because this is a base model, it may still hallucinate if prompted about modern events. The later instruction-tuned model should be trained to respond more carefully to post-1988 topics.

Example Usage

import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "guus4324343/Echo88-150M-Base"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "The personal computer revolution of the 1980s"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=160,
        temperature=0.8,
        top_p=0.95,
        do_sample=True,
        repetition_penalty=1.05,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))

Training Configuration

Echo88-150M-Base was trained as a LLaMA-style decoder-only causal LM.

Main configuration:

vocab_size: 32768
hidden_size: 768
intermediate_size: 2048
num_hidden_layers: 18
num_attention_heads: 12
num_key_value_heads: 4
max_position_embeddings: 2048
activation: SiLU / SwiGLU-style LLaMA MLP
normalization: RMSNorm
position encoding: RoPE
attention: grouped-query attention

Training setup:

precision: bf16
sequence length: 2048
optimizer: AdamW
scheduler: cosine
weight decay: 0.1
gradient clipping: 1.0
max steps: 5610
training tokens: ~1.47B

Limitations

Echo88-150M-Base is experimental and small.

Known limitations:

  • not instruction tuned
  • may hallucinate
  • may repeat text
  • may produce OCR-like artifacts
  • may reflect outdated historical language or views
  • may struggle with complex reasoning
  • may not reliably refuse post-1988 topics
  • may produce incomplete or strange continuations
  • may mix unrelated historical/computer facts

The model is intended for research, experimentation, and creative retro AI work. It is not intended for high-stakes use.

Bias and Historical Content

The training data includes historical books, magazines, and Usenet text. As a result, the model may reproduce outdated language, assumptions, stereotypes, or viewpoints present in older source material.

Users should review outputs carefully.

Model Family

Planned Echo88 model family:

Echo88-150M-Base
Echo88-150M-Instruct
Echo88-150M-Chat

License

The model weights are released under the Apache 2.0 license.

The training dataset is mixed-source and released separately under other. Users are responsible for checking dataset source rights, licensing, and suitability for their own use case.


Downloads last month
34
Safetensors
Model size
0.2B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for guus4324343/Echo88-150M-Base

Finetunes
1 model

Dataset used to train guus4324343/Echo88-150M-Base

Collection including guus4324343/Echo88-150M-Base