---
license: apache-2.0
language:
  - en
library_name: transformers
pipeline_tag: text-generation
pretty_name: Echo88 150M Base
tags:
  - text-generation
  - causal-lm
  - base-model
  - decoder-only
  - autoregressive
  - from-scratch
  - llama
  - retro
  - 1980s
  - usenet
  - magazines
  - books
  - computer-history
  - english
datasets:
  - guus4324343/Echo88-150M-Base
---

# Echo88-150M-Base

Echo88-150M-Base is a small English decoder-only causal language model trained from scratch on the Echo88 pretraining dataset.

Echo88 is designed as a retro language model inspired by the language, culture, computing, magazines, Usenet discussions, and older book text available up to the late 1980s.

This is a **base model**, not an instruction-tuned chatbot. It is trained for next-token prediction and should be fine-tuned before being used as a helpful assistant.

## Model Details

- **Model name:** Echo88-150M-Base
- **Model type:** decoder-only causal language model
- **Architecture:** LLaMA-style transformer
- **Training type:** from scratch
- **Parameter count:** 163,606,272 parameters
- **Language:** English
- **Context length:** 2048 tokens
- **Tokenizer:** custom Echo88 byte-level BPE tokenizer
- **Vocabulary size:** 32,768
- **Training objective:** autoregressive next-token prediction

## Training Data

Echo88-150M-Base was trained on the Echo88 pretraining dataset.

The packed training set contains:

- **Train tokens:** 1,470,629,888
- **Eval tokens:** 1,454,080
- **Train blocks:** 718,081 blocks
- **Eval blocks:** 710 blocks
- **Block size:** 2048 tokens
- **Packed dtype:** uint16

The dataset includes a mixture of:

- public-domain book text
- Gutenberg-style older books
- UTZOO Usenet posts
- BYTE Magazine text
- PC Magazine text
- TIME Magazine text
- Internet Archive Magazine Rack OCR text
- computer and technology magazine text
- general historical magazine text

The dataset emphasizes the 1950s through the late 1980s, with a strong focus on early personal computing, printed magazines, Usenet, and older long-form writing.

Dataset used:

- `guus4324343/Echo88-Pretrain-1.17B`

## Intended Use

Echo88-150M-Base is intended for:

- causal language modeling
- retro / historical AI experiments
- small language model research
- continued pretraining
- instruction tuning
- 1980s-style assistant experiments
- computer-history language modeling
- training Echo88-150M-Instruct

Recommended flow:

```text
Echo88-150M-Base
→ supervised fine-tuning on Echo88-Instruct-173K
→ Echo88-150M-Instruct
````

## Not Instruction Tuned

This model is not instruction tuned.

It may not reliably follow commands, answer questions directly, or behave like a chat assistant. It is a base model that continues text.

Expected behavior:

* continues prompts
* completes paragraphs
* imitates old magazine/book/Usenet style
* may produce raw text instead of direct answers
* may hallucinate
* may repeat phrases
* may generate OCR-like artifacts

For chat behavior, use or create an instruction-tuned version using:

* `guus4324343/Echo88-Instruct-173K`

## Knowledge Boundary

Echo88 is designed around a historical data mixture ending around the late 1980s.

The model should not be expected to know modern topics such as:

* Google
* Wikipedia
* iPhone
* smartphones
* modern social media
* Windows 95 and later software
* COVID-19
* modern AI systems
* 2000s, 2010s, or 2020s events

Because this is a base model, it may still hallucinate if prompted about modern events. The later instruction-tuned model should be trained to respond more carefully to post-1988 topics.

## Example Usage

```python
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM

model_id = "guus4324343/Echo88-150M-Base"

tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
    model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

prompt = "The personal computer revolution of the 1980s"
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    output = model.generate(
        **inputs,
        max_new_tokens=160,
        temperature=0.8,
        top_p=0.95,
        do_sample=True,
        repetition_penalty=1.05,
        pad_token_id=tokenizer.pad_token_id,
        eos_token_id=tokenizer.eos_token_id,
    )

print(tokenizer.decode(output[0], skip_special_tokens=True))
```

## Training Configuration

Echo88-150M-Base was trained as a LLaMA-style decoder-only causal LM.

Main configuration:

```text
vocab_size: 32768
hidden_size: 768
intermediate_size: 2048
num_hidden_layers: 18
num_attention_heads: 12
num_key_value_heads: 4
max_position_embeddings: 2048
activation: SiLU / SwiGLU-style LLaMA MLP
normalization: RMSNorm
position encoding: RoPE
attention: grouped-query attention
```

Training setup:

```text
precision: bf16
sequence length: 2048
optimizer: AdamW
scheduler: cosine
weight decay: 0.1
gradient clipping: 1.0
max steps: 5610
training tokens: ~1.47B
```

## Limitations

Echo88-150M-Base is experimental and small.

Known limitations:

* not instruction tuned
* may hallucinate
* may repeat text
* may produce OCR-like artifacts
* may reflect outdated historical language or views
* may struggle with complex reasoning
* may not reliably refuse post-1988 topics
* may produce incomplete or strange continuations
* may mix unrelated historical/computer facts

The model is intended for research, experimentation, and creative retro AI work. It is not intended for high-stakes use.

## Bias and Historical Content

The training data includes historical books, magazines, and Usenet text. As a result, the model may reproduce outdated language, assumptions, stereotypes, or viewpoints present in older source material.

Users should review outputs carefully.

## Model Family

Planned Echo88 model family:

```text
Echo88-150M-Base
Echo88-150M-Instruct
Echo88-150M-Chat
```

## License

The model weights are released under the Apache 2.0 license.

The training dataset is mixed-source and released separately under `other`. Users are responsible for checking dataset source rights, licensing, and suitability for their own use case.

```
```