--- license: apache-2.0 language: - en library_name: transformers pipeline_tag: text-generation pretty_name: Echo88 150M Base tags: - text-generation - causal-lm - base-model - decoder-only - autoregressive - from-scratch - llama - retro - 1980s - usenet - magazines - books - computer-history - english datasets: - guus4324343/Echo88-150M-Base --- # Echo88-150M-Base Echo88-150M-Base is a small English decoder-only causal language model trained from scratch on the Echo88 pretraining dataset. Echo88 is designed as a retro language model inspired by the language, culture, computing, magazines, Usenet discussions, and older book text available up to the late 1980s. This is a **base model**, not an instruction-tuned chatbot. It is trained for next-token prediction and should be fine-tuned before being used as a helpful assistant. ## Model Details - **Model name:** Echo88-150M-Base - **Model type:** decoder-only causal language model - **Architecture:** LLaMA-style transformer - **Training type:** from scratch - **Parameter count:** 163,606,272 parameters - **Language:** English - **Context length:** 2048 tokens - **Tokenizer:** custom Echo88 byte-level BPE tokenizer - **Vocabulary size:** 32,768 - **Training objective:** autoregressive next-token prediction ## Training Data Echo88-150M-Base was trained on the Echo88 pretraining dataset. The packed training set contains: - **Train tokens:** 1,470,629,888 - **Eval tokens:** 1,454,080 - **Train blocks:** 718,081 blocks - **Eval blocks:** 710 blocks - **Block size:** 2048 tokens - **Packed dtype:** uint16 The dataset includes a mixture of: - public-domain book text - Gutenberg-style older books - UTZOO Usenet posts - BYTE Magazine text - PC Magazine text - TIME Magazine text - Internet Archive Magazine Rack OCR text - computer and technology magazine text - general historical magazine text The dataset emphasizes the 1950s through the late 1980s, with a strong focus on early personal computing, printed magazines, Usenet, and older long-form writing. Dataset used: - `guus4324343/Echo88-Pretrain-1.17B` ## Intended Use Echo88-150M-Base is intended for: - causal language modeling - retro / historical AI experiments - small language model research - continued pretraining - instruction tuning - 1980s-style assistant experiments - computer-history language modeling - training Echo88-150M-Instruct Recommended flow: ```text Echo88-150M-Base → supervised fine-tuning on Echo88-Instruct-173K → Echo88-150M-Instruct ```` ## Not Instruction Tuned This model is not instruction tuned. It may not reliably follow commands, answer questions directly, or behave like a chat assistant. It is a base model that continues text. Expected behavior: * continues prompts * completes paragraphs * imitates old magazine/book/Usenet style * may produce raw text instead of direct answers * may hallucinate * may repeat phrases * may generate OCR-like artifacts For chat behavior, use or create an instruction-tuned version using: * `guus4324343/Echo88-Instruct-173K` ## Knowledge Boundary Echo88 is designed around a historical data mixture ending around the late 1980s. The model should not be expected to know modern topics such as: * Google * Wikipedia * iPhone * smartphones * modern social media * Windows 95 and later software * COVID-19 * modern AI systems * 2000s, 2010s, or 2020s events Because this is a base model, it may still hallucinate if prompted about modern events. The later instruction-tuned model should be trained to respond more carefully to post-1988 topics. ## Example Usage ```python import torch from transformers import AutoTokenizer, AutoModelForCausalLM model_id = "guus4324343/Echo88-150M-Base" tokenizer = AutoTokenizer.from_pretrained(model_id) model = AutoModelForCausalLM.from_pretrained( model_id, torch_dtype=torch.bfloat16, device_map="auto" ) prompt = "The personal computer revolution of the 1980s" inputs = tokenizer(prompt, return_tensors="pt").to(model.device) with torch.no_grad(): output = model.generate( **inputs, max_new_tokens=160, temperature=0.8, top_p=0.95, do_sample=True, repetition_penalty=1.05, pad_token_id=tokenizer.pad_token_id, eos_token_id=tokenizer.eos_token_id, ) print(tokenizer.decode(output[0], skip_special_tokens=True)) ``` ## Training Configuration Echo88-150M-Base was trained as a LLaMA-style decoder-only causal LM. Main configuration: ```text vocab_size: 32768 hidden_size: 768 intermediate_size: 2048 num_hidden_layers: 18 num_attention_heads: 12 num_key_value_heads: 4 max_position_embeddings: 2048 activation: SiLU / SwiGLU-style LLaMA MLP normalization: RMSNorm position encoding: RoPE attention: grouped-query attention ``` Training setup: ```text precision: bf16 sequence length: 2048 optimizer: AdamW scheduler: cosine weight decay: 0.1 gradient clipping: 1.0 max steps: 5610 training tokens: ~1.47B ``` ## Limitations Echo88-150M-Base is experimental and small. Known limitations: * not instruction tuned * may hallucinate * may repeat text * may produce OCR-like artifacts * may reflect outdated historical language or views * may struggle with complex reasoning * may not reliably refuse post-1988 topics * may produce incomplete or strange continuations * may mix unrelated historical/computer facts The model is intended for research, experimentation, and creative retro AI work. It is not intended for high-stakes use. ## Bias and Historical Content The training data includes historical books, magazines, and Usenet text. As a result, the model may reproduce outdated language, assumptions, stereotypes, or viewpoints present in older source material. Users should review outputs carefully. ## Model Family Planned Echo88 model family: ```text Echo88-150M-Base Echo88-150M-Instruct Echo88-150M-Chat ``` ## License The model weights are released under the Apache 2.0 license. The training dataset is mixed-source and released separately under `other`. Users are responsible for checking dataset source rights, licensing, and suitability for their own use case. ``` ```