exp6_flash_attn_1epoch_lr1e4_500k_vngr_corpus_10layers

This repository contains a causal language model trained using the lm-pretrain framework.
Source code: https://github.com/canbingol/lm-pretrain

Detailed experiment logs, ablations, and comparisons:
https://docs.google.com/spreadsheets/d/10dbABNIMc_WL85ba0rfGwrkbU-VHu3aRa9tnuOAGpyc/edit?usp=sharing

Usage

If you cannot use flash_attn, you can use the attn_type parameter as sdpa or eager within ModelConfig in model.py.

Download model file

from huggingface_hub import hf_hub_download

hf_hub_download(
    repo_id="canbingol/exp6_flash_attn_1epoch_lr1e4_500k_vngr_corpus_10layers", 
    filename="model.py", 
    repo_type="model",
    local_dir="./"  
)

Load model and generate

pip install flash_attn

import torch
from transformers import AutoTokenizer
from model import DecoderCausalLM

model_path = "canbingol/exp6_flash_attn_1epoch_lr1e4_500k_vngr_corpus_10layers"

device = "cuda" if torch.cuda.is_available() else "cpu"

model = DecoderCausalLM.from_pretrained(model_path).to(device=device, dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_path)

input_ids = tokenizer.encode("selam ben", return_tensors="pt").to(device)

out_tokens = model.generate(input_ids)
generated_text = tokenizer.decode(out_tokens.flatten())

print(generated_text)

Notes

DecoderCausalLM implementation is included in the model files (model.py).

Downloads last month: 9

Dataset used to train canbingol/exp6_flash_attn_1epoch_lr1e4_500k_vngr_corpus_10layers

Collection including canbingol/exp6_flash_attn_1epoch_lr1e4_500k_vngr_corpus_10layers

0.1-Pretrain experiments

Collection

This collection created for storing my pretrain expriments • 8 items • Updated Feb 28 • 1