| --- |
| language: en |
| license: apache-2.0 |
| library_name: transformers |
| tags: |
| - text-generation |
| - pytorch |
| - gpt |
| - language-model |
| --- |
| |
| # tinyMind |
|
|
| This is a small transformer language model trained from scratch with approximately 17,731,328 parameters. |
|
|
| ## Model Details |
|
|
| - **Architecture**: GPT-style transformer |
| - **Parameters**: ~17M |
| - **Layers**: 6 |
| - **Attention Heads**: 8 |
| - **Embedding Dimension**: 256 |
| - **Max Sequence Length**: 512 |
| - **Vocabulary Size**: 50257 |
|
|
| ## Training Data |
|
|
| The model was trained on a diverse mixture of high-quality text data including: |
| - OpenWebText |
| - Wikipedia articles |
| - BookCorpus |
| - Other curated text sources |
|
|
| ## Usage |
|
|
| ```python |
| from transformers import GPT2TokenizerFast, AutoModelForCausalLM |
| |
| tokenizer = GPT2TokenizerFast.from_pretrained("HenrySentinel/tinyMind") |
| model = AutoModelForCausalLM.from_pretrained("HenrySentinel/tinyMind") |
| |
| # Generate text |
| input_text = "The key to artificial intelligence is" |
| input_ids = tokenizer.encode(input_text, return_tensors="pt") |
| output = model.generate(input_ids, max_length=100, temperature=0.8, do_sample=True) |
| generated_text = tokenizer.decode(output[0], skip_special_tokens=True) |
| print(generated_text) |
| ``` |
|
|
| ## Training Details |
|
|
| - **Optimizer**: AdamW with cosine learning rate scheduling |
| - **Learning Rate**: 0.001 |
| - **Batch Size**: 8 |
| - **Sequence Length**: 512 |
| - **Epochs**: 3 |
| - **Gradient Clipping**: 1.0 |
|
|
| ## Limitations |
|
|
| This is a small model designed for experimentation and learning. It may: |
| - Generate inconsistent or factually incorrect content |
| - Have limited knowledge compared to larger models |
| - Require careful prompt engineering for best results |
|
|
| ## License |
|
|
| Apache 2.0 |
|
|