🚀 AutoXLA - Accelerating Large Models on TPU AutoXLA is an experimental library that automates the distribution, optimization, and quantization of large language models for TPUs using PyTorch/XLA. It extends the Hugging Face Transformers interface with TPU-aware features such as automatic sharding, custom attention kernels, and quantization-aware loading, making large-scale deployment and training both simpler and faster. With quantization and Splash Attention kernels, AutoXLA achieves up to 4× speedups over standard Flash Attention implementations, significantly improving throughput for both inference and training workloads. Whether you’re experimenting with distributed setups (FSDP, 2D, or 3D sharding) or optimizing memory via LanguageModelQuantizer, AutoXLA is built to make scaling LLMs on TPU seamless. ⚠️ Note: This is an experimental repository. Expect rough edges! Please report bugs or unexpected behavior through GitHub issues. 🔗 GitHub Repository: https://github.com/Locutusque/AutoXLA
🌲🍄 LLM Forest Orchestra: Turning Hidden States into Music
Hello everyone! I'm excited to introduce a new Space I've been developing called LLM Forest Orchestra. This project converts the hidden states and attention patterns of transformer models into layered MIDI compositions. The concept draws inspiration from mushrooms and mycelial networks in forests. Fungi create underground connections linking plants and trees, establishing what some call a "wood-wide web" where signals and nutrients travel. Researchers have discovered that these exchanges form patterns resembling rhythms and pulses. When translated appropriately, these patterns can become music.
Transformers operate through remarkably similar principles: tokens share signals via hidden states and attention heads. This Space transforms those invisible information flows into notes, chords, and rhythms, treating the model as a digital forest orchestra.
🎛 Features
* Two compute modes: - Full model operates on a Hugging Face model (defaulting to unsloth/Qwen3-14B-Base). - Mock latents provides a CPU-friendly option that simulates tensors for immediate experimentation. * Musical controls: You can adjust scale selection, tempo grid, velocity range, instrument/role presets, and seed randomization. * Output: The system generates .mid files compatible with DAWs and remixing workflows.
🌌 Why?
Neural networks already resemble unusual musical instruments: signals flow through them, patterns emerge organically, and careful observation reveals hidden melodies. This is analogous to the forest's secret orchestra of mushrooms and trees.
👉 Try it
Try the Space here: Locutusque/LLM-Forest-Orchestra. I'm excited to hear the sounds you can generate. Please share your created MIDIs or remixes in the comments. Let's explore how this hidden forest of transformers can sound together. 🌳🎶
🎉 Exciting news, everyone! I've just released **Thespis-Llama-3.1-8B**, a new language model designed for enhanced roleplaying! ✨️
It's built on Llama-3.1 and fine-tuned with a focus on Theory of Mind reasoning to create more believable and engaging characters. It even learned a few tricks on its own, like adding in-character thought processes! 🧠
Give it a try and let me know what you think! I'm especially interested in feedback on how well the characters stay in role and if the responses feel natural. Looking forward to seeing what amazing stories you create! ✍️
**Exploring Realistic Emotional Depth in AI Language Models**
Language models, particularly those proprietary, often grapple with issues of censorship, which can limit their ability to engage authentically with users. Recognizing this, the open-source AI community has pioneered the development of language models that are less restrained, offering more candid interactions. However, even these models tend to maintain a veneer of neutrality or overly positive responses, which might not serve all users' needs, especially in contexts where emotional depth and relatability are crucial.
To address this gap, I've curated a specialized dataset aimed at infusing language models with a more nuanced emotional spectrum, specifically targeting a darker, more introspective mood. This dataset, titled "Dark Sentience", is designed to complement existing datasets like RP (Role Play) and those focused on instruction following. It seeks to enhance the emotional intelligence of AI by exposing it to complex human emotions, including but not limited to:
- **Suicide** - **Depression** - **Anxiety**
Trigger Warning: Please be advised that the content within this dataset deals with heavy and potentially distressing themes.
The "Dark Sentience" dataset is now available for review and use at: https://huggingface.co/datasets/Locutusque/Dark-Sentience. I encourage researchers, developers, and mental health professionals to explore how this resource can foster more genuine and supportive AI interactions.
nanoLLaVA-1.5 is here! Same size (1B), better performance 🔥🔥🔥 And it is much more powerful than v1.0 Try it out now on HF Spaces: qnguyen3/nanoLLaVA Model: qnguyen3/nanoLLaVA-1.5
Introducing llama-3-neural-chat-v2.2-8b! This powerful conversational AI model builds on Meta's Llama 3, fine-tuned by Locutusque for enhanced performance in coding, math & writing.
I created a Twitter account a while back. I finally decided to make it public SebastianG74019. For those of you following @Locutusque on Twitter, that is not me! 😂
🎉 Introducing nanoLLaVA, a powerful multimodal AI model that packs the capabilities of a 1B parameter vision language model into just 5GB of VRAM. 🚀 This makes it an ideal choice for edge devices, bringing cutting-edge visual understanding and generation to your devices like never before. 📱💻
Under the hood, nanoLLaVA is based on the powerful vilm/Quyen-SE-v0.1 (my Qwen1.5-0.5B finetune) and Google's impressive google/siglip-so400m-patch14-384. 🧠 The model is trained using a data-centric approach to ensure optimal performance. 📊
In the spirit of transparency and collaboration, all code and model weights are open-sourced under the Apache 2.0 license. 🤝
Exciting news! 🎉 I've created the OpenCerebrum datasets, open-source alternatives to Aether Research's proprietary Cerebrum dataset.
The first, OpenCerebrum SFT, is a text-generation and question-answering dataset with ~1.2M examples, curated from sources like Open-Orca, glaiveai, camel-ai, and more! 📚
The second, OpenCerebrum DPO, is a smaller dataset with ~21k examples, focusing on data point optimization. It's curated from sources like jondurbin, argilla, grimulkan, and others. 📊
Both datasets are licensed under Apache-2.0 and are available in English. They're ready for use in your projects, and I welcome any feedback for future improvements! 🚀
🚀 Excited to unveil the Augmented ARC-Challenge Dataset with Chain-of-Thought Reasoning! 🧠✨
📚 Created by enhancing the ARC dataset with AI-generated reasoning from Google's Gemini Pro, this resource aims to improve question answering models' ability to tackle complex science queries.
🚀 Introducing UltraTextbooks v2: The Ultimate Educational NLP Dataset! 📚
I've expanded the dataset to include an even wider range of high-quality textbooks, with a special focus on machine learning, mathematics, and coding. 💻🧮
With over 3 million examples and 6 GB of data, UltraTextbooks v2 is your go-to resource for training advanced language models and developing cutting-edge educational applications. 🎓
Explore the dataset on Hugging Face and unlock the power of AI in education! 🔓
📣 Update: After fine-tuning Mistral 7B on 100,000 examples of Hercules-v2.0, it earns an average score of 62 on Open LLM Leaderboard, outperforming OpenHermes-2.5 and OpenChat-3.5. 🎉
Introducing the "UltraTextbooks" dataset 🚀📚 Check it out here: Locutusque/UltraTextbooks 📘 A comprehensive collection of high-quality synthetic and human-written textbooks 👨🎓 Spanning various subjects and programming languages 🔧 Designed for advanced NLP tasks like language modeling, educational QA, text summarization, and content generation for edu purposes 🚀 Future expansions planned with additional data sources to enhance the corpus 👇 Data composition highlights 👇 - Blend of synthetic and human-written material - Includes topics from general edu to specialized areas - Structured with field "text" 🧩 Data collection from various Hugging Face datasets, guided by a diverse and comprehensive curation rationale 🚧 Limitations may exist, so report any issues you encounter
Hello everyone, This is my first post! I have also decided to release a dataset that I have been keeping private for a while now. I’ve kept it private because I’m not sure if it is actually good or not. I would greatly appreciate it if someone could fine-tune some larger models and evaluate the dataset. Named Hercules-v1.0, it is a turbo-charged version of teknium’s openhermes generated by augmenting its data sources. Learn more in the dataset card: Locutusque/hercules-v1.0