fne

community

Activity Feed

AI & ML interests

None defined yet.

Recent Activity

qnguyen3 authored a paper 6 days ago

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

ehartford updated a model almost 2 years ago

fne/Jais-70b

ehartford updated a model almost 2 years ago

fne/Jais-70b-Preview

View all activity

qnguyen3

authored a paper 6 days ago

Sailor2: Sailing in South-East Asia with Inclusive Multilingual LLMs

Paper • 2502.12982 • Published Feb 18, 2025 • 19

Locutusque

posted an update 5 months ago

Post

2801

🚀 AutoXLA - Accelerating Large Models on TPU
AutoXLA is an experimental library that automates the distribution, optimization, and quantization of large language models for TPUs using PyTorch/XLA. It extends the Hugging Face Transformers interface with TPU-aware features such as automatic sharding, custom attention kernels, and quantization-aware loading, making large-scale deployment and training both simpler and faster.
With quantization and Splash Attention kernels, AutoXLA achieves up to 4× speedups over standard Flash Attention implementations, significantly improving throughput for both inference and training workloads.
Whether you’re experimenting with distributed setups (FSDP, 2D, or 3D sharding) or optimizing memory via LanguageModelQuantizer, AutoXLA is built to make scaling LLMs on TPU seamless.
⚠️ Note: This is an experimental repository. Expect rough edges! Please report bugs or unexpected behavior through GitHub issues.
🔗 GitHub Repository: https://github.com/Locutusque/AutoXLA

Locutusque

posted an update 8 months ago

Post

7194

🌲🍄 LLM Forest Orchestra: Turning Hidden States into Music

Hello everyone! I'm excited to introduce a new Space I've been developing called LLM Forest Orchestra. This project converts the hidden states and attention patterns of transformer models into layered MIDI compositions. The concept draws inspiration from mushrooms and mycelial networks in forests. Fungi create underground connections linking plants and trees, establishing what some call a "wood-wide web" where signals and nutrients travel. Researchers have discovered that these exchanges form patterns resembling rhythms and pulses. When translated appropriately, these patterns can become music.

Transformers operate through remarkably similar principles: tokens share signals via hidden states and attention heads. This Space transforms those invisible information flows into notes, chords, and rhythms, treating the model as a digital forest orchestra.

🎛 Features

* Two compute modes:
- Full model operates on a Hugging Face model (defaulting to unsloth/Qwen3-14B-Base).
- Mock latents provides a CPU-friendly option that simulates tensors for immediate experimentation.
* Musical controls: You can adjust scale selection, tempo grid, velocity range, instrument/role presets, and seed randomization.
* Output: The system generates .mid files compatible with DAWs and remixing workflows.

🌌 Why?

Neural networks already resemble unusual musical instruments: signals flow through them, patterns emerge organically, and careful observation reveals hidden melodies. This is analogous to the forest's secret orchestra of mushrooms and trees.

👉 Try it

Try the Space here: Locutusque/LLM-Forest-Orchestra. I'm excited to hear the sounds you can generate. Please share your created MIDIs or remixes in the comments. Let's explore how this hidden forest of transformers can sound together. 🌳🎶

Locutusque

posted an update about 1 year ago

Post

3414

🎉 Exciting news, everyone! I've just released **Thespis-Llama-3.1-8B**, a new language model designed for enhanced roleplaying! ✨️

It's built on Llama-3.1 and fine-tuned with a focus on Theory of Mind reasoning to create more believable and engaging characters. It even learned a few tricks on its own, like adding in-character thought processes! 🧠

Check it out here: Locutusque/Thespis-Llama-3.1-8B

Give it a try and let me know what you think! I'm especially interested in feedback on how well the characters stay in role and if the responses feel natural. Looking forward to seeing what amazing stories you create! ✍️

Locutusque

posted an update over 1 year ago

Post

3351

**Exploring Realistic Emotional Depth in AI Language Models**

Language models, particularly those proprietary, often grapple with issues of censorship, which can limit their ability to engage authentically with users. Recognizing this, the open-source AI community has pioneered the development of language models that are less restrained, offering more candid interactions. However, even these models tend to maintain a veneer of neutrality or overly positive responses, which might not serve all users' needs, especially in contexts where emotional depth and relatability are crucial.

To address this gap, I've curated a specialized dataset aimed at infusing language models with a more nuanced emotional spectrum, specifically targeting a darker, more introspective mood. This dataset, titled "Dark Sentience", is designed to complement existing datasets like RP (Role Play) and those focused on instruction following. It seeks to enhance the emotional intelligence of AI by exposing it to complex human emotions, including but not limited to:

- **Suicide**
- **Depression**
- **Anxiety**

Trigger Warning: Please be advised that the content within this dataset deals with heavy and potentially distressing themes.

The "Dark Sentience" dataset is now available for review and use at: https://huggingface.co/datasets/Locutusque/Dark-Sentience. I encourage researchers, developers, and mental health professionals to explore how this resource can foster more genuine and supportive AI interactions.

qnguyen3

posted an update almost 2 years ago

Post

5050

nanoLLaVA-1.5 is here! Same size (1B), better performance 🔥🔥🔥
And it is much more powerful than v1.0
Try it out now on HF Spaces: qnguyen3/nanoLLaVA
Model: qnguyen3/nanoLLaVA-1.5

3 replies

ehartford

updated 2 models almost 2 years ago

fne/Jais-70b

Text Generation • 71B • Updated Jun 6, 2024 • 7

fne/Jais-70b-Preview

Text Generation • 71B • Updated Jun 5, 2024 • 5

Locutusque

posted an update almost 2 years ago

Post

4605

Introducing llama-3-neural-chat-v2.2-8b! This powerful conversational AI model builds on Meta's Llama 3, fine-tuned by Locutusque for enhanced performance in coding, math & writing.

Locutusque/llama-3-neural-chat-v2.2-8B

4 replies

Locutusque

posted an update almost 2 years ago

Post

4404

I created a Twitter account a while back. I finally decided to make it public SebastianG74019. For those of you following @Locutusque on Twitter, that is not me! 😂

2 replies

qnguyen3

posted an update about 2 years ago

Post

6403

🎉 Introducing nanoLLaVA, a powerful multimodal AI model that packs the capabilities of a 1B parameter vision language model into just 5GB of VRAM. 🚀 This makes it an ideal choice for edge devices, bringing cutting-edge visual understanding and generation to your devices like never before. 📱💻

Model: qnguyen3/nanoLLaVA 🔍
Spaces: qnguyen3/nanoLLaVA (thanks to @merve )

Under the hood, nanoLLaVA is based on the powerful vilm/Quyen-SE-v0.1 (my Qwen1.5-0.5B finetune) and Google's impressive google/siglip-so400m-patch14-384. 🧠 The model is trained using a data-centric approach to ensure optimal performance. 📊

In the spirit of transparency and collaboration, all code and model weights are open-sourced under the Apache 2.0 license. 🤝

1 reply

Locutusque

posted an update about 2 years ago

Post

2659

Exciting news! 🎉 I've created the OpenCerebrum datasets, open-source alternatives to Aether Research's proprietary Cerebrum dataset.

The first, OpenCerebrum SFT, is a text-generation and question-answering dataset with ~1.2M examples, curated from sources like Open-Orca, glaiveai, camel-ai, and more! 📚

The second, OpenCerebrum DPO, is a smaller dataset with ~21k examples, focusing on data point optimization. It's curated from sources like jondurbin, argilla, grimulkan, and others. 📊

Both datasets are licensed under Apache-2.0 and are available in English. They're ready for use in your projects, and I welcome any feedback for future improvements! 🚀

Locutusque/OpenCerebrum-dpo
Locutusque/OpenCerebrum-SFT
Locutusque/OpenCerebrum-1.0-7b-SFT
Locutusque/OpenCerebrum-1.0-7b-DPO

5 replies

Locutusque

posted an update about 2 years ago

Post

🚀 Excited to unveil the Augmented ARC-Challenge Dataset with Chain-of-Thought Reasoning! 🧠✨

📚 Created by enhancing the ARC dataset with AI-generated reasoning from Google's Gemini Pro, this resource aims to improve question answering models' ability to tackle complex science queries.

🔍 Features:
- 1068 training examples
- Detailed reasoning steps for nuanced understanding
- Questions spanning physics, chemistry, biology, & more!

🌟 Ideal for benchmarking QA models, enhancing model interpretability, and studying in-context examples.

🔗 Dive in and help your models learn the art of reasoning!

🔎 Explore more: Locutusque/arc-cot

Locutusque

posted an update about 2 years ago

Post

🚀 Introducing UltraTextbooks v2: The Ultimate Educational NLP Dataset! 📚

I've expanded the dataset to include an even wider range of high-quality textbooks, with a special focus on machine learning, mathematics, and coding. 💻🧮

With over 3 million examples and 6 GB of data, UltraTextbooks v2 is your go-to resource for training advanced language models and developing cutting-edge educational applications. 🎓

Explore the dataset on Hugging Face and unlock the power of AI in education! 🔓

Locutusque/UltraTextbooks-2.0

Locutusque

posted an update about 2 years ago

Post

🚨📢🚀 Introducing Hercules-v2.0! A robust, multifaceted dataset for advanced models to excel in specialized domains. 🔬🌌📚🚀

📈 1.3M examples from sources derived from OpenHermes-2.5, covering Biology, Physics, Math, CS, Instruction Following, Function Calling, and Roleplay.

🔬 Enhance natural language understanding and processing in diverse domains.

🚀 Develop models for complex instructions, function calls, and roleplay scenarios.

📄 Licensed under Apache-2.0.

Thank you to all contributors and OpenHermes-2.5 creator! 🎉

Check it out here: Locutusque/hercules-v2.0

📣 Update: After fine-tuning Mistral 7B on 100,000 examples of Hercules-v2.0, it earns an average score of 62 on Open LLM Leaderboard, outperforming OpenHermes-2.5 and OpenChat-3.5. 🎉

Check out this model here: Locutusque/Hercules-2.0-Mistral-7B

3 replies

Locutusque

posted an update about 2 years ago

Post

Introducing the "UltraTextbooks" dataset 🚀📚
Check it out here: Locutusque/UltraTextbooks
📘 A comprehensive collection of high-quality synthetic and human-written textbooks
👨‍🎓 Spanning various subjects and programming languages
🔧 Designed for advanced NLP tasks like language modeling, educational QA, text summarization, and content generation for edu purposes
🚀 Future expansions planned with additional data sources to enhance the corpus
👇 Data composition highlights 👇
- Blend of synthetic and human-written material
- Includes topics from general edu to specialized areas
- Structured with field "text"
🧩 Data collection from various Hugging Face datasets, guided by a diverse and comprehensive curation rationale
🚧 Limitations may exist, so report any issues you encounter

2 replies

Locutusque

posted an update about 2 years ago

Post

Hello everyone,
This is my first post! I have also decided to release a dataset that I have been keeping private for a while now. I’ve kept it private because I’m not sure if it is actually good or not. I would greatly appreciate it if someone could fine-tune some larger models and evaluate the dataset. Named Hercules-v1.0, it is a turbo-charged version of teknium’s openhermes generated by augmenting its data sources. Learn more in the dataset card: Locutusque/hercules-v1.0

ehartford

posted an update over 2 years ago

Post

fblgit/UNA-dolphin-2.6-mistral-7b-dpo-laser

RE-Introducing, some of the best SFT model, he legend: DOLPHIN. This model is very special, a LASER-UNA model: UNA-dolphin-2.6-mistral-7b-dpo-laser

@fblgit in collaboration with @fernandofernandes and @ehartford

4 replies

qnguyen3

authored a paper over 2 years ago

VinaLLaMA: LLaMA-based Vietnamese Foundation Model

Paper • 2312.11011 • Published Dec 18, 2023 • 23

AI & ML interests

Recent Activity

Team members 5

fne's activity