--- library_name: pruna-engine thumbnail: https://assets-global.website-files.com/646b351987a8d8ce158d1940/64ec9e96b4334c0e1ac41504_Logo%20with%20white%20text.svg metrics: - memory_disk - memory_inference - inference_latency - inference_throughput - inference_CO2_emissions - inference_energy_consumption ---
PrunaAI
[![GitHub](https://img.shields.io/badge/GitHub-PrunaAI-9334E9?style=plastic&logo=github&logoColor=white)](https://github.com/PrunaAI/pruna)   [![Twitter/X](https://img.shields.io/badge/Twitter%2FX-@PrunaAI-9334E9?style=plastic&logo=x&logoColor=white)](https://twitter.com/PrunaAI)   [![LinkedIn](https://img.shields.io/badge/LinkedIn-PrunaAI-9334E9?style=plastic&logo=linkedin&logoColor=white)](https://www.linkedin.com/company/pruna-ai)   [![Discord](https://img.shields.io/badge/Discord-Join%20us-9334E9?style=plastic&logo=discord&logoColor=white)](https://discord.com/invite/JFQmtFKCjd)   [![P-Models](https://img.shields.io/badge/Performance%20Models-Try%20them%20now-9334E9?style=plastic&logo=data:image/svg%2bxml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHdpZHRoPSI0MTEiIGhlaWdodD0iNDc3IiBmaWxsPSJub25lIj48cGF0aCBmaWxsPSIjZmZmIiBkPSJNMjMuNzAzIDMwMC41MzJjLTEuODQyLS42NzQtNC4xNDItMS4wNTctNS43MDMtMi4wMjktMTIuNTU4LTcuODQ2LTguODM3LTI1LjE3MiA1LjI1NS0yOC4zNjFhLjY2LjY2IDAgMCAwIC4zNDgtLjIwMS42LjYgMCAwIDAgLjE1Ni0uMzZxNi40OS04MS45ODYgNjguNTc3LTEzNC42MjlhLjM0LjM0IDAgMCAwIC4wOTItLjEyNS4zMy4zMyAwIDAgMC0uMDI2LS4zLjMzLjMzIDAgMCAwLS4xMTItLjEwOGMtNC4xMTUtMi40OTctOC42NC00Ljc1LTEyLjM1My03LjQ0M2ExOTAgMTkwIDAgMCAxLTE4Ljc5NC0xNS41MjJjLTIuOTM2LTIuNzY4LTMuNzAzLTUuODkxLjU4LTcuNzdhLjUuNSAwIDAgMCAuMjgtLjYwOEM1MyA3NS45NSA0OC40MTcgNDguNTk5IDQ1LjUxOCAxOS42NXEtLjM5My0zLjg1Mi0uMjcxLTYuMzJhLjc1Ljc1IDAgMCAxIC4xMzQtLjQuNy43IDAgMCAxIC4zMjQtLjI1NXEyLjEwNC0uODQyIDUuODQ0LTEuNDMxYzU4LjExNC05LjEzNiAxMjcuMDA5IDguNDI1IDE0NC4zMDcgNzIuNjQ1cS4xMDMuMzY0LjE3OCAwIDUuMDk2LTI0LjA5NyA0LjYzOC00OC4xOTMtLjA0Ny0yLjc4NyAxLjg3OS00Ljc4OCAxMC40MDctMTAuOCAyNC43OTgtOC42ODdjNy40NzEgMS4wOTUgNy43NDIgNC42MzggNy4yMDkgMTEuMWEzMzQgMzM0IDAgMCAxLTExLjg4NCA2NC43MDYuMzg4LjM4OCAwIDAgMCAuMTgyLjQ0NC40LjQgMCAwIDAgLjE2NC4wNTFjOTIuNDIxIDguNDUzIDE1OC44NzUgNzkuNDg5IDE2NS4wODQgMTcxLjI3NGExIDEgMCAwIDAgLjEyOS40MjQuODMuODMgMCAwIDAgLjI5MS4yOTZjMy43NSAyLjE3IDYuMiAyLjIxNiA4Ljg1NSA2LjU5MiA1LjA2OCA4LjMzMiAxLjQ1IDIwLjQzMS04LjU5MyAyMi40N2ExLjcxIDEuNzEgMCAwIDAtMS4zNjUgMS40NjhxLTkuMTczIDc1LjM4My02NC43NjIgMTIzLjM3MWMtMjguNjIyIDI0LjcwNC02NS43NDQgMzkuNzQ5LTEwMy40NjQgNDIuMzMtMTAwLjM1OSA2Ljg2My0xODQuNDg2LTY1LjYyMi0xOTUuMDA1LTE2NS41N2EuOC44IDAgMCAwLS4xNTQtLjM5NS43NC43NCAwIDAgMC0uMzMzLS4yNU0xNTEuMzM4IDk3LjY0M2wxOS4wNjYgMTkuMzc1YS41LjUgMCAwIDAgLjE1LjEuNS41IDAgMCAwIC4xNzYuMDMxLjQzLjQzIDAgMCAwIC4zMTktLjE1cTEwLjg0Ny0xMi40NjUgNy40NDMtMjguMTU1Yy00LjUzNS0yMC44Ni0xOC41OTgtMzcuNjY0LTM3LjI3MS00OC4zODktMjEuMDY3LTEyLjY4OS00Ny44NjUtMTUuOTA1LTcyLjI4LTE0Ljg0cS0uNjE3LjAzLS4xNC40MjJsNjMuNDcyIDUzLjMyNmEuNC40IDAgMCAwIC4xNDQuMDcyLjM0LjM0IDAgMCAwIC4zMDEtLjA2OC4zLjMgMCAwIDAgLjA4OC0uMTI2IDQ3IDQ3IDAgMCAwIDMuMjgyLTEzLjA4MXEuMDIzLS4yMzMuMTI0LS40MzN0LjI2OC0uMzM0YzIuOTU1LTIuNDc4IDQuMTE1LjQ1OCA0LjY5NCAyLjg1MnEyLjMyIDkuNTM3LjE4NyAxOS4yNDRhMS40OCAxLjQ4IDAgMCAwIC40NCAxLjQwM3ptLTYuNTI2IDkuMjk1LTkuMDI0LTEwLjI3NmEuNy43IDAgMCAwLS4zNDMtLjIwNy44Ni44NiAwIDAgMC0uNDMzIDBxLTExLjY4IDMuMDIxLTIzLjExNC0uODIyYy01LjgxNi0zLjg1Mi0uNzU4LTUuNTE3IDIuNTktNS44MTZxNS45MTgtLjUzMyAxMS43MzUtMi41MDZhLjMuMyAwIDAgMCAuMTI3LS4wOC4zLjMgMCAwIDAgLjA3My0uMTMzLjM0LjM0IDAgMCAwIC4wMDEtLjE1NC4zNC4zNCAwIDAgMC0uMDctLjEzOGMtMTkuNDU5LTIxLjE1LTQwLjc2LTQwLjI0NS02Mi4yNzUtNTkuMjY0cS0uMzU2LS4zMTgtLjMxOC4xNWEzMDggMzA4IDAgMCAwIDguMzg3IDUxLjgwMmM0LjU5MiAxOC4zMDggMTMuNzU1IDM0LjE1NyAzMS44NDggNDEuODcycTI2LjI0NyAxMS4xODIgNTMuNzI5IDIuODMzYS41ODMuNTgzIDAgMCAwIC4yOTktLjg4OCAyMzUgMjM1IDAgMCAwLTEzLjIxMi0xNi4zNzNtNDUuMTI1IDQ5Ljc0NXEtMTIuNTk1IDEuMzE4LTIzLjQ0MS01LjU4M2MtMy4wNzctNC44OTkgMi4wNjYtOC41OTMgNS40NDItMTEuMTgzYS41ODQuNTg0IDAgMCAwIC4xODctLjcwMSA4LjQ2IDguNDYgMCAwIDAtMy40ODgtMy44NjIuODEuODEgMCAwIDAtLjgyMy4wMDljLTE2LjY5MSA5Ljk3Ny0zNC40ODUgMTAuNzgyLTUzLjE4NiA2Ljk2N3EtMS41Ny0uMzItMi44NTIuNjQ1LTYxLjQ1MSA0Ni4zNy02OS42NyAxMjMuOTEzLS42ODMgNi40MzMtMS44MDYgOC44MjdjLTIuOTQ1IDYuMzAyLTcuNzcgNy44NDUtMTQuMjIyIDkuNDcycS0uNDQ5LjExMyAwIC4yMzRjOS4yNzYgMi4zMSAxNC42OCA2LjE0MyAxNi4wNzQgMTYuNDAxIDguNDYyIDYyLjQ5IDQ4Ljk0IDExNy44NjQgMTEwLjM4MyAxMzguMzQxYTEwNCAxMDQgMCAwIDAgMTEuMDA2IDMuNTA3YzkyLjY0NSAyMy45MDkgMTgxLjIzMi0zMy42NjIgMjAyLjgwNC0xMjUuMDU0cTEuNjA5LTYuNzkgMy40NS0xOS4wODVjMS4xMDQtNy4zOTYgNS42OTUtMTIuMTE4IDEzLjE0Ny0xMy41NjguNDEyLS4wODQgMS45MDgtLjc1Ny4yODEtLjk5MS04LjA1MS0xLjE0MS0xMi4yNC02LjM4Ni0xMi45ODgtMTQuMTI4cS0xLjA3NS0xMS4yNS0yLjUxNi0xOS4wMzhjLTExLjc3Mi02My44OTItNTYuMzgzLTExNC41NzItMTE5LjI4NS0xMzEuNDZxLTE1LjEyLTMuODgtMzAuOTg3LTUuMjQ1YS4zODYuMzg2IDAgMCAwLS40MDIuMjUyIDE2OC42IDE2OC42IDAgMCAxLTExLjM1MiAyNC44NTRjLTQuMTMzIDcuNDA2LTguMTYzIDE1LjQ5NC0xNS43NTYgMTYuNDc2Ii8+PGNpcmNsZSBjeD0iMTQwLjA2NSIgY3k9IjI3My4wMyIgcj0iNTAuNjE4IiBzdHJva2U9IiNmZmYiIHN0cm9rZS13aWR0aD0iOS4yMDMiLz48cGF0aCBmaWxsPSIjZmZmIiBkPSJNMTQ2Ljk2NyAyNDIuMDkyYzE1LjY3MiAwIDI4LjM3NiAxMy45NjMgMjguMzc2IDMxLjE4OHMtMTIuNzA0IDMxLjE4OS0yOC4zNzYgMzEuMTg5LTI4LjM3Ny0xMy45NjQtMjguMzc3LTMxLjE4OWMwLTIuNjI1LjI5Ny01LjE3NC44NTItNy42MSAxLjYwNyAzLjcyNyA1LjMxMyA2LjMzOCA5LjYyOSA2LjMzOCA1Ljc4OSAwIDEwLjQ4MS00LjY5MyAxMC40ODItMTAuNDgycy00LjY5My0xMC40ODEtMTAuNDgyLTEwLjQ4MWMtLjc2MSAwLTEuNTA0LjA4My0yLjIxOS4yMzcgNS4xMzktNS42NzYgMTIuMjUzLTkuMTkgMjAuMTE1LTkuMTkiLz48Y2lyY2xlIGN4PSIyNjkuOTMyIiBjeT0iMjczLjAzIiByPSI1MC42MTgiIHN0cm9rZT0iI2ZmZiIgc3Ryb2tlLXdpZHRoPSI5LjIwMyIvPjxwYXRoIGZpbGw9IiNmZmYiIGQ9Ik0yNzYuODM0IDI0Mi4wOTJjMTUuNjcyIDAgMjguMzc2IDEzLjk2MyAyOC4zNzYgMzEuMTg4cy0xMi43MDQgMzEuMTg5LTI4LjM3NiAzMS4xODktMjguMzc3LTEzLjk2NC0yOC4zNzctMzEuMTg5YzAtMi42MjUuMjk3LTUuMTc0Ljg1My03LjYxIDEuNjA2IDMuNzI3IDUuMzEyIDYuMzM4IDkuNjI4IDYuMzM4IDUuNzg5IDAgMTAuNDgyLTQuNjkzIDEwLjQ4Mi0xMC40ODJzLTQuNjkzLTEwLjQ4MS0xMC40ODItMTAuNDgxYy0uNzYxIDAtMS41MDQuMDgzLTIuMjE5LjIzNyA1LjEzOS01LjY3NiAxMi4yNTQtOS4xOSAyMC4xMTUtOS4xOSIvPjxwYXRoIHN0cm9rZT0iI2ZmZiIgc3Ryb2tlLXdpZHRoPSI5LjIwMyIgZD0iTTE3Ny4xMzUgMzQ1LjMyOGMuODk5LS4yNCAyLjgwMS0uMTk3IDYuMzA0LjQ2NCA2LjMxMSAxLjE5IDE2LjU4NSA0LjIyMSAyNi4xMjIgNC4yMjEgOS40NTIgMCAxNy4xMTgtMS45MjYgMjEuMzgzLTIuNjAzLjc3LS4xMjMgMS4zODUtLjE5OSAxLjg3OC0uMjM3LS4zNCAxMy44NDMtMTIuODkzIDI3LjcxLTI3Ljg2OCAyNy43MS0xNS4xNzIgMC0yNy44NzYtMTMuODM1LTI3Ljg3Ni0yOC44MTEgMC0uMzU3LjAyOS0uNTk2LjA1Ny0uNzQ0WiIvPjwvc3ZnPg==)](https://dashboard.pruna.ai/login?utm_source=huggingface&utm_medium=org_card&utm_campaign=hf_traffic) # Simply make AI models cheaper, smaller, faster, and greener! - Give a thumbs up if you like this model! - Contact us and tell us which model to compress next [here](https://www.pruna.ai/contact). - Request access to easily compress your *own* AI models [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai). - Read the documentations to know more [here](https://pruna-ai-pruna.readthedocs-hosted.com/en/latest/) - Join Pruna AI community on Discord [here](https://discord.com/invite/vb6SmA3hxu) to share feedback/suggestions or get help. **Frequently Asked Questions** - ***How does the compression work?*** The model is compressed by using bitsandbytes. - ***How does the model quality change?*** The quality of the model output will slightly degrade. - ***What is the model format?*** We the standard safetensors format. - ***How to compress my own models?*** You can request premium access to more compression methods and tech support for your specific use-cases [here](https://z0halsaff74.typeform.com/pruna-access?typeform-source=www.pruna.ai). ## Usage ## Quickstart Guide Getting started with DBRX models is easy with the `transformers` library. The model requires ~264GB of RAM and the following packages: ```bash pip install "torch==2.4.0" "transformers>=4.39.2" "tiktoken>=0.6.0" "bitsandbytes" ``` If you'd like to speed up download time, you can use the `hf_transfer` package as described by Huggingface [here](https://huggingface.co/docs/huggingface_hub/en/guides/download#faster-downloads). ```bash pip install hf_transfer export HF_HUB_ENABLE_HF_TRANSFER=1 ``` You will need to request access to this repository to download the model. Once this is granted, [obtain an access token](https://huggingface.co/docs/hub/en/security-tokens) with `read` permission, and supply the token below. ### Run the model on multiple GPUs: ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch tokenizer = AutoTokenizer.from_pretrained("PrunaAI/dbrx-instruct-bnb-4bit", trust_remote_code=True, token="hf_YOUR_TOKEN") model = AutoModelForCausalLM.from_pretrained("PrunaAI/dbrx-instruct-bnb-4bit", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True, token="hf_YOUR_TOKEN") input_text = "What does it take to build a great LLM?" messages = [{"role": "user", "content": input_text}] input_ids = tokenizer.apply_chat_template(messages, return_dict=True, tokenize=True, add_generation_prompt=True, return_tensors="pt").to("cuda") outputs = model.generate(**input_ids, max_new_tokens=200) print(tokenizer.decode(outputs[0])) ``` ## Credits & License The license of the smashed model follows the license of the original model. Please check the license of the original model databricks/dbrx-instruct before using this model which provided the base model. The license of the `pruna-engine` is [here](https://pypi.org/project/pruna-engine/) on Pypi. ## Want to compress other models? - Compress your own models with [Pruna](https://github.com/PrunaAI/pruna) and give us a ⭐️ to bring you many more algos! - Read the documentation to know more [here](https://docs.pruna.ai/) - Stay up to date with the latest AI efficiency research on our [blog](https://www.pruna.ai/blog/), explore our [materials collection](https://github.com/PrunaAI/awesome-ai-efficiency), or dive into our [courses](https://github.com/PrunaAI/courses). ## ✨ Test our endpoints Want to use our optimized models right away? Try them via our API for fast, easy access to Pruna-powered inference. Try our models