| --- |
| license: mit |
| language: |
| - en |
| library_name: transformers |
| tags: |
| - art |
| - medical |
| - biology |
| - code |
| - chemistry |
| metrics: |
| - code_eval |
| - chrf |
| - charcut_mt |
| - cer |
| - brier_score |
| - bleurt |
| - bertscore |
| - accuracy |
| pipeline_tag: image-text-to-text |
| --- |
| |
| # MULTI-MODAL-MODEL |
| ## LeroyDyer/Mixtral_AI_Vision-Instruct_X |
| |
| |
| |
| |
| currently in test mode |
| |
| |
| # Vision/multimodal capabilities: |
| |
| If you want to use vision functionality: |
| |
| * You must use the latest versions of [Koboldcpp](https://github.com/LostRuins/koboldcpp). |
| |
| To use the multimodal capabilities of this model and use **vision** you need to load the specified **mmproj** file, this can be found inside this model repo. ([LeroyDyer/Mixtral_AI_Vision-Instruct_X](https://huggingface.co/LeroyDyer/Mixtral_AI_Vision-Instruct_X)) |
| |
| * You can load the **mmproj** by using the corresponding section in the interface: |
| |
|  |
| |
| ## Vision/multimodal capabilities: |
| |
| * For loading 4-bit use 4-bit mmproj file.- mmproj-Mixtral_AI_Vision-Instruct_X-Q4_0 |
| |
| * For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-Q8_0 |
| |
| * For loading 8-bit use 8 bit mmproj file - mmproj-Mixtral_AI_Vision-Instruct_X-f16 |
|
|
|
|
|
|
| ## Extended capabilities: |
|
|
| ``` |
| * mistralai/Mistral-7B-Instruct-v0.1 - Prime-Base |
| |
| * ChaoticNeutrals/Eris-LelantaclesV2-7b - role play |
| |
| * ChaoticNeutrals/Eris_PrimeV3-Vision-7B - vision |
| |
| * rvv-karma/BASH-Coder-Mistral-7B - coding |
| |
| * Locutusque/Hercules-3.1-Mistral-7B - Unhinging |
| |
| * KoboldAI/Mistral-7B-Erebus-v3 - NSFW |
| |
| * Locutusque/Hyperion-2.1-Mistral-7B - CHAT |
| |
| * Severian/Nexus-IKM-Mistral-7B-Pytorch - Thinking |
| |
| * NousResearch/Hermes-2-Pro-Mistral-7B - Generalizing |
| |
| * mistralai/Mistral-7B-Instruct-v0.2 - BASE |
| |
| * Nitral-AI/ProdigyXBioMistral_7B - medical |
| |
| * Nitral-AI/Infinite-Mika-7b - 128k - Context Expansion enforcement |
| |
| * Nous-Yarn-Mistral-7b-128k - 128k - Context Expansion |
| |
| * yanismiraoui/Yarn-Mistral-7b-128k-sharded |
| |
| * ChaoticNeutrals/Eris_Prime-V2-7B - Roleplay |
| |
| ``` |
|
|
| # "image-text-text" |
|
|
|
|
| ## using transformers |
|
|
| ``` python |
| from transformers import AutoProcessor, LlavaForConditionalGeneration |
| from transformers import BitsAndBytesConfig |
| import torch |
| |
| quantization_config = BitsAndBytesConfig( |
| load_in_4bit=True, |
| bnb_4bit_compute_dtype=torch.float16 |
| ) |
| |
| |
| model_id = "LeroyDyer/Mixtral_AI_Vision-Instruct_X" |
| |
| processor = AutoProcessor.from_pretrained(model_id) |
| model = LlavaForConditionalGeneration.from_pretrained(model_id, quantization_config=quantization_config, device_map="auto") |
| |
| |
| import requests |
| from PIL import Image |
| |
| image1 = Image.open(requests.get("https://llava-vl.github.io/static/images/view.jpg", stream=True).raw) |
| image2 = Image.open(requests.get("http://images.cocodataset.org/val2017/000000039769.jpg", stream=True).raw) |
| display(image1) |
| display(image2) |
| |
| prompts = [ |
| "USER: <image>\nWhat are the things I should be cautious about when I visit this place? What should I bring with me?\nASSISTANT:", |
| "USER: <image>\nPlease describe this image\nASSISTANT:", |
| ] |
| |
| inputs = processor(prompts, images=[image1, image2], padding=True, return_tensors="pt").to("cuda") |
| for k,v in inputs.items(): |
| print(k,v.shape) |
| |
| ``` |
|
|
| ## Using pipeline |
|
|
| ``` python |
| |
| from transformers import pipeline |
| from PIL import Image |
| import requests |
| |
| model_id = LeroyDyer/Mixtral_AI_Vision-Instruct_X |
| pipe = pipeline("image-to-text", model=model_id) |
| url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/transformers/tasks/ai2d-demo.jpg" |
| |
| image = Image.open(requests.get(url, stream=True).raw) |
| question = "What does the label 15 represent? (1) lava (2) core (3) tunnel (4) ash cloud" |
| prompt = f"A chat between a curious human and an artificial intelligence assistant. |
| The assistant gives helpful, detailed, and polite answers to the human's questions.###Human: <image>\n{question}###Assistant:" |
| |
| outputs = pipe(image, prompt=prompt, generate_kwargs={"max_new_tokens": 200}) |
| print(outputs) |
| ``` |
|
|
|
|
|
|
| |
|
|
|
|
| ## Mistral ChatTemplating |
| Instruction format |
| In order to leverage instruction fine-tuning, |
| your prompt should be surrounded by [INST] and [/INST] tokens. |
| The very first instruction should begin with a begin of sentence id. The next instructions should not. |
| The assistant generation will be ended by the end-of-sentence token id. |
|
|
|
|
|
|
| ```python |
| from transformers import AutoTokenizer |
| tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X") |
| |
| chat = [ |
| {"role": "user", "content": "Hello, how are you?"}, |
| {"role": "assistant", "content": "I'm doing great. How can I help you today?"}, |
| {"role": "user", "content": "I'd like to show off how chat templating works!"}, |
| ] |
| |
| tokenizer.apply_chat_template(chat, tokenize=False) |
| |
| ``` |
|
|
| # TextToText |
|
|
| ``` python |
| from transformers import AutoModelForCausalLM, AutoTokenizer |
| |
| device = "cuda" # the device to load the model onto |
| |
| model = AutoModelForCausalLM.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X") |
| tokenizer = AutoTokenizer.from_pretrained("LeroyDyer/Mixtral_AI_Vision-Instruct_X") |
| |
| messages = [ |
| {"role": "user", "content": "What is your favourite condiment?"}, |
| {"role": "assistant", "content": "Well, I'm quite partial to a good squeeze of fresh lemon juice. It adds just the right amount of zesty flavour to whatever I'm cooking up in the kitchen!"}, |
| {"role": "user", "content": "Do you have mayonnaise recipes?"} |
| ] |
| |
| encodeds = tokenizer.apply_chat_template(messages, return_tensors="pt") |
| |
| model_inputs = encodeds.to(device) |
| model.to(device) |
| |
| generated_ids = model.generate(model_inputs, max_new_tokens=1000, do_sample=True) |
| decoded = tokenizer.batch_decode(generated_ids) |
| print(decoded[0]) |
| ``` |
|
|