| --- |
| |
| {} |
| --- |
| |
| This model card provides details on a fine-tuned version of Savianto/qlora-mistral, a language model trained using the QLoRA technique on conversational data for enhanced text generation, particularly in question-answering and conversational tasks. |
| ## Model Details |
|
|
| ### Model Description |
|
|
| Model Description |
| This is a fine-tuned version of the Savianto/qlora-mistral model using the QLoRA technique. The fine-tuning was done to improve the model’s ability to generate coherent and context-aware responses in conversational and question-answering tasks. QLoRA allows for efficient fine-tuning of large models while optimizing for memory usage. |
|
|
| Developed by: Yash Sawant |
| Model type: Causal Language Model (AutoModelForCausalLM) |
| Language(s) (NLP): English |
| License: [Specify License Type Here] |
| Finetuned from model: Savianto/qlora-mistral |
|
|
|
|
|
|
| - **Developed by:** Yash Sawant |
| - **Model type:** Causal Language Model (AutoModelForCausalLM) |
| - **Language(s) (NLP):** English |
| - **Finetuned from model [optional]:** teknium/OpenHermes-2-Mistral-7B |
|
|
|
|
| ## Uses |
|
|
| This model can be directly used for: |
|
|
| Question answering |
| Conversational agents (chatbots) |
| Text generation tasks (summarization, text completion) |
|
|
| ### Direct Use |
|
|
| This model can be fine-tuned further for specific tasks such as: |
|
|
| Domain-specific question answering |
| Custom chatbot agents |
| Document summarization |
|
|
|
|
|
|
| Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. |
|
|
| ## How to Get Started with the Model |
|
|
| from transformers import AutoModelForCausalLM, AutoTokenizer |
|
|
| # Load the fine-tuned model and tokenizer |
| model = AutoModelForCausalLM.from_pretrained("Savianto/qlora-mistral-finetuned") |
| tokenizer = AutoTokenizer.from_pretrained("Savianto/qlora-mistral-finetuned") |
|
|
| # Example prompt |
| prompt = "What is the capital of France?" |
|
|
| # Tokenize and generate output |
| input_ids = tokenizer(prompt, return_tensors="pt").input_ids |
| output = model.generate(input_ids, max_length=50) |
| |
| # Decode the response |
| response = tokenizer.decode(output[0], skip_special_tokens=True) |
| print(response) |
| |
| [More Information Needed] |
| |
| ## Training Details |
| Training Details |
| Training Data |
| The model was fine-tuned using a conversational dataset, focusing on question-answer pairs and dialogue examples. This enhances the model's ability to generate contextually relevant and coherent responses. |
| |
| Training Procedure |
| Hardware: GPU (NVIDIA A100, 40GB) |
| Training Time: 5 epochs with early stopping |
| Optimizer: AdamW |
| Learning Rate: 2e-5 |
| Batch Size: 16 |
| Training regime: Mixed Precision (fp16) |
| Preprocessing |
| Tokenized the input text with padding and truncation for consistent input lengths. |
| Speeds, Sizes, Times |
| Training Time: ~3 hours |
| Model Size: ~7B parameters (Base Model: Savianto/qlora-mistral) |
| Evaluation |
| Testing Data |
| The model was evaluated on a validation split of the fine-tuning dataset, with question-answer pairs and conversational exchanges. |
| |
| Metrics |
| Perplexity: Evaluated using standard perplexity for text generation models. |
| Coherence: Human-evaluated coherence in generated responses. |
| Results |
| The model exhibited low perplexity scores on the validation set and performed well in conversational coherence during testing. |
| |
| Summary |
| The model is well-suited for question-answering tasks, conversational agents, and general text generation tasks but may require additional tuning for domain-specific applications. |
| |
| Model Examination |
| No further interpretability analysis was conducted on this model. |
| |
| Environmental Impact |
| Carbon emissions for this model can be estimated using the Machine Learning Impact calculator based on the following parameters: |
| |
| Hardware Type: NVIDIA A100 |
| Training Hours: ~3 hours |
| Cloud Provider: Google Cloud |
| Compute Region: US-Central |
| Carbon Emitted: 0.98 kg CO2eq (estimated) |
| Technical Specifications |
| Model Architecture and Objective |
| This model is based on Mistral architecture, with the objective of generating coherent and contextually aware responses in conversation and question-answering tasks. |
| |
| Compute Infrastructure |
| Hardware |
| NVIDIA A100 40GB GPU |
| Software |
| Python 3.8 |
| Transformers (Hugging Face) v4.x |
| PyTorch 1.10+ |
| Accelerate |
| |
| |
| |
| |
| |