Phi-3-Medium-128K-Instruct
Phi-3-Medium-128K-Instruct is a 14-billion-parameter instruction-tuned language model developed by Microsoft. It is designed to deliver strong reasoning, structured responses, and long-context comprehension across a wide range of tasks. With support for up to 128K tokens of context, this model is well suited for document analysis, extended conversations, and complex multi-step workflows.
This repository provides access to the instruction-optimized version of Phi-3 Medium with extended context capabilities for research and deployment.
Model Overview
- Model Name: Phi-3-Medium-128K-Instruct
- Base Model: microsoft/Phi-3-medium-128k-instruct
- Architecture: Decoder-only Transformer
- Parameter Count: 14 Billion
- Context Length: 128K tokens
- Modalities: Text
- Developer: Microsoft
- License: MIT License
Model Variants
Instruction-Tuned Version
- Optimized for conversational and task-oriented prompts
- Improved adherence to structured instructions
- Enhanced reasoning and step-by-step explanation capabilities
- Designed for safe and helpful outputs
Quantization Details
Q4_K_M
- Approx. ~71% size reduction
- Very low memory footprint (~7.98 GB)
- Optimized for CPU inference and low-VRAM GPUs
- Faster token generation speeds
- Minor degradation in complex analytical or long-chain reasoning tasks
Q5_K_M
- Approx. ~66% size reduction
- Better fidelity to the original FP16 model (9.38 GB)
- Improved coherence and reasoning consistency
- Recommended when slightly more memory is available
Training Background
Phi-3-Medium-128K-Instruct is built on the Phi-3 architecture and trained to balance efficiency and performance. Pretrained on carefully curated data with emphasis on reasoning, coding, and structured problem-solving, followed by instruction tuning to improve prompt adherence, response clarity, and real-world usability.
Pretraining
- Trained on a curated mixture of high-quality text sources
- Focused on reasoning, coding, mathematics, and general knowledge
- Optimized using autoregressive next-token prediction
Instruction Tuning
- Fine-tuned on instruction-following datasets
- Enhanced performance on question answering, summarization, coding, and dialogue
- Improved alignment for clarity, relevance, and structured outputs
Key Capabilities
Long-context understanding
Supports up to 128K tokens, enabling large document processing and persistent conversational memory.Advanced reasoning
Performs multi-step logical reasoning, analytical problem solving, and structured explanations.Code generation and analysis
Capable of generating, debugging, and explaining code across multiple programming languages.Conversational AI
Maintains coherence across extended multi-turn interactions.Document summarization and extraction
Handles long reports, contracts, and research papers effectively.
Usage Example
llama.cpp
./llama-cli \
-m SandlogicTechnologies\Phi-3-mini-4k-instruct_Q4_K_M.gguf \
-p "Explain transformers in simple terms."
Recommended Applications
Enterprise document processing :- Analyze long-form documents, compliance materials, and technical manuals.
Research and experimentation :- Evaluate reasoning performance and long-context capabilities.
Code assistance tools :- Integrate into development environments for coding support.
Conversational AI systems :- Deploy in chatbots requiring extended context memory.
Educational tools :- Generate structured explanations and step-by-step solutions.
Deployment Considerations
- Requires sufficient GPU memory for optimal performance
- Mixed precision (FP16/BF16) recommended for efficiency
- Suitable for distributed or high-memory inference setups
- Ensure adherence to the MIT license terms when deploying commercially
Acknowledgments
These quantized models are based on the original work by Microsoft development team.
Special thanks to:
The Microsoft team for developing and releasing the Phi-3-mini-4k-instruct model.
Georgi Gerganov and the entire
llama.cppopen-source community for enabling efficient model quantization and inference via the GGUF format.
Contact
For any inquiries or support, please contact us at support@sandlogic.com or visit our Website.
- Downloads last month
- 11
4-bit
5-bit
Model tree for SandLogicTechnologies/Phi-3-medium-128k-instruct-GGUF
Base model
microsoft/Phi-3-medium-128k-instruct