Phi-3-Medium-128K-Instruct

Phi-3-Medium-128K-Instruct is a 14-billion-parameter instruction-tuned language model developed by Microsoft. It is designed to deliver strong reasoning, structured responses, and long-context comprehension across a wide range of tasks. With support for up to 128K tokens of context, this model is well suited for document analysis, extended conversations, and complex multi-step workflows.

This repository provides access to the instruction-optimized version of Phi-3 Medium with extended context capabilities for research and deployment.

Model Overview

Model Name: Phi-3-Medium-128K-Instruct
Base Model: microsoft/Phi-3-medium-128k-instruct
Architecture: Decoder-only Transformer
Parameter Count: 14 Billion
Context Length: 128K tokens
Modalities: Text
Developer: Microsoft
License: MIT License

Model Variants

Instruction-Tuned Version

Optimized for conversational and task-oriented prompts
Improved adherence to structured instructions
Enhanced reasoning and step-by-step explanation capabilities
Designed for safe and helpful outputs

Quantization Details

Q4_K_M

Approx. ~71% size reduction
Very low memory footprint (~7.98 GB)
Optimized for CPU inference and low-VRAM GPUs
Faster token generation speeds
Minor degradation in complex analytical or long-chain reasoning tasks

Q5_K_M

Approx. ~66% size reduction
Better fidelity to the original FP16 model (9.38 GB)
Improved coherence and reasoning consistency
Recommended when slightly more memory is available

Training Background

Phi-3-Medium-128K-Instruct is built on the Phi-3 architecture and trained to balance efficiency and performance. Pretrained on carefully curated data with emphasis on reasoning, coding, and structured problem-solving, followed by instruction tuning to improve prompt adherence, response clarity, and real-world usability.

Pretraining

Trained on a curated mixture of high-quality text sources
Focused on reasoning, coding, mathematics, and general knowledge
Optimized using autoregressive next-token prediction

Instruction Tuning

Fine-tuned on instruction-following datasets
Enhanced performance on question answering, summarization, coding, and dialogue
Improved alignment for clarity, relevance, and structured outputs

Key Capabilities

Long-context understanding
Supports up to 128K tokens, enabling large document processing and persistent conversational memory.
Advanced reasoning
Performs multi-step logical reasoning, analytical problem solving, and structured explanations.
Code generation and analysis
Capable of generating, debugging, and explaining code across multiple programming languages.
Conversational AI
Maintains coherence across extended multi-turn interactions.
Document summarization and extraction
Handles long reports, contracts, and research papers effectively.

Usage Example

llama.cpp

./llama-cli \
  -m SandlogicTechnologies\Phi-3-mini-4k-instruct_Q4_K_M.gguf \
  -p "Explain transformers in simple terms."

Recommended Applications

Enterprise document processing :- Analyze long-form documents, compliance materials, and technical manuals.
Research and experimentation :- Evaluate reasoning performance and long-context capabilities.
Code assistance tools :- Integrate into development environments for coding support.
Conversational AI systems :- Deploy in chatbots requiring extended context memory.
Educational tools :- Generate structured explanations and step-by-step solutions.

Deployment Considerations

Requires sufficient GPU memory for optimal performance
Mixed precision (FP16/BF16) recommended for efficiency
Suitable for distributed or high-memory inference setups
Ensure adherence to the MIT license terms when deploying commercially

Acknowledgments

These quantized models are based on the original work by Microsoft development team.

Special thanks to:

The Microsoft team for developing and releasing the Phi-3-mini-4k-instruct model.
Georgi Gerganov and the entire llama.cpp open-source community for enabling efficient model quantization and inference via the GGUF format.

Contact

For any inquiries or support, please contact us at support@sandlogic.com or visit our Website.

Downloads last month: 11

GGUF

Model size

14B params

Architecture

phi3

Hardware compatibility

4-bit

5-bit

Model tree for SandLogicTechnologies/Phi-3-medium-128k-instruct-GGUF

Base model

microsoft/Phi-3-medium-128k-instruct

Quantized

(75)

this model