Phi-3-Medium-128K-Instruct

Phi-3-Medium-128K-Instruct is a 14-billion-parameter instruction-tuned language model developed by Microsoft. It is designed to deliver strong reasoning, structured responses, and long-context comprehension across a wide range of tasks. With support for up to 128K tokens of context, this model is well suited for document analysis, extended conversations, and complex multi-step workflows.

This repository provides access to the instruction-optimized version of Phi-3 Medium with extended context capabilities for research and deployment.


Model Overview

  • Model Name: Phi-3-Medium-128K-Instruct
  • Base Model: microsoft/Phi-3-medium-128k-instruct
  • Architecture: Decoder-only Transformer
  • Parameter Count: 14 Billion
  • Context Length: 128K tokens
  • Modalities: Text
  • Developer: Microsoft
  • License: MIT License

Model Variants

Instruction-Tuned Version

  • Optimized for conversational and task-oriented prompts
  • Improved adherence to structured instructions
  • Enhanced reasoning and step-by-step explanation capabilities
  • Designed for safe and helpful outputs

Quantization Details

Q4_K_M

  • Approx. ~71% size reduction
  • Very low memory footprint (~7.98 GB)
  • Optimized for CPU inference and low-VRAM GPUs
  • Faster token generation speeds
  • Minor degradation in complex analytical or long-chain reasoning tasks

Q5_K_M

  • Approx. ~66% size reduction
  • Better fidelity to the original FP16 model (9.38 GB)
  • Improved coherence and reasoning consistency
  • Recommended when slightly more memory is available

Training Background

Phi-3-Medium-128K-Instruct is built on the Phi-3 architecture and trained to balance efficiency and performance. Pretrained on carefully curated data with emphasis on reasoning, coding, and structured problem-solving, followed by instruction tuning to improve prompt adherence, response clarity, and real-world usability.

Pretraining

  • Trained on a curated mixture of high-quality text sources
  • Focused on reasoning, coding, mathematics, and general knowledge
  • Optimized using autoregressive next-token prediction

Instruction Tuning

  • Fine-tuned on instruction-following datasets
  • Enhanced performance on question answering, summarization, coding, and dialogue
  • Improved alignment for clarity, relevance, and structured outputs

Key Capabilities

  • Long-context understanding
    Supports up to 128K tokens, enabling large document processing and persistent conversational memory.

  • Advanced reasoning
    Performs multi-step logical reasoning, analytical problem solving, and structured explanations.

  • Code generation and analysis
    Capable of generating, debugging, and explaining code across multiple programming languages.

  • Conversational AI
    Maintains coherence across extended multi-turn interactions.

  • Document summarization and extraction
    Handles long reports, contracts, and research papers effectively.


Usage Example

llama.cpp

./llama-cli \
  -m SandlogicTechnologies\Phi-3-mini-4k-instruct_Q4_K_M.gguf \
  -p "Explain transformers in simple terms."

Recommended Applications

  • Enterprise document processing :- Analyze long-form documents, compliance materials, and technical manuals.

  • Research and experimentation :- Evaluate reasoning performance and long-context capabilities.

  • Code assistance tools :- Integrate into development environments for coding support.

  • Conversational AI systems :- Deploy in chatbots requiring extended context memory.

  • Educational tools :- Generate structured explanations and step-by-step solutions.


Deployment Considerations

  • Requires sufficient GPU memory for optimal performance
  • Mixed precision (FP16/BF16) recommended for efficiency
  • Suitable for distributed or high-memory inference setups
  • Ensure adherence to the MIT license terms when deploying commercially

Acknowledgments

These quantized models are based on the original work by Microsoft development team.

Special thanks to:

  • The Microsoft team for developing and releasing the Phi-3-mini-4k-instruct model.

  • Georgi Gerganov and the entire llama.cpp open-source community for enabling efficient model quantization and inference via the GGUF format.


Contact

For any inquiries or support, please contact us at support@sandlogic.com or visit our Website.

Downloads last month
11
GGUF
Model size
14B params
Architecture
phi3
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for SandLogicTechnologies/Phi-3-medium-128k-instruct-GGUF

Quantized
(75)
this model