AI Model Evaluation Report: MiroThinker-1.7-Mini (GGUF/Ollama)

#1
by phanthai12 - opened

AI Model Evaluation Report: MiroThinker-1.7-Mini (GGUF/Ollama)

Subject: Comprehensive Logic, Spatial Reasoning, and Chain-of-Thought (CoT) Analysis
Model: noctrex/MiroThinker-1.7-mini-MXFP4_MOE-GGUF
Platform: Ollama (Standard Configuration)
Date: March 12, 2026


1. Executive Summary

This report synthesizes the results of two benchmark phases conducted in both English and Vietnamese. The evaluation focused on the model's ability to handle multi-step deduction, lateral thinking, spatial geometry, and game theory.

While the model demonstrates high-tier analytical capabilities and rigorous self-verification, it exhibits a recurring technical instability characterized as "Recursive Loop Syndrome" when tackling high-complexity inductive problems.


2. Combined Performance Matrix

Category Complexity Result Technical Observation
Basic Logic & Syllogism Low Pass Excellent set-theory analysis and identification of self-referential logic.
Arithmetic & Lateral Thinking Medium Pass Avoided common linguistic traps; strong substitution logic.
Spatial Reasoning Medium/High Pass Demonstrated mathematical integrity by cross-checking surface areas and geometry.
Deductive Strategy High Partial Successfully integrated physical properties (e.g., heat) but struggled with state-partitioning.
Game Theory & Induction Very High Fail Correct logic identified, but failed to terminate the reasoning process (Infinite Loop).

3. Detailed Technical Analysis

A. Core Strengths

  • Mathematical Integrity: In spatial tasks (e.g., 3x3x3 cube problems), the model does not merely guess; it performs surface area verification ($54$ units) to validate its conclusions.
  • Domain Synthesis: The model effectively bridges the gap between abstract logic and physical reality, such as incorporating thermodynamics into light bulb puzzles.
  • Structured CoT: The "Given -> Reasoning -> Verification" framework is consistently applied, providing high transparency for debugging and complex system design.

B. Critical Weaknesses (Recursive Loop Syndrome)

The primary bottleneck for the MiroThinker-1.7-mini is its "Output Controller" during deep reasoning:

  • State Management Failure: In problems requiring case-by-case elimination (e.g., Three Hats or 100 Prisoners), the model often re-evaluates the same logic repeatedly without moving to the final inference.
  • Symmetric Attention Weights: When two competing facts are equally plausible, the model occasionally oscillates between them (e.g., Polar Bear vs. Brown Bear), leading to a failure to finalize the response despite solving the core spatial problem.
  • Verbose Redundancy: The model tends to over-explain simple concepts, which may increase latency in production environments.

4. Hardware/Software Optimization Notes

The test utilized the MXFP4_MOE-GGUF quantization via Ollama.

  • Observation: The MoE (Mixture of Experts) architecture provides rapid initial responses, but the logic "looping" suggests a potential mismatch between the thinking depth and the stop-token/penalty parameters in the standard GGUF configuration.

5. Final Conclusion & Recommendations

MiroThinker-1.7-mini is a significant advancement for open-weight reasoning models, particularly suitable for software debugging, mathematical modeling, and structured system design. However, it is currently unstable for recursive "Theory of Mind" tasks.

Recommended Tuning:

  1. Repetition Penalty: Increase repetition_penalty (suggested $1.15$ or higher) to break infinite reasoning cycles.
  2. Stop-Token Calibration: Adjust parameters to force termination once a logical equilibrium is reached.
  3. Task Scoping: Best deployed in environments where logic is deterministic rather than open-ended induction.

Final Grade: B+ (Excellent logic engine, hindered by output stability in high-complexity scenarios).

Owner

Thanks for the analysis!

Sign up or log in to comment