Results

====================================================================== Calculating evaluation metrics...

=== Evaluating on 496 samples ===

--- Categorical and Accuracy-Based Evaluation ---
Weather              Accuracy: 0.00% (36/496 samples)
Time                 Accuracy: 0.00% (1/496 samples)
Illumination         Accuracy: 28.92% (408/496 samples)
Visibility           Accuracy: 7.69% (416/496 samples)
Road Surface         Accuracy: 31.19% (388/496 samples)
Traffic Lights       Accuracy: 83.33% (6/496 samples)
Road Type            Accuracy: 0.00% (3/496 samples)
Location             Accuracy: 0.00% (1/496 samples)
Risk                 Accuracy: 45.45% (11/496 samples)
Intention            Accuracy: 72.73% (11/496 samples)

Exact Match          Accuracy: 0.00%
--- 'Risk' Field Classification Report ---
              precision    recall  f1-score   support

       ** No       0.00      0.00      0.00         0
          <1       0.00      0.00      0.00         0
          No       0.40      0.67      0.50         3
         Yes       0.75      0.38      0.50         8

    accuracy                           0.45        11
   macro avg       0.29      0.26      0.25        11
weighted avg       0.65      0.45      0.50        11

--- Performance Summary ---

Average Accuracy across all fields: 26.93%

--- Inference Time Analysis ---

Average inference time: 1893.4 ms
Median inference time: 1427.1 ms
Min/Max: 1328.3 / 3684.9 ms

=== Evaluation Complete ===

====================================================================== EVALUATION SUMMARY

Model: enpeizhao/internvl2-1b-odd-distilled-merged Test Samples: 500 Results File: test_results_distilled_20260129_203526.csv

Field Accuracies:
  Traffic Lights       β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘ 83.3%
  Intention            β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘ 72.7%
  Risk                 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 45.5%
  Road Surface         β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 31.2%
  Illumination         β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 28.9%
  Visibility           β–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 7.7%
  Weather              β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 0.0%
  Time                 β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 0.0%
  Road Type            β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 0.0%
  Location             β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 0.0%

  AVERAGE              β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘β–‘ 26.9%

======================================================================

Downloads last month
2
Safetensors
Model size
0.9B params
Tensor type
BF16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support