Literacy thresholds summary (outliers removed via IQR with ordering guard)
Source: data/factual_testing/full_details_evaluation_0_80_qwen3-30B_v2.json
Metrics:
  - source_coverage (proxy for % information from source text)
  - completeness (proxy for % information from gold summary)
  - factual_attribution (factual support of claims)

Five-number summaries by label (min, q1, median, q3, max)

Low literacy
  factual_attribution: 0.0000, 0.3656, 0.5263, 0.6667, 1.0000
  completeness:        0.8500, 0.9600, 1.0000, 1.0000, 1.0000
  source_coverage:     0.0000, 0.1765, 0.2308, 0.3226, 0.5000

Intermediate literacy
  factual_attribution: 0.2500, 0.5000, 0.6111, 0.7692, 0.9412
  completeness:        0.8500, 0.9393, 1.0000, 1.0000, 1.0000
  source_coverage:     0.0000, 0.1818, 0.3036, 0.4091, 0.7419

Proficient literacy
  factual_attribution: 0.0000, 0.4901, 0.7207, 0.8213, 0.9600
  completeness:        0.6923, 0.9231, 1.0000, 1.0000, 1.0000
  source_coverage:     0.4286, 0.7725, 0.8758, 0.9347, 1.0000

Outliers removed (count) by label and metric
  Low:          factual_attribution=0, completeness=7,  source_coverage=3
  Intermediate: factual_attribution=7, completeness=9,  source_coverage=3
  Proficient:   factual_attribution=0, completeness=10, source_coverage=10

Suggested thresholds (based on cleaned quartiles/medians)
  factual_attribution:
    low_to_intermediate: 0.5687
    intermediate_to_proficient: 0.6659
  completeness:
    low_to_intermediate: 1.0000
    intermediate_to_proficient: 1.0000
  source_coverage:
    low_to_intermediate: 0.2672
    intermediate_to_proficient: 0.5908

Interpretation for “% information needed”
(use medians, with IQR as uncertainty band)
  Low literacy:
    source text ≈ 23% (IQR 18%–32%)
    gold summary ≈ 100% (IQR 96%–100%)
  Intermediate literacy:
    source text ≈ 30% (IQR 18%–41%)
    gold summary ≈ 100% (IQR 94%–100%)
  Proficient literacy:
    source text ≈ 88% (IQR 77%–93%)
    gold summary ≈ 100% (IQR 92%–100%)

Explanation of how the ranges were selected (for reporting)
1) For each literacy label, we collected the metric scores across all examples.
2) We removed outliers using the IQR rule (values below Q1-1.5*IQR or above Q3+1.5*IQR).
3) For source_coverage and completeness, we keep the ordering proficient > intermediate > low
   by skipping outlier removal if the cleaned medians would violate that order.
4) We summarized each cleaned distribution with the five-number summary.
5) We report the median as the “typical” needed information and the IQR (Q1–Q3)
   as a robust uncertainty band around that typical value.
6) For thresholds between labels, we used the midpoint between adjacent distributions:
   - If there is a gap (Q3 of lower label < Q1 of upper label), use the midpoint
     between Q3 and Q1 to avoid overlap.
   - If distributions overlap, use the midpoint between the two medians as a
     conservative separating threshold.