Gradience now has a stable use case

Community Article Published March 26, 2026

Gradience is now best understood as a mixed-task inventory preflight system.

That is a narrower claim than “merge optimizer,” and a more useful one.

The practical question it now answers well is:

Before I spend real evaluation budget on merge experiments, how do I shrink a mixed adapter pool into a smaller, more defensible set of candidates?

That is the center of the project now.

The workflow

The stable workflow is simple:

  1. run source QA
  2. run pair reports
  3. run inventory summary / neighborhoods
  4. inspect task-boundary advisories
  5. identify same-task safe zones and cross-task caution zones
  6. reduce the candidate set
  7. only then run expensive evaluation

The point is not to produce more reports. The point is to make the next decision better.

What the evidence now supports

The recent work clarified where Gradience is actually strong.

Same-task pairs are broadly safe

On small encoder models, same-task pairs stayed broadly safe across multiple targeted studies, including:

  • training-style variation
  • domain shift inside a high-transfer task
  • source-strength asymmetry

That means same-task rescue logic is not the center of the project.

Task identity is the key boundary

The strongest and most stable distinction is:

  • same-task vs
  • cross-task

That is now the main regime boundary in Gradience.

The task-relationship advisory is established infrastructure

The advisory is now part of the stable interpretive layer.

Across the current evidence base:

  • 132+ pairs
  • 2 backbones
  • 0 false positives

It behaves exactly as it should:

  • silent on same-task pairs
  • active on different-task pairs

Mixed-task inventory preflight is where the system earns its keep

The highest-value use case is now very clear: mixed-task inventories where pair-risk alone leaves too many plausible candidates alive.

That is where Gradience helps most.

Utility result

A recent preflight utility round asked a simple question:

How much does Gradience reduce wasted merge exploration before evaluation begins?

The answer was strong.

In mixed-task inventories, Gradience reduced the candidate space by 65–90%, with 81% average reduction in cases where the task-relationship advisory was the main discriminator.

That is the clearest practical result in the repo right now.

Gradience is not just tagging pairs with extra metadata. It is helping collapse a flat pool of possibilities into a much smaller evaluation subset.

What that means in practice

The current stable stack is strongest when it can do some combination of:

  • exclude weak or low-value sources
  • identify a same-task safe zone
  • identify a cross-task caution zone
  • use neighborhoods to make a larger pair matrix legible again
  • reduce the number of pairs worth actually evaluating

That is the value proposition now.

Not “predict every merge outcome.”
Not “solve cross-task severity.”
Just:

turn a messy mixed-task inventory into a smaller, more defensible plan before expensive evaluation begins.

What Gradience does not yet solve

One important problem is still open:

  • cross-task severity grading

Gradience can now reliably tell you when you have crossed into the risky regime. It does not yet reliably tell you how severe a given cross-task merge will be across backbones.

That remains open research.

Recent follow-up studies tested:

  • exact task-pair identity
  • core-space shared-basis as a severity signal

Neither turned out to be stable enough across backbones to justify featureization.

So the current position is pretty clean:

  • boundary detection is solved
  • search-space reduction is solved enough to be useful
  • severity grading remains open

Bottom line

The repo, examples, docs, and utility evidence are now aligned around one practical use case:

run Gradience before merge evaluation on a mixed-task inventory if you want to shrink the candidate space and avoid spending eval budget on the wrong region of the pool.

That is the strongest current version of the project.

Community

Sign up or log in to comment