Hands-on testing of HY-World 2.0 shows a significant improvement in end-to-end engineering maturity compared to version 1.5
The model supports direct multimodal input from text, single-frame images, and video. Inference can be launched without camera intrinsic/extrinsic calibration or additional preprocessing
After panorama generation, the built-in Spatial Agent automatically performs semantic navigation path planning. Combined with spatial consistency constraints from HY-WorldStereo, it ensures artifact-free multi-view generation and stable geometric alignment
Outputs include standard 3D asset formats such as Mesh, 3DGS, and point clouds, which can be directly imported into Unity/UE
It is suitable for engineering scenarios including game level prototyping, digital twins, and embodied simulation
Ran a small controlled study on a frozen 40-task slice of Harbor Terminal-Bench-Pro, using the same model (minimax/minimax-m2.5) with two agent harnesses: Goose and OpenHands-SDK.
Under the base setup, reducing the turn budget from 100 to 60 pushed the two harnesses in opposite directions:
A tweaked 60-turn setup brought OpenHands-SDK back to 0.575. At their best, both harnesses reached the same 0.575 pass rate.
What surprised me most was the token profile: in this setup, the reported token usage for OpenHands-SDK was dramatically higher than Goose while converging to the same best score.
Same model, same task slice, different harness behavior under a tighter interaction budget.