Spaces:
Sleeping
Sleeping
| @startuml environment_overview | |
| !theme plain | |
| top to bottom direction | |
| skinparam backgroundColor #FEFEFE | |
| skinparam defaultFontName Arial | |
| skinparam defaultFontSize 14 | |
| skinparam ArrowColor #334155 | |
| skinparam RectangleBorderColor #64748B | |
| skinparam RectangleFontColor #0F172A | |
| skinparam roundcorner 10 | |
| skinparam linetype ortho | |
| skinparam packageStyle rectangle | |
| skinparam nodesep 42 | |
| skinparam ranksep 42 | |
| title AxiomForgeAI - Phase-Controlled Math Reasoning Loop | |
| rectangle "Small Math Model\n1.5B parameters" as MODEL #DBEAFE | |
| rectangle "Phase Controller\nwarmup: grounded only\nramp: gradual self-play\ncontinuous: capped mix + fallback" as PHASE #E2E8F0 | |
| rectangle "Task Source\nfor each GRPO group" as SELECT #E2E8F0 | |
| rectangle "Grounded Source\nKnown-answer practice" as GLANE #ECFDF5 { | |
| rectangle "Dataset problem\nGSM8K / MATH" as GQ #CCFBF1 | |
| rectangle "Gold answer\navailable" as GOLD #CCFBF1 | |
| rectangle "Model samples\nK solutions" as GSOL #CCFBF1 | |
| } | |
| rectangle "Self-Play Source\nModel-made challenges" as SLANE #EEF2FF { | |
| rectangle "Curriculum picks\nskill + difficulty" as CURRIC #E0E7FF | |
| rectangle "Model writes\na new question" as SQ #E0E7FF | |
| rectangle "Model samples\nK solutions" as SSOL #E0E7FF | |
| } | |
| rectangle "Shared Grading\nanswer, steps, arithmetic, format\n+ question quality for self-play" as GRADERS #F1F5F9 | |
| rectangle "Group Comparison\nWhich attempts worked best?" as COMPARE #EDE9FE | |
| rectangle "GRPO Update\nReinforce stronger reasoning" as GRPO #DDD6FE | |
| rectangle "Improved Model\nfor the next round" as NEXT #DBEAFE | |
| MODEL -down-> PHASE | |
| PHASE -down-> SELECT | |
| note right of PHASE | |
| sets mix | |
| end note | |
| SELECT -left-> GQ : grounded slot | |
| GQ --> GOLD | |
| GOLD --> GSOL | |
| SELECT -right-> CURRIC : self-play slot | |
| CURRIC --> SQ | |
| SQ --> SSOL | |
| GSOL -down-> GRADERS | |
| SSOL -down-> GRADERS | |
| GRADERS -right-> COMPARE | |
| COMPARE -right-> GRPO | |
| GRPO -right-> NEXT | |
| NEXT -up-> MODEL : repeat | |
| note bottom of SELECT | |
| Each batch is randomly interleaved. | |
| Phase 1 uses grounded only. | |
| Later phases add self-play slots by ratio. | |
| end note | |
| @enduml | |