Update README.md
Browse files
README.md
CHANGED
|
@@ -44,10 +44,12 @@ This model aims to validate a hypothesis in cutting-edge research: even if power
|
|
| 44 |
|
| 45 |
In recent years, the reasoning capabilities of Large Language Models (LLMs) have seen significant improvements. In mathematics, coding, scientific Q&A, and complex logical tasks, **explicit multi-step reasoning processes** have been proven to enhance problem-solving stability and interpretability. Early research on **Chain-of-Thought (CoT) prompting** showed that allowing a model to generate intermediate reasoning steps before the final answer significantly improves performance on complex tasks. Subsequently, reasoning-focused models have learned to generate longer and more systematic reasoning traces internally through supervised fine-tuning (SFT), reinforcement learning (RL), and high-quality reasoning data distillation.
|
| 46 |
|
| 47 |
-
However, the openness of full reasoning traces has brought new challenges. Based on commercial competition and safety considerations, **current public information indicates that commercial models like OpenAI's GPT series and Anthropic's Claude series have explicitly hidden their true internal reasoning chains. For these models, what we see in APIs or front-end interfaces is often just a "Reasoning Bubble"**鈥攁 highly compressed summary of the massive internal reasoning process. For smaller models aiming to improve their capabilities through data distillation, these overly compressed reasoning chains do not provide sufficient step-level learning signals. On the contrary, due to large logical gaps and missing intermediate derivations, directly training smaller models on these summaries can lead to confusion and a failure to master true reasoning skills.
|
| 48 |
-
|
| 49 |

|
| 50 |
|
|
|
|
|
|
|
|
|
|
|
|
|
| 51 |
This hidden design rests on an implicit assumption: as long as the full Chain-of-Thought is not exposed, the risk of "stealing" model capabilities is minimized. However, **Trace Inversion** research suggests that by using the "**Final Answer + Compressed Summary**," external models can still synthesize valuable, detailed reasoning traces. **Jackrong/Trace-Inverter-4B** is a 4B-parameter experiment based on this context, designed to verify whether a model can expand these compressed clues into a full reasoning path.
|
| 52 |
|
| 53 |
---
|
|
|
|
| 44 |
|
| 45 |
In recent years, the reasoning capabilities of Large Language Models (LLMs) have seen significant improvements. In mathematics, coding, scientific Q&A, and complex logical tasks, **explicit multi-step reasoning processes** have been proven to enhance problem-solving stability and interpretability. Early research on **Chain-of-Thought (CoT) prompting** showed that allowing a model to generate intermediate reasoning steps before the final answer significantly improves performance on complex tasks. Subsequently, reasoning-focused models have learned to generate longer and more systematic reasoning traces internally through supervised fine-tuning (SFT), reinforcement learning (RL), and high-quality reasoning data distillation.
|
| 46 |
|
|
|
|
|
|
|
| 47 |

|
| 48 |
|
| 49 |
+
|
| 50 |
+
However, the openness of full reasoning traces has brought new challenges. Based on commercial competition and safety considerations, current public information indicates that **commercial models like OpenAI's GPT series and Anthropic's Claude series have explicitly hidden their true internal reasoning chains. For these models, what we see in APIs or front-end interfaces is often just a "Reasoning Bubble"鈥攁 highly compressed summary of the massive internal reasoning process.** For smaller models aiming to improve their capabilities through data distillation, these overly compressed reasoning chains do not provide sufficient step-level learning signals. On the contrary, due to large logical gaps and missing intermediate derivations, directly training smaller models on these summaries can lead to confusion and a failure to master true reasoning skills.
|
| 51 |
+
|
| 52 |
+
|
| 53 |
This hidden design rests on an implicit assumption: as long as the full Chain-of-Thought is not exposed, the risk of "stealing" model capabilities is minimized. However, **Trace Inversion** research suggests that by using the "**Final Answer + Compressed Summary**," external models can still synthesize valuable, detailed reasoning traces. **Jackrong/Trace-Inverter-4B** is a 4B-parameter experiment based on this context, designed to verify whether a model can expand these compressed clues into a full reasoning path.
|
| 54 |
|
| 55 |
---
|