@tpwang199655 on Hugging Face: "[Empirical Study] DeepSeek's New 1M Context Model: Full-Window Stress Test &…"

tpwang199655

posted an update Mar 1

Post

233

[Empirical Study] DeepSeek's New 1M Context Model: Full-Window Stress Test & Cognitive Emergence
Overview
This post shares an empirical study on DeepSeek's new long-context model (released Feb 2026, web/mobile version), which extends the context window to 1,000,000 tokens.
We conducted a full-window stress test, pushing the limit to ~1.53M tokens, and analyzed the model's behavior across three key dimensions:
Key Findings:
Interaction Token Budget: A complete project lifecycle consumes 1.2M–1.6M tokens, varying by input format and internal sparse attention mechanisms.
Long-Range Recall & Synthesis: The model demonstrates high-fidelity memory across the entire context, capable of retrieving initial instructions and synthesizing comprehensive reports without external RAG.
Emergence of Collaborative Cognition: Beyond a certain threshold, the model shifts from a "Q&A Engine" to a "Cognitive Partner", adopting user reasoning styles and maintaining global coherence—a capability absent in standard 128k windows.
Evidence
The test reached the hard limit at 1,536,000 tokens (see attached screenshot: "Conversation length limit reached").

Resources
Full reports (EN/CN PDFs), source code, and detailed data analysis are open-sourced at:
🔗 Project Page: https://tpwang-lab.github.io
🔗 GitHub Repo: https://github.com/tpwang-lab/deepseek-million-token
Welcome feedback and reproduction attempts from the community!
Tags: #DeepSeek #LLM #LongContext #EmpiricalStudy #AI

NJX-njx

Mar 1

Thanks for this solid empirical study! It’s super useful to see the real usable context limit around 1.5M tokens and the emergence of collaborative cognition. Great to have all the reports and code open-sourced—will definitely check and try to reproduce!

tpwang199655

Mar 1

"Thanks for the kind words! Really appreciate you taking the time to read through.
You hit the nail on the head: the shift from 'tool' to 'cognitive partner' is exactly what surprised us most during the 1.5M token run.
Looking forward to your feedback if you manage to reproduce any part of it. Feel free to open an issue on the GitHub repo if you hit any snags!"

Join the conversation