🙋🏽♂️ Is your "multi agent" system really multi agentic? Or is it just a modular setup with a bunch of different prompts? 🤨
I’ve had this discussion way too often, so I finally wrote it all down. If you’re building with agents, you need to read this.
Here’s the TLDR: ✅ True multi agent systems require: • Persistent, private state per agent • Memory that impacts future decisions • Adaptation based on past experiences
❌ Just having modular components, function calls, or multiple LLMs doesn't cut it. That’s not multi agentic. It’s just pipelining.
🤝 The magic is in evolving relationships, context retention, and behavioral shifts over time. 🧠 If your agents aren’t learning from each other or changing based on past experience… you are missing the point.
What do you think? Curious what patterns you're experimenting with 🧐
When OpenAI released its Computer-Using Agent (CUA) API, I happened to be playing Wordle 🧩 and thought, why not see how the model handles it? Spoiler: Wordle turned out to be a surprisingly effective benchmark. So Romain Cosentino Ph.D. and I dug in and analyzed the results of several hundred runs.
🔑 Takeaways 1️⃣ Even the best computer-using models struggle with simple, context-dependent tasks. 2️⃣ Visual perception and reasoning remain major hurdles for multimodal agents. 3️⃣ Real-world use cases reveal significant gaps between hype and reality. Perception accuracy drops to near zero by the last turn 📉
Some interesting architectural choices made in Llama 4 models -- were these key to the 10M context? Possibly 🤔
🔍 Takeaways: 🧩 Interleaved Attention without position encoding - LLaMA 4 removes explicit positional encoding in some attention layers to boost performance on longer contexts. - The principles here could be similar to the residual connections to facilitate attention to early tokens without positional decay.
⚖️ Scaled Softmax to increase attention at inference time - The max attention value (output of softmax) decreases as context size increases. - Llama 4 incorporates a context-size dependent temperature in the softmax function to modify the slope of softmax, allowing the model to focus better on relevant tokens. - Done only at inference time -- guessing it was more a choice after some observation on eval datasets.