Sleeping Agents CARB Failure Observability π¬ Failure analysis for LM reasoning via HF Inference API