new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

May 7

Ice Cream Doesn't Cause Drowning: Benchmarking LLMs Against Statistical Pitfalls in Causal Inference

Reliable causal inference is essential for making decisions in high-stakes areas like medicine, economics, and public policy. However, it remains unclear whether large language models (LLMs) can handle rigorous and trustworthy statistical causal inference. Current benchmarks usually involve simplified tasks. For example, these tasks might only ask LLMs to identify semantic causal relationships or draw conclusions directly from raw data. As a result, models may overlook important statistical pitfalls, such as Simpson's paradox or selection bias. This oversight limits the applicability of LLMs in the real world. To address these limitations, we propose CausalPitfalls, a comprehensive benchmark designed to rigorously evaluate the capability of LLMs in overcoming common causal inference pitfalls. Our benchmark features structured challenges across multiple difficulty levels, each paired with grading rubrics. This approach allows us to quantitatively measure both causal reasoning capabilities and the reliability of LLMs' responses. We evaluate models using two protocols: (1) direct prompting, which assesses intrinsic causal reasoning, and (2) code-assisted prompting, where models generate executable code for explicit statistical analysis. Additionally, we validate the effectiveness of this judge by comparing its scoring with assessments from human experts. Our results reveal significant limitations in current LLMs when performing statistical causal inference. The CausalPitfalls benchmark provides essential guidance and quantitative metrics to advance the development of trustworthy causal reasoning systems.

  • 9 authors
·
Mar 3

Resolving the measurement uncertainty paradox in ecological management

Ecological management and decision-making typically focus on uncertainty about the future, but surprisingly little is known about how to account for uncertainty of the present: that is, the realities of having only partial or imperfect measurements. Our primary paradigms for handling decisions under uncertainty -- the precautionary principle and optimal control -- have so far given contradictory results. This paradox is best illustrated in the example of fisheries management, where many ideas that guide thinking about ecological decision making were first developed. We find that simplistic optimal control approaches have repeatedly concluded that a manager should increase catch quotas when faced with greater uncertainty about the fish biomass. Current best practices take a more precautionary approach, decreasing catch quotas by a fixed amount to account for uncertainty. Using comparisons to both simulated and historical catch data, we find that neither approach is sufficient to avoid stock collapses under moderate observational uncertainty. Using partially observed Markov decision process (POMDP) methods, we demonstrate how this paradox arises from flaws in the standard theory, which contributes to over-exploitation of fisheries and increased probability of economic and ecological collapse. In contrast, we find POMDP-based management avoids such over-exploitation while also generating higher economic value. These results have significant implications for how we handle uncertainty in both fisheries and ecological management more generally.

  • 2 authors
·
Dec 28, 2018