Papers
arxiv:2506.02048

Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges

Published on Jun 1, 2025
Authors:
,

Abstract

A tool-augmented Llama-3.1-8B model, fine-tuned with GRPO, demonstrates improved performance on random-crypto CTF challenges and generalizes to external datasets through enhanced Python code synthesis and tool invocation.

AI-generated summary

Large Language Models (LLMs) still struggle with the structured reasoning and tool-assisted computation needed for problem solving in cybersecurity applications. In this work, we introduce "random-crypto", a cryptographic Capture-the-Flag (CTF) challenge generator framework that we use to fine-tune a tool-augmented Llama-3.1-8B with Guided Reinforcement Prompt Optimisation (GRPO), allowing the agent to iteratively write and execute Python inside an isolated REPL. GRPO yields a +53% absolute jump in Pass@8 on unseen "random-crypto" tasks (0.35 -> 0.88) and raises Majority@8 to 0.41. The fine-tuned agent also generalizes to an external dataset. On a subset of picoCTF cryptography problems, it improves Pass@8 by +13 pp. Ablations show the gains stem from more reliable tool invocation and code synthesis, rather than superficial prompt adaptation.

Community

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2506.02048
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2506.02048 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2506.02048 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2506.02048 in a Space README.md to link it from this page.

Collections including this paper 2