Quantizations of https://huggingface.co/aws-prototyping/codefu-7b-v0.1
Open source inference clients/UIs
Closed source inference clients/UIs
- LM Studio
- More will be added...
From original readme
CodeFu-7B-v0.1 is a 7B parameter model trained using Reinforcement Learning for competitive programming tasks. Built on the DeepSeek-R1-Distill-Qwen-7B base model, CodeFu is capable of algorithmic reasoning to solve complex problems and generate efficient C++ solutions.
Specicially, CodeFu-7B-v0.1 achieves 13.7% Pass@1 on the USACO benchmark, outperforming models >4x larger.
Trained solely on problem statements—without access to any ground-truth solutions—CodeFu achieved >10x performance improvement over its base model, demonstrating the effectiveness of our RL approach..
Model Specs
- Base Model: DeepSeek-R1-Distill-Qwen-7B
- Model Size: 7.61B parameters
- License: MIT
- Task: Competitive Programming / Algorithmic Problem Solving
Starting from the DeepSeek-R1-Distill-Qwen-7B model, we trained CodeFu using RL on selected Competitive Programming problems (without solutions) from the DeepMind CodeContest dataset.
Evaluation
To assess CodeFu's genuine problem-solving abilities, we used USACO benchmark, which consists of 307 high-quality problems from the past USA Computing Olympiad contests.
| Model | Size | USACO Pass@1 | Notes |
|---|---|---|---|
| Claude-3.7-Sonnet | UNK | 31.9 | |
| OlympicCoder-32B | 32B | 18.9 | |
| QwQ-32B | 32B | 17.3 | |
| Qwen2.5-Coder-32B-Instruct | 32B | 16.3 | |
| CodeFu-7B-v0.1 | 7B | 13.7 | |
| DeepSeek-R1-Distill-Qwen-32B | 32B | 11.7 | |
| OlympicCoder-7B | 7B | 9.1 | |
| GPT-4-1106-preview | UNK | 8.7 | |
| Qwen2.5-Coder-7B-Instruct | 7B | 5.9 | |
| DeepSeek-R1-Distill-Qwen-7B | 7B | 1.0 | Base model |
| GPT-3.5-turbo-1106 | UNK | 0.6 |
Codefu Key Highlights:
- 📊 Leading 7B model on USACO benchmark
- ⚡ Outperforms 32B base model (13.7% vs 11.7% Pass@1)
- 📈 >10x improvement over 7B base model (13.7% vs 1%)
For systematic and robust evaluation, we used standardized code extraction logic across all model responses. This process identifies solution code by parsing either <code></code> tags or ```cpp code blocks, always selecting the final code block to ensure we capture each model's ultimate solution after any intermediate reasoning steps. GPT-3.5/4 scores are copied from the USACO benchmark as baselines
All extracted code solutions are executed with strict time limit enforcement - any code exceeding the problem's specified time limit is marked as incorrect, ensuring realistic competitive programming conditions.
All open-weight models were tested using vLLM v0.6.3 with identical sampling parameters: a temperature of 0.8 and a top_p of 0.95. Claude-3.7-Sonnet was evaluated at a temperature of 1.0. We set the maximum output length (max_tokens) to 28,672 for all models to ensure sufficient length for reasoning and code solutions.
Result analysis
We provide access to the complete CodeFu-7B-v0.1 evaluation results on the USACO benchmark as a CSV file containing fields such as problem_name, prompt, response, response_length, solution_code, status, and score. Notably, the status field breakdown is as follows:
- Success: 42 cases
- Failure (code runs but incorrect or timed out): 37 cases
- Fail to compile: 8 cases
- No code: 220 cases
Analysis of the response length distribution shows that successful solutions typically have concise responses around 5,000 tokens, while unsuccessful attempts often reach the maximum token limit. While some correct solutions do exceed 20,000 tokens, the vast majority of long responses correspond to the "No code" category, where the model engages in extensive reasoning that eventually degenerates into repetitive patterns or incoherent text without producing executable code. Future work is needed to improve training objectives that better distinguish between useful deliberation and unproductive verbosity.
Usage
# CodeFu works with vLLM for inference
# pip install vllm==0.6.3
from vllm import LLM, SamplingParams
model_name = "aws-prototyping/codefu-7b-v0.1"
# Initialize vLLM
llm = LLM(model=model_name, trust_remote_code=True)
sampling_params = SamplingParams(
temperature=0.8,
top_p=0.95,
max_tokens=28672,
)
# The `Hay Bales` problem in USA Computing Olympiad benchmark
prompt = """In your role as an algorithmic problem-solver, write a C++ solution for this problem. Put your thought process in <think> tags and your solution in <code> tags.
Problem:
Problem 1: Hay Bales [Brian Dean, 2011]
The cows are at it again! Farmer John has carefully arranged N (1 <= N <=
10,000) piles of hay bales, each of the same height. When he isn't
looking, however, the cows move some of the hay bales between piles, so
their heights are no longer necessarily the same. Given the new heights of
all the piles, please help Farmer John determine the minimum number of hay
bales he needs to move in order to restore all the piles to their original,
equal heights.
PROBLEM NAME: haybales
INPUT FORMAT:
* Line 1: The number of piles, N (1 <= N <= 10,000).
* Lines 2..1+N: Each line contains the number of hay bales in a single
pile (an integer in the range 1...10,000).
SAMPLE INPUT:
4
2
10
7
1
INPUT DETAILS:
There are 4 piles, of heights 2, 10, 7, and 1.
OUTPUT FORMAT:
* Line 1: An integer giving the minimum number of hay bales that need
to be moved to restore the piles to having equal heights.
SAMPLE OUTPUT:
7
OUTPUT DETAILS:
By moving 7 hay bales (3 from pile 2 to pile 1, 2 from pile 2 to pile 4, 2
from pile 3 to pile 4), we can make all piles have height 5.
"""
# Generate solution
outputs = llm.generate([prompt], sampling_params)
solution = outputs[0].outputs[0].text
print(solution)
# Alternative: OpenAI-compatible API server
# Start vLLM server first:
# python -m vllm.entrypoints.openai.api_server --model aws-prototyping/codefu-7b-v0.1 --port 8000
from openai import OpenAI
client = OpenAI(
api_key="EMPTY",
base_url="http://localhost:8000/v1"
)
response = client.completions.create(
model="aws-prototyping/codefu-7b-v0.1",
prompt=prompt,
temperature=0.8,
top_p=0.95,
max_tokens=28672,
)
solution = response.choices[0].text
print(solution)
We can examine CodeFu's generated solution for this problem, which has been verified as correct.
Prompt Format
CodeFu works best with structured prompts that request both reasoning and code:
[Role] Please solve this programming problem in C++. Show your thinking process in <think> tags and provide your solution in <code> tags.
[Problem Description]
Replace [Role] with phrases like:
- "As a competitive programming expert"
- "Working as an experienced competitive programmer"
- "As a master of algorithms and data structures"
- Downloads last month
- 255
1-bit
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit