IndicatorsEnv β€” GRPO Fine-tuned Qwen-7B (NSE India)

This model is a LoRA Adapter for Qwen2.5-7B-Instruct, specifically fine-tuned to solve high-precision directional trading signals in the IndicatorsEnv (OpenEnv) reinforcement learning environment.

πŸš€ Breakthrough Results

Unlike zero-shot foundation models which exhibit a "Neutral Bias" (collapsing into "Neutral" guesses to maintain safe accuracy), this agent was trained using Group Relative Policy Optimization (GRPO) with a Symmetric Anti-Bias Reward Function.

Metric Zero-Shot Baseline GRPO Fine-Tuned (Step 220+) Ξ” Status
Bullish Recall 36.0% 56.0% +20.0% πŸš€
Bearish Recall 0.0% 20.0% +20.0% πŸš€
Neutral Recall 70.0% 0.0% Active Decision Making

πŸ› οΈ Model Description

The "Neutral Mode Collapse" Solution

In quantitative finance, "Mode Collapse" occurs when an AI agent defaults to the majority class (Neutral) to minimize penalty. We solved this by implementing a reward function that explicitly penalizes inaction and rewards directional conviction. This forced the model to actively interpret 25+ technical indicators (Momentum, Trend, Volatility) rather than guessing.

πŸ›’ Usage (PEFT)

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel

base_model_id = "Qwen/Qwen2.5-7B-Instruct"
adapter_id = "bawsi99/indicators-grpo-qwen7b"

# Load Base Model
model = AutoModelForCausalLM.from_pretrained(
    base_model_id,
    torch_dtype="auto",
    device_map="auto",
)

# Load Adapter
model = PeftModel.from_pretrained(model, adapter_id)

# Ready for Inference

🏁 Evaluation Results

The model was evaluated on a held-out dataset of NSE India stocks from the 2023-2024 period. It demonstrates a significant surge in signal detection accuracy, specifically outperforming the baseline in identifying high-momentum Bullish setups.


Created for the Meta Γ— PyTorch Hackathon (2024)

Downloads last month
33
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for bawsi99/indicators-grpo-qwen7b

Base model

Qwen/Qwen2.5-7B
Adapter
(1769)
this model