🤖 Career OS — SOTA Multi-Agent Career Assistant

A production-quality, research-grade personal career agent built with 3-stage iterative training (SFT → DPO → GRPO), multi-agent orchestration, and Karpathy-style training visualization. Designed for publication and real-world deployment.

🏗️ Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Career OS Orchestrator                    │
│    (routes requests, manages context, synthesizes output)    │
└─────────────────────────────────────────────────────────────┘
                              │
        ┌─────────────┬───────┴───────┬─────────────┐
        ▼             ▼               ▼             ▼
  ┌─────────┐  ┌──────────┐  ┌────────────┐  ┌──────────────┐
  │ Resume  │  │  Job     │  │  Career    │  │   Salary     │
  │ Parser  │  │  Matcher │  │  Advisor   │  │ Negotiator   │
  │  Agent  │  │  Agent   │  │   Agent    │  │    Agent     │
  └─────────┘  └──────────┘  └────────────┘  └──────────────┘

🧬 3-Stage Training Pipeline

Inspired by Self-Rewarding LLMs, iGRPO, and LoRA Without Regret.

Stage 1: SFT — High-Rank LoRA (r=256)

Method: Supervised fine-tuning on ~16K multi-turn career conversations
Data: Resume reviews, job-fit assessments, JSON parsing, coaching dialogues, reasoning chains
Config: LoRA r=256, α=512, target_modules="all-linear", use_rslora=True
Time: ~60 min on A100 40GB
LR: 2e-4, cosine schedule, 2 epochs

Stage 2: DPO — Preference Optimization (r=64)

Method: Direct Preference Optimization on model-generated preference pairs
Innovation: Self-generated pairs using the Stage 1 model — quality scores for structure, actionability, career relevance
Config: LoRA r=64, α=128, β=0.1
Time: ~30 min on A100 40GB
LR: 5e-7, 1 epoch

Stage 3: GRPO — Multi-Component Reward (r=32)

Method: Group Relative Policy Optimization with custom career reward function
Reward components:
- Structure (25%): Headers, bullet points, sections
- JSON correctness (25%): Valid JSON when requested
- Actionability (25%): Action verbs and concrete steps
- Career relevance (25%): Domain-specific terminology
Config: LoRA r=32, α=64, num_generations=4
Time: ~40 min on A100 40GB
LR: 1e-6, 1 epoch

Total Pipeline Time: ~2.5–3 hours on A100 40GB

📊 Training Dashboard

Karpathy-style visualization tracks all training stages:

Metric	What It Shows
Loss curve	Raw + moving average, best-loss point marked
Learning rate	Cosine schedule with warmup
Reward trajectory	Mean reward + trend line across DPO/GRPO
Response length	Histogram + time series (detects collapse/explosion)
Gradient norm	Training stability monitoring
Career quality	JSON correctness + actionability scores

All plots auto-generated during training and saved to ./training_dashboard/.

🚀 Quick Start (Google Colab Pro A100)

Step 1: Set up Colab

Open a new notebook at Google Colab
Runtime → Change runtime type → GPU → A100 (requires Colab Pro/Pro+)
Secrets (left sidebar) → Add HF_TOKEN with write permission

Step 2: Run everything

Copy the entire contents of career_os_complete_colab.py into one code cell and press Shift+Enter.

That's it. The script runs all stages end-to-end and pushes the final model to the Hub.

What you get after ~3 hours:

✅ Fine-tuned model weights on huggingface.co/Builder-Neekhil/career-agent-v1
✅ Dataset cached on huggingface.co/datasets/Builder-Neekhil/career-agent-dataset-v1
✅ Training dashboard with publication-quality plots in ./training_dashboard/
✅ Ready-to-use multi-agent Career OS

🎯 Agent Capabilities

Agent	What It Does	Output Format
Resume Parser	Extracts structured data from raw resume text	JSON (name, skills, experience, education)
Job Matcher	Scores resume against job description	JSON (score 0-100, strengths, gaps, suggestions)
Career Advisor	Career path planning, skill gaps, interview prep	Markdown with structured headers + action items
Salary Negotiator	Compensation strategies, market research, scripts	Markdown with data-backed scripts
Orchestrator	Routes requests, chains agents, synthesizes output	Unified career report

🧪 Running Individual Agents

from career_os_orchestrator import CareerOS

cos = CareerOS(agent_model="Builder-Neekhil/career-agent-v1")

# Single task
result = cos.process("Review my resume", resume_text="...")
print(result["synthesized"])

# Full pipeline
results = cos.full_pipeline(
    resume_text="...",
    job_description="...",
    target_role="Senior Software Engineer"
)

📁 File Structure

File	Purpose	When to Use
`career_os_complete_colab.py`	⭐ ONE-COPY COLAB SCRIPT — everything in one cell	Copy into Colab, run once
`career_os_sota.py`	Standalone 3-stage pipeline script	Local GPU or scripted training
`career_os_orchestrator.py`	Multi-agent Career OS with 4 specialized agents + orchestrator	After training, for inference
`training_dashboard.py`	Karpathy-style visualization dashboard	Import during training or for analysis
`train_colab.py`	Original single-stage SFT script	Simple SFT only
`train.py`	Original standalone training script	Simple SFT only
`inference.py`	Basic inference script	Quick testing
`demo.py`	Gradio web UI	Interactive demo
`requirements.txt`	Pinned dependencies	Setup

🔬 Research Foundation

Technique	Paper	arXiv ID	What We Used
Self-Rewarding LLMs	Iterative DPO for instruction following	2401.10020	Stage 2: self-generated preference pairs
iGRPO	Iterative GRPO with self-feedback	2602.09000	Stage 3: multi-component reward function
LoRA Without Regret	High-rank LoRA for SFT, low-rank for RL	Blog post	r=256→64→32 pipeline
AgentOrchestra	Multi-agent orchestration framework	2506.12508	Career OS agent architecture
ResumeFlow	LLM resume generation pipeline	2402.06221	Structured output patterns
TALLRec	Lightweight LoRA tuning for recommendation	2305.00447	SFT data efficiency

📊 Dataset Composition

Source	Size	Purpose
cnamuangtoun/resume-job-description-fit	~4.6K	Job-fit assessment with JSON output
opensporks/resumes	~7.2K	Resume review + interview + career path
sandeeppanem/resume-json-extraction-5k	~4.9K	Structured resume parsing
Synthetic coaching dialogues	800	Salary, pivot, networking, gaps, promotion, interview prep
Reasoning boost	400	Step-by-step career reasoning chains

Total: ~17K multi-turn conversations with system prompts.

📈 Evaluation Benchmark

Since no standard career agent benchmark exists, we propose the Career Agent Evaluation Suite (CAES):

Task	Metric	What We Measure
Resume parsing	F1 on extracted entities	Name, skills, experience, education
Job fit assessment	Accuracy vs ground truth	Match/no-match classification
Career advice quality	Human evaluation (1-5)	Helpfulness, specificity, actionability
JSON correctness	Valid JSON rate	% of responses with parseable JSON
Response structure	Section headers, lists	Markdown structure score

🔧 Hyperparameters

Stage 1: SFT

Param	Value
Model	Qwen/Qwen2.5-1.5B-Instruct
LoRA rank	256
LoRA alpha	512
Target modules	all-linear
RSLora	True
Epochs	2
Learning rate	2e-4
Batch size (effective)	8 (2 × 4)
Max length	2048
Precision	bf16
Gradient checkpointing	True
Assistant-only loss	True

Stage 2: DPO

Param	Value
LoRA rank	64
LoRA alpha	128
Epochs	1
Learning rate	5e-7
Batch size (effective)	8 (1 × 8)
Beta (DPO temperature)	0.1
Max length	2048
Precision	bf16

Stage 3: GRPO

Param	Value
LoRA rank	32
LoRA alpha	64
Epochs	1
Learning rate	1e-6
Batch size (effective)	4 (1 × 4)
Completions per prompt	4
Max completion length	512
Precision	bf16

📤 Publishing Checklist

To make this publication-ready:

Run full 3-stage training and push weights
Generate training dashboard plots
Run CAES evaluation benchmark
Compare against baseline (untrained Qwen2.5-1.5B-Instruct)
Ablation: SFT-only vs SFT+DPO vs SFT+DPO+GRPO
Human evaluation with 5 annotators on 50 samples
Write arXiv paper with methodology + results

📄 License

Apache-2.0 (inherits from Qwen/Qwen2.5-1.5B-Instruct)

Built for those who take their career seriously. 🚀

Downloads last month: 28

Safetensors

Model size

2B params

Tensor type

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Papers for Builder-Neekhil/career-agent-v1

iGRPO: Self-Feedback-Driven LLM Reasoning

Paper • 2602.09000 • Published Feb 9 • 18

AgentOrchestra: A Hierarchical Multi-Agent Framework for General-Purpose Task Solving

Paper • 2506.12508 • Published Jun 14, 2025 • 1

ResumeFlow: An LLM-facilitated Pipeline for Personalized Resume Generation and Refinement

Paper • 2402.06221 • Published Feb 9, 2024

Self-Rewarding Language Models

Paper • 2401.10020 • Published Jan 18, 2024 • 153

TALLRec: An Effective and Efficient Tuning Framework to Align Large Language Model with Recommendation

Paper • 2305.00447 • Published Apr 30, 2023 • 1