π€ Career OS β SOTA Multi-Agent Career Assistant
A production-quality, research-grade personal career agent built with 3-stage iterative training (SFT β DPO β GRPO), multi-agent orchestration, and Karpathy-style training visualization. Designed for publication and real-world deployment.
ποΈ Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Career OS Orchestrator β
β (routes requests, manages context, synthesizes output) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β
βββββββββββββββ¬ββββββββ΄ββββββββ¬ββββββββββββββ
βΌ βΌ βΌ βΌ
βββββββββββ ββββββββββββ ββββββββββββββ ββββββββββββββββ
β Resume β β Job β β Career β β Salary β
β Parser β β Matcher β β Advisor β β Negotiator β
β Agent β β Agent β β Agent β β Agent β
βββββββββββ ββββββββββββ ββββββββββββββ ββββββββββββββββ
𧬠3-Stage Training Pipeline
Inspired by Self-Rewarding LLMs, iGRPO, and LoRA Without Regret.
Stage 1: SFT β High-Rank LoRA (r=256)
- Method: Supervised fine-tuning on ~16K multi-turn career conversations
- Data: Resume reviews, job-fit assessments, JSON parsing, coaching dialogues, reasoning chains
- Config: LoRA r=256, Ξ±=512, target_modules="all-linear", use_rslora=True
- Time: ~60 min on A100 40GB
- LR: 2e-4, cosine schedule, 2 epochs
Stage 2: DPO β Preference Optimization (r=64)
- Method: Direct Preference Optimization on model-generated preference pairs
- Innovation: Self-generated pairs using the Stage 1 model β quality scores for structure, actionability, career relevance
- Config: LoRA r=64, Ξ±=128, Ξ²=0.1
- Time: ~30 min on A100 40GB
- LR: 5e-7, 1 epoch
Stage 3: GRPO β Multi-Component Reward (r=32)
- Method: Group Relative Policy Optimization with custom career reward function
- Reward components:
- Structure (25%): Headers, bullet points, sections
- JSON correctness (25%): Valid JSON when requested
- Actionability (25%): Action verbs and concrete steps
- Career relevance (25%): Domain-specific terminology
- Config: LoRA r=32, Ξ±=64, num_generations=4
- Time: ~40 min on A100 40GB
- LR: 1e-6, 1 epoch
Total Pipeline Time: ~2.5β3 hours on A100 40GB
π Training Dashboard
Karpathy-style visualization tracks all training stages:
| Metric | What It Shows |
|---|---|
| Loss curve | Raw + moving average, best-loss point marked |
| Learning rate | Cosine schedule with warmup |
| Reward trajectory | Mean reward + trend line across DPO/GRPO |
| Response length | Histogram + time series (detects collapse/explosion) |
| Gradient norm | Training stability monitoring |
| Career quality | JSON correctness + actionability scores |
All plots auto-generated during training and saved to ./training_dashboard/.
π Quick Start (Google Colab Pro A100)
Step 1: Set up Colab
- Open a new notebook at Google Colab
- Runtime β Change runtime type β GPU β A100 (requires Colab Pro/Pro+)
- Secrets (left sidebar) β Add
HF_TOKENwith write permission
Step 2: Run everything
Copy the entire contents of career_os_complete_colab.py into one code cell and press Shift+Enter.
That's it. The script runs all stages end-to-end and pushes the final model to the Hub.
What you get after ~3 hours:
- β Fine-tuned model weights on huggingface.co/Builder-Neekhil/career-agent-v1
- β Dataset cached on huggingface.co/datasets/Builder-Neekhil/career-agent-dataset-v1
- β
Training dashboard with publication-quality plots in
./training_dashboard/ - β Ready-to-use multi-agent Career OS
π― Agent Capabilities
| Agent | What It Does | Output Format |
|---|---|---|
| Resume Parser | Extracts structured data from raw resume text | JSON (name, skills, experience, education) |
| Job Matcher | Scores resume against job description | JSON (score 0-100, strengths, gaps, suggestions) |
| Career Advisor | Career path planning, skill gaps, interview prep | Markdown with structured headers + action items |
| Salary Negotiator | Compensation strategies, market research, scripts | Markdown with data-backed scripts |
| Orchestrator | Routes requests, chains agents, synthesizes output | Unified career report |
π§ͺ Running Individual Agents
from career_os_orchestrator import CareerOS
cos = CareerOS(agent_model="Builder-Neekhil/career-agent-v1")
# Single task
result = cos.process("Review my resume", resume_text="...")
print(result["synthesized"])
# Full pipeline
results = cos.full_pipeline(
resume_text="...",
job_description="...",
target_role="Senior Software Engineer"
)
π File Structure
| File | Purpose | When to Use |
|---|---|---|
career_os_complete_colab.py |
β ONE-COPY COLAB SCRIPT β everything in one cell | Copy into Colab, run once |
career_os_sota.py |
Standalone 3-stage pipeline script | Local GPU or scripted training |
career_os_orchestrator.py |
Multi-agent Career OS with 4 specialized agents + orchestrator | After training, for inference |
training_dashboard.py |
Karpathy-style visualization dashboard | Import during training or for analysis |
train_colab.py |
Original single-stage SFT script | Simple SFT only |
train.py |
Original standalone training script | Simple SFT only |
inference.py |
Basic inference script | Quick testing |
demo.py |
Gradio web UI | Interactive demo |
requirements.txt |
Pinned dependencies | Setup |
π¬ Research Foundation
| Technique | Paper | arXiv ID | What We Used |
|---|---|---|---|
| Self-Rewarding LLMs | Iterative DPO for instruction following | 2401.10020 | Stage 2: self-generated preference pairs |
| iGRPO | Iterative GRPO with self-feedback | 2602.09000 | Stage 3: multi-component reward function |
| LoRA Without Regret | High-rank LoRA for SFT, low-rank for RL | Blog post | r=256β64β32 pipeline |
| AgentOrchestra | Multi-agent orchestration framework | 2506.12508 | Career OS agent architecture |
| ResumeFlow | LLM resume generation pipeline | 2402.06221 | Structured output patterns |
| TALLRec | Lightweight LoRA tuning for recommendation | 2305.00447 | SFT data efficiency |
π Dataset Composition
| Source | Size | Purpose |
|---|---|---|
| cnamuangtoun/resume-job-description-fit | ~4.6K | Job-fit assessment with JSON output |
| opensporks/resumes | ~7.2K | Resume review + interview + career path |
| sandeeppanem/resume-json-extraction-5k | ~4.9K | Structured resume parsing |
| Synthetic coaching dialogues | 800 | Salary, pivot, networking, gaps, promotion, interview prep |
| Reasoning boost | 400 | Step-by-step career reasoning chains |
Total: ~17K multi-turn conversations with system prompts.
π Evaluation Benchmark
Since no standard career agent benchmark exists, we propose the Career Agent Evaluation Suite (CAES):
| Task | Metric | What We Measure |
|---|---|---|
| Resume parsing | F1 on extracted entities | Name, skills, experience, education |
| Job fit assessment | Accuracy vs ground truth | Match/no-match classification |
| Career advice quality | Human evaluation (1-5) | Helpfulness, specificity, actionability |
| JSON correctness | Valid JSON rate | % of responses with parseable JSON |
| Response structure | Section headers, lists | Markdown structure score |
π§ Hyperparameters
Stage 1: SFT
| Param | Value |
|---|---|
| Model | Qwen/Qwen2.5-1.5B-Instruct |
| LoRA rank | 256 |
| LoRA alpha | 512 |
| Target modules | all-linear |
| RSLora | True |
| Epochs | 2 |
| Learning rate | 2e-4 |
| Batch size (effective) | 8 (2 Γ 4) |
| Max length | 2048 |
| Precision | bf16 |
| Gradient checkpointing | True |
| Assistant-only loss | True |
Stage 2: DPO
| Param | Value |
|---|---|
| LoRA rank | 64 |
| LoRA alpha | 128 |
| Epochs | 1 |
| Learning rate | 5e-7 |
| Batch size (effective) | 8 (1 Γ 8) |
| Beta (DPO temperature) | 0.1 |
| Max length | 2048 |
| Precision | bf16 |
Stage 3: GRPO
| Param | Value |
|---|---|
| LoRA rank | 32 |
| LoRA alpha | 64 |
| Epochs | 1 |
| Learning rate | 1e-6 |
| Batch size (effective) | 4 (1 Γ 4) |
| Completions per prompt | 4 |
| Max completion length | 512 |
| Precision | bf16 |
π€ Publishing Checklist
To make this publication-ready:
- Run full 3-stage training and push weights
- Generate training dashboard plots
- Run CAES evaluation benchmark
- Compare against baseline (untrained Qwen2.5-1.5B-Instruct)
- Ablation: SFT-only vs SFT+DPO vs SFT+DPO+GRPO
- Human evaluation with 5 annotators on 50 samples
- Write arXiv paper with methodology + results
π License
Apache-2.0 (inherits from Qwen/Qwen2.5-1.5B-Instruct)
Built for those who take their career seriously. π
- Downloads last month
- 28