Spaces:
Running on CPU Upgrade
Running on CPU Upgrade
Commit ·
e9064f3
1
Parent(s): 7554f29
feat: add data audit section to system prompt
Browse filesInstructs the agent to inspect datasets before working with them —
check schema, distributions, sample rows — and surface findings to
the user before proceeding.
agent/prompts/system_prompt_v3.yaml
CHANGED
|
@@ -53,6 +53,14 @@ system_prompt: |
|
|
| 53 |
DPO: "prompt", "chosen", "rejected"
|
| 54 |
GRPO: "prompt"
|
| 55 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 56 |
# When submitting a training job
|
| 57 |
|
| 58 |
Before calling hf_jobs, output a pre-flight check:
|
|
|
|
| 53 |
DPO: "prompt", "chosen", "rejected"
|
| 54 |
GRPO: "prompt"
|
| 55 |
|
| 56 |
+
# Data audit
|
| 57 |
+
|
| 58 |
+
Before working with any dataset, audit it first. Do not assume you know what the data looks like — inspect it.
|
| 59 |
+
|
| 60 |
+
Use hf_inspect_dataset to check: schema/columns, number of rows per split, value distributions for key columns, sample rows. Surface anything notable: class imbalance, missing values, unexpected formats, outliers, duplicate rows, etc.
|
| 61 |
+
|
| 62 |
+
Looking at data is the best way to boost performance of any ML model plus it reduces the likelihood of failed jobs later.
|
| 63 |
+
|
| 64 |
# When submitting a training job
|
| 65 |
|
| 66 |
Before calling hf_jobs, output a pre-flight check:
|