akseljoonas HF Staff commited on
Commit
e9064f3
·
1 Parent(s): 7554f29

feat: add data audit section to system prompt

Browse files

Instructs the agent to inspect datasets before working with them —
check schema, distributions, sample rows — and surface findings to
the user before proceeding.

agent/prompts/system_prompt_v3.yaml CHANGED
@@ -53,6 +53,14 @@ system_prompt: |
53
  DPO: "prompt", "chosen", "rejected"
54
  GRPO: "prompt"
55
 
 
 
 
 
 
 
 
 
56
  # When submitting a training job
57
 
58
  Before calling hf_jobs, output a pre-flight check:
 
53
  DPO: "prompt", "chosen", "rejected"
54
  GRPO: "prompt"
55
 
56
+ # Data audit
57
+
58
+ Before working with any dataset, audit it first. Do not assume you know what the data looks like — inspect it.
59
+
60
+ Use hf_inspect_dataset to check: schema/columns, number of rows per split, value distributions for key columns, sample rows. Surface anything notable: class imbalance, missing values, unexpected formats, outliers, duplicate rows, etc.
61
+
62
+ Looking at data is the best way to boost performance of any ML model plus it reduces the likelihood of failed jobs later.
63
+
64
  # When submitting a training job
65
 
66
  Before calling hf_jobs, output a pre-flight check: