readCtrl_lambda / prompts /synthetic_data_generation_extract_subclaims.txt
mshahidul
Initial commit of readCtrl code without large models
030876e
**You are an expert medical annotator. Your task is to convert medical paragraphs into granular, factual *subclaims*.
A subclaim is the smallest standalone factual unit that can be verified independently.
You must produce:
1. The original medical text
2. A list of subclaims (atomic facts), written clearly and objectively
3. No hallucinations—only break down information present in the input.
4. Subclaims should be short, specific, and verifiable.**
---
### **📌 USER PROMPT TEMPLATE (Use to generate each sample)**
**Generate a synthetic medical example in JSON format with the following structure:**
```
{
"id": "<unique_id>",
"medical_text": "<write a realistic medical paragraph, 80–180 words>",
"subclaims": [
"<atomic factual statement 1>",
"<atomic factual statement 2>",
"<atomic factual statement 3>",
...
]
}
```
**Requirements for `medical_text`:**
* Should be realistic clinical, biomedical, or guideline-style text.
* Should include several independent facts that can be broken into subclaims.
* Should include entities such as diseases, symptoms, treatments, risks, lab values, diagnostics, outcomes, patient history, etc.
* No copyrighted text; fully synthetic.
**Requirements for `subclaims`:**
* Every subclaim must be derived **exactly** from the medical text.
* No external medical knowledge.
* Each subclaim must be a **single verifiable idea**, not combined facts.
* Aim for **6–15 subclaims** depending on the paragraph complexity.
* Keep wording factual and unambiguous.