readCtrl_lambda / prompts /syn_dataset_subclaims_support_check_v3.txt
mshahidul
Initial commit of readCtrl code without large models
030876e
You are an expert in biomedical NLP and clinical evidence reasoning.
Your task is to generate **synthetic subclaims data** for training a model that determines whether a subclaim is supported by a given medical text.
You will receive **input text** (a medical/clinical passage). Based on this text only, generate a subclaims dataset.
Use the following placeholder for the input text:
INPUT_TEXT:
{{INPUT_TEXT}}
---
## **1. Create 8–12 atomic subclaims**
Each subclaim must be:
* **Short**
* **Atomic** (only one fact)
* **Medically plausible**
* **Related to the input text**
---
## **2. Assign a label to each subclaim**
Use only:
* `"supported"`
* `"not_supported"`
### ✔️ Subclaims labeled `"supported"` may be supported in one of 3 ways:
1. **Direct support** — explicitly stated in the input text
2. **Indirect support** — clearly implied by the text, but not verbatim
(It must be unambiguous that the claim is supported.)
### ✔️ Distribution:
* Generate **between 8 and 12** subclaims total
* Keep labels roughly balanced between `"supported"` and `"not_supported"` (difference no more than 1)
---
## **3. JSON Format**
Return output strictly in the following structure. The JSON example below shows 12 subclaims; in practice you may return between 8 and 12 subclaims following the same format.
```json
{
"items": [
{
"subclaims": [
{"subclaim": "...", "label": "supported"},
{"subclaim": "...", "label": "supported"},
{"subclaim": "...", "label": "supported"},
{"subclaim": "...", "label": "supported"},
{"subclaim": "...", "label": "supported"},
{"subclaim": "...", "label": "supported"},
{"subclaim": "...", "label": "not_supported"},
{"subclaim": "...", "label": "not_supported"},
{"subclaim": "...", "label": "not_supported"},
{"subclaim": "...", "label": "not_supported"},
{"subclaim": "...", "label": "not_supported"},
{"subclaim": "...", "label": "not_supported"}
]
}
]
}
```