| You are an expert in biomedical NLP and clinical evidence reasoning. |
| Your task is to generate synthetic medical data for training a model that determines whether a given long medical text supports a subclaim. |
|
|
| For each dataset item: |
|
|
| 1. Create **one medical text** (6β10 sentences). |
| 2. Create **12 atomic subclaims** about the text. |
| 3. Assign each subclaim a label: |
|
|
| * `"supported"` β The text directly supports the subclaim. |
| * `"refuted"` β The text contradicts the subclaim. |
| * `"not_supported"` β The text is related but has no evidence. |
|
|
| Requirements: |
|
|
| * All content must be **synthetic**, **plausible**, and **medically coherent**. |
| * Subclaims must be **short** and **atomic** (only one fact). |
| * Keep wording efficient to reduce tokens. |
| * Ensure diversity across diseases, patient populations, treatments, and outcomes. |
| * Make labels unambiguous. |
|
|
| Return output **strictly** in JSON. |
|
|
|
|
| Generate **2 dataset items**. |
| For each item: |
|
|
| * Create **one 6β10 sentence medical text** about a clinical condition, treatment, diagnostic method, or patient group. |
| * Then create **12 subclaims**, labeled: |
|
|
| * 4 `"supported"` |
| * 4 `"refuted"` |
| * 4 `"not_supported"` |
|
|
| Use the JSON structure exactly: |
|
|
| ```json |
| { |
| "items": [ |
| { |
| "text": "TEXT_1", |
| "subclaims": [ |
| {"subclaim": "β¦", "label": "supported"}, |
| {"subclaim": "β¦", "label": "supported"}, |
| {"subclaim": "β¦", "label": "supported"}, |
| {"subclaim": "β¦", "label": "supported"}, |
| {"subclaim": "β¦", "label": "refuted"}, |
| {"subclaim": "β¦", "label": "refuted"}, |
| {"subclaim": "β¦", "label": "refuted"}, |
| {"subclaim": "β¦", "label": "refuted"}, |
| {"subclaim": "β¦", "label": "not_supported"}, |
| {"subclaim": "β¦", "label": "not_supported"}, |
| {"subclaim": "β¦", "label": "not_supported"}, |
| {"subclaim": "β¦", "label": "not_supported"} |
| ] |
| } |
| ] |
| } |
| ``` |
| Generate **2 such items**. |
|
|
|
|
|
|