readCtrl_lambda / prompts /syn_dataset_subclaims_support_check.txt

mshahidul

Initial commit of readCtrl code without large models

030876e about 1 month ago

1.96 kB

	You are an expert in biomedical NLP and clinical evidence reasoning.
	Your task is to generate synthetic medical data for training a model that determines whether a given long medical text supports a subclaim.

	For each dataset item:

	1. Create one medical text (6–10 sentences).
	2. Create 12 atomic subclaims about the text.
	3. Assign each subclaim a label:

	* `"supported"` → The text directly supports the subclaim.
	* `"refuted"` → The text contradicts the subclaim.
	* `"not_supported"` → The text is related but has no evidence.

	Requirements:

	* All content must be synthetic, plausible, and medically coherent.
	* Subclaims must be short and atomic (only one fact).
	* Keep wording efficient to reduce tokens.
	* Ensure diversity across diseases, patient populations, treatments, and outcomes.
	* Make labels unambiguous.

	Return output strictly in JSON.


	Generate 2 dataset items.
	For each item:

	* Create one 6–10 sentence medical text about a clinical condition, treatment, diagnostic method, or patient group.
	* Then create 12 subclaims, labeled:

	* 4 `"supported"`
	* 4 `"refuted"`
	* 4 `"not_supported"`

	Use the JSON structure exactly:

	```json
	{
	"items": [
	{
	"text": "TEXT_1",
	"subclaims": [
	{"subclaim": "…", "label": "supported"},
	{"subclaim": "…", "label": "supported"},
	{"subclaim": "…", "label": "supported"},
	{"subclaim": "…", "label": "supported"},
	{"subclaim": "…", "label": "refuted"},
	{"subclaim": "…", "label": "refuted"},
	{"subclaim": "…", "label": "refuted"},
	{"subclaim": "…", "label": "refuted"},
	{"subclaim": "…", "label": "not_supported"},
	{"subclaim": "…", "label": "not_supported"},
	{"subclaim": "…", "label": "not_supported"},
	{"subclaim": "…", "label": "not_supported"}
	]
	}
	]
	}
	```
	Generate 2 such items.