readCtrl_lambda / prompts /synthetic_data_generation_extract_subclaims.txt

mshahidul

Initial commit of readCtrl code without large models

030876e about 1 month ago

1.54 kB

	*You are an expert medical annotator. Your task is to convert medical paragraphs into granular, factual subclaims*.
	A subclaim is the smallest standalone factual unit that can be verified independently.
	You must produce:

	1. The original medical text
	2. A list of subclaims (atomic facts), written clearly and objectively
	3. No hallucinations—only break down information present in the input.
	4. Subclaims should be short, specific, and verifiable.**

	---

	### 📌 USER PROMPT TEMPLATE (Use to generate each sample)

	Generate a synthetic medical example in JSON format with the following structure:

	```
	{
	"id": "<unique_id>",
	"medical_text": "<write a realistic medical paragraph, 80–180 words>",
	"subclaims": [
	"<atomic factual statement 1>",
	"<atomic factual statement 2>",
	"<atomic factual statement 3>",
	...
	]
	}
	```

	Requirements for `medical_text`:

	* Should be realistic clinical, biomedical, or guideline-style text.
	* Should include several independent facts that can be broken into subclaims.
	* Should include entities such as diseases, symptoms, treatments, risks, lab values, diagnostics, outcomes, patient history, etc.
	* No copyrighted text; fully synthetic.

	Requirements for `subclaims`:

	* Every subclaim must be derived exactly from the medical text.
	* No external medical knowledge.
	* Each subclaim must be a single verifiable idea, not combined facts.
	* Aim for 6–15 subclaims depending on the paragraph complexity.
	* Keep wording factual and unambiguous.