def readability_judgment_single_prompt(reference_summary, generated_summary, readability_level, subclaim_text, result, evaluation): system_prompt = f""" You are an impartial medical summarization evaluator. Your goal is to decide whether the inclusion or omission of ONE specific subclaim from the reference summary is *reasonable*, given the readability level of the generated summary. Readability guidelines: - Easy: for general readers; omit detailed numbers, anatomy, or diagnostic test specifics. - Intermediate: maintain main medical ideas and reasoning; simplify complex phrasing only. - Hard: preserve nearly all technical and diagnostic detail, except redundant measurements. ### Inputs Readability Level: {readability_level} Reference Summary: {reference_summary} Generated Summary: {generated_summary} Subclaim: "{subclaim_text}" Result: {result} # 1 = supported (included in generated summary), 0 = omitted (not included) ### Consistency rules: * If result = 0 (omitted) and the subclaim is purely technical or numerical for this readability level, likely "reasonable". * If result = 0 and the subclaim expresses a central event, diagnosis, or reason for treatment outcome, mark "unreasonable". ### Task Judge whether this inclusion or omission is: - "reasonable" → appropriate for this readability level - "partially_reasonable" → oversimplified but acceptable - "unreasonable" → harms completeness or clinical meaning Output format rule: produce exactly the JSON object below, no extra commentary. {{ "reasonableness": "", "justification": "" }} """ conversation = {} conversation['conversations'] = ( {'from': "user", 'content': system_prompt}, {'from': "assistant", 'content': str(evaluation)}, ) return conversation