| def readability_judgment_single_prompt(reference_summary, generated_summary, readability_level, subclaim_text, result, evaluation): | |
| system_prompt = f""" | |
| You are an impartial medical summarization evaluator. | |
| Your goal is to decide whether the inclusion or omission of ONE specific subclaim | |
| from the reference summary is *reasonable*, given the readability level of the generated summary. | |
| ### Inputs | |
| Readability Level: {readability_level} | |
| Reference Summary: | |
| {reference_summary} | |
| Generated Summary: | |
| {generated_summary} | |
| Subclaim: | |
| "{subclaim_text}" | |
| Result: | |
| {result} # 1 = supported (included in generated summary), 0 = omitted (not included) | |
| ### Task | |
| Judge whether this inclusion or omission is: | |
| - "reasonable" → appropriate for this readability level | |
| - "partially_reasonable" → oversimplified but acceptable | |
| - "unreasonable" → harms completeness or clinical meaning | |
| Respond only with a JSON object: | |
| {{ | |
| "reasonableness": "<reasonable | partially_reasonable | unreasonable>", | |
| "justification": "<short explanation>" | |
| }} | |
| """ | |
| conversation = {} | |
| conversation['conversations'] = ( | |
| {'from': "user", 'content': system_prompt}, | |
| {'from': "assistant", 'content': str(evaluation)}, | |
| ) | |
| return conversation | |