Description
The CLA-QA dataset contains 25 common patient questions about Complex Lymphatic Anomalies (CLAs), 175 responses generated by seven large language models (LLMs), and physician-assigned accuracy score from three board-certified clinical experts using a 5-point Likert scale. The dataset was developed to support research on automated evaluation methods for LLM-generated free-text responses in rare diseases. It provides a benchmark resource for comparing traditional NLP similarity metrics and LLM-based evaluation against expert physician judgment.