Title Quality Evaluation of Automatically Generated Q&A Datasets from Built Environment Literature Using the On-Device LLM Exaone 4.0 1.2B
Authors 정창헌(Cheong, Chang Heon)
DOI https://doi.org/10.5659/JAIK.2025.41.11.259
Page pp.259-269
ISSN 2733-6247
Keywords Large language model; Exaone; Automatic item generation; Quality evaluation; Built environment
Abstract This study explores the application of large language models (LLMs) in architectural engineering education by evaluating the quality of question?answer (Q&A) pairs automatically generated from architectural environment literature using Exaone 4.0 1.2B, an on-device LLM. A total of 36 papers in the architectural environment domain were collected and preprocessed into text files, which were then used as input for zero-shot prompting. This process generated 1,913 Q&A pairs. Evaluation was conducted using ROUGE-L, containment, and cosine similarity (SBERT), along with a review of formal errors such as incomplete sentences, encoding corruption, typographical or spacing issues, meta-utterances, metadata exposure, residual citations, numerical or unit errors, and answer duplication (A=Q). The results show an average containment score of 0.399 and an average cosine similarity of 0.420. In addition, 272 formal errors were identified, representing 14.2 percent of all generated pairs. These findings provide a baseline assessment of Exaone 4.0 1.2B’s performance in automatic Q&A generation for the architectural environment domain. Future research is expected to focus on reducing formal errors and improving semantic quality to enhance educational and practical applications.