| Title |
Quality Evaluation of Automatically Generated Q&A Datasets from Built Environment Literature Using the On-Device LLM Exaone 4.0 1.2B |
| DOI |
https://doi.org/10.5659/JAIK.2025.41.11.259 |
| Keywords |
Large language model; Exaone; Automatic item generation; Quality evaluation; Built environment |
| Abstract |
This study explores the application of large language models (LLMs) in architectural engineering education by evaluating the quality of
question?answer (Q&A) pairs automatically generated from architectural environment literature using Exaone 4.0 1.2B, an on-device LLM. A
total of 36 papers in the architectural environment domain were collected and preprocessed into text files, which were then used as input for
zero-shot prompting. This process generated 1,913 Q&A pairs. Evaluation was conducted using ROUGE-L, containment, and cosine similarity
(SBERT), along with a review of formal errors such as incomplete sentences, encoding corruption, typographical or spacing issues,
meta-utterances, metadata exposure, residual citations, numerical or unit errors, and answer duplication (A=Q). The results show an average
containment score of 0.399 and an average cosine similarity of 0.420. In addition, 272 formal errors were identified, representing 14.2 percent
of all generated pairs. These findings provide a baseline assessment of Exaone 4.0 1.2B’s performance in automatic Q&A generation for the
architectural environment domain. Future research is expected to focus on reducing formal errors and improving semantic quality to enhance
educational and practical applications. |