JAIK - Journal of the Architectural Institute of Korea

Main Menu

Journal Search


Title	An Exploratory Study of Benchmark Construction for Performance Evaluation of Large Language Models in the Building Environmental Domain
Authors	정창헌(Cheong, Chang Heon)
DOI	https://doi.org/10.5659/JAIK.2026.42.5.291
Page	pp.291-299
ISSN	2733-6247
Keywords	Large language model; Built Environment; Benchmark; Domain Performance Evaluation; Expert Consensus
Abstract	This study proposes a benchmark construction method for systematically evaluating the performance of large language models (LLMs) in the building environmental domain and presents an exploratory experiment applying the benchmark to on-device models. A dual-tier benchmark framework was developed, categorizing items into core performance indicators and extended performance indicators based on expert consensus. A total of 120 question?answer items were created using educational materials in the building environmental field, and their importance was assessed by experts. As a result, 32 items, or 26.7 percent, were classified as core performance indicators, 83 items, or 69.2 percent, as extended performance indicators, and 5 items, or 4.2 percent, were excluded from the benchmark. The proposed benchmark was then applied to evaluate two on-device LLMs. The results showed that the models achieved accuracy rates of 43.8 to 59.4 percent on core performance indicators and 39.1 to 52.2 percent on extended performance indicators. Both models demonstrated higher accuracy on the core performance indicators, suggesting that concepts with stronger expert consensus were more likely to be reflected in training data for LLMs. Overall, the findings indicate that a dual-tier benchmark based on expert consensus can serve as an effective tool for evaluating domain-specific knowledge in LLMs within the building environmental field.

Copyright © ARCHITECTURAL INSTITUTE OF KOREA All right's reserved.

No part of this publication may be reproduced or distributed in any form or any means, or stored in a data base or retrieval system, without the prior permission of the publisher(www.aik.or.kr).