Title |
Traffic Sequence Vectorization and Ensemble Algorithm Classification for Tor Website Fingerprinting |
Authors |
오형석(Hyoungseok Oh) ; 황두성(Doosung Hwang) ; 김원겸(Wongyum Kim) |
DOI |
https://doi.org/10.5573/ieie.2020.57.5.55 |
Keywords |
Tor; Websitefingerprinting; Ensemble algorithm |
Abstract |
Tor network processes encryption through a variety of relay nodes to ensure anonymity. Website fingerprinting aims to identify the visiting website by analyzing the traffic sequence that arises between the user and the entry node. This paper suggests an effective website fingerprinting method over time. The traffic data is collected by accessing the web site within the selected web category at random times to reflect the random access time, and the training data is prepared through the preprocessing and feature extraction steps. The training vector is consist of features such as incoming/outgoing time interval, packet length and burst from traffic sequence according to network performance and protocols. The tree ensemble algorithms is applied to compare the classification performance of website fingerprinting over time. The average detection rate of the ensemble model is over 90.0% and the extra tree algorithm shows high performance of 93.0%. The comparison of the original learning model with data trained continuously over 30 days reduced 10.0%. Therefore, a website fingerprinting method that is based on a machine learning approach requires a regular model learning strategy. |