Joshua JaffersonA. 1
PonnusamyVijayakumar2
JovicJovana3
TrajanovicMiroslav4
-
(Department of Electronics and Communication, SRM Institute of Science and Technology,
Chennai, India
{joshuaj@srmist.edu.in}
)
-
(Department of Electronics and Communication, SRM Institute of Science and Technology,
Chennai, India
{vijayakp@srmist.edu.in}
)
-
(Belgrade Metropolitan University, Serbia jovana.jovicic@metropolitan.ac.rs)
-
(University of Nis, Serbia miroslav.trajanovic@gmail.com )
Copyright © The Institute of Electronics and Information Engineers(IEIE)
Keywords
AWS lambda, Brain-computer interface, Cloud computing, CNN, EEG signal, IoT, Imagined speech to text
1. Introduction
The brain-computer interface (BCI) is cutting-edge technology that helps physically
challenged people interact with the world. Reading and analyzing various properties
of the brain, such as electrical, magnetic, and blood oxygen levels, help better understand
the brain activities, such as alertness, focus level, sleep cycle, and even motor
control signals [1]. These motor control signals are involved in actuating muscles throughout the body.
Many studies have shown that by extracting and analyzing the signals associated with
speech, articulation will help identify the word that is being imagined in the brain
[2-4].
This field of research is still in its infancy because only six words have been
identified [5]. Increasing the number of words results in a decrease in accuracy. Moreover, the
duration of the imagined word varies from time to time and from person to person.
Another challenge in EEG signal processing is tagging the signal with the actual word.
Unlike an audio signal, it is not that easy to tag because the imagined word of the
participant is not understandable.
Moreover, there is a considerable gap between the stimuli and response. The response
does not have an indicator or marker to specify where the cue of the stimulus starts
and where it ends. One more challenge involved in developing a system to convert an
imagined word in the brain to text or sound is the computational complexity of the
algorithms involved in processing and classifying the EEG signals to text. Usually,
a handheld device only requires the deployment of the application. Handheld devices
and portable devices have lesser throughput, making them less suitable for this type
of application.
This article contributes to the following areas: (i) introducing a mechanism to
synchronize the stimuli and response by adding an indicator channel in the EEG signal
to indicate where the imagined speech starts and ends to increase the accuracy of
detecting the word; (ii) a method of repeating the thought of the same word with different
experiment durations is proposed to improve the data reliability; (iii) a method of
segmenting the EEG data is proposed to enhance the accuracy for an increased number
of 10 words recognition; (iv) an IoT framework with cloud computing is proposed in
this paper to handle the computational complexity.
IoT and cloud computing has become more versatile and provides stable solutions
[6]. The few web services available include Amazon web service (AWS), Google cloud platform,
Microsoft Azure, IBM Cloud, Jelastic, Digital ocean, and Salesforce. The system in
this paper deals with the Amazon web service because it is a reliable, scalable, and
inexpensive cloud computing service. The remaining part of the article is organized
as follows: Section II summarizes the related works, while section III details the
materials and methods followed in this paper. The results are reported in section
IV, and section V proposes the opportunities and challenges in future research in
BCI for Imagined speech. Section VI discusses the conclusions and summary of work
done.
2. Related Work
Many studies have provided evidence for the relationship between the brain activities
and their corresponding outcomes as motor actions and speech [7-9]. The technologies involved in BCI include electroencephalography (EEG), magnetoencephalography
(MEG), functional magnetic resonance imaging (fMRI), near-infrared spectroscopy (fNIR),
and electrocorticography (ECoG) [10].
fNIR and fMRI technology are used to find the active part of the brain while performing
a particular task. The active brain part takes more oxygen, which can be monitored
by fNIR. In fNIR, the blood oxygen level was measured through an infrared source and
detector placed on the scalp in a particular order. A change in infrared light intensity
was converted to the level of oxygenated and deoxygenated hemoglobin [11,12]. The human auditory system was mapped to Broca’s and Wernicke’s area of the brain
using this information. On the other hand, these methods lag in temporal resolution.
To record the imagined speech, the system requires high temporal resolution. MEG is
a non-invasive method that requires a bulk fixed instrument, which makes the experiment
procedure expensive and complex.
ECoG has a very high temporal resolution. It has less noise because it is placed
inside the skull and on the surface of the brain. Electrode placement requires surgery
and may produce scarring on the brain surface. Therefore, this method has more risk
in usage [13].
Among these BCI technologies, EEG stands out because of its low cost, low risk,
wearable, and excellent temporal resolution [14,15]. The signals were captured during a task performed by the subject. The task in the
proposed system was to imagine the word in mind with or without articulating the sound
after an audio or visual cue. The maximum number of words processed thus far was only
six, which is very limited in daily usage.
The support vector machine (SVM) is a widely used algorithm for EEG signal processing
[16,17]. On the other hand, before applying any machine-learning algorithm, the data needs
to be noise- and artifact-free. The general pre-processing methods are the bandpass
filter and common spatial patterns (CSP) .
All the above-said algorithm requires machines with high processing capacity.
An IoT-based cloud computing system can provide a solution to these process-hungry
applications. This system also provides a centralized kernel (learning and classification
algorithms) that keeps updating for each data input entry. Owing to the centralization
of the kernel, learning and classification occur uniformly to all the systems connected
to the cloud. One more advantage of the cloud system is scalability, which enables
an increase in the amount of data for learning if the classification accuracy is poor.
3. Materials and Methods
3.1 Data Acquisition
The EEG signal was collected from 18 right-handed volunteers aged between 13
and 51. Among them, six are male, and 12 are female. Each volunteer was asked to rest
in a comfortable chair in a calm room and asked to repeat imagining any one of the
following words ‘Bath,’ ‘Cold,’ ‘Doctor,’ ‘Food,’ ‘Hot,’ ‘No,’ ‘Pain,’ ‘Toilet,’ ‘Water,’
and ‘Yes.’ The volunteer's data for the given the word is acquired for one-, five-,
and ten second time intervals to improve the reliability of the data. Each time they
imagine a word, the EEG signal was captured using a NeuroSky MindWave Mobile 2 device.
This apparatus is a single-channel EEG device with a 12-bit resolution and a 512Hz
sampling rate at maximum. This raw data was then transferred to Raspberry Pi 3 through
Bluetooth connectivity as packets.
In Raspberry Pi 3, Python code was used to extract the raw EEG data from the
received packets. A part of the code was written to remove the outlier values and
eye blinking artifacts during pre-processing of the data.
As mentioned in Section I (Introduction), the EEG data requires some synchronizing
between stimuli and response. As a solution to this, the participants were asked to
imagine the word after each bell sound. Later, this bell sound was added as a second
channel to the recorded response by a microphone with the same timestamp. This audio
signal was used as a synchronizing pulse for each imagined word.
According to this, a participant repeats a word twice for five seconds and four
times for 10 seconds. Therefore, the response with indicator can be segmented and
considered different samples. A comparison was made between ‘without indicator’ and
‘with indicator’. Table 1 lists the performance for five seconds.
Here, the sample size for without an indicator is 180 because the complete trial
is considered to be one sample. Furthermore, for the remainder of this study, the
samples were collected with indicators.
The volunteer's data for the given the word was acquired for one-, five-, and
10-second intervals. The one-second samples were too short for the participant to
think the word. Therefore, it was discarded. During five and 10 seconds, the samples
were collected with an indicator. The calculation for the total number of samples
is as follows:
Sample size = (Number of participants) X
(Number of words) X
(Number of times they repeat in the trial).
That is, 18 ${\times}$ 10 ${\times}$ 2 = 360 is for a five-second time interval
and 18 ${\times}$ 10 ${\times}$ 4 =720 for 10-second time interval. Of these samples,
80% were used as training data, and 20% were used for testing and validating. Therefore,
288 samples for training and 72 samples for testing in a 5-second time interval and
576 samples for training, and 144 samples for testing in a 10second time interval.
The Raspberry Pi 3 was used as a local host in which the received raw data was
preprocessed to remove artifacts and noise and stored in memory in an Excel sheet
format. The Raspberry Pi 3 was also used to communicate with the cloud system and
display the final text result received from the cloud system. The algorithm and GUI
to communicate and display the messages are written in python language.
An AWS Message Broker was installed in the Raspberry Pi 3, and MQTT Publisher
and MQTT subscriber were configured. The MQTT Publisher pushed the pre-processed data
into the cloud system, and the MQTT Subscriber received the processed result text
from the cloud system. This final text was displayed in the Raspberry Pi 3 display.
Fig. 1. Response with no indicator.
Fig. 2. Response with indicator and data after segmentation into separate samples.
Table 1. Comparison of accuracy with respect to indicator.
Data acquisition method
|
Sample size
|
Accuracy in %
|
Without indicator
|
180
|
58.4%
|
With indicator
|
360
|
77.3%
|
3.2 Cloud System
The implementation of the proposed system used Amazon web service (AWS) as the
cloud system, which executed the code of the machine learning and classification algorithms.
The AWS provided various services, such as Amazon Lambda, which is a server-less service,
and Amazon Sage Maker, which requires a dedicated server to implement the machine
language algorithms. These services also support various operating system platforms
and programming languages.
The messages received from the AWS Message Broker were processed and integrated
by the Rules engine. This selects the data from message payloads, processes them,
forwards them to the AWS Lambda service and Amazon Dynamo. Amazon Dynamo Cloud Database
is a multi-master, internet-scale database with build-in security, which can handle
more than 20million requests per second.
In AWS Lambda, the data was pre-processed, and the data was prepared ready for
the machine-learning algorithm. AWS Lambda also contains services related to machine
learning and data classification. AWS Lambda service is implemented along with Amazon
Kinesis. Amazon Kinesis is a data streaming service provided by AWS. A convolution
neural network (CNN) was used as the classification algorithm. Here, the data was
first converted to two-dimension information from one dimension data stream. This
two-dimension vector was applied to CNN as an image form.
The input layer was connected to the convolution layer of 20 with a kernel dimension
of 5x5. One batch normalization layer was used to speed up training by reducing the
sensitivity. ReLu was used as the activation function. Two fully connected layers
followed by a softmax layer were employed as a classification layer for computing
the probability of the classes. The classification layer contained a one-dimensional
array of size ten representing the class.
The final text result of the classifier was forwarded to the Amazon Dynamo Cloud
Database and the MQTT subscriber using the Rules engine.
4. Performance Analysis and Result Discussion
4.1 Maximum Latency
Table 2 lists the latency compared between different sample sizes. The latency depended on
the throughput of the server and the network capacity.
Table 2. Accuracy and Latency result.
Sample size
|
Accuracy
|
Latency in sec
|
720
|
82.1%
|
Trining=1.5; test=0.1283
|
360
|
77..3%
|
Trining=0.7
Test =0.1144
|
4.2 Different Dataset Size
An increase in dataset size always increases the accuracy of the CNN system.
A comparison study was performed by varying the dataset size, and Table 2 lists the accuracy level. From the table, the accuracy increased with increasing
dataset size, and the latency also increased.
4.3 Accuracy and Loss
Fig. 4 presents the accuracy of classification versus the number of epochs.
Fig. 4 shows that the accuracy was above 70% for a 140 epoch and small training data set.
By increasing the training data size to 720, the accuracy was improved to 82%.
Fig. 5 shows the loss value concerning the number of the epochs. The minimum error/loss
value of 8.2311e-08 was obtained after training the CCN at 144$^{\mathrm{th}}$ epoch.
Fig. 3. Block diagram of the proposed IoT based brain signal classifier
Fig. 4. Accuracy of the classification with respect to the number of epochs.
Fig. 5. Mean Squared Error as a function of the number of the epochs.
5. Opportunities and Challenges
The BCI research was still in its infancy because the maximum of imagined words
identified was still six. More word classification is the current requirement to know
the actual needs of the patient. On the other hand, the challenge is in the data acquisition
methods. Considerable electrical interference and even brain signals responsible for
other actions also result in poor accuracy. These challenges open the door for more
research in the future.
6. Conclusion
Thought-to-text conversion using EEG signal analysis is a challenge. An attempt
was made to make ten specific words that may require an expression of the basic needs
of paralyzed people. The brain signals corresponding to the text information of these
words were mapped with the help of a deep learning algorithm. The imagined speech
to text mapping problem was converted to a classification problem under this work.
The accuracy result of the mapping provides a hope to realize the proposed system
in a practical scenario. Approximately 82% accuracy was achieved using a single-channel
EEG signal without feature extraction and signal processing. Future work will attempt
to improve the accuracy of the proposed system by applying a signal-processing algorithm
before feeding the data to the classifier.
REFERENCES
Pereira Joana, Sburlea Andreea Ioana, Müller-putz Gernot R, September 2018, EEG patterns
of self-paced movement imaginations towards externally-cued and internally- selected
targets, nature scientificreports
Sereshkeh Alborz Rezazadeh, Trott Robert, Bricout Aurelien, Chau Tom, December 2017,
EEG Classification of Covert Speech Using Regularized Neural Networks, IEEE/ACM Transactions
on Audio Speech and Language Processing, Vol. 25, pp. 2292-2300
Hickok Gregory, February 2012, Computational neuroanatomy of speech production, Nature
reviews-Neuroscience, Vol. 13, pp. 135-145
Rahman1 K. A. A., Ibrahim1 B. S. K. K., Leman A.M., Jamil M. M. A., December 2012,
Fundamental study on brain signal for BCI-FES system development, IEEE-EMBS Conference
on Biomedical Engineering and Sciences, pp. 195-198
Matsumotoa Mariko, Hori Junichi, November 2013, Classification of silent speech using
support vector machine and relevance vector machine, Applied Soft Computing, Vol.
20, pp. 95-102
Xu Xiaolong, Liu Qingxiang, Luo Yun, Peng Kai, Zhang Xuyun, Meng Shunmei, January
2019, A computation offloading method over big data for IoT-enabled cloud-edge computing,
Future Generation Computer Systems, Vol. 95, pp. 522-533
Jafferson A. Joshua, JAN 2020, ,A Review on Machine Learning Mechanisms for Imagined
Speech Classification, Journal of Advanced Research in Dynamical and Control Systems,
Vol. volume 12, No. 1, pp. 137-142
Vijayakumar , abajieet , balaji , February 2019, A Palm Vein Recognition System based
on support vector machine, IEIE Transaction on smart processing and computing, Vol.
8, No. 1
Ganga Revanth Chowdary, Vijayakumar P., Badrinath Pratyush, Singh Ankit Raj, Singh
Mohit, Drone Control Using EEG Signal, Journal of Advanced Research in Dynamical and
Control Systems, Vol. 11, No. 4, pp. 2107-2113
Cooney Ciaran, Folli Raffaella, Coyle Damien, October 2018, Neurolinguistics Research
Advancing Development of a Direct-Speech Brain-Computer Interface, iScience Cellpress
Reviews, Vol. 8, No. , pp. 103-125
Sereshkeh Alborz Rezazadeh, Yousefi Rozhin, Wong Andrew T, November 2018, Online classification
of imagined speech using functional near-infrared spectroscopy signals, Journal of
neural engineering, Vol. 16
Valente Giancarlo, Kaas Amanda L., Formisano Elia, Goebel Rainer, 2019, Optimizing
fMRI experimental design for MVPA-based BCI control: Combining the strengths of block
and event-related designs, NeuroImage, Vol. 186, pp. 369-381
Brumberg Jonathan S., Krusienski Dean J., Chakrabarti Shreya, Gunduz Aysegul, Brunner
Peter, Ritaccio Anthony L., Schalk Gerwin, November 2016, Spatio-temporal progression
of cortical activity related to continuous overt and covert speech production in a
reading task, PLoS ONE, Vol. 11, pp. 1-21
Jahangiri1 Amir, Sepulveda Francisco, 2019, The Relative Contribution of High-Gamma
Linguistic Processing Stages of Word Production, and Motor Imagery of Articulation
in Class Separability of Covert Speech Tasks in EEG Data, Journal of Medical Systems,
Vol. 43
Wang Li, Zhang Xiong, Zhong Xuefei, Zhang Yu, 2013, Analysis and classification of
speech imagery EEG for BCI, Biomedical Signal Processing and Control, Vol. 8, pp.
901-908
Siuly Siuly , Li Yan, Zhang Yanchun, 2016, EEG Signal Analysis and Classification
Techniques and Applications, Health Information Science
Martin Stephanie, Brunner Peter, Iturrate1 Iñaki, Millán1 José del R., Schalk Gerwin,
Knight Robert T., Pasley Brian N., May 2016, Word pair classification during imagined
speech using direct brain recordings, Nature Scientific Reports, Vol. 6
Author
A. Joshua Jafferson is currently working as an Assistant Professor in Department
of Electronics and Communication Engineering at SRM Institute of Science and Technology,
Chennai, Tamil Nadu, India. He is currently pursuing PhD in Bio-medical signal processing
domine under the guidance of Dr. P. Vijayakumar. Earlier he did his Masters in Embedded
System from SASTRA University (2008), Thanjavur, Tamil Nadu, India.
Vijayakumar Ponnusamy has completed his Ph.D. from SRM IST (2018) in applied machine
learning in wireless communication (cognitive radio), Master in Applied Electronic
from the college of engineering, Guindy (2006), and B.E(ECE) from Madras University
(2000). He is a Certified ``IoT specialist'' and ``Data scientist. ``. He is a recipient
of the NI India Academic award for excellence in research (2015). His current research
interests are in Machine and Deep learning, IoT based intelligent system design, Blockchain
technology, and cognitive radio networks. He is a senior member of IEEE. He is currently
working as an Associate Professor in the ECE Department, SRM IST, Chennai, Tamil Nadu,
and India.
Jovana Jović, MSc, is a junior research assistant and a teaching assistant at the
Faculty of Information Technologies at the Belgrade Metropolitan University. At the
same University she is a Ph.D. student in Software Engineering. She had completed
her MSc. Studies at the Faculty of Electronic Engineering in Niš, University of Niš,
in 2015. She is employed at the Belgrade Metropolitan University since 2015, where
she is involved in teaching activities in object-oriented programming, objects and
data abstraction, intro to software engineering and software architecture design.
Miroslav Trajanovic, Professor at Mechanical Engineering Faculty, University of
Nis, Nis, Serbia, has had got 30 years of experience in application of IT in mechanical
engineering and education. Those experiences include the usage of the most popular
CAE programs, writing programs for solving different engineering problems and educating
students in IT. He is expert for computer programming, CAD, finite element method
and expert systems. He is the author of more than 140 scientific and professional
papers (published till 2010). He has also taken part in 15 scientific and professional
team projects supported by the Serbian government and industry. He was also project
leader for 10 projects mainly in IT and Mechanical Engineering, as well as two European
FP6 and six FP7 projects.