Research Experience
Comparison Between MFCC and PLP in Automatic Speech Recognition (ASR) of Bangla Speech Corpus
Publications
-
Publisher: Przeglad Elektrotechniczny
In this study, the effectiveness of six machine learning and eight deep learning algorithms in analyzing electroencephalogram (EEG) signals for detecting epileptic seizures has been investigated. The study utilizes 14 channels in the EMOTIV EPOC+ device which is based on international 10-20 system. To find out the most informative and sensitive channel, one of the 14 channels has been dropped one at a time. The accuracy values were determined for all the methods using two different publicly available datasets: the Guinea-Bissau epilepsy dataset and the Nigeria epilepsy dataset. In case of machine learning models, the performance of SVM classifier performs best with maximum accuracy of 83.2% (Guinea-Bissau) and 77% (Nigeria) without excluding any channels. No significant performance degradation has been observed for single channel exclusion of this classifier. Among the deep learning models, the four best performing models in terms of accuracy are CNN-LSTM (92.5%), IC-RNN (91.8%), ChronoNet (91.1%) and C-DRNN (88.6%). After excluding one channel at a time and investigating their effect on the performance of the four DL models, it has been observed that the most significant and most sensitive channels lie within the frontal and parietal zone. This finding will be very useful in practice as it indicates that the electrodes in the frontal and parietal zone should be placed with absolute precision for accurate diagnosis of the diseases. In addition, this study also explore the effectiveness of the selected classifiers in detecting seizure in case of failure of any particular EEG signal channel.
-
Publisher: IEEE
Seizure detection using electroencephalogram (EEG) signals plays a crucial role in diagnosing and treating epilepsy. However, the cost associated with deploying EEG-based seizure detection systems limits their widespread adoption, particularly in resource-constrained healthcare settings like remote areas in countries like Bangladesh. This research aims at fostering cost-efficient, fully automated initial seizure diagnosis in remote areas. This paper presents a novel approach for classifier selection in EEG seizure detection systems to improve cost-effectiveness while maintaining high detection accuracy. Two publicly available datasets of EEG signals from a low-cost consumer-grade EEG headset with 14 channels have been chosen to ensure lower cost during data acquisition along with lower complexity of classifier design. Fourteen state-of-the-art classifiers found in recent literature have been used to distinguish between epileptic individuals and healthy individuals. Firstly, we have chosen the four most accurate classifiers which are ChronoNet (94.6%), CNN-LSTM (92.5%), IC-RNN (91.8%), and C-DRNN (88.6%). Then further compared these four models in terms of evaluation time (ET), and trainable parameters (TP) along with their Cohen's kappa score and discarded the more complex CNN-LSTM model due to its extremely high ET (1.48s) and TP (about 2.2 million). We have also investigated the sensitivity of individual channels on the best-performing classifiers and found that most sensitive channels lie in the frontal and parietal zone. After comparing the robustness due to failure of any EEG signal channel, the C-DRNN has been found the most robust and the fastest for the initial diagnosis with the least resources.
-
Publisher: MAT Journals
Electroencephalograms (EEGs) are commonly used to diagnose brain conditions such as epilepsy. Deep Learning Algorithms have proven to be
effective in analyzing EEG signals for detecting epileptic seizures. In this study, machine learning model: Random Forest Tree (RFT), deep
learning models: Convolutional Neural Network (CNN) and ChronoNet, and transformer models: Vision Transformer (ViT) and Swin Transformer (ST)
have been proposed to distinguish between epileptic individuals and healthy individuals. In Random Forest Tree, features are chosen randomly,
data are trained in different trees and predictions from all trees are combined to get the final prediction. ChronoNet is popular for predicting
the future value of time-series-based data. ViT is a neural network architecture designed for image recognition tasks. Unlike traditional
convolutional neural networks (CNNs), which rely on hand-designed feature extraction layers, ViTs use self-attention mechanisms to learn relevant
image features directly from raw pixel values. Swin transformer is a vision transformer (ViT) variant but with a hierarchical way of processing
the image. ChronoNet has performed best among the models with accuracies of 94% and 88.7% for the Guinea-Bissau and Nigeria datasets respectively.
Transformer models have poor accuracies in this study, as only the first 3 channels out of 14 channels of the 10-20 system have been considered to
create images from signals because of the scarcity of resources. Compared to the Swin Transformer model, the Vision Transformer architecture has
shown better performance in accurately classifying epileptic patients and healthy individuals. Performances of RFT and CNN models have been satisfactory
as well. Models have been trained and tested on publicly available data.
-
Publisher: MECS Press
In this paper, the bit error rate (BER) performance of SFBC-OFDM systems for frequency selective fading channels is observed for various antenna
orientations and modulation schemes. The objective is to find out a suitable configuration with minimum number of receiving antenna that requires
minimum signal power level at the receiver to provide reliable voice and video communication. We have considered both M-ary phase shift keying (MPSK)
and M-ary quadrature amplitude modulation (MQAM) in the performance analysis considering both perfect and imperfect channel state information (CSI).
The authors have expressed the BER under imperfect channel estimation condition as a function of BER under perfect channel condition in this paper.
The finding shows, for a BTS with 4 transmitting antenna and MS with 2 receiving antenna BPSK performs better for both perfect and imperfect CSI.
Maximum permissible channel estimation error increases with the usage of more receiving antenna at the expense of increased cost.
-
Publisher: MECS Press
In this work, a 5 state left to right HMM-based Bangla Isolated word speech recognizer has been developed. To train and test the recognizer, a small corpus of
various sampling frequencies have been developed in noisy as well as the noiseless environment. The number of filter banks is varied during the feature extraction
phase for both MFCC and PLP. The effects of 2nd and 3rd differential coefficients have also been observed. Experimental results exhibit that MFCC based feature
extraction technique is better in CLASSROOM environment on the contrary PLP based technique performs better not only in a noiseless environment but also in when AC
or FAN noise is present. We have also noticed that higher sampling frequency and higher filter order don’t always help to improve the performance.
-
Publisher: IEEE
This paper has observed the effects of different coefficients for MFCC and PLP feature extraction techniques for Bangla corpus System. We have first observed
the effects of 12 coefficients for every 10 ms frames, and then added the delta and accelerating coefficients to get 24 and 36 coefficient vectors per frame
respectively. Then we have also observed the effect of appending the power coefficient and its first and second derivative while getting a 39 coefficient feature
vector per frame. In addition, we have further appended 13 third differential coefficients to make a vector set of 52 coefficients per frame to observe the effect
of third differential coefficients too. From the experimental results, we have observed that for gender unbiased models, delta addition has shown the maximum
detection both for speaker dependent and independent system. But for speaker independent gender biased models, acceleration, power, and third differential
coefficients addition have increased the detection for both MFCC and PLP in noise-free audio samples with the sampling rate of 44.1 KHz.
-
Publisher: IEEE
The paper has observed that different environmental noises and sampling frequencies severely affect the performance of the MFCC and PLP based Bangla Isolated Word
Recognition System. We have observed the effects of different environments on MFCC and PLP for 39 and 52 coefficients for 8 kHz, 16 kHz, 32 kHz, and 44.1 kHz sampling
frequencies. From the experimental results, we have observed that for different sampling frequency both in noiseless and noisy medium PLP models detect better than MFCC
except in CLASSROOM environment where different types of noise are present simultaneously.
-
Publisher: ELSEVIER
In terms of mechanical flexibility, organic SRAM offers better designs and a commercially feasible option with the ability to deliver acceptable performance. This paper
investigates the implementation of different SRAM topologies based on organic thin film transistors (OTFTs). In this work, a compact spice model is used to simulate pOTFT
and nOTFT in LTSpice software. Time delays, power consumption, the power delay product (PDP), and static noise margin (SNM) for read and write operations are calculated,
and a comparative analysis of OTFT based 6T, 7T, 8T, and 9T SRAM topologies is performed. Among different topologies, 9T OTFT SRAM cell achieves a 1.67× increase in SNM,
compared to conventional 6T OTFT-based SRAM cell. The highest figure of merit value of 9T SRAM cell indicates its suitability for various applications.
-
Publisher: World Scientific
This paper presents a performance analysis of indium-gallium-zinc-oxide (IGZO)- and pentacene-based top-gate-top-contact (TGTC) and bottom-gate-top-contact (BGTC) thin
film transistors (TFTs). Extensive simulation has been performed to assess the performances in terms of threshold voltage, subthreshold slope, on-off current ratio,
mobility, and figure of merit (FoM). Results indicate a trade-off between mobility and current ratio with respect to the permittivity of the dielectric layer, where
tantalum oxide (Ta2O5) provides the optimum result in terms of FoM. The mobility of IGZO is significantly higher for both structures, whereas the current ratio for IGZO is
higher than pentacene in the BGTC configuration. Comparing the structural configurations, Ta2O5-IGZO-based BGTC achieves 5.92× and 41.8× better mobility and current ratio,
respectively, over TGTC structures. The threshold voltage of IGZO-based TFT is observed to increase with the permittivity of the dielectric in TGTC configuration but
decrease in BGTC configuration. Meanwhile, the increase in oxide and active layer thicknesses causes a decrease in the threshold voltage. Moreover, both mobility and
current ratio improve with a decrease in oxide or active layer thickness. Maximum mobility of 32.30cm2/Vs and a maximum current ratio of 7.54E+08 are achieved for
Ta2O5-IGZO-based BGTC TFT with 10μm channel thickness and 5μm oxide thickness.