Abstract:
Tiny Machine Learning (TinyML) is a rapidly growing field that aims to bring machine learning to resource-constrained embedded systems such as microcontrollers. These devices have limited processing power, memory, and energy, which makes it challenging to deploy traditional machine learning models designed to run on powerful servers with large amounts of memory and processing capabilities. To address this challenge, TinyML models are highly optimized and compressed, using techniques such as quantization, pruning, and weight sharing to reduce their memory footprint and increase their computational efficiency. This allows intelligent applications to run on devices that were previously incapable of running complex algorithms.
This study investigates the impact of model compression techniques on the performance of four deep learning models - Convolutional Neural Networks, Long Short Term Memory, Gated Recurrent Units, and Bidirectional Long Short-Term Memoryfor a limited-vocabulary speech processing task in Arabic, specifically focusing on the Levantine dialect. We evaluate the effectiveness of these techniques in reducing the memory footprint of the models, improving their accuracy and performance, and minimizing inference time and energy consumption. To evaluate the real-world performance of our optimized models, we deploy them on two distinct edge devices that represent different resource-constrained environments. One device has limited processing power and memory, while the other has relatively more computational resources but still constrained by limited memory. By analyzing the performance of our models on these two devices, we gain insights into the effectiveness of different compression techniques for TinyML models and their suitability for deployment on edge devices.
Our experiments demonstrate the efficacy of model compression techniques in significantly reducing the memory footprint of deep learning models by up to 89% while maintaining an accuracy of over 97%. Moreover, the optimized models result in a significant reduction in inference time and energy consumption by 99%, making them highly suitable for deployment on resource-constrained edge devices. Our optimized models achieve real-time performance for limited-vocabulary speech recognition tasks, with an average inference time of less than 500 ms on both edge devices.
Overall, this study highlights the potential of model compression techniques for developing efficient TinyML models that can be deployed on resource-constrained edge devices. The significant reduction in memory footprint, inference time, and energy consumption of our optimized models showcases their practicality and effectiveness for real-world applications. The efficient and accurate speech processing techniques developed in this study have the potential to significantly improve Arabic speech applications, such as improving the accuracy of Arabic speech recognition and enabling ecient speech-to-text translation. The deployment of these techniques on resource-constrained edge devices can facilitate the development of Arabic speech applications in various domains, such as healthcare, education, and business. Furthermore, the optimized models developed in this study can enhance communication and accessibility for Arabic speakers with speech impairments or disabilities. The potential of TinyML-based Arabic speech processing applications to improve communication and accessibility for Arabic speakers is vast, and this study provides valuable insights into the development of efficient and practical models for this purpose.