您好,欢迎来到中国测试科技资讯平台!

首页> 《中国测试》期刊 >本期导读>联合CTC和Transformer的轮式移动机器人语音控制研究

联合CTC和Transformer的轮式移动机器人语音控制研究

334    2024-06-26

¥0.50

全文售价

作者:唐咸荣, 高瑞贞

作者单位:河北工程大学机械与装备工程学院,河北 邯郸 056038


关键词:轮式移动机器人;语音控制;Transformer;连接时序分类


摘要:

针对人机交互过程中手动控制轮式移动机器人步骤繁琐且双手受到限制的问题,提出并实现基于深度学习算法的轮式移动机器人语音控制系统。系统选取树莓派4B开发板作为主控制器,科大讯飞6阵列语音模块作为语音采集器,STM32单片机作为底层轮式移动机器人的控制器。语音识别算法部分设计基于Transformer的端到端语音识别模型,并加入连续时序分类算法来辅助模型的训练,模型的收敛速度和鲁棒性都得到相应的提升。模型在AISHELL-1语音数据集上测试的字错率为5.57%,相对于Transformer单独训练,字错率相对下降5.1%。经过平台搭建和实验,实现轮式移动机器人根据用户语音指令完成相对应动作的目的,有利于提高工作效率和解放用户的双手。


Research on voice control of wheeled mobile robot combined with CTC and Transformer
TANG Xianrong, GAO Ruizhen
School of Mechanical and Equipment Engineering, Hebei University of Engineering, Handan 056038, China
Abstract: Aiming at the problem that the manual control of wheeled mobile robot in the process of human-computer interaction is cumbersome and hands are limited, a voice control system of wheeled mobile robot based on deep learning algorithm is proposed and implemented. The system chooses Raspberry PI 4B development board as the main controller, iFLYTEK voice module with 6 array as the voice collector and STM32 MCU as the controller of the bottom wheeled mobile robot. The speech recognition part designs an end-to-end speech recognition model based on Transformer,and connectionist temporal classification algorithm is added to assist the training of the model. The convergence rate of the model is accelerated and the robustness of the model is improved. The word error rate of the model tested on the speech dataset AISHELL-1 is 5.57%, separately decreased by 5.1% compared to Transformer training alone. Through the platform construction and experiments, the wheeled mobile robot can complete corresponding actions according to the user's voice command, which is beneficial to improve work efficiency and liberate the user's hands.
Keywords: wheeled mobile robot; voice control; Transformer; connectionist temporal classification
2024, 50(6):117-123 收稿日期: 2022-04-07;收到修改稿日期: 2022-06-17
基金项目: 河北省高校科技攻关项目(ZD2018207)
作者简介: 唐咸荣(1998-),男,江苏南通市人,硕士研究生,专业方向为语音识别、深度学习。
参考文献
[1] HUANG X, BAKER J, REDDY R. A historical perspective of speech recognition[J]. Communications of the ACM, 2014, 57(1): 94-103
[2] 过馨露. 基于TensorFlow框架的语音控制移动机器人实训项目设计[J]. 信息与电脑, 2020, 32(22): 123-124
GUO X L. Design of speech control mobile robot training project based on Tensorflow framework[J]. China Computer & Communication, 2020, 32(22): 123-124
[3] SONG M, ZHANG Q, PAN J, et al. Improving HMM/DNN in asr of under-resourced languages using probabilistic sampling[C]//2015 IEEE China Summit and International Conference on Signal and Information Processing (ChinaSIP). IEEE, 2015: 20-24.
[4] HINTON G, DENG L, YU D, et al. Deep neural networks for acoustic modeling in speech recognition: The shared views of four research groups[J]. IEEE Signal processing magazine, 2012, 29(6): 82-97
[5] WU X, JIN Y F, WANG J J, et al. MKD: Mixup-based knowledge distillation for mandarin end-to-end speech recognition[J]. Algorithms, 2022, 15(5): 160.
[6] Algorithms; Data from Suleyman Demirel University Advance Knowledge in Algorithms (Comparison of Optimization Algorithms of Connectionist Temporal Classifier For Speech Recognition System)[J]. Journal of Engineering, 2020.
[7] CHOROWSKI J K, BAHDANAU D, SERDYUK D, et al. Attention-based models for speech recognition[J]. Advances in Neural Information Processing Systems, 2015, 28: 1-19.
[8] NIE M X, LEI Z C. Hybrid CTC/attention architecture with self[J]. Journal of Physics: Conference Series, 2020, 1549(5): 052034.
[9] DONG L, XU S, XU B. Speech-transformer: A no-recurrence sequence-to-sequence model for speech recognition[C]//2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2018: 5884-5888.
[10] 莫仁鹏, 司小胜, 李天梅, 等. 基于多粗粒度与注意力网络的轴承剩余寿命预测[J]. 中国测试, 2021, 47(10): 1-6
MO R P, SI X S, LI T M, et al. Remaining life prediction for bearing based on multiple coarse-grained and attention network[J]. China Measurement & Test, 2021, 47(10): 1-6
[11] KARITA S, SOPLIN N E Y, WATANABE S, et al. Improving transformer-based end-to-end speech recognition with connectionist temporal classification and language model integration[C]//Interspeech 2019, 2019.
[12] GRAVES A. Generating sequences with recurrent neural networks[J]. Computer Science, 2013(v1): 1-43.
[13] 蒿晓阳, 张鹏远. 使用变分自编码器的自回归多说话人中文语音合成[J]. 声学学报, 2022, 47(3): 405-416
HAO X Y, ZHANG P Y. Autoregressive multi-speaker model in Chinese speech synthesis based on variational autoencoder[J]. Acta Acustica, 2022, 47(3): 405-416
[14] BU H, DU J, NA X, et al. Aishell-1: An open-source mandarin speech corpus and a speech recognition baseline[C]//2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA). IEEE, 2017: 1-5.