Note: This tutorial assumes that you have completed the previous tutorials: SLAM Map Building with xbot.

Please ask about problems and questions regarding this tutorial on answers.ros.org. Don't forget to include in your question the link to this page, the versions of your OS & ROS, and also add appropriate tags.

Baidu Speech AND Recognition

Description: This tutorial support user making a transform from Text To Speach(TTS) and from Speach to Text(Speech Recognition) by using Baidu's Speech API.

Keywords: navigation

Tutorial Level: BEGINNER

Contents

Package Summary
Overview
Hardware Requirements
Launch Example
Node
Baidu speech Home page
What Next?

Only released in EOL distros:

See simple_voice on index.ros.org for more info including anything ROS 2 related.

Documentation Status

Package Links

Dependencies (4)

Package Summary

The simple_voice package

Maintainer status: maintained
Maintainer: zhihao <zhihao AT iscas.ac DOT cn>
Author: zhihao <zhihao AT iscas.ac DOT cn>
License: TODO
Source: git https://github.com/DinnerHowe/simple_voice.git (branch: 0.1.1)

Package Links

Dependencies (4)

Package Summary

The simple_voice package

Maintainer status: maintained
Maintainer: zhihao <zhihao AT iscas.ac DOT cn>
Author: zhihao <zhihao AT iscas.ac DOT cn>
License: TODO
Source: git https://github.com/DinnerHowe/simple_voice.git (branch: 0.1.1)

Contents

Package Summary
Overview
Hardware Requirements
Launch Example
Node
Baidu speech Home page

Package Summary

Package for speech recognition and Text To Speach(TTS) using Baidu's Speech API. you may need to change parameters, if you wanna custom you setting.

Maintainer status: maintained
Maintainer: Howe
Download: https://github.com/DinnerHowe/baidu_speech.git
Author: Howe
License: TODO

Overview

Before starting this tutorial, make sure you installed two python libs pyaudio and vlc.
This package provides a solution for baidu speech application in both chinese and english in ros
Also, this packge was tested and run well in ubuntu 14.04, thinkpad T440.

Hardware Requirements

A laptop or hardware which could run ros core and vlc, of couse, a microphone and speaker is needed as well.

Launch Example

to make computer to speak , Text To Speech:
```
   roslaunch simple_voice simple_speaker.launch
```
- to make conputer understand what your say, Speech Recognition :
```
   roslaunch simple_voice simple_speaker.launch
```

Node

There are three nodes in this package, node_main.py, simple_speek.py and voice_node.py.

node_main.py is a demo for TTS(Text To Speech), is work with laser scanner, when laser detect a obstacles node_main will trigger simple_speek.py to speak some words such as 'excuse me', 'make a way for me pls' etc.
simple_speek.py would subscribe a std_msgs/String type message and speeck this word out.
voice_node.py would recognise what you said in 5 seconds and print result in terminal.

Subscribed Topic

TTS(simple_speek.py):
- /speak_string(std_msgs/String )
  - The text message for machine to speak
Demo(node_main.py):
- /SpeakerSubTopic (std_msgs/String )
  - this is used to trigger TTS, feel free to change it to whatever you want. when you send a 'stop' into this topic node_main will trigger TTS to speak whatever words you assigned for TTS.

Published Topic

Demo(node_main.py):
- /speak_string(std_msgs/String )
  - this is the text message for machine to speak, subscribe by TTS
Speech Recognition(voice_node.py):
- /Rog_result(std_msgs/String )
  - it currently trigger by input ENTER form key board. you could associate it with node_main just like what i have done in TTS node. Or maybe i will it later.

Parameters

TTS

Gender:
- what gender the speaker it is, default 'women', it supports 'man' and 'women'
CTP:
- customer terminal type, default 1, webside
LAN:
- what language you want regnise, default 'zh', it supports Chinese('zh'), Cantonese('ct'), English('en')
USER_ID:
- Baidu USER_ID, default '8168466', feel free to change it to you own
SPEED:
- speak speed, default 5, it supports 0~9
PIT:
- intonation, default 5, it supports 0~9
VOL:
- volume, default 5, it supports 0~9
Api_Key:
- Baidu Api_Key, default "pmUzrWcsA3Ce7RB5rSqsvQt2", feel free to change it to you own
Secrect_Key:
- Baidu Secrect_Key, default "d39ec848d016a8474c7c25e308b310c3", feel free to change it to you own
Grant_type:
- Baidu Grant_type, default "client_credentials", feel free to change it to you own
Token_url:
- Baidu Token_url, default 'https://openapi.baidu.com/oauth/2.0/token', feel free to change it to you own
Speeker_url:
- Baidu speak url, default 'http://tsn.baidu.com/text2audio'
FORMAT:
- what type of voice file would be generate, default 'mp3'
ResponseSensitivity:
- the speaker response sensitivity, default 0.2
WorkSpaces:
- your code work space

Speech Recognition

REG_NUM_SAMPLES:
- The number of sample you want to listen, default 2000
REG_SAMPLING_RATE:
- voice data sampling rate, default 8000
REG_UPPER_LEVEL:
- voice data read upper threshold value, default 5000
REG_LOWER_LEVEL:
- voice data read lower threshold value, default 500
REG_COUNT_NUM:
- this parameter is used to decide whether record the sampling voice data of not, defualt 20, which means the voice data would be handled only when there are more than 20 voice data greater than the lower level in the sampling data.
REG_SAVE_LENGTH:
- minimal record length for voice data, default 8
REG_TIME_OUT:
- max recording time, if people starting speak, default 60s
REG_NO_WORDS:
- the period of stop recording if there is no words coming, default 6
REG_Api_Key:
- Baidu Api_Key, default "pmUzrWcsA3Ce7RB5rSqsvQt2", feel free to change it to you own
REG_Secrect_Key:
- Baidu Secrect_Key, default "d39ec848d016a8474c7c25e308b310c3", feel free to change it to you own
REG_Grant_type:
- Baidu Grant_type, default "client_credentials", feel free to change it to you own
REG_Token_url:
- Baidu Token_url, default 'https://openapi.baidu.com/oauth/2.0/token', feel free to change it to you own
REG_Reg_url:
- Baidu Reg_url, default 'http://vop.baidu.com/server_api', feel free to change it to you own
REG_USER_ID:
- Baidu USER_ID, default '8168466', feel free to change it to you own
REG_FORMAT:
- what type of voice file would be record ,default 'wav' , it supports other vlc reading type as well
REG_LAN:
- what language you want regnise, default 'zh', it supports Chinese('zh'), Cantonese('ct'), English('en')
REG_nchannel:
- what channel does the voice data record, default 1

Demo

words
- the words you wanna speaker to say
SpeakerSubTopic
- topic which Demo Subscribe, it's std_msgs/String type, you can change topic name to any topic you like, default '/stop_flag'

Baidu speech Home page

if you wanna know more in time infomation, please visit web below http://yuyin.baidu.com/

What Next?

Interactive or return to Xbot main page.

ROS 2 Documentation

Wiki

Page

User

Baidu Speech AND Recognition

Package Summary

Package Summary

Package Summary

Overview

Hardware Requirements

Launch Example

Node

Subscribed Topic

Published Topic

Parameters

TTS

Speech Recognition

Demo

Baidu speech Home page

What Next?