Note: This tutorial assumes that you have completed the previous tutorials: SLAM Map Building with xbot.
(!) Please ask about problems and questions regarding this tutorial on answers.ros.org. Don't forget to include in your question the link to this page, the versions of your OS & ROS, and also add appropriate tags.

Baidu Speech AND Recognition

Description: This tutorial support user making a transform from Text To Speach(TTS) and from Speach to Text(Speech Recognition) by using Baidu's Speech API.

Keywords: navigation

Tutorial Level: BEGINNER

Only released in EOL distros:  

Package Summary

The simple_voice package

Package Summary

The simple_voice package

Package Summary

Package for speech recognition and Text To Speach(TTS) using Baidu's Speech API. you may need to change parameters, if you wanna custom you setting.

Overview

  • Before starting this tutorial, make sure you installed two python libs pyaudio and vlc.
  • This package provides a solution for baidu speech application in both chinese and english in ros
  • Also, this packge was tested and run well in ubuntu 14.04, thinkpad T440.

Hardware Requirements

  • A laptop or hardware which could run ros core and vlc, of couse, a microphone and speaker is needed as well.

Launch Example

  • to make computer to speak , Text To Speech:
       roslaunch simple_voice simple_speaker.launch
    • to make conputer understand what your say, Speech Recognition :
       roslaunch simple_voice simple_speaker.launch

Node

There are three nodes in this package, node_main.py, simple_speek.py and voice_node.py.

  • node_main.py is a demo for TTS(Text To Speech), is work with laser scanner, when laser detect a obstacles node_main will trigger simple_speek.py to speak some words such as 'excuse me', 'make a way for me pls' etc.
  • simple_speek.py would subscribe a std_msgs/String type message and speeck this word out.

  • voice_node.py would recognise what you said in 5 seconds and print result in terminal.

Subscribed Topic

  • TTS(simple_speek.py):
  • Demo(node_main.py):
    • /SpeakerSubTopic (std_msgs/String )

      • this is used to trigger TTS, feel free to change it to whatever you want. when you send a 'stop' into this topic node_main will trigger TTS to speak whatever words you assigned for TTS.

Published Topic

  • Demo(node_main.py):
    • /speak_string(std_msgs/String )

      • this is the text message for machine to speak, subscribe by TTS
  • Speech Recognition(voice_node.py):
    • /Rog_result(std_msgs/String )

      • it currently trigger by input ENTER form key board. you could associate it with node_main just like what i have done in TTS node. Or maybe i will it later.

Parameters

TTS

  • Gender:
    • what gender the speaker it is, default 'women', it supports 'man' and 'women'
  • CTP:
    • customer terminal type, default 1, webside
  • LAN:
    • what language you want regnise, default 'zh', it supports Chinese('zh'), Cantonese('ct'), English('en')
  • USER_ID:
    • Baidu USER_ID, default '8168466', feel free to change it to you own
  • SPEED:
    • speak speed, default 5, it supports 0~9
  • PIT:
    • intonation, default 5, it supports 0~9
  • VOL:
    • volume, default 5, it supports 0~9
  • Api_Key:
    • Baidu Api_Key, default "pmUzrWcsA3Ce7RB5rSqsvQt2", feel free to change it to you own
  • Secrect_Key:
    • Baidu Secrect_Key, default "d39ec848d016a8474c7c25e308b310c3", feel free to change it to you own
  • Grant_type:
    • Baidu Grant_type, default "client_credentials", feel free to change it to you own
  • Token_url:
  • Speeker_url:
  • FORMAT:
    • what type of voice file would be generate, default 'mp3'
  • ResponseSensitivity:

    • the speaker response sensitivity, default 0.2
  • WorkSpaces:

    • your code work space

Speech Recognition

  • REG_NUM_SAMPLES:
    • The number of sample you want to listen, default 2000
  • REG_SAMPLING_RATE:
    • voice data sampling rate, default 8000
  • REG_UPPER_LEVEL:
    • voice data read upper threshold value, default 5000
  • REG_LOWER_LEVEL:
    • voice data read lower threshold value, default 500
  • REG_COUNT_NUM:
    • this parameter is used to decide whether record the sampling voice data of not, defualt 20, which means the voice data would be handled only when there are more than 20 voice data greater than the lower level in the sampling data.
  • REG_SAVE_LENGTH:
    • minimal record length for voice data, default 8
  • REG_TIME_OUT:
    • max recording time, if people starting speak, default 60s
  • REG_NO_WORDS:
    • the period of stop recording if there is no words coming, default 6
  • REG_Api_Key:
    • Baidu Api_Key, default "pmUzrWcsA3Ce7RB5rSqsvQt2", feel free to change it to you own
  • REG_Secrect_Key:
    • Baidu Secrect_Key, default "d39ec848d016a8474c7c25e308b310c3", feel free to change it to you own
  • REG_Grant_type:
    • Baidu Grant_type, default "client_credentials", feel free to change it to you own
  • REG_Token_url:
  • REG_Reg_url:
  • REG_USER_ID:
    • Baidu USER_ID, default '8168466', feel free to change it to you own
  • REG_FORMAT:
    • what type of voice file would be record ,default 'wav' , it supports other vlc reading type as well
  • REG_LAN:
    • what language you want regnise, default 'zh', it supports Chinese('zh'), Cantonese('ct'), English('en')
  • REG_nchannel:
    • what channel does the voice data record, default 1

Demo

  • words
    • the words you wanna speaker to say
  • SpeakerSubTopic

    • topic which Demo Subscribe, it's std_msgs/String type, you can change topic name to any topic you like, default '/stop_flag'

Baidu speech Home page

if you wanna know more in time infomation, please visit web below http://yuyin.baidu.com/

What Next?

Wiki: xbot/tutorials/indigo/Baidu Speech AND Recognition (last edited 2017-04-10 02:01:49 by DinnerHowe)