keywords = audio detection, microphone, ROAR

Note: This tutorial assumes that you have completed the previous tutorials: ROS tutorials.
(!) Please ask about problems and questions regarding this tutorial on answers.ros.org. Don't forget to include in your question the link to this page, the versions of your OS & ROS, and also add appropriate tags.

Learning a new audio event using the ROAR trainer

Description: This tutorial teaches you how to use the ROAR package to create and train a new model for audio-event detection

Tutorial Level: BEGINNER

Next Tutorial: Detecting an audio-event using ROAR

This package assumes you have correctly downloaded and compiled ROAR as described on the ROAR wiki page.

Launching the ROAR Trainer

The first step is to launch the ROAR trainer program. You can do this by typing the following on the console window:

roslaunch roar roar_trainer.launch

If everything runs correctly you will be presented with the menu:

===== Audio Learner =====
1. Estimate background subtraction
2. Create a new model or open an existing model
3. Inspect size of currently open model
4. Learn new model entries
5. Learn new model entries (continuous mode)
6. Save current model to file
7. Test microphone (1.5s)
0. Quit
Select a menu option: 

Microphone Functionality Verification

The first thing to do is to test if your microphone is setup properly. To test your microphone select menu option 7. The microphone will record 1.5 seconds of data, and then play it back to you out of your default speakers. If you don't hear the audio clip first make sure your speakers are working properly. If you still don't hear it you'll need to go back and make sure your microphone is setup properly. Some notes on microphone setup can be found on the ROAR wiki page. If run successfully you should see something that looks like the following:

Select a menu option: 7
Playing Sparc Audio 'stdin' : Signed 16 bit Big Endian, Rate 44100 Hz, Mono

Making ROAR Models

After you have verified your microphone setup you can now move on to creating an audio model of your task by collecting a set of sound samples.This doesn't have to all be done at one time. The ROAR trainer program allows you to save your work and continue it later.

Estimating the Background Noise

The first step when recording audio samples is always to estimate the background noise of the room you are in. To run the background estimation select menu option 1. During background estimation it is important that no other loud impulsive audio events occur in the room. If they do you will need to repeate the estimation. If run successfully you will see the following:

Select a menu option: 1
Estimating background noise
Noise estimated

Creating a new Model or Opening an Existing Model

Now we need to define the path and name of the model we want to open or create. To create or open a model select menu option 2. At first you will probably want to create your models in the default directory that ROAR when opening detected models. To find this directory you can type in a new console 'roscd roar/models/; pwd;' and copy that output. You should always name your model with the .mat extension. If model creation is run correctly you should see something like this:

Select a menu option: 2
Enter the path to the model you want to open or create: /home/jromano/Code/penn-ros-packages/roar_stack/roar/models/mymodel.mat
Created new model: /home/jromano/Code/penn-ros-packages/roar_stack/roar/models/mymodel.mat

Adding Entries to a Model

At this point you are ready to collect some data samples to with which to train your model. Do this by selecting menu option 4. Once you select this option you should see the following output:

Enter number of seconds to record for (<= 3.0), or 0 for main menu: 1

You should now enter the number of seconds to record audio for. Usually you will want to record somewhere between 1 and 3 seconds. You cannot record more than 3 (seconds). If you just hit <enter> without entering a number the program will use the default value of 1. As soon as you hit enter the program will start recording, during which you should make whatever sound you want to learn.

After you the selected amount of recording time a Spectrogram of your sound will be displayed, similar to the figure seen below.

roar_trainer.png

You should now use your cursor to left-click the point in time (x-axis) that represents the feature you want to learn. The program will sample roughly 40 milliseconds on each side of the cursor click-point. After this the program will playback the clip you selected, and prompt you whether you'd like to accept your selection and add it to the model.

[INFO] [WallTime: 1311099141.393338] You selected an event centered at 0.523222497932 seconds.
Playing Sparc Audio 'stdin' : Signed 16 bit Big Endian, Rate 44100 Hz, Mono
Add to current model? (y/n - any other key to replay clip):

After you select y/n this process will repeat until you enter 0 for the number of seconds to record data, after which you will return to the main menu.

NOTE 1: Before you start testing your recognition it is recommended you collect a minimum of ~10 audio samples. Significant recognition refinement is possible as your increase your number of samples.

NOTE 2: It is important you attempt to click at a similar spot for each sample. Failure to click in approximately the right area can lead to a model with a large range of data, and thus a large number of false-positives during detection.

Currently the ROAR trainer also supports a continuous mode (main menu selection #5) for recording data, where no graphical plot or clicking is required and sound is sampled when it exceeds a certain volume level. This may be useful to users where a graphical environment is not available, but is not recommended for usage.

Saving your Model

When you are done collecting data samples you must explicitly save your work to a file. To save your work to a file select menu option 6. Saving performs a series of calculations on your collected data, and saves it to your specified file. If it all run correctly you should see a variety of output that starts and ends with something like:

Select a menu option: 6
...
[INFO] [WallTime: 1310494473.416188] Normalized and saved model as: /home/jromano/Code/penn-ros-packages/roar_stack/roar/models/mymodel.mat

Quitting

When you are finished select menu option 0 to quit the trainer program.

Wiki: roar/Tutorials/Trainer (last edited 2011-08-31 16:14:00 by JoeRomano)