Only released in EOL distros:  

cob_object_perception: cob_leptonica | cob_object_detection_fake | cob_object_detection_msgs | cob_read_text | cob_tesseract

Package Summary

bases on literate_pr2 package from Menglong Zhu: menglong(at)seas.upenn.edu

Detailed Documentation

Overview

Cob_read_text detects and recognizes text in natural images. The system was trained for indoors household environment with Care-O-bot's stereo vision colour cameras. Text localisation and OCR occur separately and can be evaluated using precision, recall. For locating text on images Stroke Width Transformation is used. Text whose baseline was found is transformed back into horizontal text using Bézier curves and given to OCR software to recognize text.

To use read text clone the repository to your disc and build it.

Usage

Single Image

To read text on a single image: roslaunch cob_read_text read_text.launch image:=path_to_image

read_text.launch -> run_detection -> text_detect

Image Stream

To read text from a camera stream: roslaunch cob_read_text read_text_from_camera.launch

read_text_from_camera.launch -> text_detect

Remap "image_color" to topic that publishes to be processed images.

Evaluation Database

To read text on images of database with ground truth data: roslaunch cob_read_text read_text_with_eval.launch xmlfile:=path_to_xmlfile

read_text_with_eval.launch -> run_detection -> text_detect

Parameters for read_text

Parameters are specified in launch/params.yaml.

Parameters

smoothImage (bool, default: false)
  • smoothing of whole input image before processing
maxStrokeWidthParameter (int, default: 50)
  • maximal number pixels of recognized strokes, good for big text: <50, small test: >50
useColorEdge (bool, default: false)
  • use rgb channels to compute edgeMap
cannyThreshold1 (int, default: 120)
  • upper threshold for canny algorithm
cannyThreshold2 (int, default: 50)
  • lower threshold for canny algorithm
compareGradientParameter (double, default: 1.57)
  • parameter when comparing gradients of opposing edges, in paper: 3.14 / 6
swCompareParameter (double, default: 3.0)
  • parameter when comparing stroke widths between pixels
colorCompareParameter (int, default: 100)
  • parameter when comparing color between pixels, set to 255 to deactivate
maxLetterHeight_ (int, default: 100)
  • maximum height of component in pixels
varianceParameter (double, default: 1.5)
  • maximum variance of stroke width inside component
diagonalParameter (int, default: 10)
  • diagonal of component has to be smaller than diagonalParameter*medianStrokeWidth
pixelCountParameter (int, default: 5)
  • number of pixels inside component belonging to letter has to be bigger than maxStrokeWidth * pixelCountParameter
innerLetterCandidatesParameter (int, default: 5)
  • maximum number of foreign components inside component
heightParameter (double, default: 2.5)
  • [turned off], width has to be smaller than heightParameter * height
clrComponentParameter (int, default: 20)
  • maximum color value difference
distanceRatioParameter (double, default: 2.0)
  • maximum pythagorean distance between 2components
medianSwParameter (double, default: 2.5)
  • maximum quotient of median sw of 2 components
diagonalRatioParameter (double, default: 2.0)
  • maximum diagonal ratio between 2 components
grayClrParameter (double, default: 10.0)
  • maximum gray color value difference between components
clrSingleParameter (double, default: 20.0)
  • maximum single colour value difference between components
areaParameter (double, default: 1.5)
  • maximum area ratio between components
pixelParameter (double, default: 0.3)
  • [turned off] maximum pixel number ratio (pixels belonging to letter)
p (double, default: 0.99)
  • probability p
maxE (double, default: 0.7)
  • maximum percentage of outliers in dataset
minE (double, default: 0.4)
  • minimum percentage of outliers in dataset
bendParameter (int, default: 30)
  • maximum angle of bezier curve
distaneParameter (double, default: 0.8)
  • parameter, with which maximum distance between curve and point is calculated
sigma_sharp (double, default: 1.0)
  • how strong is blur when sharpening with unsharp mask
threshold_sharp (double, default: 3.0)
  • sets minimum brightness change that will be sharpened
amount_sharp (double, default: 1.5)
  • magnitude, how much contrast is added at the edges when sharpening
result_ (int, default: 2)
  • which spell check method is used
showEdge (bool, default: false)
  • show gray colored edgemap image
showSWT (bool, default: false)
  • show swt map
showLetterCandidates (bool, default: false)
  • show connected components
showLetters (bool, default: false)
  • show connected components recognized as letter
showPairs (bool, default: false)
  • show pairs (2 letters belonging together)
showChains (bool, default: false)
  • show all text segments after pairs were merged together
showBreakLines (bool, default: false)
  • show how text blocks are broken into several lines
showBezier (bool, default: false)
  • show all bezier models that are created within ransac
showRansac (bool, default: false)
  • show ransac result -> show best bezier model only
showNeighborMerging (bool, default: false)
  • show how neighbor texts are merged if they fit together and were accidentally separated before
showResult (bool, default: false)
  • show results after ocr
transformImages (bool, default: true)
  • use bezier-transformed images (true) or normal rectangles around bezier lines

Additional Applications

read_evaluation

To evaluate cob_read_text the application read_evaluation is used. Usage: read_evaluation takes a .xml-file containing a list of images with ground-truth bounding boxes enclosing every text that can be found.

Example input file:

<?xml version="1.0" encoding="UTF-8"?>
<tagset>
 <image>
 <imageName>example_image2.png</imageName>
  <taggedRectangles>
   <taggedRectangle center_x="605.894" center_y="636.016" width="94" height="40" angle="-20" text="Sprite" />
   <taggedRectangle center_x="752.5" center_y="626.5" width="47" height="25" angle="0" text="Red" />
  </taggedRectangles>
 </image>
</tagset>

All rectangles are described via:

  • their center point
  • their width and height
  • their rotation angle (clockwise)
  • and the text written inside of them.

For every image referenced between <image> and </image> in the .xml file, read_evaluation calls cob_read_text and compares the results with the solution.

The evaluation is based on the ICDAR 2005 Robust Reading Competitions system using recall and precision as measure: "Precision p is defined as the number of correct estimates divided by the total number of estimates. Recall r is defined as the number of correct estimates divided by the total number of targets. We define the match mp between two rectangles as the area of intersection divided by the area of the minimum bounding box containing both rectangles. For each rectangle in the set of estimates we find the closest match in the set of targets, and vice versa. Hence, the best match m(r, R) for a rectangle r in a set of Rectangles R is defined as: m(r, R) = max mp (r, r') | r' ∈ R Then, our new more forgiving definitions of precision and recall, where T and E are the sets of ground-truth and estimated rectangles respectively:

p = ( Σr_e ∈ Em(r_e, T) ) / |E|

r = ( Σr_t ∈ Tm(r_t, E) ) / |T|

(S. M. Lucas. ICDAR 2005 text locating competition results. In Proc. 8th Int’l Conf. Document Analysis and Recognition, volume 1, pages 80–84, 2005)

Results are saved in folder 'results' inside the image containing folder.

Running the application:

roslaunch cob_read_text read_text_with_eval.launch xmlfile:=path_to_xmlfile

The .xml-file has to be specified in read_text_with_eval.launch.

LabelBox

LabelBox can be used to generate a .xml-file for read_evaluation. Taking an image or a folder with images inside, LabelBox produces a .xml-file as seen above.

Running the application:

$(rospack find cob_read_text)/bin/labelBox path_to_image

$(rospack find cob_read_text)/bin/labelBox path_to_folder/

labelBox Help

  • Draw box with mouse.
  • Press direction keys to move/resize drawn box before entering text.
  • Press [d] for rotating clockwise.
  • Press [a] for rotating anticlockwise.
  • Press [s] for switching between moving and resizing with direction keys.
  • Press [c] for color change.
  • Press Return for entering text in drawn and resized box.
  • Press left mouse button on box to show text.
  • Press right mouse button to delete a box (after text was written inside).
  • Press [z] to show data of all drawn rects.
  • Press [r] to reset complete image.
  • Press [ESC] to quit/move on to next image.

Allowed file types: 'png', 'PNG', 'jpg', 'JPG', 'jpeg', 'JPEG', 'bmp', 'BMP', 'tiff', 'TIFF', 'tif' or 'TIF'.

example_image1

bases on literate_pr2 package from Menglong Zhu: menglong(at)seas.upenn.edu

Wiki: cob_read_text (last edited 2012-08-29 11:18:23 by RobertHeinze)