Contents
Overview
This Package is the modified version of well known monocular SLAM framework PTAM presented by Klein & Murray in their paper at ISMAR07. We modified the original code such that:
- it is compatible with ROS.
- it runs on-board a computationally limited platform (e.g. at 20Hz on an ATOM 1.6GHz single core processor)
- it is robust in outdoor and self-similar environments such as grass.
This version of PTAM was successfully used in the European project sFly to estimate a 6DoF pose for long-term vision based navigation of a micro helicopter in large outdoor environments.
Please study the original PTAM website and the corresponding paper before using this code. Also, be aware of the license that comes with it.
Note: This package is under active development. It is meant for research use only (license) and at your own responsibility.
Nodes
ptam: The main node described here
remote_ptam: Node for displaying PTAM information and managing PTAM from a remote (ground) station
ptam_visualizer: Node to store map, keyframe and path information and visualize it in RViz
cameracalibrator: Node to calibrate the camera. Copy the obtained calibration to the fix parameter file.
Modifications to the Original Code
PTAM has been ported to be compatible to the Robot Operating System (ROS) such that:
- the input image taken from an image node and a verification image including the features found, is published. This enables the user to handle PTAM on an embedded system without human-machine interfaces.
- the 6DoF pose is published as a pose with a covariance estimation calculated from PTAM's internal bundle adjustment.
- the visualization of camera keyframes, trajectory and features is ported to RVIZ such that visualization can be done on a ground station, if necessary.
- tuning parameters can be changed dynamically in a GUI for dynamic reconfiguration.
Keyframe Handling
In PTAM, the map is defined as a set of keyframes together with their observed features. In order to minimize the computational complexity, here we set a maximum number of keyframes retained in the map. If this number is exceeded, the keyframe furthest away from the current MAV pose gets deleted along with the features associated with it. If the maximum number of retained keyframes is infinite, then the algorithm is equivalent to the original PTAM, while if we set a maximum of 2 keyframes we obtain a visual odometry framework. Naturally, the larger the number of retained keyframes, the lower the estimation drift, but also the larger the computational complexity.
Improved Feature Handling for More Robust Maps
When flying outdoors, we experienced severe issues with self-similarity of the environment - e.g. the asphalt in urban areas or the grass in rural areas. Naturally, features extracted at higher pyramidal levels are more robust to scene ambiguity. Thus, while the finest-scale features are included for tracking, we omit them in map-handling - i.e. we only store features extracted in the highest 3 pyramidal levels. This improves tracking quality when moving away from a given feature (e.g. when taking-off with a MAV with a downward-looking camera), making it possible to navigate over both grass and asphalt.
Since this vision algorithm is keyframe-based, it has high measurement rates when tracking. However, at keyframe generation the frame-rate drops remarkably. Using only features detected at the highest pyramidal levels also reduces drastically the number of newly added features upon keyframe generation. This results to great speed-ups with keyframe-generation running at 13Hz (in contrast to the 7Hz of the original PTAM) and normal tracking rates of around 20Hz on an onboard Atom computer 1.6GHz.
Concerning the type of features we augmented the choice for the corner detector by the AGAST features. Compared to the FAST derivatives, the AGAST corner detector is more repetitive and usually slightly faster as described in this paper. In self-similar structures, it is crucial that the corner detector is highly repetitive. Hence we suggest here to use the AGAST corner detector instead of FAST.
Re-Initialization After Failure Mode
For automatic initialization we ensure that the baseline is sufficiently large by calculating the rotation-compensated median pixel disparity. For rotation compensation we use efficient second-order minimization techniques (ESM) in order to keep PTAM independent of IMU readings. For re-initializations, we store the median scene depth and pose of the closest keyframe and propagate this information to the new initialized map. This way we minimize large jumps in scale and pose at re-initializations.
Inverted Index Structure for Map-point Filtering
On each frame, PTAM projects the 3D points from the map into the current image according to the motion-model prior, which allows then point-correspondences to be established for tracking. Since no filtering on point visibility is preceding this step, it scales linearly with the number of points in the map. We implemented an inverted index structure based on the grouping of map points inside keyframes which allows discarding large groups of map-points with low probability of being in the field-of-view. The search for visible points is performed by re-projecting a small set of distinct map-points from every keyframe which permits inference on their visibility from the current keyframe. The total number of points that need evaluation by reprojection is thereby significantly reduced leading to a scaling of the system in linear order of the visible keyframes rather than in linear order with the overall number of keyframes in the map.
Using ETHZASL-PTAM
To use ETHZASL_PTAM you need an environment with sufficient contrast (i.e. texture) and light. If your shutter speed is above 5ms to obtain an image with sufficient contrast it is very likely that the algorithm does not work properly because of motion blur. This depends on vehicle's motion and vibrations. For good performance on MAVs you may ensure:
- a shutter speed below 5ms
- no over/under saturated image regions
- no self-similar structures
Camera Calibration
The camera model used in PTAM is made for wide angle lenses (>90°). The cameracalibrator node in this package is unchanged with respect to the original code - except the ROS compatibility. You should have a re-projection error of below 0.5. If you have problems calibrating your camera, please have a look at this page.
Initialization
The manual initialization procedure is the same as on the PTAM website:
- Hit space to start initializing
- translate the camera (no rotation)
- Hit space again as soon as the disparity is sufficient
For automatic initialization, enable the AutoInit checkbox in the dynamic reconfigure GUI. This activates also the re-initialization procedure if the map is lost.
Remote Interface
The node remote_ptam catches the image and information published by PTAM such that you can visualize it on a ground station offboard your vehicle. This remote interface accepts nearly all inputs as the original PTAM GUI:
- [space] for initialization
- [r] for full map reset
- [q] quits PTAM
- [s] enable/disable image streaming for remote node
- [a] experimental: map reset while storing last pose and scale for propagation upon re-initialization
It also displays the map grid and the initialization trails but not the features in the image nor in a 3D view. See the remote_ptam node documentation for more details.
Running onboard the vehicle
Usually the robot on which PTAM runs does not have any display means and PTAM is controlled remotely using remote_ptam. In such cases the built PTAM GUI can be disabled to free additional computation resources: set the fix parameter gui=False. For further speed-up and performance boost of PTAM onboard the robot you may consider the following settings:
InitLevel: 1 for robust and fast initialization on self-similar structures
MaxStereoInitLoops: 4 for fast map initialization, prevents hanging on degenerated sets
MaxPatchesPerFrame: 300 maximal number of features per frame
MaxKF: 15 maximal number of KFs to be retained in the map, do not go lower than 5 to maintain good drift performance
UseKFPixelDist: True generates new KF based on pixel disparity. This is usually more robust than the built in KF request in PTAM
NoLevelZeroMapPoints: True only use map features in down sampled image pyramids. This greatly improves robustness in self-similar structures
Map Export and Map Display in RViz
The node ptam_visualizer fetches the 3D features, keyframes and actual pose from the PTAM framework and prepares the data to be visualized in RViz. Simply start the node and RViz to start the streaming. The node also allows to store map, path and keyframe data to a file.
Tutorials
!! under construction !! please check regularly for updates
Node Information
ptam
main framework derived from the original PTAMSubscribed Topics
image (sensor_msgs/Image)- the input image to be processed by PTAM
- topic used by the remote_ptam node to send keyboard commands to the PTAM interface
- Experimental: IMU readings to estimate a better rotation between frames. We do not recommend to use this since PTAM's estimate using the Small Blurry Images for rotation estimation is already very good.
Published Topics
vslam/info (ptam_com/ptam_info)- Contains information on the current status of PTAM such as framerate, number of keyframes, tracking and map quality as well as string messages
- PoseWidthCovarianceStamped for the 6DoF pose calculated by PTAM. This is the world seen from the camera.
- down sampled image used by the remote_ptam node to visualize the current camera view and PTAM status
Services
vslam/pointcloud (invalid message type for SrvLink(srv/type))- point cloud service to visualize 3D points in RViz
- keyframe service to visualize keyframes in RViz
Parameters
Dynamically Reconfigurable Parameters
See the dynamic_reconfigure package for details on dynamically reconfigurable parameters.- Scale Range: 0.01 to 30.0
- selects the source for the motion model. Possible values are: MM_CONSTANT (CONSTANT): use constant motion model., MM_IMU (IMU): use imu orientation for the motion model., MM_FULL_POSE (FULL_POSE): use full pose estimated externally for motion model.
- max features per frame Range: 10.0 to 1000.0
- 'distance' after which a new kf is requested Range: 0.1 to 10.0
- use AutoInitPixel as new KF request criteria
- do not add map points at level zero
- depth variance to search for features Range: 1.0 to 100.0
- min number of features for coarse tracking Range: 1.0 to 100.0
- max number of features for coarse tracking Range: 1.0 to 100.0
- Pixel search radius for coarse features Range: 1.0 to 100.0
- coarse tracking sub-pixel iterations Range: 1.0 to 100.0
- enable/disable coarse tracking
- speed above which coarse stage is used Range: 0.0 to 1.0
- min ratio features visible/features found for good tracking Range: 0.0 to 1.0
- max ratio features visible/features found before lost Range: 0.0 to 1.0
- min pixels to be found for good tracking Range: 1 to 1000
- max iterations for bundle adjustment Range: 1.0 to 100.0
- number of keyframes kept in the map (0 or 1 = inf) Range: 0 to 1000
- bundleadjustment method Possible values are: LOCAL_GLOBAL (LOCAL_GLOBAL): local and global bundle adjustment, LOCAL (LOCAL): local bundle adjustment only, GLOBAL (GLOBAL): global bundle adjustment only
- limit for convergence in bundle adjustment Range: 0.0 to 1.0
- print bundle debug messages
- FAST corner method Possible values are: FAST9 (FAST9): FAST 9, FAST10 (FAST10): FAST 10, FAST9_nonmax (FAST9_nonmax): FAST 9 with nonmax suppression, AGAST12d (AGAST12d): AGAST 12 pixel diamond, OAST16 (OAST16): AGAST 16 pixel circular
- threshold for FAST features on level 0 Range: 0 to 255
- threshold for FAST features on level 1 Range: 0 to 255
- threshold for FAST features on level 2 Range: 0 to 255
- threshold for FAST features on level 3 Range: 0 to 255
- adaptive threshold for corner extraction
- controls adaptive threshold to MaxPatches*N corners Range: 0.5 to 20.0
- small images for the rotation estimator blur Range: 0.0 to 10.0
- small images for the rotation estimator enable/disable
- MiniPatch tracking threshhold Range: 0 to 10000000
- number of dominant plane RANSACs Range: 0 to 1000
- score for relocalization Range: 0 to 90000000
- enable auto initialization
- min pixel distance for auto initialization Range: 1 to 100
- max # of loops for stereo initialization Range: 1 to 100
Static parameters
parameters that are statically set- minimal pyramidal level for map initialization. Use higher levels on self-similar structures Range 0 to 3
- enables/disables the graphical feedback. Disable this on onboard computers
- x-dimensions of the input image to be processed by PTAM
- y-dimensions of the input image to be processed by PTAM
- width resolution of the graphical feedback window
- height resolution of the graphical feedback window
- Multiplier for the distance at which a new KF is requested
- Estimator to be used for bundle adjustment. Possible values are: Huber, Cauchy, Tukey
- Estimator to be used for bundle adjustment. Possible values are: Huber, Cauchy, Tukey
- Sigma value for Tukey estimator
- minimal Shi-Tomasi score for a point to be considered as a candidate feature
- Camera focal length in x. Retrieve this value from the cameracalibrator node
- Camera focal length in y. Retrieve this value from the cameracalibrator node
- Camera center in x. Retrieve this value from the cameracalibrator node
- Camera center in y. Retrieve this value from the cameracalibrator node
- Camera distortion factor. Retrieve this value from the cameracalibrator node
- blur factor for the cameracalibrator
- mean value for the cameracalibrator
- min # of corners for the cameracalibrator
- optimize flag for the cameracalibrator
- show flag for the cameracalibrator
- enable/disable distortion flag for the cameracalibrator
- max step for the cameracalibrator
- size for patches used in the cameracalibrator
- use the openGL window
- openGL window size
- offset for text display in the openGL window