Package proposal

This will be a cleaned-up, generic implementation of Nister vocabulary trees intended to replace the one in place_recognition. Problems with the existing version:

  • Only supports dense uint8 Calonder descriptors.
    • The new version will allow any feature such that it can compute a distance between two instances.
    • Calonder-specific feature quantization code will be moved out of the VocabularyTree class, into a Calonder-specific training program.

  • Assumes L2 distance between features.
    • The new version will allow other metrics (L1, EMD, ...).
  • Complicated implementation.
    • New version will separate tasks across multiple classes instead of putting everything in a monolithic VocabularyTree class.

    • Existing version supports hierarchical scoring (every tree node is considered a word and receives a weight). This considerably complicates the implementation for little or no benefit; Nister got the best recognition performance using only the leaves as words. The new version will not bother with hierarchical scoring. (Neither does Fraundorfer's implementation)

Vocabulary Tree

Unlike the current all-in-one class, the new VocabularyTree is responsible for exactly one task: quantizing features into "words."

   1 typedef uint32_t Word;
   3 template<class Feature, class Distance = L2>
   4 class VocabularyTree
   5 {
   6   VocabularyTree(Distance d = Distance());
   8   Word quantize(const Feature& f) const;
  10   void save(const std::string& file) const;
  11   void load(const std::string& file);
  12 };


The Database stores "documents" (images represented as words). It can find the set of documents best matching a query. Internally, it keeps an inverted file for each word mapping that word to the documents containing it.

   1 typedef uint32_t DocId;
   2 typedef std::vector<Word> Document;
   4 class Database
   5 {
   6   // Insert a new document. The returned DocId identifies it.
   7   DocId insert(const Document& words);
   9   struct Match
  10   {
  11     DocId id;
  12     float score;
  13   };
  14   typedef std::vector<Match> Matches;
  16   // Find the top N matches in the database for the query document.
  17   void find(const Document& words, size_t N, Matches& matches) const;
  19   // Find the top N matches AND insert the query document. insert and find
  20   // are essentially the same operation, so this is twice as fast as doing
  21   // them sequentially.
  22   DocId findAndInsert(const Document& words, size_t N, Matches& matches);
  24   // Compute the TF-IDF weights of all the words. To be called after inserting
  25   // a corpus of training examples into the database.
  26   void computeWeights();
  28   void saveWeights(const std::string& file) const;
  29   void loadWeights(const std::string& file);
  30 };


place_recognition already contains a rather fast implementation of K-means. It uses Elkan's algorithm, which greatly reduces the number of distance computations while producing the same results as standard K-means at each iteration. It can use the k-means++ seeding algorithm, which tends to reduce the number of iterations required for convergence. It is already templated to work with float or double features.

Needed changes:

  • Support user-specified features and distance (currently assume dense features with L2).
  • Tweak as needed to work with integer features.

   1 template<class Feature, class Distance>
   2 class Kmeans
   3 {
   4   Kmeans(Distance d = Distance());
   6   // random, use given, kmeans++, ...
   7   void setInitMethod(...);
   9   // Stop after convergence or some number of iterations.
  10   void setMaxIterations(size_t iters);
  12   // Number of times to run the algorithm, if seeding is non-deterministic.
  13   void setStarts(size_t starts);
  15   // Cluster a set of features into k centers, with each input feature
  16   // assigned membership to some center. Returns the error.
  17   distance_type cluster(const std::vector<Feature>& features,
  18                         size_t k,
  19                         std::vector<Feature>& centers,
  20                         std::vector<size_t>& membership) const;
  21 };

A reusable HierarchicalKmeans class would be nice to have also, but requires a more complicated interface. That can wait I think.

Feature concept

Features must have a value type (e.g. float, uint8_t) and dimension (e.g. 176 for Calonder) known at compile time. This information can be accessed through type traits. Normally a feature class will simply be a wrapper for a stack-allocated array, such as fixed-size Eigen vectors or boost::array.

Distance concept

   1 concept Distance
   2 {
   3   typedef ... value_type;  // the coordinate type of the feature
   4   typedef ... result_type; // capable of accumulating value_type
   6   // Returns the (squared) distance between a and b.
   7   result_type operator() (const Feature& a, const Feature& b) const;
   8 };

Package review meeting notes

Create new package review

Enter the date in the appropriate box to generate a template for an API or code review meeting. Check the QAProcess page for more information.

Wiki: vocabulary_tree/Reviews (last edited 2010-01-28 05:13:03 by PatrickMihelich)