Reference:
Ralf Herbrich, Thore Graepel, and Klaus Obermayer, "Support Vector Learning for Ordinal Regression" , May 20 1999.
The first two algorithms make refinement on the well-known data structure - kdtree. The first one uses incremental distance calculation to make the complexity of transversing the kdtree reduce to only O(1) (independent of dimension), which is tricky and very useful. The second one uses a priority quene to determine the order of searching. It makes the query terminates earlier.
The third algorithm is based on neighborhood graphs. Any two connected nodes in the neighborhood graphs must satisfy some local criterion, and can be used to accelerate the qurey speed. However, the preprocess
Reference:
S. Arya and D. M. Mount, "Algorithms for fast vector quantization." Proceedings of DCC 93: Data Compression Conference, pp. 381-390, IEEE Press, 1993.
Comment:
I depreciate the way how they explain their technical thing. The symbols and the equations used in the paper are ambiguous (especially when they explain Locality-Sensitive Hashing). I remember that KT has presented a very similar topic before, she did a much better job than this paper does!
Reference :
Gionis, Indyk, and Motwani, "Similarity Search in High Dimensions via Hashing," VLDB: International Conference on Very Large Data Bases, Morgan Kaufmann Publishers, 1999.
Then, the paper gives two example applications of NMF, one is text mining and the other is spectral data analysis. In text mining, NMF uses a document-term matrix constructed with the weights of various terms from a set of documents. Then, the matrix is factored into a term-feature and a feature-document matrix similar to LSI. As a matter of fact, NMF is equivalent to PLSA when it is obtained by minimizing the Kullback–Leibler divergence.
Reference:
Michael W. Berry, Murray Browne, Amy Nicole Langville, V. Paul Pauca, and Robert J. Plemmons, "Algorithms and applications for approximate nonnegative matrix factorization," Computational Statistics & Data Analysis, 52(1), pp. 155-173, 2007.
Since the scene often changes in a music video, the experiment shows that the lighting condition dominates the results. However, if the lighting condition is roughly the same, PCA more or less can distinguish the people. If I knew Fisherface at that time, maybe I'll try FLD to see if it can solve the lighting problem.
Summarizaton:
Paper [1]:
A face image is treated as a featrue vector. However, the dimension of an image is too large to be handled, so they do dimension reduction by PCA. An important thing mentioned in this paper is that the complexity of solveing PCA can be reduced when the number of vectors is less than the dimensions.
Paper [2]:
They argued that PCA preserves all the difference between feature vectors, which is inadequate for face recognition. FLD is more suitable because it diminish the in-class difference when it does dimension reduction.
Comments:
PCA provides a systematic way to obtain the most distinguishable vectors (principal components) from a collection of feature vectors. However, it is very sensitive to "noise" because PCA does not know what differences can be ignored when it is applied to the application of classification. FLD enlarges the difference between classes and diminishes the in-class difference when it does dimension reduction. PCA performs better for feature reconstruction; FLD is more suitable for classification.
In my opinion, we cannot say that FLD is better than PCA because their main goals are different. Besides, a prerequisite of FLD is that the data must be supervised, that is, each vector must be labeled beforehand. Conversely, PCA can be directly applied to unsupervised data.
Reference:
[1] M. Turk and A. Pentland, "Eigenfaces for Recognition." Journal of Cognitive Neuro Science, 3(1), pp. 71-86, 1991.
[2] D. J. Kriegman and J. P. Hespanha and P. N. Belhumeur, "Eigenfaces vs. Fisherfaces: Recognition Using Class-Specific Linear Projection." European Conference on Computer Vision, p. I:43-58, 1996.
Unlike Lowe's paper trying to address on each component (from point detection, descryption, indexing, to matching) of a feature point matching system, this paper mainly focus on the part of interesting point detection.
In Lowe's paper, Lowe has claimed that none of previous approaches are yet fully affine invariant, as they start with initial featurescales and locations selected in a non-affine-invariant manner due to the prohibitive cost of exploring the full affine space. Although this paper use affine-variant method to choose candidate positions in the very beginning, it uses affine-invariant method to refine the point position and the shape of its neighborhood. Therefore, the feature is very close to an Affine Invariant Detectors according to Lowe's argument.
The authors first introduce a scale invariant interesting point detection method called Harris-Laplace detector. This detector uses Harris detector and Laplacian-of-Gaussians (LoG) interchangeably to find the best position for a candidate point in 3D space. And the 3D space is composed of image space (x, y) and scale space. In other words, this detector is similar to the interesting point detector used in SIFT since they both are scale-invariant. In addition to the iterative approach, the author give a simplified Harris-Laplace detector to reduce the computation complexity.
And then, the affine invariant interest point detector is introduced. In essential, affine transformation can be viewed as different scale change in each direction. The second moment matrix is used to estimate the anisotropic shape of a local image structure. The author gives several examples to compare the results using scale invariant detectors and affine invariant detectors. From those figures, it can be seem easily that the regions covered by each interesting point are much less variant to the viewpoint.
An experiment is given to test the repeatability of different detectors. Not surprisely, their affine detector performs much better than other detectors in the case of strong affine deformations. A picture matching application is given in the end of the paper.
Reference:
K. Mikolajczyk and C. Schmid, “Scale & Affine InvariantInterest Point Detectors,” Int’l J. Computer Vision, vol. 1, no. 60,pp. 63-86, 2004.
3. Orientation assignment
So far, the candidate positions are already found and would not be rejected in the following steps. Indeed, the methods used to find these positions are invariant to scale, rotation, and change in illuminace. However, we also need a point descriptor that is invariant to scale, rotation, and change in illuminace, too. In this step, the author determines one or more dominant directions of a point according to the gradient histogram. The author gives the flexibility to choose more than one dominant direction, since the second (third?) dominant direction may become the first dominant direction when there is image noise. The dominant direction will be used in the next step to achieve invariation to rotation.
4. The local image descriptor
The location, scale, orientation of each keypoint are already assigned by the previous steps. In this step, the main goal is to find a proper descriptor to reduce the effects caused by illumination variation and viewpoint. By adopting the gradient instead of the pixel value, the descriptor is invariant to illuminance. The effects caused by different image contrast is removed by normalizing the gradient histogram. And the effects caused by non-linear illumination changes is reduced by restricting the magnitudes of gradient (after normalization) to be less than 0.2. Again, the author tested the performance using several parameters, and chosen the one with good performance and low complexity.
Application
The author gives an object recognition application using SIFT. He suggests to use the ratio of the distance of the closest neighbor to the distance of the second-closest neighbor to achieve reliable matching. And then he recommends a nearest neighbor indexing method to more efficiently find the nearest neighbor with high probability. Since methods such as RANSAC or Least Median of Squares perform poorly when outliers dominate, Hough transformation is used to help identify clusters. At the end, the author gives the methods to obtain the solution for affine parameter.
My conclusion
This is not the first work addressing on scale-invariant, illuminant-invariant, or viewpoint-invariant features. The main contribute of this paper is that the author introduces a very complete procedure to get SIFT feature, by integrating and improving the merits of many previous works. To make SIFT more practicable, the author gives several tricks to reduce the computation cost. However, there is also some drawbacks in this paper - the application example somehow deviates from the main subject, and takes too many (7) pages.
Reference:
D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,”Int’l J. Computer Vision, vol. 2, no. 60, pp. 91-110, 2004.
Three-pass approach is recommended in this article. Here, I interpret each pass in my own way.
1. First pass : Understand whether the paper is what I need.
The most importnat parts of a paper is title, abstract, introduction, headings, and conclusion. We usually can understand if a paper is worthy to be read from these parts.
In fact, when we write a paper, we also should deal with these parts very carefully to attract reviewers' attention.
2. Second pass : Grasp the content of the paper.
Have the ability to summarize the main thrust of the paper. Not necessary to know every detail.
3. Third pass : Absorb but also doubt every detail.
In this pass, we are already sure that this paper is what we need. Therefore, pay more attention on the details. Simutaneously, challenge their work to get more novel ideas.
Reference :
Keshav, "How to Read a Paper," ACM SIGCOMM Computer Communication Review 2007
http://www.sigcomm.org/ccr/drupal/files/p83-keshavA.pdf