2008年2月29日 星期五

[Reading] Lecture 03 - Distinctive Image Features from Scale-Invariant Keypoints

SIFT is such a powerful method to extract distinctive feature points that can be used not only in CBIR, but also in object recognition, image matching, stereo vision, and so on. Since different cameras can capture the same scene in different positions, in different times, with different viewing angles, or different camera settings, a good feature describing a point should be invariant to the viewangle and the illuminace change. Although SIFT do NOT support all kinds of transformation (caused by different viewangles), its computation efficiency and stable matching power make it so popular that many researchers use it as their tool.

In "Related research" of this paper, the author gives a good review of the area of feature point matching. From the history, we can more clearly understand what is the main concern in each component of SIFT. The author also explains why SIFT do not try to cover wider affine invariance. He claims that none of previous approaches are yet fully affine invariant, as they start with initial featurescales and locations selected in a non-affine-invariant manner. Lossing some affine invariance makes SIFT more robust against changes in 3D viewpoint for non-planar surfaces, more efficient feature extraction, and the ability to identify larger numbers of features.

Since SIFT is composed of so many components, I won't give too much technical detail in the following. More focuses will be put on the what problems each component is trying to solve.

1. Detection of scale-space extrema

SIFT detects feature points using cascade filtering for efficiency. In this step, SIFT tries to find out the candidate locations that is invariant to the scale-space (In fact, also invariant to some simple transformations and illuminace changes). The candidate locations are found by finding the local maximum / minimum in the difference-of-Gaussian (DOG) function convolved with the image. The DoG function is a close approximation of the LoG function but the DoG can significantly accelerate the computation process. Since there are some parameters of the DOG function need to be determined, the author determines the parameters according to the repeatability of the candidate locations after doing some synthesized image transformtion. Some tricky implementation details are given to improve the computation efficiency.

2. Accurate keypoint localization

After the candidate positions are found, the position of the extremum is refined by solving the derivative of the Taylor expansion around the candidate position. Some more candidate positions are rejected if they are low contrast or along an edge. The author determines if a point is on an edge by checking the ratio of the two largest pricipal curvature (yes if the ratio is too large).

3. Orientation assignment

So far, the candidate positions are already found and would not be rejected in the following steps. Indeed, the methods used to find these positions are invariant to scale, rotation, and change in illuminace. However, we also need a point descriptor that is invariant to scale, rotation, and change in illuminace, too. In this step, the author determines one or more dominant directions of a point according to the gradient histogram. The author gives the flexibility to choose more than one dominant direction, since the second (third?) dominant direction may become the first dominant direction when there is image noise. The dominant direction will be used in the next step to achieve invariation to rotation.

4. The local image descriptor

The location, scale, orientation of each keypoint are already assigned by the previous steps. In this step, the main goal is to find a proper descriptor to reduce the effects caused by illumination variation and viewpoint. By adopting the gradient instead of the pixel value, the descriptor is invariant to illuminance. The effects caused by different image contrast is removed by normalizing the gradient histogram. And the effects caused by non-linear illumination changes is reduced by restricting the magnitudes of gradient (after normalization) to be less than 0.2. Again, the author tested the performance using several parameters, and chosen the one with good performance and low complexity.

Application

The author gives an object recognition application using SIFT. He suggests to use the ratio of the distance of the closest neighbor to the distance of the second-closest neighbor to achieve reliable matching. And then he recommends a nearest neighbor indexing method to more efficiently find the nearest neighbor with high probability. Since methods such as RANSAC or Least Median of Squares perform poorly when outliers dominate, Hough transformation is used to help identify clusters. At the end, the author gives the methods to obtain the solution for affine parameter.

My conclusion

This is not the first work addressing on scale-invariant, illuminant-invariant, or viewpoint-invariant features. The main contribute of this paper is that the author introduces a very complete procedure to get SIFT feature, by integrating and improving the merits of many previous works. To make SIFT more practicable, the author gives several tricks to reduce the computation cost. However, there is also some drawbacks in this paper - the application example somehow deviates from the main subject, and takes too many (7) pages.

Reference:

D. Lowe, “Distinctive Image Features from Scale-Invariant Keypoints,”Int’l J. Computer Vision, vol. 2, no. 60, pp. 91-110, 2004.

2008年2月25日 星期一

[Reading] Lecture 02 - Image Retrieval: Ideas, Influences, and Trends of the New Age

I used to do research on video retrieval when I was in Eric's small group before, therefore, I thought this paper would not be too hard to read for me. However, the fact is that there are so much thing that I had never heard before. Beside, I only understand how some developing techniques can be used in CBIR, but don't know why they are brought up into this area. I'm such a frog in a wall...

Although this paper only focus on the history of CBIR after 2000, we still can understand how the mainstream of this area goes in the last decades. In the very beginning, the research subjects for CBIR were very limited, people usually tried to take only two steps - feature extraction and similarity calculation, to retrieve what they want. After several years, more and more new areas such signature extraction, clustering (metric learning), classification (SVM), and relevance feedback are thriving up. The author took some experiments by Google Scholar's search tool to proof this phenomenon.

Then, the author gives several aspects of the design of a CBIR system. Because the CBIR applications are interactive, the author usaually thinks from users' viewing angle. Some constraints such as searching speed are added to ensure the practicability of a CBIR system. Those constraints make the CBIR design becomes even more hard to design, since we cannot solve the problem by brute force.

There is always a trade-off between performance and complexity. Take features/signatures extraction for example, in order to obtain a thorough description of an image, we should also concern about the spatial information (including the shape). However, to get a more concise and meaningful spatial information, higher computation is often unavoided (Ex:segmentation). Similarly, the trade-off between performance and complexity can be seen in similarity calculation. Mallows distance, which performs well, is described by a minimzation problem. However, since the complexity to get this value is so high, some people would rather choose integrated region matching (IRM) as their metric, which can be computed significantly faster but not much inferior to Mallows distance.

Another important issue deserves to be mention is the evaluation strategies. It is hard to evalute whether the result of a CBIR is good or bad because it is very subjective, not to mentioned how to compare two different CBIR system. Therefore, researchers started to agreed on certain evaluation datasets, benchmearks, and forums.

Overall, this paper gives a comprehensive survey and also provides some possible offshoots in the future. However, we won't get to much thing from this paper if we only skim through it. The author tries to give many things so that he sacrifices some of the details. Of course, if we really try to understand a certain subject well, reading the reference listed by the author would be a good start.

Reference:
R. Datta, D. Joshi, J. Li, and J. Z. Wang, "Image retrieval: Ideas, influences, and. trends of the new age." ACM Computing Surveys, 39(65), 2007

2008年2月21日 星期四

[Reading] Lecture 02 - How to Give a Good Research Talk

First, we should understand that this article focus on the situation that the audiences are the people who may not know much about your works, and you are presenting a technical talk. Knowing your audiences is important, it determines what kind of materials should be put in your slides. To make sure that everyone can get something from your talk, give enough and clear information instead of too much details. Giving example would be a very good way to explain our idea.

The author also tells us how to prepare the slides, but I think some of them are out of date. For example, I don't think perparing the slide by typing is a bad idea, handwriting is not always good for everyone.

I do like the way introduced by the author about how to help re-orient the audience. By saying something like "This is what I have discussed so far, and now I'm going on to cover these areas," we have another change to re-catch the attention from some people who are already lost. I have used a similar skill in my talk before.

Another important concept is : avoid the temptation to conceal problems you know about in your work, get your audience to help you do your research! I think most people would try to hide thier weakness instead of doing this, so I'm a little surprised at this philosophy.

Reference:
Simon L Peyton Jones, John Hughes, and John Launchbury, "How to give a good research talk," SIGPLAN Notices 28(11) (Nov 1993).
http://research.microsoft.com/~simonpj/papers/giving-a-talk/giving-a-talk-html.html

[Reading] Lecture 02 - How to Read a Paper

Three-pass approach is recommended in this article. Here, I interpret each pass in my own way.

1. First pass : Understand whether the paper is what I need.
The most importnat parts of a paper is title, abstract, introduction, headings, and conclusion. We usually can understand if a paper is worthy to be read from these parts.
In fact, when we write a paper, we also should deal with these parts very carefully to attract reviewers' attention.

2. Second pass : Grasp the content of the paper.
Have the ability to summarize the main thrust of the paper. Not necessary to know every detail.

3. Third pass : Absorb but also doubt every detail.
In this pass, we are already sure that this paper is what we need. Therefore, pay more attention on the details. Simutaneously, challenge their work to get more novel ideas.

Reference :
Keshav, "How to Read a Paper," ACM SIGCOMM Computer Communication Review 2007
http://www.sigcomm.org/ccr/drupal/files/p83-keshavA.pdf