2008年6月10日 星期二

[Reading] Lecture 16 - Support Vector Learning for Ordinal Regression

This paper poposed a machine learning method to solve the problem - ordinal regression. The job is similar to doing classification, but there exists ordering relationships among different classes. The authors solve the problem by modifying an ordinary SVM framework. This paper is hard to read for me since I am not familiar with the mathematical parts of SVM.

Reference:
Ralf Herbrich, Thore Graepel, and Klaus Obermayer, "Support Vector Learning for Ordinal Regression" , May 20 1999.

2008年6月2日 星期一

[Reading] Lecture 15 - The PageRank Citation Ranking : Bringing Order to the Web

This paper provides a method to evaluates the importance of a webpage according to its backlinks. Using the backlinks as the indication of goodness is not a new idea. In academic area, people see the number of citations, which can be viewed as the backlink of paper, to evaluate the importance of paper. The quality of the citation is under control (you must have the ability to publish a paper before you can cite), so it is worthy to believe that a paper has only a few citations won't be to important. However, it's different in the network. Building a website is easy, so does putting a forelink on the webpage. Linking by Yahoo is more valuable than linking by a personal webpage. Therefore, we should not only see the number of backlink. Intead, we should also consider the quality of the backlink provider.

After understanding the spirit of its method, you can easily know what it tries to do during the propagation. It makes an important webpage contributes more than a common webpage in each iteration. People in that era really likes to use propagation...

There may be some problems using the propagation approaches. For instance, there is a rank sink problem if there exists a local loop in the model. The author solve this problem by providing a source of rank. It can ensure that each page may contribute at least a little instead of zero. In a random walk of view, you can think that the source of rank is a random surfer. It will jump to a webpage that is not linked by the current webpage.

Indeed, PageRank provides a good way to evalute the importance according to the backlinks. However, I think it still ignores the forelinks. A webpage provides high-quality forelinks won't benefit much (although might be some if there exist a loop that includes the webpage and its forelinked-page) from its contribution.

Reference:
Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd, "The PageRank Citation Ranking: Bringing Order to the Web", Number SIDL-WP-1999-0120, November 1999.

2008年5月27日 星期二

[Reading] Lecture 14 - Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach

This paper proposed a efficient algorithm for finding frequency item sets. It utilized a data structure called FP tree to do this job. There are two advantages of this algorithm. First, the FP tree can very compactly represent the data without any information loss. That's because it didn't use any approximation. The solution will be exactly as same as the right answer. Second, the query is efficient. You only have to scan the database once, which is better than those apriori-based method. A deficincy of this work is the memory problem.

Reference:
"Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach," Han, Data Mining and Knowledge Discovery, 2004.

2008年5月20日 星期二

[Reading] Lecture 13 - On Spectral Clustering: Analysis and an algorithm

A big problem of spectral clustering is that it is very slow when we solve the eigenvector problem and the Laplacian matrix is big. Therefore, it is a little wasteful to exploit only one (the second) eigenvector and abandon the others to generate only one normalized cuts at a a time. In this paper the methods how to use the 3rd, 4th, ... eigenvectors are presented. According to their algorithm, after the eigenvector is obtained and represented as column in a matrix M, each row of M is normalized and can be clustered by Kmeans. I've tried to understand the physical meaning of the step of normalization in row but in vain. I think maybe it is just a trick operated in the mathematics and not easy to be interpreted by the physical meaning.

Reference:
Lie Lu, and Alan Hanjalic,"Audio Keywords Discovery for Text-Like Audio Content Analysis and Retrieval," to appear IEEE Trans. on Multimedia, 2007-2008.

[Reading] Lecture 13 - Normalized Cuts and Image Segmentation

Spectral clustering is not easy to understand. Although I present this topics, I dare not to say I am familiar with it.

Usually it is easier to understand an algorithm from its physcial meaning. In this paper, feature points clustering is first modeled as a Graph G=(V,E). And the action of clustering is interpreted as grouping the nodes into different groups. A simple method to evaluate the quality of grouping is to calculate the sum of edges that are linked between different groups (Cut(A,B)). Smaller Cut(A,B) is prefered since it means the similarity between the groups is small. However, minimize cut usually causes clustering that has little number of members. To solve this problem, the author proposed a method to evaluate the cut quality, and it is Normlized Cut (NCut(A,B)).

The idea of NCut is that it penalizes a clustering that has few edges connected to the nodes out of its group. However, it makes the problem becomes NP-hard. The author found that the NCut value can be represented by some matrix form consisting of the Laplacian matrix L and the vector f determined by the clustering result. That is, you can minimze NCut by minimize the matrix form. However, it is unreasonable since once we get f, we get the NCut value. In other words, minimizing the matrix form is equivalent to using brute force. In order to solve the matrix form, the author relieves some constraint of f. And then the problem can be solved as an eigenvector problem because of the approximation.

The paper mainly focus on how to cluster the data into two groups and does not put much emphasize on howto cluster the data into several groups simutaneously.

The authors also compared the normalized cut algorithm with other algorithms. There is no suprise that their method performs superior than other methods under some different settings.

Reference:
Jianbo Shi and Jitendra Malik
, "Normalized Cuts and Image Segmentation," Trans. PAMI 2000.

2008年5月12日 星期一

[Reading] Lecture 12 - A Comparison of Algorithms for Inference and Learning in Probabilistic Graphical Models

This paper demonstrates several inferences and learning algorithms by a simple computer vision application - occlusion. The graphical models that have been introduced in the paper includes the Bayesian Network, the Markov Random Field, and the Factor Graph. Since the occlusion application is simple, we could easily learn the pros and cons of each graphical model from the examples. Bayesian Network is good at indicating marginal independence while Markov Random Field is good at indicating conditional independence.

The rest of the paper discuss many different algorithms for inference, including exact and approximate inferences. Since the algorithms for exact inference is often intractable, we are interested more on the characteristics of the algorithms for approximate inferences. ICM reaches lower energy more quickly, but converges at a higher energy compared to other algorithms.

Reference :
B. J. Frey and N. Jojic, "A Comparison of Algorithms for Inference and Learning in Probabilistic Graphical Models," IEEE Trans. Pattern Analysis and Machine Intelligence, 27(9), pp. 1392-1416, September 2005.

2008年5月9日 星期五

[Whisper] Humble webpage

My very humble webpage which records my project results in some courses.
http://mpac.ee.ntu.edu.tw/~sutony
or
http://su.sutony.googlepages.com/sutony