sutony's aMMAI !: [Reading] Lecture 10 - Combining Labeled and Unlabeled Data with Co-Training

This paper gives a way to do semi-supervised learning, and it focus on the problem of how to combine different kinds of information in the learning procedure. In a more mathematical way to interpret the same problem, it is eqivalent to how to training a classifier from two feature sets with limited labeled data and large number of unlabeled sample data.

The main idea of co-training is trying to let one classifer (trained by feature set 1) to teach the other classifier (trained by feature set 2) and viece versa. One of the benefit is that the data which is agreed to be the same label by the two classifiers is more likely to be correct, and can be used as a labeled data in the next iteration. The other benifit is that any of the classifiers can learn something that can not be learnt by itself from other classifiers, which makes up the deficiency that the classifiers were trained totally separately.

Comment:
After I understood the idea of this paper, I think there is a question that must be answered. When should we separate the two feature vector (f1, f2) to train two classifiers? Why don't we just put them together to form another feature vector? The authors did give an assumption that f1 and f2 are (conditional) indepdent, is that reasonable? or how can I know whether f1 and f2 are indepdent?

Reference:
Blum and Mitchell, "Combining Labeled and Unlabeled Data with Co-training," COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann Publishers, 1998.

sutony's aMMAI !

2008年4月29日星期二

[Reading] Lecture 10 - Combining Labeled and Unlabeled Data with Co-Training

沒有留言:

網誌存檔

關於我自己