This paper gives a way to do semi-supervised learning, and it focus on the problem of how to combine different kinds of information in the learning procedure. In a more mathematical way to interpret the same problem, it is eqivalent to how to training a classifier from two feature sets with limited labeled data and large number of unlabeled sample data.
The main idea of co-training is trying to let one classifer (trained by feature set 1) to teach the other classifier (trained by feature set 2) and viece versa. One of the benefit is that the data which is agreed to be the same label by the two classifiers is more likely to be correct, and can be used as a labeled data in the next iteration. The other benifit is that any of the classifiers can learn something that can not be learnt by itself from other classifiers, which makes up the deficiency that the classifiers were trained totally separately.
Comment:
After I understood the idea of this paper, I think there is a question that must be answered. When should we separate the two feature vector (f1, f2) to train two classifiers? Why don't we just put them together to form another feature vector? The authors did give an assumption that f1 and f2 are (conditional) indepdent, is that reasonable? or how can I know whether f1 and f2 are indepdent?
Reference:
Blum and Mitchell, "Combining Labeled and Unlabeled Data with Co-training," COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann Publishers, 1998.
The main idea of co-training is trying to let one classifer (trained by feature set 1) to teach the other classifier (trained by feature set 2) and viece versa. One of the benefit is that the data which is agreed to be the same label by the two classifiers is more likely to be correct, and can be used as a labeled data in the next iteration. The other benifit is that any of the classifiers can learn something that can not be learnt by itself from other classifiers, which makes up the deficiency that the classifiers were trained totally separately.
Comment:
After I understood the idea of this paper, I think there is a question that must be answered. When should we separate the two feature vector (f1, f2) to train two classifiers? Why don't we just put them together to form another feature vector? The authors did give an assumption that f1 and f2 are (conditional) indepdent, is that reasonable? or how can I know whether f1 and f2 are indepdent?
Reference:
Blum and Mitchell, "Combining Labeled and Unlabeled Data with Co-training," COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann Publishers, 1998.
沒有留言:
張貼留言