„Probabilistische Klassifikation“ – Versionsunterschied

aus Wikipedia, der freien Enzyklopädie
Zur Navigation springen Zur Suche springen
[ungesichtete Version][ungesichtete Version]
Inhalt gelöscht Inhalt hinzugefügt
Added {{merge from}} tag to article (TW)
let's do that merge
Zeile 1: Zeile 1:
{{merge from|Class membership probabilities|discuss=Talk:Probabilistic classification#Proposed merge with Class membership probabilities|date=May 2014}}
{{machine learning bar}}
{{machine learning bar}}
In [[machine learning]], a '''probabilistic classifier''' is a [[statistical classification|classifier]] that is able to predict, given a sample input, a [[probability distribution]] over a set of classes, rather than only predicting a class for the sample. Probabilistic classifiers provide classification with a degree of certainty, which can be useful when combining them into larger systems, for example in [[ensemble classifier]]s.
In [[machine learning]], a '''probabilistic classifier''' is a [[statistical classification|classifier]] that is able to predict, given a sample input, a [[probability distribution]] over a set of classes, rather than only predicting a class for the sample. Probabilistic classifiers provide classification with a degree of certainty, which can be useful when combining them into larger systems, for example in [[ensemble classifier]]s.


Formally, a probabilistic classifier is a [[conditional probability|conditional]] distribution <math>\operatorname{P}(Y \vert X)</math> over a set of classes {{mvar|Y}}, given inputs {{mvar|X}}. Deciding on the best class label <math>\hat{y}</math> for {{mvar|X}} can then be done using the [[Bayes estimator|optimal decision rule]]<ref name="bishop">{{cite book |first=Christopher M. |last2=Bishop |year=2006 |title=Pattern Recognition and Machine Learning |publisher=Springer }}</ref>{{rp|39–40}}
Formally, a probabilistic classifier is a [[conditional probability|conditional]] distribution <math>\operatorname{P}(Y \vert X)</math> over a set of classes {{mvar|Y}}, given inputs {{mvar|X}}. Deciding on the best class label <math>\hat{y}</math> for {{mvar|X}} can then be done using the [[Bayes estimator|optimal decision rule]]<ref name="bishop">{{cite book |first=Christopher M. |last=Bishop |year=2006 |title=Pattern Recognition and Machine Learning |publisher=Springer }}</ref>{{rp|39–40}}


:<math>\hat{y} = \operatorname{\arg\max}_{y} \operatorname{P}(Y=y \vert X)</math>
:<math>\hat{y} = \operatorname{\arg\max}_{y} \operatorname{P}(Y=y \vert X)</math>
Zeile 9: Zeile 8:
Binary probabilistic classifiers are also called [[binomial regression]] models in [[statistics]]. In [[econometrics]], probabilistic classification in general is called [[discrete choice]].
Binary probabilistic classifiers are also called [[binomial regression]] models in [[statistics]]. In [[econometrics]], probabilistic classification in general is called [[discrete choice]].


Some classification models, such as [[naive Bayes classifier|naive Bayes]], [[logistic regression]] and [[multilayer perceptron]]s (when trained under an appropriate [[loss function]]) are naturally probabilistic. Other models such as [[support vector machine]]s are not, but probability models can be fit on their outputs; examples of this technique include [[Platt scaling]] and [[isotonic regression|isotonic calibration]].<ref name="Niculescu">{{cite conference |last1=Niculescu-Mizil |first1=Alexandru |first2=Rich |last2=Caruana |title=Predicting good probabilities with supervised learning |conference=ICML |year=2005 |url=http://machinelearning.wustl.edu/mlpapers/paper_files/icml2005_Niculescu-MizilC05.pdf}}</ref>
Some classification models, such as [[naive Bayes classifier|naive Bayes]], [[logistic regression]] and [[multilayer perceptron]]s (when trained under an appropriate [[loss function]]) are naturally probabilistic. Other models such as [[support vector machine]]s are not, but [[#Probability calibration|methods exist]] to turn them into probabilistic classifiers.


==Generative and conditional training==
==Generative and conditional training==
Some models, such as logistic regression, are conditionally trained: they optimize the conditional probability <math>\operatorname{P}(Y \vert X)</math> directly on a training set (see [[empirical risk minimization]]). Other classifiers, such as naive Bayes, are trained [[Generative model|generatively]]: at training time, the class-conditional distribution <math>\operatorname{P}(X \vert Y)</math> and the class [[Prior probability|prior]] <math>P(Y)</math> are found, and the conditional distribution <math>\operatorname{P}(Y \vert X)</math> is derived using [[Bayes' theorem|Bayes' rule]].<ref name="bishop"/>{{rp|43}}
Some models, such as logistic regression, are conditionally trained: they optimize the conditional probability <math>\operatorname{P}(Y \vert X)</math> directly on a training set (see [[empirical risk minimization]]). Other classifiers, such as naive Bayes, are trained [[Generative model|generatively]]: at training time, the class-conditional distribution <math>\operatorname{P}(X \vert Y)</math> and the class [[Prior probability|prior]] <math>P(Y)</math> are found, and the conditional distribution <math>\operatorname{P}(Y \vert X)</math> is derived using [[Bayes' theorem|Bayes' rule]].<ref name="bishop"/>{{rp|43}}

==Probability calibration==
Not all classification models are naturally probabilistic, and some that are, notably naive Bayes classifiers and [[Boosting (machine learning)|boosting]] methods, produce distorted class probability distributions.<ref name="Niculescu">{{cite conference |last1=Niculescu-Mizil |first1=Alexandru |first2=Rich |last2=Caruana |title=Predicting good probabilities with supervised learning |conference=ICML |year=2005 |url=http://machinelearning.wustl.edu/mlpapers/paper_files/icml2005_Niculescu-MizilC05.pdf}}</ref>
However, for classification models that produce some kind of "score" on their outputs (such as a distorted probability distribution or the "signed distance to the hyperplane" in a support vector machine), there are several methods that turn these scores into properly [[Calibration (statistics)|calibrated]] class membership probabilities.

For the [[binary classification|binary]] case, a common approach is to apply [[Platt scaling]], which learns a [[logistic regression]] model on the scores.<ref name="platt99">{{cite journal |last=Platt |first=John |authorlink=John Platt (computer scientist) |title=Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods |journal=Advances in large margin classifiers |volume=10 |issue=3 |year=1999 |pages=61–74 |url=http://www.researchgate.net/publication/2594015_Probabilistic_Outputs_for_Support_Vector_Machines_and_Comparisons_to_Regularized_Likelihood_Methods/file/504635154cff5262d6.pdf}}</ref>
An alternative method using [[isotonic regression]]<ref>{{cite doi|10.1145/775047.775151}}</ref> is generally superior to Platt's method when sufficient training data is available.<ref name="Niculescu"/>

In the [[multiclass classification|multiclass]] case, one can use a reduction to binary tasks, followed by univariate calibration with an algorithm as described above and further application of the pairwise coupling algorithm by Hastie and Tibshirani.<ref>{{cite doi|10.1214/aos/1028144844}}</ref> An alternative one-step method, the Dirichlet calibration, is introduced by Gebel and Weihs.<ref>{{cite doi|10.1007/978-3-540-78246-9_4}}</ref>


==Evaluating probabilistic classification==
==Evaluating probabilistic classification==

Version vom 5. Mai 2014, 19:34 Uhr

Vorlage:Machine learning bar In machine learning, a probabilistic classifier is a classifier that is able to predict, given a sample input, a probability distribution over a set of classes, rather than only predicting a class for the sample. Probabilistic classifiers provide classification with a degree of certainty, which can be useful when combining them into larger systems, for example in ensemble classifiers.

Formally, a probabilistic classifier is a conditional distribution over a set of classes Vorlage:Mvar, given inputs Vorlage:Mvar. Deciding on the best class label for Vorlage:Mvar can then be done using the optimal decision rule[1]:39–40

Binary probabilistic classifiers are also called binomial regression models in statistics. In econometrics, probabilistic classification in general is called discrete choice.

Some classification models, such as naive Bayes, logistic regression and multilayer perceptrons (when trained under an appropriate loss function) are naturally probabilistic. Other models such as support vector machines are not, but methods exist to turn them into probabilistic classifiers.

Generative and conditional training

Some models, such as logistic regression, are conditionally trained: they optimize the conditional probability directly on a training set (see empirical risk minimization). Other classifiers, such as naive Bayes, are trained generatively: at training time, the class-conditional distribution and the class prior are found, and the conditional distribution is derived using Bayes' rule.[1]:43

Probability calibration

Not all classification models are naturally probabilistic, and some that are, notably naive Bayes classifiers and boosting methods, produce distorted class probability distributions.[2] However, for classification models that produce some kind of "score" on their outputs (such as a distorted probability distribution or the "signed distance to the hyperplane" in a support vector machine), there are several methods that turn these scores into properly calibrated class membership probabilities.

For the binary case, a common approach is to apply Platt scaling, which learns a logistic regression model on the scores.[3] An alternative method using isotonic regression[4] is generally superior to Platt's method when sufficient training data is available.[2]

In the multiclass case, one can use a reduction to binary tasks, followed by univariate calibration with an algorithm as described above and further application of the pairwise coupling algorithm by Hastie and Tibshirani.[5] An alternative one-step method, the Dirichlet calibration, is introduced by Gebel and Weihs.[6]

Evaluating probabilistic classification

Commonly used loss functions for probabilistic classification include log loss and the mean squared error between the predicted and the true probability distributions. The former of these is commonly used to train logistic models.

See also

References

Vorlage:Reflist

Vorlage:Compu-ai-stub

  1. a b Christopher M. Bishop: Pattern Recognition and Machine Learning. Springer, 2006.
  2. a b Alexandru Niculescu-Mizil, Rich Caruana, Caruana: Predicting good probabilities with supervised learning. ICML. 2005 (wustl.edu [PDF]).
  3. John Platt: Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. In: Advances in large margin classifiers. 10. Jahrgang, Nr. 3, 1999, S. 61–74 (researchgate.net [PDF]).
  4. Vorlage:Cite doi
  5. Vorlage:Cite doi
  6. Vorlage:Cite doi