Package weka.classifiers.evaluation
Class ThresholdCurve
- java.lang.Object
-
- weka.classifiers.evaluation.ThresholdCurve
-
- All Implemented Interfaces:
RevisionHandler
public class ThresholdCurve extends java.lang.Object implements RevisionHandler
Generates points illustrating prediction tradeoffs that can be obtained by varying the threshold value between classes. For example, the typical threshold value of 0.5 means the predicted probability of "positive" must be higher than 0.5 for the instance to be predicted as "positive". The resulting dataset can be used to visualize precision/recall tradeoff, or for ROC curve analysis (true positive rate vs false positive rate). Weka just varies the threshold on the class probability estimates in each case. The Mann Whitney statistic is used to calculate the AUC.- Version:
- $Revision: 7833 $
- Author:
- Len Trigg (len@reeltwo.com)
-
-
Field Summary
Fields Modifier and Type Field Description static java.lang.StringFALLOUT_NAMEattribute name: Falloutstatic java.lang.StringFALSE_NEG_NAMEattribute name: False Negativesstatic java.lang.StringFALSE_POS_NAMEattribute name: False Positivesstatic java.lang.StringFMEASURE_NAMEattribute name: FMeasurestatic java.lang.StringFP_RATE_NAMEattribute name: False Positive Rate"static java.lang.StringLIFT_NAMEattribute name: Liftstatic java.lang.StringPRECISION_NAMEattribute name: Precisionstatic java.lang.StringRECALL_NAMEattribute name: Recallstatic java.lang.StringRELATION_NAMEThe name of the relation used in threshold curve datasetsstatic java.lang.StringSAMPLE_SIZE_NAMEattribute name: Sample Sizestatic java.lang.StringTHRESHOLD_NAMEattribute name: Thresholdstatic java.lang.StringTP_RATE_NAMEattribute name: True Positive Ratestatic java.lang.StringTRUE_NEG_NAMEattribute name: True Negativesstatic java.lang.StringTRUE_POS_NAMEattribute name: True Positives
-
Constructor Summary
Constructors Constructor Description ThresholdCurve()
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description InstancesgetCurve(FastVector predictions)Calculates the performance stats for the default class and return results as a set of Instances.InstancesgetCurve(FastVector predictions, int classIndex)Calculates the performance stats for the desired class and return results as a set of Instances.static doublegetNPointPrecision(Instances tcurve, int n)Calculates the n point precision result, which is the precision averaged over n evenly spaced (w.r.t recall) samples of the curve.java.lang.StringgetRevision()Returns the revision string.static doublegetROCArea(Instances tcurve)Calculates the area under the ROC curve as the Wilcoxon-Mann-Whitney statistic.static intgetThresholdInstance(Instances tcurve, double threshold)Gets the index of the instance with the closest threshold value to the desired targetstatic voidmain(java.lang.String[] args)Tests the ThresholdCurve generation from the command line.
-
-
-
Field Detail
-
RELATION_NAME
public static final java.lang.String RELATION_NAME
The name of the relation used in threshold curve datasets- See Also:
- Constant Field Values
-
TRUE_POS_NAME
public static final java.lang.String TRUE_POS_NAME
attribute name: True Positives- See Also:
- Constant Field Values
-
FALSE_NEG_NAME
public static final java.lang.String FALSE_NEG_NAME
attribute name: False Negatives- See Also:
- Constant Field Values
-
FALSE_POS_NAME
public static final java.lang.String FALSE_POS_NAME
attribute name: False Positives- See Also:
- Constant Field Values
-
TRUE_NEG_NAME
public static final java.lang.String TRUE_NEG_NAME
attribute name: True Negatives- See Also:
- Constant Field Values
-
FP_RATE_NAME
public static final java.lang.String FP_RATE_NAME
attribute name: False Positive Rate"- See Also:
- Constant Field Values
-
TP_RATE_NAME
public static final java.lang.String TP_RATE_NAME
attribute name: True Positive Rate- See Also:
- Constant Field Values
-
PRECISION_NAME
public static final java.lang.String PRECISION_NAME
attribute name: Precision- See Also:
- Constant Field Values
-
RECALL_NAME
public static final java.lang.String RECALL_NAME
attribute name: Recall- See Also:
- Constant Field Values
-
FALLOUT_NAME
public static final java.lang.String FALLOUT_NAME
attribute name: Fallout- See Also:
- Constant Field Values
-
FMEASURE_NAME
public static final java.lang.String FMEASURE_NAME
attribute name: FMeasure- See Also:
- Constant Field Values
-
SAMPLE_SIZE_NAME
public static final java.lang.String SAMPLE_SIZE_NAME
attribute name: Sample Size- See Also:
- Constant Field Values
-
LIFT_NAME
public static final java.lang.String LIFT_NAME
attribute name: Lift- See Also:
- Constant Field Values
-
THRESHOLD_NAME
public static final java.lang.String THRESHOLD_NAME
attribute name: Threshold- See Also:
- Constant Field Values
-
-
Method Detail
-
getCurve
public Instances getCurve(FastVector predictions)
Calculates the performance stats for the default class and return results as a set of Instances. The structure of these Instances is as follows:- True Positives
- False Negatives
- False Positives
- True Negatives
- False Positive Rate
- True Positive Rate
- Precision
- Recall
- Fallout
- Threshold contains the probability threshold that gives rise to the previous performance values.
For the definitions of these measures, see TwoClassStats
- Parameters:
predictions- the predictions to base the curve on- Returns:
- datapoints as a set of instances, null if no predictions have been made.
- See Also:
TwoClassStats
-
getCurve
public Instances getCurve(FastVector predictions, int classIndex)
Calculates the performance stats for the desired class and return results as a set of Instances.- Parameters:
predictions- the predictions to base the curve onclassIndex- index of the class of interest.- Returns:
- datapoints as a set of instances.
-
getNPointPrecision
public static double getNPointPrecision(Instances tcurve, int n)
Calculates the n point precision result, which is the precision averaged over n evenly spaced (w.r.t recall) samples of the curve.- Parameters:
tcurve- a previously extracted threshold curve Instances.n- the number of points to average over.- Returns:
- the n-point precision.
-
getROCArea
public static double getROCArea(Instances tcurve)
Calculates the area under the ROC curve as the Wilcoxon-Mann-Whitney statistic.- Parameters:
tcurve- a previously extracted threshold curve Instances.- Returns:
- the ROC area, or Double.NaN if you don't pass in a ThresholdCurve generated Instances.
-
getThresholdInstance
public static int getThresholdInstance(Instances tcurve, double threshold)
Gets the index of the instance with the closest threshold value to the desired target- Parameters:
tcurve- a set of instances that have been generated by this classthreshold- the target threshold- Returns:
- the index of the instance that has threshold closest to the target, or -1 if this could not be found (i.e. no data, or bad threshold target)
-
getRevision
public java.lang.String getRevision()
Returns the revision string.- Specified by:
getRevisionin interfaceRevisionHandler- Returns:
- the revision
-
main
public static void main(java.lang.String[] args)
Tests the ThresholdCurve generation from the command line. The classifier is currently hardcoded. Pipe in an arff file.- Parameters:
args- currently ignored
-
-