Biorithm
1.1
|
Public Member Functions | |
Reporter (string outDir, string cmdLine) | |
void | addCV (string function, MyGainCVResult result, MyNT threshold, string algorithm="Hopfield") |
void | addCV (string gene, string function, MyGainAnnotationType correctState, MyGainAnnotationType predictedState, MyNT correctStateConfidence, MyNT predictionConfidence, string algorithm="Hopfield") |
Add results of cross validation of a single gene/protein for a function. The method also updates the overall cross validation results. | |
void | addPrediction (string gene, string function, MyNT prob, MyNT input, MyNT threshold, string algorithm="Hopfield") |
void | addPredictionCutBasedConfidence (string gene, string function, MyNT confidence, MyNT threshold, string algorithm="Hopfield") |
void | checkTruePathRuleForPredictions (string algorithm, MyAnnotations &annotations, const GeneOntology &go, ostream &ostr, map< string, set< string > > &tprViolations) |
Check whether the predictions follow the true path rule. | |
void | comparePredictions (string algo1, string algo2, ostream &ostr) |
Compare the predictions for algo1 and algo2. | |
void | computePredictionRanks (string algo, MyAnnotations &differentAnnotations, GeneOntology &go) |
For each prediction made by an algorithm, use the confidence of the prediction to compute its rank of that prediction in the list of all predictions and in the list of predictions for that function. | |
void | evaluatePredictions (string algo, MyAnnotations ¤tAnnotations, MyAnnotations &newAnnotations, GeneOntology &go) |
Evaluate quality of predictions based on new functional annotations in newAnnotations that are not in currentAnnotations. | |
void | evaluatePredictionsForROCCurvesUsingRanks (string algo, MyAnnotations &differentAnnotations, GeneOntology &go) |
Evaluate quality of predictions based on new functional annotations in newAnnotations that are not in currentAnnotations and generate ROC curves. | |
void | getAlgorithms (set< string > &algorithms) |
Return the set of algorithms with stored predictions. | |
void | getGenesWithPredictions (string algo, set< string > &genes) |
void | printComparisonPredictionEvaluationROCCurves (const set< string > &algorithms, string outputDir, MyAnnotations &annotations, const GeneOntology &go) |
Print ROC curves comparing the predictions for multiple algorithms. | |
void | printComparisonPredictionEvaluationAUCScatterPlots (const set< string > &algorithms, string outputDir, MyAnnotations &annotations) |
void | printDetailedCVResults (ostream &dcvfstr, bool flush=1) |
void | printCVResults (ostream &cvfstr, bool flush=1, const BioFunction *functionToPrint=NULL) |
Print results of cross-validation, one function per line. | |
void | printPredictions (ostream &predfstr, int numPredictionsToPrintPerFunction, bool flush=1, const BioFunction *functionToPrint=NULL) |
Print prediction results, one (gene, function, algorithm) triple per line. | |
void | printPredictionEvaluationROCCurves (ostream &ostr) |
Print ROC curves that summarise the evaluation. | |
void | printPredictionEvaluations (ostream &ostr, GeneOntology *go=NULL) |
Print results of evaluating predictions, one (gene, function, algorithm) triple per line. | |
void | printPredictionEvaluationSummary (ostream &ostr) |
Print various statistics that summarise the evaluation. | |
void | readGeneUniverse (string guFile, set< string > &universe) |
Read the universe of genes over which GAIN operated in a previous invocation. | |
void | readPredictions (string predFile, set< string > *onlyFunctions=NULL, string convertFunction="") |
void | readDetailedCVResults (string predFile, const set< string > *onlyFunctions=NULL, string convertFunction="") |
void | setExperimentName (string dataset) |
void | printCurveDataFromCV (ostream &out, GeneOntology &go, set< BioFunction > functions=set< BioFunction >(), string extra="") |
Print data for ROC and precision-recall curves from cross validation results. | |
void | printECWeightedCurveDataFromCV (ostream &out, GeneOntology &go, MyAnnotations &annotations, set< BioFunction > functions=set< BioFunction >(), string extra="") |
Print data for ROC and precision-recall curves from cross validation results, taking evidence code weights into account. | |
void | clear () |
void Reporter::addCV | ( | string | function, |
MyGainCVResult | result, | ||
MyNT | threshold, | ||
string | algorithm = "Hopfield" |
||
) |
Add the result of cross validation for a single gene-function pair.
void Reporter::addCV | ( | string | gene, |
string | function, | ||
MyGainAnnotationType | correctState, | ||
MyGainAnnotationType | predictedState, | ||
MyNT | correctStateConfidence, | ||
MyNT | predictionConfidence, | ||
string | algorithm = "Hopfield" |
||
) |
Add results of cross validation of a single gene/protein for a function. The method also updates the overall cross validation results.
Add cumulative results of cross validation for a function.
[in] | gene | The gene involved in the cross validation. |
[in] | function | The function involved in the cross validation. |
[in] | correctState | The correct annotation of gene is with respect to function. |
[in] | predictedState | The predicted annotation of gene is with respect to function. |
[in] | correctStateConfidence | A number (should be between 0 and 1) indicating the (a priori) confidence in correctState. Normally, this value is 1. For some algorithm, such as SinkSource with evidence codes, this value can depend on the weights assigned to the evidence codes. |
[in] | predictionConfidence | The confidence in the prediction. |
[in] | algorithm | the name of the algorithm being cross validated. |
void Reporter::addPredictionCutBasedConfidence | ( | string | gene, |
string | function, | ||
MyNT | confidence, | ||
MyNT | threshold, | ||
string | algorithm = "Hopfield" |
||
) |
Add the confidence for a prediction where the confidence is computed using distance to a cut in the FLN.
void Reporter::checkTruePathRuleForPredictions | ( | string | algorithm, |
MyAnnotations & | annotations, | ||
const GeneOntology & | go, | ||
ostream & | ostr, | ||
map< string, set< string > > & | tprViolations | ||
) |
Check whether the predictions follow the true path rule.
[in] | algorithm,the | algorithm whose predictions need checking. |
[in] | annotations,a | reference to an instance of MyAnnotations. |
[in] | go,a | reference to an instance of GeneOntology. |
[in] | ostr,an | output stream to print results to. |
[out] | tprViolations,a | map from strings to sets of strings that will store for each function, the set of genes predicted not to have that function but predicted to have some child of that function. |
For each prediction (gene-function) pair made by algorithm, for each parent of the function, the method checks if the gene is either predicted to have or annotated with the parent.
void Reporter::comparePredictions | ( | string | algo1, |
string | algo2, | ||
ostream & | ostr | ||
) |
Compare the predictions for algo1 and algo2.
[in] | algo1,the | name of the first algorithm to compare. |
[in] | algo2,the | name of the second algorithm to compare. |
For each prediction made by algo1, the method checks if algo2 also made that prediction. If only algo1 made the prediction or if the confidence values are different, the method prints out the details of the predictions.
void Reporter::computePredictionRanks | ( | string | algo, |
MyAnnotations & | differentAnnotations, | ||
GeneOntology & | go | ||
) |
For each prediction made by an algorithm, use the confidence of the prediction to compute its rank of that prediction in the list of all predictions and in the list of predictions for that function.
[in] | algo,the | name of the algorithm to evaluate. |
[in] | differentAnnotations,the | difference between the new annotations and the current/old annotations. |
[in] | go,a | reference to an instance of GeneOntology. The method uses this variable to decide if a predicted function is not valid (e.g., it is now obsolete). |
The method assumes that the calling context computed differentAnnotations after computing the transitive closure of the new and the current/old annotations.
void Reporter::evaluatePredictions | ( | string | algo, |
MyAnnotations & | currentAnnotations, | ||
MyAnnotations & | newAnnotations, | ||
GeneOntology & | go | ||
) |
Evaluate quality of predictions based on new functional annotations in newAnnotations that are not in currentAnnotations.
[in] | algo,the | name of the algorithm to evaluate. |
[in] | currentAnnotations,the | "old" annotations that were the basis of the predictions. |
[in] | newAnnotations,the | "new" annotations that are the basis of the evaluations. |
[in] | go,a | reference to an instance of GeneOntology. The method uses this variable to decide if a predicted function is not valid (e.g., it is now obsolete). |
The method performs the following computations:
(i) Find those annotations in newAnnotations that are not in currentAnnotations.
(ii) Restrict this difference to verifiable genes, i.e., genes for which the algorithm has predicted at least one function.
(iii) For each verifiable gene, find the most specific predictions (MSPs). For each such prediction, compute the closest function annotating the gene in the difference. Consider a prediction (and the gene) to be verified if the closest function is a descendant of the function in the prediction; to assist this calculation, when comparing functions, the method considers a descendant to be closer than an ancestor or a relative.
(iv) For verified MSPs, compute the distribution (histogram) of distances from the predicted function to the verifying function.
(v) For verified genes, compute the distribution (histogram) of the smallest distance from a predicted function to the verifying function; the minimum is over all verified MSPS for a gene.
(vi) For verified genes, compute the distribution of confidence values for verified MSPs and for unverified MSPs.
(vi) For unverified genes (i.e., genes without any verified predictions), compute the distribution of confidence values for all MSPs.
void Reporter::evaluatePredictionsForROCCurvesUsingRanks | ( | string | algo, |
MyAnnotations & | differentAnnotations, | ||
GeneOntology & | go | ||
) |
Evaluate quality of predictions based on new functional annotations in newAnnotations that are not in currentAnnotations and generate ROC curves.
[in] | algo,the | name of the algorithm to evaluate. |
[in] | differentAnnotations,the | difference between the new annotations and the current/old annotations. |
[in] | go,a | reference to an instance of GeneOntology. The method uses this variable to decide if a predicted function is not valid (e.g., it is now obsolete). |
The method assumes that the calling context computed differentAnnotations after computing the transitive closure of the new and the current/old annotations.
void Reporter::getGenesWithPredictions | ( | string | algo, |
set< string > & | genes | ||
) |
Return all the genes that have a predicted function, where the prediction is made by the algorithm called algo.
void Reporter::printComparisonPredictionEvaluationROCCurves | ( | const set< string > & | algorithms, |
string | outputDir, | ||
MyAnnotations & | annotations, | ||
const GeneOntology & | go | ||
) |
Print ROC curves comparing the predictions for multiple algorithms.
[in] | algorithms,a | set of names of algorithms to compare. |
[in] | outputDir,the | name of the directory to print plots to. |
The method prints a single plot containing the ROC curves for all algorithms. For each function with at least one verifiable prediction, the method also prints a plot containing the ROC curves for the algorithms for that function.
void Reporter::printCurveDataFromCV | ( | ostream & | out, |
GeneOntology & | go, | ||
set< BioFunction > | functions = set< BioFunction >() , |
||
string | extra = "" |
||
) |
Print data for ROC and precision-recall curves from cross validation results.
[out] | out | the stream to output to. |
[in] | go | an instance of GeneOntology |
[in] | functions | a set of BioFunctions. If this set is not empty, then the method combines the results of all the functions in this set. |
[in] | extra | an extra string to print as part of the information output. This string goes into a comment printed before the cross-validation curve data. Scripts such as eval-gain.pl can use the contents of the string in their output files. |
The method prints out the following pieces of information for every function or for the entire group in the parameter functions: confidence cutoff, desired recall (the method tries to print recall values at regular intervals), actual recall (a desired recall may not be achievable because many results have identical confidence values, precision, false positive rate, the number of true positives, the number of false positives, the number of true negatives, and the number of false negatives.
void Reporter::printECWeightedCurveDataFromCV | ( | ostream & | out, |
GeneOntology & | go, | ||
MyAnnotations & | annotations, | ||
set< BioFunction > | functions = set< BioFunction >() , |
||
string | extra = "" |
||
) |
Print data for ROC and precision-recall curves from cross validation results, taking evidence code weights into account.
The parameters for this method are identical to those for Reporter::printCurveDataFromCV(). This method modifies the counts for the number of true positives, false positives, true negatives, and false negatives, based on weights associated with evidence codes. The method relies on the values of correctStateConfidence input to Reporter::addCV().
void Reporter::printPredictionEvaluations | ( | ostream & | ostr, |
GeneOntology * | go = NULL |
||
) |
Print results of evaluating predictions, one (gene, function, algorithm) triple per line.
On each line, the method prints a predicted function, the name of the prediction algorithm, and the closest function in the new annotations. The method prints this information only for genes that have a new annotation in the net set of annotations. If there are multiple closest functions, the method prints just one.
void Reporter::printPredictions | ( | ostream & | predfstr, |
int | numPredictionsToPrintPerFunction, | ||
bool | flush = 1 , |
||
const BioFunction * | functionToPrint = NULL |
||
) |
Print prediction results, one (gene, function, algorithm) triple per line.
void Reporter::readDetailedCVResults | ( | string | predFile, |
const set< string > * | onlyFunctions = NULL , |
||
string | convertFunction = "" |
||
) |
Read detailed cross-validation results from detailedCVFile.
[in] | detailedCVFile,the | name of the file to read results from. |
[in] | onlyFunctions,a | pointer to a set of function ids. |
[in] | convertFunction,the | name of a mathematical function to use to convert confidence values. Currently supported values are "oneminusexpminus" for $1 - (-x)$. |
If onlyFunctions is not NULL, the method will ignore detailed CV results for all functions not present in onlyFunctions.
void Reporter::readGeneUniverse | ( | string | guFile, |
set< string > & | universe | ||
) |
Read the universe of genes over which GAIN operated in a previous invocation.
[in] | guFile,a | file containing identifiers of the genes in the FLN used by GAIN in a previous invocation. |
Using this method is useful for ensuring (for example) that eval-gain operates on precisely the same set of genes as gain.
void Reporter::readPredictions | ( | string | predFile, |
set< string > * | onlyFunctions = NULL , |
||
string | convertFunction = "" |
||
) |
Read predictions from predFile.
[in] | predFile,the | name of the file to read predictions from. |
[in] | onlyFunctions,a | pointer to a set of function ids. |
[in] | convertFunction,the | name of a mathematical function to use to convert confidence values. Currently supported values are "oneminusexpminus" for $1 - (-x)$. |
If onlyFunctions is not NULL, the method will ignore detailed CV results for all functions not present in onlyFunctions.
void Reporter::setExperimentName | ( | string | dataset | ) | [inline] |
Set the name of the dataset used to obtain these results.
Use this method to set the name of the dataset corresponding to these results. The name could correspond to a gene expression experiment, a method of constructing the FLN, an algorithm comparison, etc.