Public Member Functions
	Reporter (string outDir, string cmdLine)
void	addCV (string function, MyGainCVResult result, MyNT threshold, string algorithm="Hopfield")
void	addCV (string gene, string function, MyGainAnnotationType correctState, MyGainAnnotationType predictedState, MyNT correctStateConfidence, MyNT predictionConfidence, string algorithm="Hopfield")
	Add results of cross validation of a single gene/protein for a function. The method also updates the overall cross validation results.
void	addPrediction (string gene, string function, MyNT prob, MyNT input, MyNT threshold, string algorithm="Hopfield")
void	addPredictionCutBasedConfidence (string gene, string function, MyNT confidence, MyNT threshold, string algorithm="Hopfield")
void	checkTruePathRuleForPredictions (string algorithm, MyAnnotations &annotations, const GeneOntology &go, ostream &ostr, map< string, set< string > > &tprViolations)
	Check whether the predictions follow the true path rule.
void	comparePredictions (string algo1, string algo2, ostream &ostr)
	Compare the predictions for algo1 and algo2.
void	computePredictionRanks (string algo, MyAnnotations &differentAnnotations, GeneOntology &go)
	For each prediction made by an algorithm, use the confidence of the prediction to compute its rank of that prediction in the list of all predictions and in the list of predictions for that function.
void	evaluatePredictions (string algo, MyAnnotations &currentAnnotations, MyAnnotations &newAnnotations, GeneOntology &go)
	Evaluate quality of predictions based on new functional annotations in newAnnotations that are not in currentAnnotations.
void	evaluatePredictionsForROCCurvesUsingRanks (string algo, MyAnnotations &differentAnnotations, GeneOntology &go)
	Evaluate quality of predictions based on new functional annotations in newAnnotations that are not in currentAnnotations and generate ROC curves.
void	getAlgorithms (set< string > &algorithms)
	Return the set of algorithms with stored predictions.
void	getGenesWithPredictions (string algo, set< string > &genes)
void	printComparisonPredictionEvaluationROCCurves (const set< string > &algorithms, string outputDir, MyAnnotations &annotations, const GeneOntology &go)
	Print ROC curves comparing the predictions for multiple algorithms.
void	printComparisonPredictionEvaluationAUCScatterPlots (const set< string > &algorithms, string outputDir, MyAnnotations &annotations)
void	printDetailedCVResults (ostream &dcvfstr, bool flush=1)
void	printCVResults (ostream &cvfstr, bool flush=1, const BioFunction *functionToPrint=NULL)
	Print results of cross-validation, one function per line.
void	printPredictions (ostream &predfstr, int numPredictionsToPrintPerFunction, bool flush=1, const BioFunction *functionToPrint=NULL)
	Print prediction results, one (gene, function, algorithm) triple per line.
void	printPredictionEvaluationROCCurves (ostream &ostr)
	Print ROC curves that summarise the evaluation.
void	printPredictionEvaluations (ostream &ostr, GeneOntology *go=NULL)
	Print results of evaluating predictions, one (gene, function, algorithm) triple per line.
void	printPredictionEvaluationSummary (ostream &ostr)
	Print various statistics that summarise the evaluation.
void	readGeneUniverse (string guFile, set< string > &universe)
	Read the universe of genes over which GAIN operated in a previous invocation.
void	readPredictions (string predFile, set< string > *onlyFunctions=NULL, string convertFunction="")
void	readDetailedCVResults (string predFile, const set< string > *onlyFunctions=NULL, string convertFunction="")
void	setExperimentName (string dataset)
void	printCurveDataFromCV (ostream &out, GeneOntology &go, set< BioFunction > functions=set< BioFunction >(), string extra="")
	Print data for ROC and precision-recall curves from cross validation results.
void	printECWeightedCurveDataFromCV (ostream &out, GeneOntology &go, MyAnnotations &annotations, set< BioFunction > functions=set< BioFunction >(), string extra="")
	Print data for ROC and precision-recall curves from cross validation results, taking evidence code weights into account.
void	clear ()

Member Function Documentation

void Reporter::addCV	(	string	function,
		MyGainCVResult	result,
		MyNT	threshold,
		string	algorithm = `"Hopfield"`
	)

Add the result of cross validation for a single gene-function pair.

void Reporter::addCV	(	string	gene,
		string	function,
		MyGainAnnotationType	correctState,
		MyGainAnnotationType	predictedState,
		MyNT	correctStateConfidence,
		MyNT	predictionConfidence,
		string	algorithm = `"Hopfield"`
	)

Add results of cross validation of a single gene/protein for a function. The method also updates the overall cross validation results.

Add cumulative results of cross validation for a function.

Parameters:

[in]	gene	The gene involved in the cross validation.
[in]	function	The function involved in the cross validation.
[in]	correctState	The correct annotation of gene is with respect to function.
[in]	predictedState	The predicted annotation of gene is with respect to function.
[in]	correctStateConfidence	A number (should be between 0 and 1) indicating the (a priori) confidence in correctState. Normally, this value is 1. For some algorithm, such as SinkSource with evidence codes, this value can depend on the weights assigned to the evidence codes.
[in]	predictionConfidence	The confidence in the prediction.
[in]	algorithm	the name of the algorithm being cross validated.

Warning:: If you invoke this method, you should invoke it for every cross validated gene/function pair. You should not invoke the other addCV() method.

void Reporter::addPredictionCutBasedConfidence	(	string	gene,
		string	function,
		MyNT	confidence,
		MyNT	threshold,
		string	algorithm = `"Hopfield"`
	)

Add the confidence for a prediction where the confidence is computed using distance to a cut in the FLN.

void Reporter::checkTruePathRuleForPredictions	(	string	algorithm,
		MyAnnotations &	annotations,
		const GeneOntology &	go,
		ostream &	ostr,
		map< string, set< string > > &	tprViolations
	)

Check whether the predictions follow the true path rule.

Parameters:

[in]	algorithm,the	algorithm whose predictions need checking.
[in]	annotations,a	reference to an instance of MyAnnotations.
[in]	go,a	reference to an instance of GeneOntology.
[in]	ostr,an	output stream to print results to.
[out]	tprViolations,a	map from strings to sets of strings that will store for each function, the set of genes predicted not to have that function but predicted to have some child of that function.

For each prediction (gene-function) pair made by algorithm, for each parent of the function, the method checks if the gene is either predicted to have or annotated with the parent.

void Reporter::comparePredictions	(	string	algo1,
		string	algo2,
		ostream &	ostr
	)

Compare the predictions for algo1 and algo2.

Parameters:

[in]	algo1,the	name of the first algorithm to compare.
[in]	algo2,the	name of the second algorithm to compare.

For each prediction made by algo1, the method checks if algo2 also made that prediction. If only algo1 made the prediction or if the confidence values are different, the method prints out the details of the predictions.

Note:: To print a list of predictions made by algo2 but not by algo1, invoke the method with the first two arguments swapped.

void Reporter::computePredictionRanks	(	string	algo,
		MyAnnotations &	differentAnnotations,
		GeneOntology &	go
	)

For each prediction made by an algorithm, use the confidence of the prediction to compute its rank of that prediction in the list of all predictions and in the list of predictions for that function.

Parameters:

[in]	algo,the	name of the algorithm to evaluate.
[in]	differentAnnotations,the	difference between the new annotations and the current/old annotations.
[in]	go,a	reference to an instance of GeneOntology. The method uses this variable to decide if a predicted function is not valid (e.g., it is now obsolete).

The method assumes that the calling context computed differentAnnotations after computing the transitive closure of the new and the current/old annotations.

void Reporter::evaluatePredictions	(	string	algo,
		MyAnnotations &	currentAnnotations,
		MyAnnotations &	newAnnotations,
		GeneOntology &	go
	)

Evaluate quality of predictions based on new functional annotations in newAnnotations that are not in currentAnnotations.

Parameters:

[in]	algo,the	name of the algorithm to evaluate.
[in]	currentAnnotations,the	"old" annotations that were the basis of the predictions.
[in]	newAnnotations,the	"new" annotations that are the basis of the evaluations.
[in]	go,a	reference to an instance of GeneOntology. The method uses this variable to decide if a predicted function is not valid (e.g., it is now obsolete).

The method performs the following computations:

(i) Find those annotations in newAnnotations that are not in currentAnnotations.

(ii) Restrict this difference to verifiable genes, i.e., genes for which the algorithm has predicted at least one function.

(iii) For each verifiable gene, find the most specific predictions (MSPs). For each such prediction, compute the closest function annotating the gene in the difference. Consider a prediction (and the gene) to be verified if the closest function is a descendant of the function in the prediction; to assist this calculation, when comparing functions, the method considers a descendant to be closer than an ancestor or a relative.

(iv) For verified MSPs, compute the distribution (histogram) of distances from the predicted function to the verifying function.

(v) For verified genes, compute the distribution (histogram) of the smallest distance from a predicted function to the verifying function; the minimum is over all verified MSPS for a gene.

(vi) For verified genes, compute the distribution of confidence values for verified MSPs and for unverified MSPs.

(vi) For unverified genes (i.e., genes without any verified predictions), compute the distribution of confidence values for all MSPs.

void Reporter::evaluatePredictionsForROCCurvesUsingRanks	(	string	algo,
		MyAnnotations &	differentAnnotations,
		GeneOntology &	go
	)

Evaluate quality of predictions based on new functional annotations in newAnnotations that are not in currentAnnotations and generate ROC curves.

Parameters:

[in]	algo,the	name of the algorithm to evaluate.
[in]	differentAnnotations,the	difference between the new annotations and the current/old annotations.
[in]	go,a	reference to an instance of GeneOntology. The method uses this variable to decide if a predicted function is not valid (e.g., it is now obsolete).

The method assumes that the calling context computed differentAnnotations after computing the transitive closure of the new and the current/old annotations.

void Reporter::getGenesWithPredictions	(	string	algo,
		set< string > &	genes
	)

Return all the genes that have a predicted function, where the prediction is made by the algorithm called algo.

void Reporter::printComparisonPredictionEvaluationROCCurves	(	const set< string > &	algorithms,
		string	outputDir,
		MyAnnotations &	annotations,
		const GeneOntology &	go
	)

Print ROC curves comparing the predictions for multiple algorithms.

Parameters:

[in]	algorithms,a	set of names of algorithms to compare.
[in]	outputDir,the	name of the directory to print plots to.

The method prints a single plot containing the ROC curves for all algorithms. For each function with at least one verifiable prediction, the method also prints a plot containing the ROC curves for the algorithms for that function.

void Reporter::printCurveDataFromCV	(	ostream &	out,
		GeneOntology &	go,
		set< BioFunction >	functions = `set< BioFunction >()`,
		string	extra = `""`
	)

Print data for ROC and precision-recall curves from cross validation results.

Parameters:

[out]	out	the stream to output to.
[in]	go	an instance of GeneOntology
[in]	functions	a set of BioFunctions. If this set is not empty, then the method combines the results of all the functions in this set.
[in]	extra	an extra string to print as part of the information output. This string goes into a comment printed before the cross-validation curve data. Scripts such as eval-gain.pl can use the contents of the string in their output files.

The method prints out the following pieces of information for every function or for the entire group in the parameter functions: confidence cutoff, desired recall (the method tries to print recall values at regular intervals), actual recall (a desired recall may not be achievable because many results have identical confidence values, precision, false positive rate, the number of true positives, the number of false positives, the number of true negatives, and the number of false negatives.

void Reporter::printECWeightedCurveDataFromCV	(	ostream &	out,
		GeneOntology &	go,
		MyAnnotations &	annotations,
		set< BioFunction >	functions = `set< BioFunction >()`,
		string	extra = `""`
	)

Print data for ROC and precision-recall curves from cross validation results, taking evidence code weights into account.

The parameters for this method are identical to those for Reporter::printCurveDataFromCV(). This method modifies the counts for the number of true positives, false positives, true negatives, and false negatives, based on weights associated with evidence codes. The method relies on the values of correctStateConfidence input to Reporter::addCV().

void Reporter::printPredictionEvaluations	(	ostream &	ostr,
		GeneOntology *	go = `NULL`
	)

Print results of evaluating predictions, one (gene, function, algorithm) triple per line.

On each line, the method prints a predicted function, the name of the prediction algorithm, and the closest function in the new annotations. The method prints this information only for genes that have a new annotation in the net set of annotations. If there are multiple closest functions, the method prints just one.

void Reporter::printPredictions	(	ostream &	predfstr,
		int	numPredictionsToPrintPerFunction,
		bool	flush = `1`,
		const BioFunction *	functionToPrint = `NULL`
	)

Print prediction results, one (gene, function, algorithm) triple per line.

void Reporter::readDetailedCVResults	(	string	predFile,
		const set< string > *	onlyFunctions = `NULL`,
		string	convertFunction = `""`
	)

Read detailed cross-validation results from detailedCVFile.

Parameters:

[in]	detailedCVFile,the	name of the file to read results from.
[in]	onlyFunctions,a	pointer to a set of function ids.
[in]	convertFunction,the	name of a mathematical function to use to convert confidence values. Currently supported values are "oneminusexpminus" for $1 - (-x)$.

If onlyFunctions is not NULL, the method will ignore detailed CV results for all functions not present in onlyFunctions.

void Reporter::readGeneUniverse	(	string	guFile,
		set< string > &	universe
	)

Read the universe of genes over which GAIN operated in a previous invocation.

Parameters:

[in] guFile,a file containing identifiers of the genes in the FLN used by GAIN in a previous invocation.

Using this method is useful for ensuring (for example) that eval-gain operates on precisely the same set of genes as gain.

void Reporter::readPredictions	(	string	predFile,
		set< string > *	onlyFunctions = `NULL`,
		string	convertFunction = `""`
	)

Read predictions from predFile.

Parameters:

[in]	predFile,the	name of the file to read predictions from.
[in]	onlyFunctions,a	pointer to a set of function ids.
[in]	convertFunction,the	name of a mathematical function to use to convert confidence values. Currently supported values are "oneminusexpminus" for $1 - (-x)$.

If onlyFunctions is not NULL, the method will ignore detailed CV results for all functions not present in onlyFunctions.

void Reporter::setExperimentName ( string dataset ) [inline]

Set the name of the dataset used to obtain these results.

Use this method to set the name of the dataset corresponding to these results. The name could correspond to a gene expression experiment, a method of constructing the FLN, an algorithm comparison, etc.

The documentation for this class was generated from the following files:

/home/poirel/src/c++/biorithm/gain/reporter.h
/home/poirel/src/c++/biorithm/gain/reporter.C

Public Member Functions

Member Function Documentation