Biorithm: GAIN

This manual is for GAIN, a software package for computational prediction of gene function.

Index

Introduction : What Does GAIN Do?
Installation : Downloading and Installing GAIN
Invoking GAIN : Details of the GAIN Command Line

Introduction

Gene Annotation using Integrated Networks (GAIN) is a computational system for automatically and robustly predicting the functions of genes. GAIN operates on a functional linkage network (FLN), which is a graph whose nodes are genes and whose edges connect genes that may share the same function. In its current form, GAIN constructs a FLN for a single organism by integrating functional genomic information such as gene expression data, protein-protein interactions, and protein-DNA binding data. GAIN includes a number of algorithms for systematically propagating annotations through the entire FLN. One of the features of many of these algorithms is that we can represent the flow of information in the FLN as a directed graph and provide visualisations of this graph to biologists.

Installation

Download the Biorithm package and follow the installation instructions for Biorithm. The executable file will be available as gain/gain.

Invoking GAIN

How GAIN Works : Details on How GAIN Works
Input Files : Specifying Input Files for GAIN
GAIN Generated Output Files : Details of the output files generated by GAIN
Prediction Algorithms : Selecting which gene function prediction algorithm to invoke
Command-line Options : All command line options

How GAIN Works

GAIN operates by making predictions for each gene function independently. Therefore, it can make multiple predictions for the same gene.

Most algorithms in GAIN are semi-supervised learning algorithms, i.e., they simultaneously analyse the relationships (as encoded in the FLN) between positive, negative, and unknown examples to make predictions regarding the unknown examples. An important aspect of GAIN is how it generates training data from the input files. For a specific function, a positive example is a gene known to be annotated with the function, a negative example is a gene known not to be annotated with the function, and an unknown example is a gene for which the status with respect to the function is unknown. Typically we represent a positive example with +1, negative example with -1, and an unknown example as 0. However, this representation varies between the current algorithms. Please note that we use the term "state" to refer to the numerical value assigned to a gene (+1, -1, 0).

Positive examples are easy to generate since they are the gene-function pairs in the annotations file. More specifically, if the functions come from the Gene Ontology (GO), then GAIN considers a gene to be annotated with a function if there is a direct annotation (the gene-function pairs explicitly appears in the annotations files) or if the gene is annotated with a descendant of that function.

Unknown examples are also easy to generate. In doing so, GAIN only considers functions in the same category as the current function. A gene is an unknown example for a function if the gene has no functions annotating it or if the most specific annotation for the gene is an ancestor of the function.

Negative examples are harder to come by. GAIN uses the following heuristic: a gene is a negative example for a function if it is not a positive example or an unknown example for that function. In other words, the gene must not be annotated with that function, a descendant of the function, or an ancestor of the function, and must be annotated with some other function.

Like positives, unknowns and negatives may also be specified explicitly in the annotations file, with a value of 0 or -1, respectively, in the "annotation type" column.

GAIN is a command line tool. Numerous options govern its behavior and performance. The following sections first highlight the most important input files and output files, followed by explanations of each algorithm, and some sample invocations. Finally comes a comprehensive list of all command line options.

Input Files

GAIN can make predictions for both GO terms and non-GO terms. The following section outlines the usual files needed to make predictions dependent on the type of terms you wish to make predictions for.

Predicting GO Terms

Interactions File

File describing the functional linkage network (FLN) to be constructed. GAIN assumes that the FLN is undirected. This file is tab-delimited. The file has one edge per line and must have at least two columns, specifying the IDs of the two nodes connected by each edge. An optional 3rd column specifies the type of the edge. An optional 4th column specifies the weight of the edge. You can use this option multiple times. The node IDs should use the same naming scheme as that of the orf column of the annotations file. The interactions file is passed through the -i command line option.
Annotations File

File containing Gene Ontology (GO) functional annotations for the genes in the FLN. This file is tab-delimited. The first line is a header line, with the following elements:
- orf A systematic name for the gene.
- goid The ID of the GO function. You can leave in the "GO:0+" prefix for a function. GAIN will strip it out.
- hierarchy The GO category the function belongs to. You can use either the abbreviations "c", "f", and "p", the capitalised forms "C", "F", and "P", or the complete names "cellular_component", "molecular_function", and "biological_process".
- evidencecode The evidence code for an annotation. GAIN currently ignores this information. Future versions of GAIN will use this information.
- annotation type The value is either 1 (indicating that the gene is annotated with the function), 0 (indicating that the gene is unknown for the function), or -1 (indicating that the gene is not annotated with the function).
In principle, this file can contain any annotations for the genes, e.g., phenotype information. In this case, you can invent your own hierarchies and evidence codes.

The annotations file is passed through the -f command line option.
Gene Ontology File

.obo A file downloaded from the Gene Ontology that provides a controlled vocabulary for describing the annotation data.

The Gene Ontology file is passed through the --go-file command line option.

Predicting non-GO Terms

Predicting non-GO terms is nearly equivalent to predicting GO terms. You will need the same files and file structure that is used to predict GO terms. However, since you will not be predicting GO terms, any positive and negative annotations to a non-GO term must be mapped to a GO term. You must also include an only-functions file that contains the positive mapped GO-term IDs so that GAIN will only make predictions for those GO terms. See the example invocation section for information on invoking GAIN.

GAIN Generated Output Files

GAIN contains two command line options that will be helpful in interpreting output files.

To specify where GAIN should place the output files, you should use:

-o -output-directory filename: Name of directory to output all results to. If you do not provide this option, GAIN will not print any results to any files.
To specify an experiment name, you should use the -e option:

-e -experiment-name string: A string describing the experiment that generated the gene expression data.

The experiment name will be used in naming the output files generated by GAIN. By providing a comprehensive experiment name, such as "ova-local-yeast-2011-01", it will be easier to correlate your results to a particular experiment.

GAIN will output the following text files. Remember that "experiment-name" will be replaced by the string provided from use of the -e option:

db-experiment-name-cv.txt

Contains cross validation results for each function predicted in tabular format with the following data:
- confidence cutoff: The confidence cutoff is a threshold at which GAIN makes predictions. In other words, if a gene is predicted to have a function with confidence of 0.4 and the threshold is 0.5, GAIN will say that the gene does not meet the threshold and therefore is not a prediction.
- desired recall: The desired recall is the recall that you expected to have returned. Recall tells you that out of all of the positive instances, how many were correct. Or, in other words, how many correct predictions did you make that you should have made. Mathematically, recall is the number of true positives divided by the number of true positives plus the number of false negatives or TP/(TP+FN).
- actual recall: The actual recall is the actual recall that was received. Recall tells you that out of all of the positive instances, how many were correct. Or, in other words, how many correct predictions did you make that you should have made. Mathematically, recall is the number of true positives divided by the number of true positives plus the number of false negatives or TP/(TP+FN).
- precision: Precision tells you that out of all of the positive predictions that were made, how many of those were correct. Mathematically, precision is the number of true positives divided by the total number of predicted instances or TP/(TP + FP).
- false positive(FP) rate: The FP rate is the rate of which genes are incorrectly predicted to have functions that they do not have. Mathematically, FP rate is the number of false positives divided by the number of false positives plus the number of true negatives or FP / (FP + TN).
- true positive(TP): A TP occurs when a gene is correctly predicted to have a function.
- false positive(FP): A FP occurs when a gene is incorrectly predicted to have a function that it does not have.
- true negative(TN): A TN occurs when a gene is correctly predicted to not have a function.
- false negative(FN): A FN occurs when a gene is incorrectly predicted not to have a function that it has.
db-experiment-name-gene-universe.txt

Contains the structure of the generated graph in two column format with the first column specifying GraphID and the second column specifying NodeID. This file essentially contains the universe of genes over which GAIN operated.
db-experiment-name-grouped-cv.txt

Contains cross validation results based on groups of functions in tabular format with the same structure as that of db-experiment-name-cv.txt
db-experiment-name-invocation.txt

Contains parsed command line options that were invoked to run the algorithm.
db-experiment-name-log.txt

Contains the output from the console as the algorithm was run. It also contains additional log information from GAIN that is dependent upon the algorithm that was run.
db-experiment-name-stats.txt

Contains statistics regarding the algorithm chosen to run. i.e. the number of iterations to convergence.

Prediction Algorithms

GAIN operates on each function of interest in turn. It makes predictions independently for each function. There are two types of function prediction algorithms implemented in GAIN, One-Versus-All (OVA) and One-Versus-None (OVN).

Given a specific function f, OVA algorithms predict which genes have function f by propagating the labels f and not f across the FLN. Genes that have been annotated with some function other than f, or an ancestor/descendant of f, initially get the label not f, hence the name one-versus-all. You can invoke an OVA algorithm using the command-line switch --one-versus-all or --ova.

Given a specific function f, OVN algorithms predict which genes have function f by propagating the label f across the FLN. The name one-versus-none comes from the fact that these labels have no competition from not f and can thus overrun the entire FLN. You can invoke an OVN algorithm using the command-line switch --one-versus-none or --ovn.

Each of the following sections on prediction algorithms contains an explanation of the algorithm, an example invocation of the algorithm, and if appropriate, a citation to the paper describing the algorithm. Arguments in the following invocations are as follows:

-i interactions.txt: the interactions file used to construct the FLN.
-f annotations.txt: the annotations file.
-o output-dir: placing output files in the directory ./output-dir
--only-predictions: only compute predictions and perform no cross-validation.

One-Versus-All (OVA) algorithms
- Local rule (--ova local)
  
  gain -i interactions.txt --ova local -f annotations.txt -o output-dir --only-predictions
  
  This algorithm computes functional assignment by examining only the immediate neighbors of each unannotated node. In essence, this is a guilt-by-association algorithm. For each unannotated node, the algorithm sets its state to the weighted majority of its neighbors in the FLN.
- Hopfield network
  - Normal Hopfield network (--ova hopfield)
    
    gain -i interactions.txt --ova hopfield -f annotations.txt -o output-dir -O onlyTheseFunctions.txt
    
    This algorithm applies the local-rule algorithm repeatedly and serially to all the genes until the gene labels do not change.
    
    [3]
- Mincut (--ova mincut)
  
  gain -i interactions.txt --ova mincut -f annotations.txt -o output-dir --only-predictions
  
  This algorithm predicts gene functions by minimising the total weight of inconsistent edges in the functional linkage network. A consistent edge is an edge whose nodes share the same state. In essence, this is a global guilt-by-association technique. Please note that Mincut relies on you having the HIPR program installed on your computer. If you do not have hi_pr in your PATH, you will have to specify the path to the executable by using the --hipr-directory option.
  
  [9]
- SinkSource (--ova sinksource)
  
  gain -i interactions.txt --ova sinksource -f annotations.txt -o output-dir --only-predictions
  
  This algorithm is similar to FunctionalFlow, except that it incorporates negative as sinks. While FunctionalFlow allows flow to continue to propagate throughout (and ultimately fill) the network, SinkSource absorbs flow through the use of sinks. Sinks allow an infinite amount of fluid (or influence) to flow into them, therefore, stopping that influence from further propagating through the network.
  
  [10]
- GeneMANIA (--ova genemania)
  
  gain -i interactions.txt --ova genemania -f annotations.txt -o output-dir --only-predictions
  
  The algorithm is derived from ridge regression and operates by integrating multiple functional association networks into a single process-specific network to predict gene function.
  
  [7]
- Support Vector Machines (SVM)
  
  GAIN supports making predictions using several SVM-based approaches. GAIN does not implement SVMs itself. It assumes you have downloaded and installed the appropriate packages and that the executables are in your PATH. GAIN uses SVMs by feeding them the adjacency matrix of the FLN. By default, GAIN trains these SVMs using a linear kernel and assumes that the positive and negative examples are separable. Here are the different SVMs supported:
  - libSVM (--ova libsvm)
    
    gain -i interactions.txt --ova libsvm -f annotations.txt -o output-dir --only-predictions
    
    Use the libSVM library. You can tell GAIN in which directory on your machine the libSVM executables are located using the --libsvm-directory option. In addition, you can pass specific options to the libSVM trainer and tester with the --libsvm-train-options and --libsvm-test-options, respectively.
  - SVMLight (--ova svmlight)
    
    gain -i interactions.txt --ova svmlight -f annotations.txt -o output-dir --only-predictions
    
    Use the SVMLight package to train SVMs and make predictions. You can tell GAIN in which directory the SVMLight executables are located using the --svmlight-directory option. In addition, you can pass specific options to the SVMLight trainer and tester with the --svmlight-train-options and --svmlight-test-options, respectively.
  - Transductive SVMLight (--ova svmlight-transductive)
    
    gain -i interactions.txt --ova svmlight-transductive -f annotations.txt -o output-dir --only-predictions
    
    Use the transductive learner in the SVMLight package to train SVMs and make predictions. The advantage of this learner is that it exploits edges in the FLN between unlabelled and labelled examples to train the SVM and to make predictions.
One-Versus-None (OVN) algorithms
- FunctionalFlow (--ovn functional-flow)
  
  gain -i interactions.txt --ovn functional-flow -f annotations.txt -o output-dir --only-predictions
  
  This algorithm implements the FunctionalFlow algorithm where each annotated node is an infinite reservoir of functional flow. Initially, the reservoir for every unknown node is empty. In each round, "function" flows along the edges of the graph, "downhill" from larger reservoirs to smaller reservoirs. The amount of flow through each edge is bounded by the weight of that edge.
  
  [11]
- Local rule (--ovn local)
  
  gain -i interactions.txt --ovn local -f annotations.txt -o output-dir --only-predictions
  
  This algorithm is the same as the OVA version, except that negative examples are treated as unknowns.
- SinkSource (--ovn sinksource)
  
  gain -i interactions.txt --ovn sinksource -f annotations.txt -o output-dir --only-predictions
  
  This algorithm is the same as the OVA version, except that negative examples are treated as unknowns. Additionally, an artificial sink is connected to all unknowns to absorb flow. The weight of this connection can be controlled with the --ovn-sinksource-edge-weight option.
  
  [10]

Invoking GAIN

The following are sample invocations using various algorithms and optional arguments. In each of the following sample invocations, arguments are as follows:

-i interactions.txt: the interactions file used to construct the FLN.
-f annotations.txt: the annotations file.
-o output-dir: placing output files in the directory ./output-dir

gain -i interactions.txt --ovn local -f annotations.txt -o output-dir --only-functions-file only-functions.txt --cross-validate-fold 5 --unclamp-positives

This invocation states that GAIN should run the OVN local algorithm, place results in the directory output-dir, only make predictions for the functions located in only-functions.txt, use 5 fold cross validation, and allow positive examples to change state.
gain -i interactions.txt --ovn local -f annotations.txt -o output-dir --only-functions-file only-functions.txt --cross-validate-fold 3 --unclamp-negatives --distance 3 --detailed-cross-validation-results --use-custom-RNG-seed 2393849 --visualise

This invocation states that GAIN should run the OVN local algorithm, place results in the directory output-dir, only make predictions for the functions located in only-functions.txt, use 3 fold cross validation, allow negative examples to change, set the distance to 3 to which a node is considered a neighbor in the local algorithm, return detailed cross validation results, use a custom integer seed of 2393849 and also visualise the rationale for each prediction made by GAIN.

Command-line Options

General Options

These options specify how to control GAIN in general.

Short	Long	Type	Description	Default Value
-e	--experiment-name	string	A string describing the experiment that generated the gene expression data.
-f	--functions-file	string	Name of file containing a list of proteins and their functions.
-g	--gene-expression-file	string	Name of file containing the the gene expression data.
	--go-file	string	Name of file containing the definition of the gene ontology in OBO format. You can download this from the GO website.
-I	--ignore	string	Information to ignore. Use this option to tell GAIN to ignore particular functional categories, GO evidence codes, or interaction types. You may use this option multiple times.
	--ignore-evidence-code	string	GO evidence code to ignore. You may use this option multiple times.
-i	--interactions-file	string	Name of file containing pairs of linked genes in the FLN.
-o	--output-directory	string	Name of directory to output all results to. If you do not provide this option, GAIN will not print any results to any files.
-N	--overlapping-functions	string	Name of file containing a list of functions which have overlapping annotations.
-T	--type	string	Type of interaction data. Use 'unweighted' for interaction datasets that do not contain edge weights and 'weighted' for interaction data sets that contain edge weights.	'unweighted'

Options Related to Functional Linkage Network Construction

These options specify how to construct the Functional linkage network, including how to assess edge weights.

Short	Long	Type	Description	Default Value
-G	--group-functions-method	string	The method by which to group functions in GO in order to convert gene expression correlations to estimates of probabilities of shared function. Allowed values are parent (a group of functions share the same parent) and depth (a group of functions have the same minimum depth in the GO DAG).	'parent'
	--just-use-correlations	boolean	Use the absolute values of the correlations in gene expression data as edge weights.	OFF
	--integrate	string	Type of data integration to do. Allowed values are 'and' and 'or'.	'or'
	--original-annotations	boolean	The annotations file contains the original annotations downloaded from the GO website. They do not contain transitively closed annotations.	OFF
	--minimum-weight	float	Discard interactions/edges with weight/confidence less than this parameter.	1
	--apply-true-path-rule	boolean	Assuming that the annotations file contains the original annotations downloaded from the GO website, apply the true path rule to transfer annotations up the GO DAG, and annotation status of uknown down and sideways in the GO DAG. Use this option to allow any prediction algorithm to potentially make predictions that follow the true path rule, not just algorithms such as hierarchical-hopfield, which have been designed to do so.	OFF
	--no-true-path-rule-downward	boolean	Assuming that the annotations file contains the original annotations downloaded from the GO website, do not apply the true path rule to transfer the annotation status of 'unknown' down and sideways in the GO DAG. This option is useful for post-2009 versions of the GO DAG, which cause downward application of the true path rule to use more 3GB of RAM. This option will force GAIN to explicitly check if a gene is in HYPOTHETICAL_STATE with respect to a GO term, potentially slowing down the prediction and cross-validation stages.	OFF
-S	--sanity-check	boolean	Perform a sanity check of the data. This optionchecks (i) if the IDs of the gene expression file (when provided) match those in the annotations file. GAIN outputs results to a file ending in the string 'sanity-check.txt'	OFF

Options to Select Algorithms

These options specify which function prediction algorithms to run.

Short	Long	Type	Description	Default Value
-d	--degree	boolean	Divide the input from a node's neighbours by the degree of that node.	ON
-D	--distance	integer	With the local neighbourhood algorithm, use this option to specify the distance a node can be to be considered a neighbour.	1
	--num-rounds	integer	This option has different meanings depending on the algorithm being used. When applied to the one-versus-none algorithm functional-flow, this option specifies how many rounds of flow the algorithm should push.	5
	--one-versus-all	string	The argument specifies which one-versus-all algorithm to run. You can use this option multiple times. If you specify any of the hierarchical algorithms, you must also provide (i) a GO OBO file using the --go-file option, (ii) original, non-transitively-closed functional annotations using the -f option, and (iii) pass the --original-annotations option. (possible values are "genemania", "hopfield", "local", "libsvm", "mincut", "sinksource", "svmlight", "svmlight-transductive")
	--ova	string	An alias for the --one-versus-all option.
	--one-versus-none	string	The argument specifies which one-versus-none algorithm to run. You can use this option multiple times. (possible values are "functional-flow", "local", "sinksource")
	--ovn	string	An alias for the --one-versus-none option.
	--hipr-directory	string	The name of the directory containing the HIPR executable `hi_pr`. Use this option if this executable is not in your path.
	--libsvm-directory	string	The name of the directory containing the libSVM executables `svm-train` and `svm-predict`. Use this option if these executables are not in your path.
	--svmlight-directory	string	The name of the directory containing the SVMLight executables `svm_learn` and `svm_classify`. Use this option if these executables are not in your path.

Algorithm Control Options

These options control how the selected algorithms are executed and which biological functions the algorithms make predictions for. The option description will specify if the option applies only to a subset of the algorithms.

Short	Long	Type	Description	Default Value
	--maximum-go-depth	integer	Do not make predictions or perform cross-validation for functions with depth greater than this parameter in the GO Directed Acyclic Graph. This option speeds up GAIN by preventing it from performing any operations for very specific functions in GO, which may have very few annotations. If you use this option, you must also provide the GO file in OBO format using the `--go-file` option.	Maximum depth in the GO DAG
	--minimum-go-depth	integer	Do not make predictions or perform cross-validation for functions with depth less than this parameter in the GO Directed Acyclic Graph. This option speeds up GAIN by preventing it from performing any operations for very general functions in GO. If you use this option, you must also provide the GO file in OBO format using the `--go-file` option.	1
	--maximum-annotated-genes	integer	Do not make predictions or perform cross-validation for functions annotating more genes than this parameter.	Number of genes annotated by the largest function
	--minimum-annotated-genes	integer	Do not make predictions or perform cross-validation for functions annotating fewer genes than this parameter.	1
	--num-print-predictions	integer	The number of predictions to print per function. Some algorithms, e.g., sinksource, predict a confidence for every gene for every function. This option allows the user to control the number of predictions per function that are output. Use -1 if you want all predictions to be printed, but be warned that the output file can be very large.	100
	--number-runs	integer	The number of times to run the algorithm. When applied to the one-versus-all algorithms 'semi-hierarchical-hopfield' and 'hierarchical-hopfield', this option specifies how many times to run the algorithm, with each run using a different permutation of the nodes.	1
	--only-category	string	Run the GAIN algorithm only for functions belonging to this category. (possible values are "cellular component", "c", "molecular function", "f", "biological process", "p")
	--only-cv	boolean	Do ONLY cross-validation. Do NOT do predictions.	OFF
-O	--only-functions-file	string	Run the GAIN algorithm only for functions in this file. The file should contain one GO id per line.
	--only-predictions	boolean	Do ONLY predictions. Do NOT do cross-validation.	OFF
	--ovn-sinksource-edge-weight	float	The weight of the artificial edges added to the FLN by the one-versus-none SinkSource algorithm.	1
	--libsvm-test-options	string	Extra options to pass to the libSVM testing programme. If an option contains a space, enclose that option in quotes.
	--libsvm-train-options	string	Extra options to pass to the libSVM training programme. If an option contains a space, enclose that option in quotes.
	--svmlight-test-options	string	Extra options to pass to the SVMLight testing programme. If an option contains a space, enclose that option in quotes.
	--svmlight-train-options	string	Extra options to pass to the SVMLight training programme. If an option contains a space, enclose that option in quotes.
-t	--threshold	float	Threshold for the Hopfield network algorithm.	0
	--unclamp-positives	boolean	Allow the state of a positive (+1) node to change.	OFF
	--unclamp-negatives	boolean	Allow the state of a negative (-1) node to change.	OFF
	--weight-evidence-codes	boolean	Weight (GO) evidence codes (TAS = 0.9, IDA = 0.9, IMP = 0.7, IGI = 0.7, IPI = 0.7, ISS = 0.6, IEP = 0.6, NAS = 0.4, IEA = 0.1). These weights are based on the loose hierarchicy of evidence codes available at the Gene Ontology. They are hard-coded in the software.	OFF
	--weight-interaction-types-cutoff	string	Weight interaction types for each max-gene-count given in this file. If no file is given, the default scheme is used.
	--weight-interaction-types-depth	boolean	Weight interaction types for each depth in the Gene Ontology hierarchy.	OFF
	--group-interaction-types	string	When weighting interaction types, lump them into groups as specified in this file.
	--edge-weighting-scheme	string	Specifies how multiple types per edge should be handled. See documentation for details. (possible values are "linear", "probabilistic", "shared")	shared

Options to Evaluate Performance

These options specify how to evaluate the performance of GAIN.

Short	Long	Type	Description	Default Value
-v	--cross-validate	boolean	Cross-validate the GAIN net.	ON
-F	--cross-validate-fold	integer	Specify the k in k-fold cross-validation. If this option is 1, do leave-one-out cross validation.	5
	--detailed-cross-validation-results	boolean	Print detailed results on cross validation. For every gene-function pair tested by the cross-validation procedure, print the predicted function, the true function, and whether the result was a true/false positive or a true/false negative. WARNING: This option will create a very large output file, especially if you are analysing many functions.	OFF
-X	--file-cross-validate	string	Cross-validate the Gain net using the cross-validation set in the file. To indicate that the file contains information for leave-one-out cross validation, use the -F 1 option. In this case, each line of the file contains three items, separated by tabs: the GO id, the gene id, and +1 if the GO id annotates the gene or -1 if the GO id does not annotate this gene but annotates another gene.
	--evaluate-predictions	string	Evaluate the correctness of the predictions based on the (new) annotations in the argument.
	--no-graphviz	boolean	Do not run the Graphviz programmes to create images files. You must use this option with one of the visualisation options. However, this option will only create the input files for the Graphviz programmes, thus saving considerable running time and disk space.	OFF
	--min-confidence	float	Do not consider predictions with confidence below this value. The current version of GAIN uses this value only in conjunction with the --visualise option: It does not create propagation diagrams for gene-function pairs whose confidence is less than this value.	0
	--predictions-file	string	File containing predictions made in an earlier run of GAIN or by another algorithm. Use this option with the --validate-predictions option when you want to run GAIN only to check how many predictions made by an earlier run of GAIN or by another algorithm are correct.
	--visualise	boolean	Visualise the rationale behind each prediction. GAIN will lay out the propagation diagram for each prediction (gene-function pair). For this option to work, you must have the programmes `dot` and `neato` (available in the graphviz package) in your path. Using this option will cause GAIN to slow down by a factor of 10 and use up gobs of disk space.	OFF
	--visualise-cut	boolean	Visualise the cut induced by the predictions for each function. For each function, GAIN lays out the subgraph of the FLN induced by the set of genes predicted/annotated to have the function as well as the edges in the FLN connected these nodes to any nodes predicted/annotated not to have the function. For this option to work, you must have the programmes `dot` and `neato` (available in the graphviz package) in your path. Using this option will cause GAIN to slow down by a factor of 10 and use up gobs of disk space.	OFF
	--visualise-cross-validation	boolean	Visualise the rationale behind the cross validation results. For each gene in each cross validation set for each function, GAIN lays out the propagation diagram that yields the prediction for that gene. For this option to work, you must have the programmes `dot` and `neato` (available in the graphviz package) in your path. Using this option will cause GAIN to slow down by a factor of 10 and use up gobs of disk space.	OFF
	--visualise-params-file	string	The name of the file containing parameters setting up how nodes and edges look in the propagations diagrams.	'params.txt'

Miscellaneous Options

These options have not yet been categorised or do not belong to any category. You probably do not need them at all. They may drop off the face of the earth in the future.

Short	Long	Type	Description	Default Value
-p	--check-propagation	boolean	Check (and prove) whether information propagates in the FLN.	OFF
-P	--pvalue	boolean	Compute p-values for functional assignments.	OFF
	--print-all	boolean	Print the states of all nodes in the network.	OFF
	--randomise	boolean	Generate a random graph with the same degree distribution as the input graph.	OFF
	--no-reduce	boolean	The normal behaviour of GAIN is to remove all connected component from the FLN that do not contain both positive and negative examples. This option turns off this behaviour. This behaviour is also automatically turned off if any algorithm is invoked with the --one-versus-none option. If you want to run a one-versus-all algorithm with this option and a one-versus-none algorithm, invoke the algorithms in separate runs of GAIN.	OFF
	--treewidth	boolean	Compute the treewidth of the FLN. At the moment, GAIN will compute an upper bound on the treewidth, an associated tree decomposition and exit. Future versions of GAIN will incorporate the tree decomposition into the function prediction engines.	OFF
-w	--weights-file	string	Name of file containing interaction weights. Each line of the file should contain the two proteins and the weight separated by tabs.
-z	--allow-zero-states	boolean	Allow the state of a node to be zero (i.e., HYPOTHETICAL).	ON
	--min-threshold	float	Begin threshold search here. This parameter can be a floating-point number.	-1.0
	--max-threshold	float	End threshold search here. This parameter can be a floating-point number.	1.0
	--num-thresholds	integer	Do this many threshold steps.	10
	--explain	boolean	Output a description of the neighbourhood of all +1 nodes.	OFF
	--use-custom-RNG-seed	integer	Use a custom seed for random number generation.	0