Biorithm  1.1
GAIN

This manual is for GAIN, a software package for computational prediction of gene function.

Index

  1. Introduction : What Does GAIN Do?
  2. Installation : Downloading and Installing GAIN
  3. Invoking GAIN : Details of the GAIN Command Line

Introduction

Gene Annotation using Integrated Networks (GAIN) is a computational system for automatically and robustly predicting the functions of genes. GAIN operates on a functional linkage network (FLN), which is a graph whose nodes are genes and whose edges connect genes that may share the same function. In its current form, GAIN constructs a FLN for a single organism by integrating functional genomic information such as gene expression data, protein-protein interactions, and protein-DNA binding data. GAIN includes a number of algorithms for systematically propagating annotations through the entire FLN. One of the features of many of these algorithms is that we can represent the flow of information in the FLN as a directed graph and provide visualisations of this graph to biologists.

Installation

Download the Biorithm package and follow the installation instructions for Biorithm. The executable file will be available as gain/gain.

Invoking GAIN

  1. How GAIN Works : Details on How GAIN Works
  2. Input Files : Specifying Input Files for GAIN
  3. GAIN Generated Output Files : Details of the output files generated by GAIN
  4. Prediction Algorithms : Selecting which gene function prediction algorithm to invoke
  5. Command-line Options : All command line options

How GAIN Works

GAIN operates by making predictions for each gene function independently. Therefore, it can make multiple predictions for the same gene.

Most algorithms in GAIN are semi-supervised learning algorithms, i.e., they simultaneously analyse the relationships (as encoded in the FLN) between positive, negative, and unknown examples to make predictions regarding the unknown examples. An important aspect of GAIN is how it generates training data from the input files. For a specific function, a positive example is a gene known to be annotated with the function, a negative example is a gene known not to be annotated with the function, and an unknown example is a gene for which the status with respect to the function is unknown. Typically we represent a positive example with +1, negative example with -1, and an unknown example as 0. However, this representation varies between the current algorithms. Please note that we use the term "state" to refer to the numerical value assigned to a gene (+1, -1, 0).

Positive examples are easy to generate since they are the gene-function pairs in the annotations file. More specifically, if the functions come from the Gene Ontology (GO), then GAIN considers a gene to be annotated with a function if there is a direct annotation (the gene-function pairs explicitly appears in the annotations files) or if the gene is annotated with a descendant of that function.

Unknown examples are also easy to generate. In doing so, GAIN only considers functions in the same category as the current function. A gene is an unknown example for a function if the gene has no functions annotating it or if the most specific annotation for the gene is an ancestor of the function.

Negative examples are harder to come by. GAIN uses the following heuristic: a gene is a negative example for a function if it is not a positive example or an unknown example for that function. In other words, the gene must not be annotated with that function, a descendant of the function, or an ancestor of the function, and must be annotated with some other function.

Like positives, unknowns and negatives may also be specified explicitly in the annotations file, with a value of 0 or -1, respectively, in the "annotation type" column.

GAIN is a command line tool. Numerous options govern its behavior and performance. The following sections first highlight the most important input files and output files, followed by explanations of each algorithm, and some sample invocations. Finally comes a comprehensive list of all command line options.

Input Files

GAIN can make predictions for both GO terms and non-GO terms. The following section outlines the usual files needed to make predictions dependent on the type of terms you wish to make predictions for.

Predicting GO Terms

  1. Interactions File

    File describing the functional linkage network (FLN) to be constructed. GAIN assumes that the FLN is undirected. This file is tab-delimited. The file has one edge per line and must have at least two columns, specifying the IDs of the two nodes connected by each edge. An optional 3rd column specifies the type of the edge. An optional 4th column specifies the weight of the edge. You can use this option multiple times. The node IDs should use the same naming scheme as that of the orf column of the annotations file. The interactions file is passed through the -i command line option.

  2. Annotations File

    File containing Gene Ontology (GO) functional annotations for the genes in the FLN. This file is tab-delimited. The first line is a header line, with the following elements:

    • orf A systematic name for the gene.
    • goid The ID of the GO function. You can leave in the "GO:0+" prefix for a function. GAIN will strip it out.
    • hierarchy The GO category the function belongs to. You can use either the abbreviations "c", "f", and "p", the capitalised forms "C", "F", and "P", or the complete names "cellular_component", "molecular_function", and "biological_process".
    • evidencecode The evidence code for an annotation. GAIN currently ignores this information. Future versions of GAIN will use this information.
    • annotation type The value is either 1 (indicating that the gene is annotated with the function), 0 (indicating that the gene is unknown for the function), or -1 (indicating that the gene is not annotated with the function).

    In principle, this file can contain any annotations for the genes, e.g., phenotype information. In this case, you can invent your own hierarchies and evidence codes.

    The annotations file is passed through the -f command line option.

  3. Gene Ontology File

    .obo A file downloaded from the Gene Ontology that provides a controlled vocabulary for describing the annotation data.

    The Gene Ontology file is passed through the --go-file command line option.

Predicting non-GO Terms

Predicting non-GO terms is nearly equivalent to predicting GO terms. You will need the same files and file structure that is used to predict GO terms. However, since you will not be predicting GO terms, any positive and negative annotations to a non-GO term must be mapped to a GO term. You must also include an only-functions file that contains the positive mapped GO-term IDs so that GAIN will only make predictions for those GO terms. See the example invocation section for information on invoking GAIN.

GAIN Generated Output Files

GAIN contains two command line options that will be helpful in interpreting output files.

  1. To specify where GAIN should place the output files, you should use:

    -o -output-directory filename: Name of directory to output all results to. If you do not provide this option, GAIN will not print any results to any files.

  2. To specify an experiment name, you should use the -e option:

    -e -experiment-name string: A string describing the experiment that generated the gene expression data.

    The experiment name will be used in naming the output files generated by GAIN. By providing a comprehensive experiment name, such as "ova-local-yeast-2011-01", it will be easier to correlate your results to a particular experiment.

GAIN will output the following text files. Remember that "experiment-name" will be replaced by the string provided from use of the -e option:

  1. db-experiment-name-cv.txt

    Contains cross validation results for each function predicted in tabular format with the following data:

    • confidence cutoff: The confidence cutoff is a threshold at which GAIN makes predictions. In other words, if a gene is predicted to have a function with confidence of 0.4 and the threshold is 0.5, GAIN will say that the gene does not meet the threshold and therefore is not a prediction.
    • desired recall: The desired recall is the recall that you expected to have returned. Recall tells you that out of all of the positive instances, how many were correct. Or, in other words, how many correct predictions did you make that you should have made. Mathematically, recall is the number of true positives divided by the number of true positives plus the number of false negatives or TP/(TP+FN).
    • actual recall: The actual recall is the actual recall that was received. Recall tells you that out of all of the positive instances, how many were correct. Or, in other words, how many correct predictions did you make that you should have made. Mathematically, recall is the number of true positives divided by the number of true positives plus the number of false negatives or TP/(TP+FN).
    • precision: Precision tells you that out of all of the positive predictions that were made, how many of those were correct. Mathematically, precision is the number of true positives divided by the total number of predicted instances or TP/(TP + FP).
    • false positive(FP) rate: The FP rate is the rate of which genes are incorrectly predicted to have functions that they do not have. Mathematically, FP rate is the number of false positives divided by the number of false positives plus the number of true negatives or FP / (FP + TN).
    • true positive(TP): A TP occurs when a gene is correctly predicted to have a function.
    • false positive(FP): A FP occurs when a gene is incorrectly predicted to have a function that it does not have.
    • true negative(TN): A TN occurs when a gene is correctly predicted to not have a function.
    • false negative(FN): A FN occurs when a gene is incorrectly predicted not to have a function that it has.

  2. db-experiment-name-gene-universe.txt

    Contains the structure of the generated graph in two column format with the first column specifying GraphID and the second column specifying NodeID. This file essentially contains the universe of genes over which GAIN operated.

  3. db-experiment-name-grouped-cv.txt

    Contains cross validation results based on groups of functions in tabular format with the same structure as that of db-experiment-name-cv.txt

  4. db-experiment-name-invocation.txt

    Contains parsed command line options that were invoked to run the algorithm.

  5. db-experiment-name-log.txt

    Contains the output from the console as the algorithm was run. It also contains additional log information from GAIN that is dependent upon the algorithm that was run.

  6. db-experiment-name-stats.txt

    Contains statistics regarding the algorithm chosen to run. i.e. the number of iterations to convergence.

Prediction Algorithms

GAIN operates on each function of interest in turn. It makes predictions independently for each function. There are two types of function prediction algorithms implemented in GAIN, One-Versus-All (OVA) and One-Versus-None (OVN).

Given a specific function f, OVA algorithms predict which genes have function f by propagating the labels f and not f across the FLN. Genes that have been annotated with some function other than f, or an ancestor/descendant of f, initially get the label not f, hence the name one-versus-all. You can invoke an OVA algorithm using the command-line switch --one-versus-all or --ova.

Given a specific function f, OVN algorithms predict which genes have function f by propagating the label f across the FLN. The name one-versus-none comes from the fact that these labels have no competition from not f and can thus overrun the entire FLN. You can invoke an OVN algorithm using the command-line switch --one-versus-none or --ovn.

Each of the following sections on prediction algorithms contains an explanation of the algorithm, an example invocation of the algorithm, and if appropriate, a citation to the paper describing the algorithm. Arguments in the following invocations are as follows:

  1. One-Versus-All (OVA) algorithms

    • Local rule (--ova local)

      gain -i interactions.txt --ova local -f annotations.txt -o output-dir --only-predictions

      This algorithm computes functional assignment by examining only the immediate neighbors of each unannotated node. In essence, this is a guilt-by-association algorithm. For each unannotated node, the algorithm sets its state to the weighted majority of its neighbors in the FLN.

    • Hopfield network

      • Normal Hopfield network (--ova hopfield)

        gain -i interactions.txt --ova hopfield -f annotations.txt -o output-dir -O onlyTheseFunctions.txt

        This algorithm applies the local-rule algorithm repeatedly and serially to all the genes until the gene labels do not change.

        [3]

    • Mincut (--ova mincut)

      gain -i interactions.txt --ova mincut -f annotations.txt -o output-dir --only-predictions

      This algorithm predicts gene functions by minimising the total weight of inconsistent edges in the functional linkage network. A consistent edge is an edge whose nodes share the same state. In essence, this is a global guilt-by-association technique. Please note that Mincut relies on you having the HIPR program installed on your computer. If you do not have hi_pr in your PATH, you will have to specify the path to the executable by using the --hipr-directory option.

      [9]

    • SinkSource (--ova sinksource)

      gain -i interactions.txt --ova sinksource -f annotations.txt -o output-dir --only-predictions

      This algorithm is similar to FunctionalFlow, except that it incorporates negative as sinks. While FunctionalFlow allows flow to continue to propagate throughout (and ultimately fill) the network, SinkSource absorbs flow through the use of sinks. Sinks allow an infinite amount of fluid (or influence) to flow into them, therefore, stopping that influence from further propagating through the network.

      [10]

    • GeneMANIA (--ova genemania)

      gain -i interactions.txt --ova genemania -f annotations.txt -o output-dir --only-predictions

      The algorithm is derived from ridge regression and operates by integrating multiple functional association networks into a single process-specific network to predict gene function.

      [7]

    • Support Vector Machines (SVM)

      GAIN supports making predictions using several SVM-based approaches. GAIN does not implement SVMs itself. It assumes you have downloaded and installed the appropriate packages and that the executables are in your PATH. GAIN uses SVMs by feeding them the adjacency matrix of the FLN. By default, GAIN trains these SVMs using a linear kernel and assumes that the positive and negative examples are separable. Here are the different SVMs supported:

      • libSVM (--ova libsvm)

        gain -i interactions.txt --ova libsvm -f annotations.txt -o output-dir --only-predictions

        Use the libSVM library. You can tell GAIN in which directory on your machine the libSVM executables are located using the --libsvm-directory option. In addition, you can pass specific options to the libSVM trainer and tester with the --libsvm-train-options and --libsvm-test-options, respectively.

      • SVMLight (--ova svmlight)

        gain -i interactions.txt --ova svmlight -f annotations.txt -o output-dir --only-predictions

        Use the SVMLight package to train SVMs and make predictions. You can tell GAIN in which directory the SVMLight executables are located using the --svmlight-directory option. In addition, you can pass specific options to the SVMLight trainer and tester with the --svmlight-train-options and --svmlight-test-options, respectively.

      • Transductive SVMLight (--ova svmlight-transductive)

        gain -i interactions.txt --ova svmlight-transductive -f annotations.txt -o output-dir --only-predictions

        Use the transductive learner in the SVMLight package to train SVMs and make predictions. The advantage of this learner is that it exploits edges in the FLN between unlabelled and labelled examples to train the SVM and to make predictions.

  2. One-Versus-None (OVN) algorithms

    • FunctionalFlow (--ovn functional-flow)

      gain -i interactions.txt --ovn functional-flow -f annotations.txt -o output-dir --only-predictions

      This algorithm implements the FunctionalFlow algorithm where each annotated node is an infinite reservoir of functional flow. Initially, the reservoir for every unknown node is empty. In each round, "function" flows along the edges of the graph, "downhill" from larger reservoirs to smaller reservoirs. The amount of flow through each edge is bounded by the weight of that edge.

      [11]

    • Local rule (--ovn local)

      gain -i interactions.txt --ovn local -f annotations.txt -o output-dir --only-predictions

      This algorithm is the same as the OVA version, except that negative examples are treated as unknowns.

    • SinkSource (--ovn sinksource)

      gain -i interactions.txt --ovn sinksource -f annotations.txt -o output-dir --only-predictions

      This algorithm is the same as the OVA version, except that negative examples are treated as unknowns. Additionally, an artificial sink is connected to all unknowns to absorb flow. The weight of this connection can be controlled with the --ovn-sinksource-edge-weight option.

      [10]

Invoking GAIN

The following are sample invocations using various algorithms and optional arguments. In each of the following sample invocations, arguments are as follows:

  1. gain -i interactions.txt --ovn local -f annotations.txt -o output-dir --only-functions-file only-functions.txt --cross-validate-fold 5 --unclamp-positives

    This invocation states that GAIN should run the OVN local algorithm, place results in the directory output-dir, only make predictions for the functions located in only-functions.txt, use 5 fold cross validation, and allow positive examples to change state.

  2. gain -i interactions.txt --ovn local -f annotations.txt -o output-dir --only-functions-file only-functions.txt --cross-validate-fold 3 --unclamp-negatives --distance 3 --detailed-cross-validation-results --use-custom-RNG-seed 2393849 --visualise

    This invocation states that GAIN should run the OVN local algorithm, place results in the directory output-dir, only make predictions for the functions located in only-functions.txt, use 3 fold cross validation, allow negative examples to change, set the distance to 3 to which a node is considered a neighbor in the local algorithm, return detailed cross validation results, use a custom integer seed of 2393849 and also visualise the rationale for each prediction made by GAIN.

Command-line Options

General Options

These options specify how to control GAIN in general.

Short Long Type Description Default Value
-e --experiment-name string A string describing the experiment that generated the gene expression data.
-f --functions-file string Name of file containing a list of proteins and their functions.
-g --gene-expression-file string Name of file containing the the gene expression data.
--go-file string Name of file containing the definition of the gene ontology in OBO format. You can download this from the GO website.
-I --ignore string Information to ignore. Use this option to tell GAIN to ignore particular functional categories, GO evidence codes, or interaction types. You may use this option multiple times.
--ignore-evidence-code string GO evidence code to ignore. You may use this option multiple times.
-i --interactions-file string Name of file containing pairs of linked genes in the FLN.
-o --output-directory string Name of directory to output all results to. If you do not provide this option, GAIN will not print any results to any files.
-N --overlapping-functions string Name of file containing a list of functions which have overlapping annotations.
-T --type string Type of interaction data. Use 'unweighted' for interaction datasets that do not contain edge weights and 'weighted' for interaction data sets that contain edge weights. 'unweighted'

Options Related to Functional Linkage Network Construction

These options specify how to construct the Functional linkage network, including how to assess edge weights.

Short Long Type Description Default Value
-G --group-functions-method string The method by which to group functions in GO in order to convert gene expression correlations to estimates of probabilities of shared function. Allowed values are parent (a group of functions share the same parent) and depth (a group of functions have the same minimum depth in the GO DAG). 'parent'
--just-use-correlations boolean Use the absolute values of the correlations in gene expression data as edge weights. OFF
--integrate string Type of data integration to do. Allowed values are 'and' and 'or'. 'or'
--original-annotations boolean The annotations file contains the original annotations downloaded from the GO website. They do not contain transitively closed annotations. OFF
--minimum-weight float Discard interactions/edges with weight/confidence less than this parameter. 1
--apply-true-path-rule boolean Assuming that the annotations file contains the original annotations downloaded from the GO website, apply the true path rule to transfer annotations up the GO DAG, and annotation status of uknown down and sideways in the GO DAG. Use this option to allow any prediction algorithm to potentially make predictions that follow the true path rule, not just algorithms such as hierarchical-hopfield, which have been designed to do so. OFF
--no-true-path-rule-downward boolean Assuming that the annotations file contains the original annotations downloaded from the GO website, do not apply the true path rule to transfer the annotation status of 'unknown' down and sideways in the GO DAG. This option is useful for post-2009 versions of the GO DAG, which cause downward application of the true path rule to use more 3GB of RAM. This option will force GAIN to explicitly check if a gene is in HYPOTHETICAL_STATE with respect to a GO term, potentially slowing down the prediction and cross-validation stages. OFF
-S --sanity-check boolean Perform a sanity check of the data. This optionchecks (i) if the IDs of the gene expression file (when provided) match those in the annotations file. GAIN outputs results to a file ending in the string 'sanity-check.txt' OFF

Options to Select Algorithms

These options specify which function prediction algorithms to run.

Short Long Type Description Default Value
-d --degree boolean Divide the input from a node's neighbours by the degree of that node. ON
-D --distance integer With the local neighbourhood algorithm, use this option to specify the distance a node can be to be considered a neighbour. 1
--num-rounds integer This option has different meanings depending on the algorithm being used. When applied to the one-versus-none algorithm functional-flow, this option specifies how many rounds of flow the algorithm should push. 5
--one-versus-all string The argument specifies which one-versus-all algorithm to run. You can use this option multiple times. If you specify any of the hierarchical algorithms, you must also provide (i) a GO OBO file using the --go-file option, (ii) original, non-transitively-closed functional annotations using the -f option, and (iii) pass the --original-annotations option. (possible values are "genemania", "hopfield", "local", "libsvm", "mincut", "sinksource", "svmlight", "svmlight-transductive")
--ova string An alias for the --one-versus-all option.
--one-versus-none string The argument specifies which one-versus-none algorithm to run. You can use this option multiple times. (possible values are "functional-flow", "local", "sinksource")
--ovn string An alias for the --one-versus-none option.
--hipr-directory string The name of the directory containing the HIPR executable hi_pr. Use this option if this executable is not in your path.
--libsvm-directory string The name of the directory containing the libSVM executables svm-train and svm-predict. Use this option if these executables are not in your path.
--svmlight-directory string The name of the directory containing the SVMLight executables svm_learn and svm_classify. Use this option if these executables are not in your path.

Algorithm Control Options

These options control how the selected algorithms are executed and which biological functions the algorithms make predictions for. The option description will specify if the option applies only to a subset of the algorithms.

Short Long Type Description Default Value
--maximum-go-depth integer Do not make predictions or perform cross-validation for functions with depth greater than this parameter in the GO Directed Acyclic Graph. This option speeds up GAIN by preventing it from performing any operations for very specific functions in GO, which may have very few annotations. If you use this option, you must also provide the GO file in OBO format using the --go-file option. Maximum depth in the GO DAG
--minimum-go-depth integer Do not make predictions or perform cross-validation for functions with depth less than this parameter in the GO Directed Acyclic Graph. This option speeds up GAIN by preventing it from performing any operations for very general functions in GO. If you use this option, you must also provide the GO file in OBO format using the --go-file option. 1
--maximum-annotated-genes integer Do not make predictions or perform cross-validation for functions annotating more genes than this parameter. Number of genes annotated by the largest function
--minimum-annotated-genes integer Do not make predictions or perform cross-validation for functions annotating fewer genes than this parameter. 1
--num-print-predictions integer The number of predictions to print per function. Some algorithms, e.g., sinksource, predict a confidence for every gene for every function. This option allows the user to control the number of predictions per function that are output. Use -1 if you want all predictions to be printed, but be warned that the output file can be very large. 100
--number-runs integer The number of times to run the algorithm. When applied to the one-versus-all algorithms 'semi-hierarchical-hopfield' and 'hierarchical-hopfield', this option specifies how many times to run the algorithm, with each run using a different permutation of the nodes. 1
--only-category string Run the GAIN algorithm only for functions belonging to this category. (possible values are "cellular component", "c", "molecular function", "f", "biological process", "p")
--only-cv boolean Do ONLY cross-validation. Do NOT do predictions. OFF
-O --only-functions-file string Run the GAIN algorithm only for functions in this file. The file should contain one GO id per line.
--only-predictions boolean Do ONLY predictions. Do NOT do cross-validation. OFF
--ovn-sinksource-edge-weight float The weight of the artificial edges added to the FLN by the one-versus-none SinkSource algorithm. 1
--libsvm-test-options string Extra options to pass to the libSVM testing programme. If an option contains a space, enclose that option in quotes.
--libsvm-train-options string Extra options to pass to the libSVM training programme. If an option contains a space, enclose that option in quotes.
--svmlight-test-options string Extra options to pass to the SVMLight testing programme. If an option contains a space, enclose that option in quotes.
--svmlight-train-options string Extra options to pass to the SVMLight training programme. If an option contains a space, enclose that option in quotes.
-t --threshold float Threshold for the Hopfield network algorithm. 0
--unclamp-positives boolean Allow the state of a positive (+1) node to change. OFF
--unclamp-negatives boolean Allow the state of a negative (-1) node to change. OFF
--weight-evidence-codes boolean Weight (GO) evidence codes (TAS = 0.9, IDA = 0.9, IMP = 0.7, IGI = 0.7, IPI = 0.7, ISS = 0.6, IEP = 0.6, NAS = 0.4, IEA = 0.1). These weights are based on the loose hierarchicy of evidence codes available at the Gene Ontology. They are hard-coded in the software. OFF
--weight-interaction-types-cutoff string Weight interaction types for each max-gene-count given in this file. If no file is given, the default scheme is used.
--weight-interaction-types-depth boolean Weight interaction types for each depth in the Gene Ontology hierarchy. OFF
--group-interaction-types string When weighting interaction types, lump them into groups as specified in this file.
--edge-weighting-scheme string Specifies how multiple types per edge should be handled. See documentation for details. (possible values are "linear", "probabilistic", "shared") shared

Options to Evaluate Performance

These options specify how to evaluate the performance of GAIN.

Short Long Type Description Default Value
-v --cross-validate boolean Cross-validate the GAIN net. ON
-F --cross-validate-fold integer Specify the k in k-fold cross-validation. If this option is 1, do leave-one-out cross validation. 5
--detailed-cross-validation-results boolean Print detailed results on cross validation. For every gene-function pair tested by the cross-validation procedure, print the predicted function, the true function, and whether the result was a true/false positive or a true/false negative. WARNING: This option will create a very large output file, especially if you are analysing many functions. OFF
-X --file-cross-validate string Cross-validate the Gain net using the cross-validation set in the file. To indicate that the file contains information for leave-one-out cross validation, use the -F 1 option. In this case, each line of the file contains three items, separated by tabs: the GO id, the gene id, and +1 if the GO id annotates the gene or -1 if the GO id does not annotate this gene but annotates another gene.
--evaluate-predictions string Evaluate the correctness of the predictions based on the (new) annotations in the argument.
--no-graphviz boolean Do not run the Graphviz programmes to create images files. You must use this option with one of the visualisation options. However, this option will only create the input files for the Graphviz programmes, thus saving considerable running time and disk space. OFF
--min-confidence float Do not consider predictions with confidence below this value. The current version of GAIN uses this value only in conjunction with the --visualise option: It does not create propagation diagrams for gene-function pairs whose confidence is less than this value. 0
--predictions-file string File containing predictions made in an earlier run of GAIN or by another algorithm. Use this option with the --validate-predictions option when you want to run GAIN only to check how many predictions made by an earlier run of GAIN or by another algorithm are correct.
--visualise boolean Visualise the rationale behind each prediction. GAIN will lay out the propagation diagram for each prediction (gene-function pair). For this option to work, you must have the programmes dot and neato (available in the graphviz package) in your path. Using this option will cause GAIN to slow down by a factor of 10 and use up gobs of disk space. OFF
--visualise-cut boolean Visualise the cut induced by the predictions for each function. For each function, GAIN lays out the subgraph of the FLN induced by the set of genes predicted/annotated to have the function as well as the edges in the FLN connected these nodes to any nodes predicted/annotated not to have the function. For this option to work, you must have the programmes dot and neato (available in the graphviz package) in your path. Using this option will cause GAIN to slow down by a factor of 10 and use up gobs of disk space. OFF
--visualise-cross-validation boolean Visualise the rationale behind the cross validation results. For each gene in each cross validation set for each function, GAIN lays out the propagation diagram that yields the prediction for that gene. For this option to work, you must have the programmes dot and neato (available in the graphviz package) in your path. Using this option will cause GAIN to slow down by a factor of 10 and use up gobs of disk space. OFF
--visualise-params-file string The name of the file containing parameters setting up how nodes and edges look in the propagations diagrams. 'params.txt'

Miscellaneous Options

These options have not yet been categorised or do not belong to any category. You probably do not need them at all. They may drop off the face of the earth in the future.

Short Long Type Description Default Value
-p --check-propagation boolean Check (and prove) whether information propagates in the FLN. OFF
-P --pvalue boolean Compute p-values for functional assignments. OFF
--print-all boolean Print the states of all nodes in the network. OFF
--randomise boolean Generate a random graph with the same degree distribution as the input graph. OFF
--no-reduce boolean The normal behaviour of GAIN is to remove all connected component from the FLN that do not contain both positive and negative examples. This option turns off this behaviour. This behaviour is also automatically turned off if any algorithm is invoked with the --one-versus-none option. If you want to run a one-versus-all algorithm with this option and a one-versus-none algorithm, invoke the algorithms in separate runs of GAIN. OFF
--treewidth boolean Compute the treewidth of the FLN. At the moment, GAIN will compute an upper bound on the treewidth, an associated tree decomposition and exit. Future versions of GAIN will incorporate the tree decomposition into the function prediction engines. OFF
-w --weights-file string Name of file containing interaction weights. Each line of the file should contain the two proteins and the weight separated by tabs.
-z --allow-zero-states boolean Allow the state of a node to be zero (i.e., HYPOTHETICAL). ON
--min-threshold float Begin threshold search here. This parameter can be a floating-point number. -1.0
--max-threshold float End threshold search here. This parameter can be a floating-point number. 1.0
--num-thresholds integer Do this many threshold steps. 10
--explain boolean Output a description of the neighbourhood of all +1 nodes. OFF
--use-custom-RNG-seed integer Use a custom seed for random number generation. 0
 All Classes Functions Variables Typedefs Friends