Biorithm
1.1
|
The reconcile tool implements several methods that reconcile gene expression data with an underlying molecular interaction network. For this type of analysis, the user typically possesses a (possibly weighted) undirected interaction network and weights for some or all of the nodes in the network. Higher weight indicates greater importance for that node. For example, given treatment and control gene expression samples, we could use limma to compute a p-value representing the statistical significance of the level of differential expression of the gene between treatment and control. By taking the negative logarithm of each gene's p-value, we have an initial score for each gene where high scores represent a large degree of differential expression. The reconciliation algorithms weight each node in the underlying interaction network by these initial node scores and proceed to diffuse this value throughout the network. The diffusion process typically ensures that scores on neighboring nodes are similar, while each node's score does not deviate too much from its initial weight.
Download the Biorithm package and follow the installation instructions for Biorithm. The reconcile
executable will be available in the reconcile
subdirectory of Biorithm.
reconcile accepts three different categories of command line options:
Exactly one of these options specifies which reconciliation algorithm to run. reconcile will print an error to standard output if the user provides an incorrect number of algorithms.
Flag | Long | Description |
---|---|---|
- | vanilla | Compute node scores by minimizing the energy function for the Vanilla algorithm. |
- | pagerank | Compute node scores by minimizing the energy function for the PageRank algorithm. |
- | genemania | Compute node scores by minimizing the energy function for the GeneMANIA algorithm. |
- | heat-kernel | Compute node scores by applying the heat diffusion kernel. |
Each algorithm can be run in two different ways. The user can provide multiple networks (-N
) and the chosen algorithm will be run on each network using uniform start weights for each node. Alternatively, the user can provide a single network (-n
) and one or more collections of node weights (-W
), and the chosen algorithm will be run on the same network using the various specified starting node weights.
Flag | Long | Description |
---|---|---|
N | networks | Name of the tab-delimited file containing multiple interaction networks with the following 5-column format: graphId, node1, node2, edge type, edge weight. The last 2 columns are optional. |
W | node-weights | Name of the tab-delimited file containing node weights. The first column contains node identifiers, and remaining columns contains node weights, each column for a different sample. |
n | network-file | Name of the tab-delimited file containing the interaction network with the following 2-column format: node1, node2. An optional 3rd column may be provided with edge-weight. Other columns will be ignored. |
These options control various parametric aspects of running the network reconciliation algorithms.
Flag | Long | Description |
---|---|---|
- | max-num-iterations | An integer that controls the maximum number of iterations for the reconciliation algorithms before terminating. |
- | epsilon | A small value used for checking convergence of reconciliation algorithms. |
q | q-param | The parameter 0<q<=1 in some reconciliation algorithms. In the case of the HeatKernel method, the temperature is given by . High q gives more weight to the initial node values, while low q gives more weight to the network-based term in the energy function. |
o | output | A prefix for the output files that are generated by the reconciliation algorithms. |
A typical invocation of any network reconciliation algorithm will look like the following command:
reconcile --<algorithm> -n <network> -W <node-weights> -o <output>
If the -N
option is used, an output file is generated for each input network. If the -n
and -W
options are used, an output file is generated for each sample in the input weights matrix. Each output file contains one row for each node in the network and several columns of information for the node in each row:
Column Header | Description |
---|---|
NodeId | The node identifier. |
initial_<sample-name> | The initial node value for the current sample. Sample name is determined by the column header of that sample in the weights file. |
final_<sample-name> | The final node value for the current sample. |
degree | The number of neighbors for this node in the network. |
weighted_deg | The sum of the weights of the edges incident on this node in the network. |
Let be an undirected network, where is the set of nodes and is the set of edges. Denote the weight of edge by . Let denote the neighbors of node in the graph, and let denote the weighted degree of node . Let be the starting value of node (specified by the -W
option). Each network reconciliation algorithm computes a final score for each node by smoothing the initial node scores across the network.
Since is a quadratic function of the values, we can minimize it by setting its partial derivative with respect to each to 0, resulting in the following linear system:
Vanilla iteratively applies this update function to each node until convergence.
Since is a quadratic function of the values, we can minimize it by setting its partial derivative with respect to each to 0, resulting in the following linear system:
PageRank iteratively applies this update function to each node until convergence.
Since is a quadratic function of the values, we can minimize it by setting its partial derivative with respect to each to 0, resulting in the following linear system:
GeneMANIA iteratively applies this update function to each node until convergence.
where and are the vectors of starting and final values, repsectively, and parameterizes the rate of heat dispersion through the network. We define , where is the identity matrix, and is a normalized edge weith matrix such that . We use the following approximation for the above matrix exponential:
which converges to the heat kernel as tends to infinity. Let and recursively define as follows:
HeatKernel iteratively computes up to , where is supplied by the user (
--max-num-iterations
).