Biorithm: %ActiveNetwork and %NetworkLego

Index

Introduction to NetworkLego : What does NetworkLego do?
Installing Active Network and Network Lego : Installing Active Network and NetworkLego
Invoking NetworkLego : Command-line options supported by NetworkLego
Interpreting the NetworkLego Output : Information and analysis that NetworkLego outputs

Introduction to NetworkLego

NetworkLego is a general computational framework for

detecting cellular networks that are activated in a particular cell state or in response to a perturbation
determining core cellular networks that are activated in response to multiple perturbations.

This system consolidates datasets that contain information on known molecular interactions into a universal network. When queried with a molecular profile (typically, a gene expression data set) representing a cell state, it searches the universal network using an algorithm for finding heavily-weighted subgraphs to retrieve an "active network," which is a sub-network whose constituent molecules act in concert (as determined by the combined similarity of their measured profiles) to determine the state of the cell. NetworkLego also works when there is no network of molecular interactions available. In this situation, it finds dense subgraphs in a network consisting of edges between pairs of genes with statistically significant correlations in the molecular profile data.

Installing Active Network and Network Lego

Download the Biorithm package and follow the installation instructions for Biorithm. The executable files will be available as active-networks/active-networks

Invoking NetworkLego

To run NetworkLego, you usually need several input files:

A network file containing interactions. Each line of the file is tab-delimited and contains the two interacting molecules and the type of the interaction. This file is optional if you provide a file containing measurements of molecular profiles.
A file containing measurements of molecular profiles, such as gene expression measurements.
A file specifying the "class" that each condition (sample, chip) in the molecular file belongs to. This file is tab-delimited, with each line containing the name of one condition and the name of the class. Typically, all the classes in one file are the same. However, NetworkLego can handle a class file with multiple classes by processing the conditions in each class separately.
A file containing functional annotations for the genes in the interaction data set and molecular profiles. While this file is optional, you probably want to provide it to find out if the computed NetworkLego are enriched in any functions.
A file defining the functions in the Gene Ontology and the relationships between them, in OBO format. This file is optional.

The current NetworkLego pipeline runs as follows:

Compute correlations by randomly permuting the gene expression data:
pathway -d . -C 100
Manually select a statistically significant correlation threshold from the results of the previous step. The results are in a file that ends with the string "pvalues".

Compute NetworkLego using this threshold. The standard incantation is
active-networks -d <directory> -n <network-fil> -f <functions-file> -B <OBO-file> -p <pvalue-threshold> -C 100 -m <multiple-hypothesis-correction>

The options mean the following:

Flag	Description
B	the file containing the structure of the Gene Ontology (in OBO format). This file is optional.
c	the name of the file containing “class” information. If you do not use the -d option, you must use this option in conjunction with the -g option.
C	the number of times NetworkLego should randomise the gene expression data to compute the null distribution of gene expression correlation values in "random" gene expression data.
d	the directory containing dataset.txt and class.txt
f	the file containing functional annotations.
g	the name of the file containing gene expression data. If you do not use the -d option, you must use this option in conjunction with the -c option.
m	a string describing the type of correction you want to perform when testing multiple hypothesis. Legal values of this string are "Bonferroni", "Holms", and "FDR". If you do not provide this option, NetworkLego will not perform any correction. Currently, this option only controls the correction done to select statistically significant co-expressed pairs of genes when you do not provide a network of molecular interactions.
n	the file containing the interactions.
p	the p-value cutoff that determines if a correlation value between two genes is statistically significant. The default is 0.01.
r	the number of times you want to randomise the network to compute p-values for NetworkLego.
t	the correlation threshold you computed in the previous step.
z	just use this option blindly. It may go away in a future version of NetworkLego.

Interpreting the NetworkLego Output

The software typically produces numerous output files containing different types of information about the networks analysed. All files are tab-delimited, unless otherwise noted, and hence can be opened in a spreadsheet or slurped into a database. All files have a common prefix (e.g., what you supply through the --experiment-name option or a string the software guesses from the argument to the --edges-file option. For the purpose of this documentation, let us assume that this common prefix is "network-lego-". The files produced by the software are:

network-lego-stats.txt

This file has two parts. The first part details various statistics about each active network or network lego in a tab-delimited (columnar format). The second part lists information about the connected components in each network.

Each line in the first part of the file corresponds to a single active network or network lego. The columns in the first part contain the following information (Note: only some columns may be relevant or useful for a particular application of NetworkLego):

Column Header	Description
ActiveNetworkName	An identifier for the active network.
#Nodes	The number of nodes (genes, proteins, molecules, etc.) in the active network.
#Edges	The number of edges (physical, regulatory, functional, or other type of interactions) in the active network.
Total Edge Weight	The total weight of the edges in the network.
Weighted density	The total weight of the edges in the network divided by the number of nodes in the network.
Average Edge Weight	The total weight of the edges in the network divided by the number of edges in the network.
Unweighted Density	The number of edges in the network divided by the number of nodes in the network.
Unweighted Completeness	The number of edges in the network divided by the maximum number of edges possible (i.e., the number of pairs of nodes) in the network.
Stouffer's z-score	When applicable, the Liptak-Stouffer z-score of the network.
Gaussian p-value of Stouffer's z-score	The Gaussian p-value corresponding to the Liptak-Stouffer z-score.

The second part of the file also contains one line per active network. The first three columns are identical to the first part of the file (i.e., "ActiveNetworkName", "#Nodes", and "#Edges"). The other two columns are:

Column Header	Description
#Components	The number of connected components in the active network.
Component sizes (#nodes, #edges)	For each component, the number of nodes and edges in the component, separated by a comma and placed within parentheses.

network-lego-edges.txt

The edges in each network. Each line of the file contains information on one network and one edge in that network. The columns of the file are the name of the network, the identifiers of the two nodes incident on the edge, the type of the edge (e.g., PPI), and the weight of the edge.

network-lego-nodes.txt

The nodes in each network. Each line of the file contains information on one network and one node in that network. The columns of the file are the name of network and the identifier of the node.

... more to be documented