Biorithm
1.1
|
Enrichment Class. More...
#include <enrichment.h>
Public Member Functions | |
Enrichment () | |
void | clear () |
double | computeHyperGeometricProbability (int globTotal, int globTrue, int locTotal, int locTrue) |
void | addUnannotated (int numS, int numT) |
bool | check (double value, LIBENRICHMENT_TEST_TYPE test, double alpha, unsigned int numTests, unsigned int rank) |
double | correct (double value, LIBENRICHMENT_TEST_TYPE test, double alpha, unsigned int numTests, unsigned int rank) |
void | check (vector< EnrichmentRecord< S, T > > &in, vector< EnrichmentRecord< S, T > > &out, LIBENRICHMENT_TEST_TYPE test, double alpha) |
void | getEnrichments (set< S > &in, vector< EnrichmentRecord< S, T > > &out) |
void | getEnrichmentsGSEA (set< S > &in, vector< EnrichmentRecord< S, T > > &out) |
bool | isPair (const S &s, const T &t) const |
void | loadPairs (const vector< pair< S, T > > &in) |
void | loadPairs (const map< S, set< T > > &in) |
void | FormSettings (string Db, string Host, string User, string Passwd) |
The default setting forms a connection to oncogroup. | |
vector< long > | LocusIndex (vector< long > LL) |
Retrieve database ids from locus link ids. | |
void | SetUniverse (vector< long > Uni) |
Set the full universe of functions and associated probe counts. | |
void | SetQuery (vector< long > Qry) |
Set the probes associated with a single bicluster. | |
int | SetBicluster (long clusId) |
Set the universe and query sets through a single database call. | |
void | Enrich (double threshold) |
Calculate Enrichment. | |
map< long, vector< long > > | getEnrich_p2f (void) |
Get an enrichment map from probes to functions. | |
map< long, vector< long > > | getEnrich_f2p (void) |
Get an enrichment map from functions to probes. | |
map< long, double > | getEnrich_f2val (void) |
Get a map from functions to associated p-values. | |
map< long, vector< long > > | getUni_f2p (void) |
Get a map of the universal functions to associated probes. | |
map< long, vector< long > > | getUni_p2f (void) |
Get a map of the universal probes to associated functions. | |
int | InsertEnrich (void) |
Insert the enrichment results into the database. | |
void | ClearResults (void) |
Clear the enrichment results from the last query. | |
void | ClearUniverse (void) |
Clear all data. | |
void | functionFile (string fileName) |
Produce function reference file. |
Enrichment Class.
Enrichment Class
The Enrichment class is designed to read data from a mysql database and calculate the p-values for biclusters. Each experiment has a range of associated probes and functions. A bicluster associated with the experiment is defined with its own range of associated probes and functions. From this data, p-values are calculated for each function associataed with the bicluster. The results, which would be sets of referenced probe-to-functions with p-values, can be generated as either maps or directly inserted into the mysql database.
NOTE: The mysql database referenced in this program is oncogroup at whipple.cs.vt.edu. This data base was generated and maintained by Greg Grothaus.
Enrichment< S, T >::Enrichment | ( | ) |
Constructor
void Enrichment< S, T >::addUnannotated | ( | int | numS, |
int | numT | ||
) |
Adds unpaired elements of each type
numS | number of unpaired elements of type S |
numT | number of unpaired elements of type T |
bool Enrichment< S, T >::check | ( | double | value, |
LIBENRICHMENT_TEST_TYPE | test, | ||
double | alpha, | ||
unsigned int | numTests, | ||
unsigned int | rank | ||
) |
Given a value of statistical significance, checks if the value is significant by performing a correction for testing multiple hypotheses.
value | The uncorrect statistical significance. |
test | The type of test to be used. Possible values are LIBENRICHMENT_NONE,LIBENRICHMENT_BONFERRONI, LIBENRICHMENT_HOLMS, and LIBENRICHMENT_FALSE_DISCOVERY_RATE |
alpha | The alpha cutoff specifying the probability of a Type I statistical error |
numTests | The number of multiple hypotheses being tested. |
rank | The index of this particular test in the list of all tested hypotheses sorted by uncorrected statistical significance. |
void Enrichment< S, T >::check | ( | vector< EnrichmentRecord< S, T > > & | in, |
vector< EnrichmentRecord< S, T > > & | out, | ||
LIBENRICHMENT_TEST_TYPE | test, | ||
double | alpha | ||
) |
Given an EnrichmentRecord list of all hypotheses, performs a multiple hypothesis correction.
in | A vector of EnrichmentRecords sorted by increasing p value (generally the output from getEnrichments) | |
test | The type of test to be used. Possible values are LIBENRICHMENT_NONE,LIBENRICHMENT_BONFERRONI, LIBENRICHMENT_HOLMS, LIBENRICHMENT_FALSE_DISCOVERY_RATE | |
[in] | alpha | The alpha cutoff specifying the probability of a Type I statistical error |
[out] | out,a | subset of the input vector that passes the multiple hypotheses test |
void Enrichment< S, T >::clear | ( | ) |
Removes all stored data.
void Enrichment< S, T >::ClearResults | ( | void | ) |
Clear the enrichment results from the last query.
This function removes all of the probes, functions, and enrichment values associated with the last query. This should be done before every query.
void Enrichment< S, T >::ClearUniverse | ( | void | ) |
Clear all data.
This function will remove all of the data from this data structure. This should be done before a set of experiment probes are defined.
double Enrichment< S, T >::computeHyperGeometricProbability | ( | int | globTotal, |
int | globTrue, | ||
int | locTotal, | ||
int | locTrue | ||
) |
Calculates the Hypergeometric Statistic using summary values
globTotal | The total number of elements in your global set of objects |
globTrue | The total number of elements in your global set of objects with property P. |
locTotal | The total number of elements in your subset of objects |
locTrue | The total number of elements in your subset of objects with property P |
double Enrichment< S, T >::correct | ( | double | value, |
LIBENRICHMENT_TEST_TYPE | test, | ||
double | alpha, | ||
unsigned int | numTests, | ||
unsigned int | rank | ||
) |
Given a value of statistical significance, corrects the value to account for testing multiple hypotheses.
value | The uncorrect statistical significance. |
test | The type of test to be used. Possible values are LIBENRICHMENT_NONE,LIBENRICHMENT_BONFERRONI, LIBENRICHMENT_HOLMS, and LIBENRICHMENT_FALSE_DISCOVERY_RATE |
alpha | The alpha cutoff specifying the probability of a Type I statistical error |
numTests | The number of multiple hypotheses being tested. |
rank | The index of this particular test in the list of all tested hypotheses sorted by uncorrected statistical significance. |
void Enrichment< S, T >::Enrich | ( | double | threshold | ) |
Calculate Enrichment.
This function will take all of the data defined by functions SetUniverse and SetQuery to calculate the hyper-geometric p-values for each function data object that was found associated. All functions with associated p-values less than the threshold will be stored.
threshold | defines the maximum p-values of interest. All p-values found less than the threshold are stored. |
void Enrichment< S, T >::FormSettings | ( | string | Db, |
string | Host, | ||
string | User, | ||
string | Passwd | ||
) |
The default setting forms a connection to oncogroup.
This function allows the user to define a different database from the default setting for oncogroup at whipple.cs.vt.edu
Db | is the databasename |
Host | is the host server |
User | is the account used to access the database |
Passwd | is the associated password |
void Enrichment< S, T >::functionFile | ( | string | fileName | ) |
Produce function reference file.
This function will produce a file which contains all of the functions upon which an enrichment calculation is made. The file contains four columns:
[database function ID] tab [Num of associated probes in the universe] tab [Num of associated probes in the bicluster] tab [Enrichment Score]
The file will also contain the total number of probes in the entire universe and the entire bicluster.
map< long, vector< long > > Enrichment< S, T >::getEnrich_f2p | ( | void | ) |
Get an enrichment map from functions to probes.
This function will produce a map that associates functions with a vector of probes such that the function has p-value less than the threshold.
map< long, double > Enrichment< S, T >::getEnrich_f2val | ( | void | ) |
Get a map from functions to associated p-values.
This function produces a map to associate each function with its associated p-value.
map< long, vector< long > > Enrichment< S, T >::getEnrich_p2f | ( | void | ) |
Get an enrichment map from probes to functions.
This function will give back a map that associates each probe with a vector of functions such that all the functions had p-value less than the threshold.
void Enrichment< S, T >::getEnrichments | ( | set< S > & | in, |
vector< EnrichmentRecord< S, T > > & | out | ||
) |
Calculates enrichments for all T objects given a set of S objects
[in] | in | A set of S objects to find enrichments of T object type |
[out] | a | vector of EnrichmentRecords sorted in increasing order of p-value |
map< long, vector< long > > Enrichment< S, T >::getUni_f2p | ( | void | ) |
Get a map of the universal functions to associated probes.
This function will return a map that associates every function in the universe to all of its associated probes
map< long, vector< long > > Enrichment< S, T >::getUni_p2f | ( | void | ) |
Get a map of the universal probes to associated functions.
This function will return a map that associates every probe in the universe to all of its associated functions.
int Enrichment< S, T >::InsertEnrich | ( | void | ) |
Insert the enrichment results into the database.
This function will take all of the enrichments defined by the Enrich function and insert the results into the table biclusterXfunction. The the enrichment values will be inserted with the database ids for the function and the bicluster.
bool Enrichment< S, T >::isPair | ( | const S & | s, |
const T & | t | ||
) | const |
Checks if pair of objects are related in the set of stored pairs.
[in] | s,an | instance of S. |
[in] | t,an | instance of T. |
void Enrichment< S, T >::loadPairs | ( | const vector< pair< S, T > > & | in | ) |
Loads relationship data as a set of pairs
[in] | in | A vector of pairs of objects S and T such that there is a relationship between S and T. For example, S is a gene or protein and T is a function. |
void Enrichment< S, T >::loadPairs | ( | const map< S, set< T > > & | in | ) |
Loads relationship data as a map of maps.
[in] | in | A map keyed by elements of type S, where each value is a set of elements of type T. For example, S is a gene or protein and T is a function. |
vector< long > Enrichment< S, T >::LocusIndex | ( | vector< long > | LL | ) |
Retrieve database ids from locus link ids.
The set universe and set query commands expect database ids as variable. This may not always be easy for the user. Therefore, the following commands produce database id vectors based on different index angles.
LL | is a vector of long values which are the locus link index values for probes |
int Enrichment< S, T >::SetBicluster | ( | long | clusId | ) |
Set the universe and query sets through a single database call.
This function will generate the full set of probes for the universe and the query by traversing the mysql database oncogroup based on a single reference to a defined Bicluster.
clusId | is the database index for a bicluster that has already been defined in the database |
void Enrichment< S, T >::SetQuery | ( | vector< long > | Qry | ) |
Set the probes associated with a single bicluster.
This function will traverse all of the functions associated with the probes defined in the vector Qry. The associations will be used to calculate p-values
Qry | is a vector of longs. The longs are database ids for the probes. |
void Enrichment< S, T >::SetUniverse | ( | vector< long > | Uni | ) |
Set the full universe of functions and associated probe counts.
The function traverses all of the functional annotations associated with the vector Uni. In the process association counts are made at each function found. These values are required for further enrichment analysis.
Uni | is a vector of long. The long values are database ids for probes |