Biorithm
1.1
|
A class that implements the Apriori algorithm for computing itemsets in binary matrix. This class computes closed itemsets. It also computes the lattice connecting the closed itemsets. ! More...
#include <apriori.h>
Public Member Functions | |
Apriori (string filename) | |
Constructor that reads a binary matrix from a file. Create an instance of Apriori from a file. The file must be in the form of a tab delimited matrix with the first row containing the column names. Each row must begin with the row name. The first token of the first row must exist, but can be any value. | |
Apriori (vector< vector< unsigned int > > &matrix, map< unsigned int, string > &majorNames, map< unsigned int, string > &minorNames) | |
virtual | ~Apriori () |
Destructor. | |
virtual bool | checkIfClosed (Itemset &itemset) const |
virtual void | computeItemsetFromRows (vector< unsigned int > &rowIndices, Itemset &itemset) |
Compute an itemset from a set of rows. | |
virtual void | computeItemsetFromRows (Itemset &itemset) |
Compute the columns in the itemset given the rows already stored in the itemset. | |
virtual void | computeItemsets (const set< unsigned int > *rowsToAvoidInSeed=NULL, const set< unsigned int > *rowsToAvoidCompletely=NULL) |
virtual void | computeItemsetsLowRAM (const set< unsigned int > *rowsToAvoidInSeed=NULL, const set< unsigned int > *rowsToAvoidCompletely=NULL) |
This method is identical to Apriori::computeItemsets() except that it uses less memory by not storing column names with each computed itemset. | |
virtual AprioriLatticeEdgeType | computeLatticeEdgeType (const Itemset &itm1, const Itemset &itm2) const |
virtual void | computeLattice () |
virtual void | computeRandomDistribution (ostream &ostrm, unsigned int numTries, map< ItemsetRandomRowDistKeyType, MyHistogram > &rowDists, map< unsigned int, MyHistogram > &columnDists, MyHistogram &sizeDist) |
bool | containsColumn (Itemset &itemset, unsigned int index) const |
bool | containsRow (Itemset &itemset, unsigned int index) const |
virtual void | getItemsets (vector< Itemset > &itemsets, bool deleteItemsets=false) |
virtual void | getLattice (ItemsetLattice &lattice, bool deleteLattice=false) |
virtual void | getClosedLattice (ItemsetLattice &closedLattice, bool deleteLattice=false) |
Returns the transitive closure of the lattice connecting the closed itemsets in the data. | |
virtual unsigned int | getNumItemsets () const |
Return the number of computed itemsets. | |
virtual unsigned int | getNumLatticeEdges () const |
Return the number of edges in the lattice. | |
virtual void | printItemsets (ostream &ostr) const |
Print itemsets to the output stream in "itemset" format. | |
virtual void | printItemsets (string outputFile) const |
virtual void | printItemsetsGraph (ostream &ostr) const |
Print the bipartite graph induced by each itemset to the output stream. | |
virtual void | printItemsetsGraph (string outputFile) const |
virtual void | printItemsetStatistics (ostream &ostr) const |
For each k, print the number of itemsets with k rows and with k columns. | |
void | readFile (string filename) |
Reads a binary matrix into the invocant from a file. | |
virtual void | setItemsets (const vector< Itemset > &isets) |
Store itemsets (computed by another method) internally. | |
virtual unsigned int | setMinimums (unsigned int rows, unsigned int columns) |
Protected Member Functions | |
virtual void | _computeAllRows (vector< unsigned int > &allRows) const |
virtual unsigned int | _computeRandomRows (const vector< unsigned int > &allRows, vector< unsigned int > &randomRows) const |
virtual bool | _areRowsInSameDataset (unsigned int i, unsigned int j) const |
Protected Attributes | |
unsigned int | minrows |
unsigned int | mincols |
bool | transposed |
vector< vector< unsigned int > > | data |
map< unsigned int, string > | columnNames |
map< unsigned int, string > | rowNames |
unsigned int | _numDatasets |
vector< unsigned int > | _rowIndexToDatasetIndex |
vector< Itemset > | itemsets |
bool | _itemsetsComputed |
ItemsetLattice | _closedLattice |
ItemsetLattice | _reducedLattice |
bool | _latticeComputed |
A class that implements the Apriori algorithm for computing itemsets in binary matrix. This class computes closed itemsets. It also computes the lattice connecting the closed itemsets. !
Apriori::Apriori | ( | string | filename | ) |
Constructor that reads a binary matrix from a file. Create an instance of Apriori from a file. The file must be in the form of a tab delimited matrix with the first row containing the column names. Each row must begin with the row name. The first token of the first row must exist, but can be any value.
bool Apriori::checkIfClosed | ( | Itemset & | itemset | ) | const [virtual] |
Return true if and only if the itemset is closed.
void Apriori::computeItemsets | ( | const set< unsigned int > * | rowsToAvoidInSeed = NULL , |
const set< unsigned int > * | rowsToAvoidCompletely = NULL |
||
) | [virtual] |
Compute closed itemsets in the data using the Apriori algorithm.
[in] | rowsToAvoidInSeed,a | pointer to a set of rows to avoid when computing single-row itemsets. The method starts by computing itemsets with single rows. If a row is a member of this set, the method will not use that row to compute a single-row itemset. The row may be used later in computing itemsets with more than one row. |
[in] | rowsToAvoidCompletely,a | pointer to a set of rows to avoid when computing any itemset, including single-row itemsets. |
Reimplemented in AprioriWithComplement.
void Apriori::computeItemsetsLowRAM | ( | const set< unsigned int > * | rowsToAvoidInSeed = NULL , |
const set< unsigned int > * | rowsToAvoidCompletely = NULL |
||
) | [virtual] |
This method is identical to Apriori::computeItemsets() except that it uses less memory by not storing column names with each computed itemset.
void Apriori::computeLattice | ( | ) | [virtual] |
Compute the lattice induced by subset relationships between the itemsets.
AprioriLatticeEdgeType Apriori::computeLatticeEdgeType | ( | const Itemset & | itm1, |
const Itemset & | itm2 | ||
) | const [virtual] |
Compute the type of the edge between two itemsets.
Reimplemented in AprioriWithComplement.
void Apriori::computeRandomDistribution | ( | ostream & | ostrm, |
unsigned int | numTries, | ||
map< ItemsetRandomRowDistKeyType, MyHistogram > & | rowDists, | ||
map< unsigned int, MyHistogram > & | columnDists, | ||
MyHistogram & | sizeDist | ||
) | [virtual] |
Compute a distribution of itemset sizes by picking subsets of rows uniformly at random.
bool Apriori::containsColumn | ( | Itemset & | itemset, |
unsigned int | index | ||
) | const |
Return true if and only if the column index should be part of itemset.
The method checks if the column index has a 1 in all the rows currently in itemset.
bool Apriori::containsRow | ( | Itemset & | itemset, |
unsigned int | index | ||
) | const |
Return true if and only if the row index should be part of itemset.
The method checks if the row index has a 1 in all the columns currently in itemset.
void Apriori::getClosedLattice | ( | ItemsetLattice & | closedLattice, |
bool | deleteLattice = false |
||
) | [virtual] |
Returns the transitive closure of the lattice connecting the closed itemsets in the data.
[in] | deleteLattice,a | boolean; if true, delete the closed lattice stored internally. |
void Apriori::getItemsets | ( | vector< Itemset > & | itemsets, |
bool | deleteItemsets = false |
||
) | [virtual] |
Returns a vector of closed itemsets in the data.
[out] | itemsets,a | vector of Itemset to return the closed itemsets in. |
[in] | deleteItemsets,a | boolean; if true, delete the itemsets stored internally |
void Apriori::getLattice | ( | ItemsetLattice & | lattice, |
bool | deleteLattice = false |
||
) | [virtual] |
Returns the lattice connecting the closed itemsets in the data.
[out] | lattice,an | instance of ItemsetLattice to return the lattice in. |
[in] | deleteLattice,a | boolean; if true, delete the lattice stored internally. |
void Apriori::printItemsets | ( | string | outputFile | ) | const [virtual] |
Print itemsets to the output file in "itemset" format.
Each itemset appears on one line containing the name of the itemset, the names of the rows in the itemset, and the names of the columns in the itemset, all separated by tabs. The name of the itemset is the string "itemset_<index>_<number of rows>_<number of columns>".
void Apriori::printItemsetsGraph | ( | string | outputFile | ) | const [virtual] |
Print the bipartite graph induced by each itemset to the output file.
Each line of the file contains the itemset name, a row name, and a column name, all separated by tabs.
void Apriori::setItemsets | ( | const vector< Itemset > & | isets | ) | [virtual] |
Store itemsets (computed by another method) internally.
[in] | itemsets,a | vector of Itemsets to be stored. |
Use this method when you have a set of itemsets that you would like to further process using methods in the Apriori class, e.g., Apriori::computeLattice(). A typical use of this method is to compute itemsets using this class, using an external method to prune the itemsets, and computing the lattice involving the pruned itemsets.