Biorithm
1.1
|
Real-world data can often be mapped to a binary matrix. For example, the position (hypothetical data) of some senators on some bills is represented by the binary matrix in Fig. 1(a). Here a 1 means the senator supported/voted for the bill (e.g. senator Clinton supported the bill on Tax Cuts), whereas a 0 indicates that the senator opposed the bill (e.g. senator Clinton opposed the bill on War). Now look at Fig 1(b). It is apparent from this figure that, the bills: War and Tax Cuts constitute indpendent dimensions along which senators distinguish themselves. Why is this apparent from Fig 1(b) but not from Fig 1(a)? Because in Fig 1(b), the matrix is partitioned (by dotted lines) in such a way that each row in a partition has same values for War and Tax Cuts (e.g. senators Clinton and Dwight took same positions on War and Tax Cuts), but rows from different partitions (e.g. Clinton VS Edwards) have different values for War and Tax Cuts. In this case, we say that the columns War and Tax Cuts forms an exact truth table because the submatrix formed by War and Tax Cuts contains all possible (here 22 = 4) distinct bit-patterns in its rows. On the other hand, we say that the sub-matrix formed by the columns War, Tax Cuts, and Environmental Reforms is an approximate truth-table, since not all possible (here 23 = 8) bit-patterns are present in its rows.
The goodness of a truth-table can be measured by two quantities: balance and support. Balance of a truth-table is the ratio of the minimum number of rows in a partition (of this truth table) over the total number of rows. For example, in Fig. 1(b), the balance of the truth-table formed by War and Tax Cuts is 2/9. The support of an n-column truth-table is defined as the ratio of the number of distinct bit-patterns (or partitions) in it over 2n. For example, in Fig. 1(b), the support of the truth-table formed by War, Tax Cuts, and Environmental Reforms is 5/8.
Truthiness miner is a program that can mine 'sufficiently good' truth-tables from a given binary matrix. In other words, given a minimum balance threshold and a minimum support threshold, it can mine truth tables having enough balance and support (i.e. having balance and support greater than or equal to the balance threshold and support threshold respectively).
Truthiness Miner has been designed and developed by Clifford Conley Owens III, T. M. Murali, and Naren Ramakrishnan. It uses a level-wise algorithm for mining truth-tables. For details of the method, confer the paper: Capturing Truthiness: Mining Truth Tables in Binary Datasets .
Download the Biorithm package and follow the installation instructions for Biorithm. The executable file will be available as truth-tables/truth
To run Truthiness Miner you will need a single input file containing a binary matrix along with a header. You can generate a binary matrix file using the genmatrix script in the synth sub-directory of truth-tables directory. To generate a 100*10 binary matrix of sparsity 0.5 with this script and store it in a file called synth.mat:
./genmatrix 10 100 0.5 > synth.mat
Now to mine all the truth-tables with balance at least 0.1 and support at least 0.8:
Flag | Long | Description |
---|---|---|
f | matrix-file | The file that contains the input binary matrix. |
b | min-balance | The minumum balance threshold. i.e., if the input matrix has n rows, each truth table partition will need at least b * n rows to succeed. |
s | min-support | The minumum support threshold. i.e., if a truth-table has y columns, number of balanced truth-table-partitions (i.e. those having at least b * n rows) must be at least s*(2^y) to succeed. |
z | buffer-size | This program stores the itemsets in a buffer in each level; when the buffer becomes full, it checks each of the itemsets whether it is a truth-table or not. Using this option you can tell the program what should be the size of this buffer, i.e. how many itemsets can be stored in the buffer at once. |
u | usage | Display brief message about how to use this tool. |
q | quite-level | Suppress output. When the value of this parameter is 0: the output will be the same as running without this flag, 1: the program will not show miner specific data, 2: it will only show errors, 3: it will not show anything, including errors. |
t | transpose | If this option is specified, the given binary-matrix is transposed, stored in a new file called <original file-name>_transpose.txt, and then truth tables are mined from this transposed matrix. If no value for this option is given, then the top-left corner of the new file contains ROW, otherwise it contains whatever string is provided. |
i | status-interval | Number of seconds to wait before a status update |
Executing Truthiness Miner results in several output files:
This file contains the number of truth-tables found in each level in separate lines. Specifically, the first line contains the number of valid (having sufficient balance and support) truth-tables having no (or zero) columns, the second line contains the number of valid truth-tables having one column, the third line contains the number of valid truth-tables having two columns, and so on.
Elapsed time for completing the execution of truthiness miner in seconds.
There may be several files with names <input_file_name>_pborders_1.dat, <input_file_name>_pborders_2.dat, and so on. In general <input_file_name>_pborders_n.dat will contain all the valid truth tables with (n-1) columns. Each line of this file will contain the column-headers of one truth-table in the following format:
<column number (starts from 0)>:<column header>="">, <column number>="">:<column header>="">, ...
This pattern: "<column header>, <column number>" will appear (n-1) times in each line of the <input_file_name>_pborders_n.dat file.