Biorithm  1.1
Truthiness Miner

Index

  1. Introduction : What does Truthiness Miner do?
  2. Installing Truthiness Miner : Installing Truthiness Miner
  3. Invoking Truthiness Miner : Command-line options supported by Truthiness Miner
  4. Interpreting Truthiness Miner Output : Information and analysis of Truthiness Miner outputs

Introduction

Real-world data can often be mapped to a binary matrix. For example, the position (hypothetical data) of some senators on some bills is represented by the binary matrix in Fig. 1(a). Here a 1 means the senator supported/voted for the bill (e.g. senator Clinton supported the bill on Tax Cuts), whereas a 0 indicates that the senator opposed the bill (e.g. senator Clinton opposed the bill on War). Now look at Fig 1(b). It is apparent from this figure that, the bills: War and Tax Cuts constitute indpendent dimensions along which senators distinguish themselves. Why is this apparent from Fig 1(b) but not from Fig 1(a)? Because in Fig 1(b), the matrix is partitioned (by dotted lines) in such a way that each row in a partition has same values for War and Tax Cuts (e.g. senators Clinton and Dwight took same positions on War and Tax Cuts), but rows from different partitions (e.g. Clinton VS Edwards) have different values for War and Tax Cuts. In this case, we say that the columns War and Tax Cuts forms an exact truth table because the submatrix formed by War and Tax Cuts contains all possible (here 22 = 4) distinct bit-patterns in its rows. On the other hand, we say that the sub-matrix formed by the columns War, Tax Cuts, and Environmental Reforms is an approximate truth-table, since not all possible (here 23 = 8) bit-patterns are present in its rows.

truth-table1.png

The goodness of a truth-table can be measured by two quantities: balance and support. Balance of a truth-table is the ratio of the minimum number of rows in a partition (of this truth table) over the total number of rows. For example, in Fig. 1(b), the balance of the truth-table formed by War and Tax Cuts is 2/9. The support of an n-column truth-table is defined as the ratio of the number of distinct bit-patterns (or partitions) in it over 2n. For example, in Fig. 1(b), the support of the truth-table formed by War, Tax Cuts, and Environmental Reforms is 5/8.

Truthiness miner is a program that can mine 'sufficiently good' truth-tables from a given binary matrix. In other words, given a minimum balance threshold and a minimum support threshold, it can mine truth tables having enough balance and support (i.e. having balance and support greater than or equal to the balance threshold and support threshold respectively).

Truthiness Miner has been designed and developed by Clifford Conley Owens III, T. M. Murali, and Naren Ramakrishnan. It uses a level-wise algorithm for mining truth-tables. For details of the method, confer the paper: Capturing Truthiness: Mining Truth Tables in Binary Datasets .

Installing Truthiness Miner

Download the Biorithm package and follow the installation instructions for Biorithm. The executable file will be available as truth-tables/truth

Invoking Truthiness Miner

To run Truthiness Miner you will need a single input file containing a binary matrix along with a header. You can generate a binary matrix file using the genmatrix script in the synth sub-directory of truth-tables directory. To generate a 100*10 binary matrix of sparsity 0.5 with this script and store it in a file called synth.mat:

Now to mine all the truth-tables with balance at least 0.1 and support at least 0.8:

Command Line Options

Flag Long Description
f matrix-file The file that contains the input binary matrix.
b min-balance The minumum balance threshold. i.e., if the input matrix has n rows, each truth table partition will need at least b * n rows to succeed.
s min-support The minumum support threshold. i.e., if a truth-table has y columns, number of balanced truth-table-partitions (i.e. those having at least b * n rows) must be at least s*(2^y) to succeed.
z buffer-size This program stores the itemsets in a buffer in each level; when the buffer becomes full, it checks each of the itemsets whether it is a truth-table or not. Using this option you can tell the program what should be the size of this buffer, i.e. how many itemsets can be stored in the buffer at once.
u usage Display brief message about how to use this tool.
q quite-level Suppress output. When the value of this parameter is 0: the output will be the same as running without this flag, 1: the program will not show miner specific data, 2: it will only show errors, 3: it will not show anything, including errors.
t transpose If this option is specified, the given binary-matrix is transposed, stored in a new file called <original file-name>_transpose.txt, and then truth tables are mined from this transposed matrix. If no value for this option is given, then the top-left corner of the new file contains ROW, otherwise it contains whatever string is provided.
i status-interval Number of seconds to wait before a status update

Interpreting Truthiness Miner Output

Executing Truthiness Miner results in several output files:

<input file name>_level.dat

This file contains the number of truth-tables found in each level in separate lines. Specifically, the first line contains the number of valid (having sufficient balance and support) truth-tables having no (or zero) columns, the second line contains the number of valid truth-tables having one column, the third line contains the number of valid truth-tables having two columns, and so on.

<input file name>_time.dat

Elapsed time for completing the execution of truthiness miner in seconds.

<input file name>_pborders_<an integer>.dat

There may be several files with names <input_file_name>_pborders_1.dat, <input_file_name>_pborders_2.dat, and so on. In general <input_file_name>_pborders_n.dat will contain all the valid truth tables with (n-1) columns. Each line of this file will contain the column-headers of one truth-table in the following format:

<column number (starts from 0)>:<column header>="">, <column number>="">:<column header>="">, ...

This pattern: "<column header>, <column number>" will appear (n-1) times in each line of the <input_file_name>_pborders_n.dat file.

 All Classes Functions Variables Typedefs Friends