Biorithm  1.1
BiVoC

This manual is for BiVoC, a software package for automatic layout and visualisation of biclusters.

Index

  1. Introduction : What Does BiVoC Do?
  2. Installing BiVoC : Installing BiVoC
  3. Running BiVoC : Running BiVoC
  4. Command Line Options : Details of the BiVoC Command Line

Introduction

Biclustering has emerged as a powerful algorithmic tool for analyzing measurements of gene expression. A number of different methods have emerged for computing biclusters in gene expression data. Many of these algorithms may output a very large number of biclusters with varying degrees of overlap.

BiVoC is an algorithm for laying out biclusters in a two-dimensional matrix whose rows (respectively, columns) are rows (respectively, columns) of the original dataset. BiVoC displays each bicluster as a contiguous submatrix in the layout. It allows the layout to have repeated rows and/or columns from the original matrix as required, but seeks a layout of the smallest size. BiVoC also includes a simple web-based search interface for the user to query the layout for genes and samples of interest.

Installing BiVoC

BiVoC is part of the Biorithm package, but must be built separately. Before building BiVoC you will need the libGD graphics libraries installed on your computer. On an Ubuntu/Debian-based machine, you can install this package using a command such as

apt-get install libgd2-xpm libgd2-xpm-dev

Download the Biorithm package. Although BiVoC is included in Biorithm, you will need to compile it separately by following these steps:

  1. Unpack biorithm.tar.gz: tar -xvzf biorithm.tar.gz
  2. Move to the newly-unpacked BiVoC subdirectory and run the following commands:

    cd biorithm-1.1/bivoc
    ./configure
    make
    

    Now the executables should be ready to run.

Running BiVoC

There are two binaries that must be run in order to use BiVoC: src/layout and src/drawlayout. To use layout, you need an input file containing the biclusters that need to be displayed. The format of this file is the same as the file output by the bicluster miner or the xMotif algorithm included in Biorithm. This file contains one bicluster per line. Each line is tab-delimited and contains the following information: the name of the bicluster, a colour (at the moment, this string is ignored), the list of columns in the bicluster, and the list of rows in the bicluster, all separated by tabs. The format of the bicluster name is <id>_<number of rows>_<number of columns>. The bicluster names should be unique. Since the elements in each line are tab separated, row and column names may contain spaces. For example, you can process the simple example file containing three biclusters found at bivoc/examples/allaml/allaml-3.txt as follows:

src/layout -o examples/allaml/allaml-3.layout examples/allaml/allaml-3.txt

The layout file allaml-3.layout so created stores the row/column orderings of the close-to-optimal layout. Processing this file using the src/drawlayout programme will produce an image according to your specifications. In addition to the layout file, the drawlayout executable requires the file containing the microarray data original microarray data file. The expected format of the microarray data file is a tab-delimited two dimensional array with columns representing samples and rows representing genes. Using the microarray data file in examples/allaml/ALL-AML-all-no-name.txt as an example, to create a postscript image using the default parameters, you can run

src/drawlayout -o allaml-3.ps -a examples/allaml/ALL-AML-all-no-name.txt examples/allaml/allaml-3.layout

If you would like to display the classes associated with the samples in question, you can add the -classes command-line option, as follows:

src/drawlayout -o allaml-3.ps -a examples/allaml/ALL-AML-all-no-name.txt examples/allaml/allaml-3.layout -classes examples/allaml/classConvert.txt

The resulting image should look similar to the following:

allaml-3.png

Command Line Options

To see all parameters for both executables, run them with a -h command-line flag. The drawlayout binary is the only one that has optional command line parameters:

Option Parameter Type Description
-o filename Specify output image file.
-a string Name of Microarray data file.
-startSamples integer Number of columns before the first data column(default=1).
-nameColumn integer Number of columns before the column specifying the row names (default=0).
-sizeModifier real Multiply all dimensions by this value. A value of 2 would make the image twice as tall and twice as wide, effectively quadrupling it's area.
-width real Multiply bicluster line width by this value in order to make the boxes easier to see.
-cr string Name of tab-delimited file. Replaces all column names given in the first column of the file with the column names in the second column of the file.
-rr string Name of tab-delimited file. Replaces all row names in the first column of the file with the row names in the second column of the file.
-binary none Lets drawlayout know that the microarray input file is binary, not real valued, and draws heat diagram colors accordingly.
-colors none Draws the bicluster boxes in the color specified in the microarray file(only red,green,blue,black supported).
-showColumns none Show all columns in the expression file regardless of their presence in a bicluster.
-showRows none Show all rows in the expression file regardless of their presence in a bicluster.
-classes string Name of input file. Displays the classes of the samples as specified in the file.
-t string Specify an output image format. Options are ps,png,gif. ps is the default.
 All Classes Functions Variables Typedefs Friends