Meeting Notes for August 12, 2003

Attendance: Dr. Lenwood Heath, Dr. Ruth Grene, Cecilia Vasquez-Robinet, Vibha Singhal, Allan Sioson, Tom Panning, John Archie

The purpose of this meeting was to discuss different ways of identifying promoters mentioned in the paper by Rombauts et. al. discussed in the last meeting. The three methods of promoter detection were

  1. Markov chain statistics,
  2. a lexicon, and
  3. random reassembly of sequences.

These approaches are mentioned in the left-hand column on page 1170.

The random reassembly of sequences refers to a tool called shufflet that takes a sequence and creates a random permutation of that sequence preserving the frequency of all k-mers, (k - 1)-mers, ..., 1-mers. This random sequence can be used as a control for alignments and other analyses to determine what "normal" results are for sequences with similar statistical properties.

The lexicon approach builds a dictionary of sequences to which an enzyme is likely to bind.

The remaining model constructs a position-weight matrix for each element in a sequence to which regulatory enzymes bind. (This position-weight model is generated from sequence data given to the program.) Then, via a neural network, this algorithm selects the locations where regulatory enzymes are likely to bind.