Dr. Grene, Johnathan, and Ceci are back from Europe, and Ranjit is going on vacation for the summer.
Last week John A. finished two scripts.
- One creates a database table for gene ontology categories, so relevant data may be included in reports.
- Another draws images illustrating locations of known regulatory sequences.
The images of regulatory sequences sometimes contain too much data to be reasonably interpreted by a human. Possible solutions for this are
- The patterns for elements might be too generic; finding more specific patterns, if possible, could create fewer results.
- Limit the results to hits within the first 1,000 bp of the upstream region.
- Find characteristics of a match and assign a score to each element based on those characteristics. Then only show characteristics with a score above some arbitrary value. Possible characteristics for determining a score are
- If an element has known flanking sequences, a higher score should be assigned.
- If many promoters of a certain type are found many times in one area, they are probably part of some repeat and should be given a lower score.
- We can use a program to analyze upstream regions for sequence redundancy. If a region contains similar sequences repeated many times, we will assign a lower score to elements found in that region.
When many elements are found for a gene, this is usually because the upstream region is huge. Only displaying elements within 1 kb of the coding region should filter out most of the results. One possible cause for huge upstream regions is that the gene is near a centromere.
Human TonE elements (a dehydration response element) are found nearly twice to three times as often as probability indicates on chromosome. This is not terribly surprising because TonE elements perform a biological function. In the future John A. will count the number of TonE elements occurring within a coding region and the TonE element that occur between coding regions.