Attendance: Ranjit Randhawa, Dr Ramakrishnan, Xiaofeng Bao.
Yeast paper we discussed:
- Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES.
Sequencing and comparison of yeast species to identify genes and regulatory elements.
Nature. 2003 May 15;423(6937):233-4.
Given a bunch of genes look upstream.
- This needs some formalization
- Algorithm to do this must know full sequence
- Algorithm to do this must know TATA box
- Algorithm to do this must know different motifs
- We want to know level of covariance structure
- Need to classify blocks of nucleotides into classs
Signal of classes:
- TATA box
- Motif 1 (M1)
- Motif 2 (M2)
- Motif 3 (M3)
- Note, not all have to be present in all genes
These are the classes used in Gaussian Process (GP) modelling.
We need to know nucleotide and class information.
How will this be used?
- Given unknown gene and its upstream region.
- See whether covariant structure matches what we already have.
What does it mean to be similar.
- Covariance structure reflects randon process
- If you pick process there is a lot of functions that can come
At the end of the day we want to group genes into different gene families and put new genes into the family with the greatest probability.
In contrast to traditional GP which has x and y plane our GP will have x,y and z.
- X is the position, Y the classes and Z refers to the nucleotides (AGCT).
- Covariance structure comes from 2 concentrations.
- The point is GP is still a process space in x space as z is a function of x.