GeneSieve

A Probe Selection Tool for cDNA Microarrays

  Protein Homology > 1
GeneSieve: Protein Homology (PH) Score > 1


1. "Mismatch" between EST and Contig sequences
2. "Hanging" EST Sequence

1. "Mismatch" between EST and Contig sequences

Consensus sequence of a contig is generated as a summery of the highest
quality parts of its constituent ESTs. Each base in the contig sequence
is determined by the consensus of the bases in the EST Sequences aligning
to that position. As a consequence, consensus sequence of a contig may differ
from one of its constituent ESTs in the region they align to each other.
In such cases, sometimes it is possible that an EST sequence show better
alignment with a protein than the corrosponding contig-protein alignment.


Example: Pine EST - ST40G03

EST ST40G03 belongs to Contig5940. The EST-protein BLAST score is 243 while the
contig-protein BLAST score is 214. Hence, the PH score for ST40G03 is 1.13.
Below is the table showing ESTs constituting Contig5940 and their alignments with
the consensus sequence.

EST EST length Serial no. EST start position EST end position Contig start position Contig end position
NXSI_002_G12_F 416 1 0 415 0 415
ST40G03 587 2 67 586 93 614
ST71A05 405 3 67 404 93 430
ST21H08 601 4 0 579 24 614
PC15B10 520 5 9 517 93 602
NXSI_024_A05_F 253 6 0 252 92 344


EST-Contig Alignment

>Contig5940 
          Length = 615

 Score =  923 bits (480), Expect = 0.0
 Identities = 511/522 (97%), Gaps = 2/522 (0%)
 Strand = Plus / Plus

                                                                       
Query: 68  cagtttcacaatcatgccgatggctccagtggtggacgccgcgtatctcaagtccattga 127
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 94  cagtttcacaatcatgccgatggctccagtggtggacgccgcgtatctcaagtccattga 153

                                                                       
Query: 128 caaggcacgccgagacctgcgggctctcattgctgaaaagaattgcgcgcccatcatgct 187
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 154 caaggcacgccgagacctgcgggctctcattgctgaaaagaattgcgcgcccatcatgct 213

                                                                       
Query: 188 tcgtctcgcatggcatgatgcaggcacttatgatgcaaaaacgaagacgggtggggcaaa 247
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 214 tcgtctcgcatggcatgatgcaggcacttatgatgcaaaaacgaagacgggtggggcaaa 273

                                                                       
Query: 248 tggttccattagaaacggggaggaactcaatcacagtgcaaataatgggctgaaaattgc 307
           ||||||||||||||||| ||||||||||||||||||||||||||||||||||||||||||
Sbjct: 274 tggttccattagaaacgaggaggaactcaatcacagtgcaaataatgggctgaaaattgc 333

                                                                       
Query: 308 acttacattgtgtgaaccgatcaaggcaaagtacccaaatattacttatgcagaccttta 367
           |||| ||||||||||||| |||||||||||||||||||||||||||||||||||||||||
Sbjct: 334 acttgcattgtgtgaaccaatcaaggcaaagtacccaaatattacttatgcagaccttta 393

                                                                       
Query: 368 tcagctggctggtgtagttgctgttgaggttacaggaggtcccacaattgagtttgtccc 427
           ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Sbjct: 394 tcagctggctggtgtagttgctgttgaggttacaggaggtcccacaattgagtttgtccc 453

                                                                       
Query: 428 tggtcgtaaggattcactggcatcaccacgagaanggcggcttcctgatgcgnnnnnnng 487
           ||||||||||||||||||||||||||||||||||||||||||||||||||||      ||
Sbjct: 454 tggtcgtaaggattcactggcatcaccacgagaanggcggcttcctgatgcgaaaaaang 513

                                                                       
Query: 488 nccacaacaacctaagggatatc--tttataggatgggcctatctganaagggtattgtt 545
           |||||||||||||||||||||||  |||||||||||||||||||||||||||||||||||
Sbjct: 514 nccacaacaacctaagggatatccttttataggatgggcctatctganaagggtattgtt 573

                                                     
Query: 546 gccctttctggggcgcacacattgggaaaacccatccanaaa 587
           ||||||||||||||||||||||||||||||||||||||||||
Sbjct: 574 gccctttctggggcgcacacattgggaaaacccatccanaaa 615
Here, two gaps are inserted in the EST in order to align it to the contig.
The corrosponding bases in the consensus sequence are supported by two other ESTs.
Because of these two extra bases, consensus sequence can not continue its alignment
with the protein without frame-shift. EST-protein and contig-protein alignments
are given below.

EST-Protein Alignment
>At4g35000.1 ascorbate peroxidase, putative (APX)   /  identical to L-ascorbate
           peroxidase [Arabidopsis thaliana]
           gi|1523791|emb|CAA66926; similar to ascorbate peroxidase
           [Gossypium hirsutum] gi|1019946|gb|AAB52954; supported
           by full-length cDNA: Ceres:21896.
          Length = 287
                                                                                                                                                              
 Score =  243 bits (621), Expect = 4e-65
 Identities = 122/162 (75%), Positives = 134/162 (82%)
 Frame = +3
                                                                                                                                                              
Query: 90  APVVDAAYLKSIDKARRDLRALIAEKNCAPIMLRLAWHDAGTYDAKTKTGGANGSIRNGE 269
           AP+VDA YLK I KARR+LR+LIA KNCAPIMLRLAWHDAGTYDA++KTGG NGSIRN E
Sbjct: 3   APIVDAEYLKEITKARRELRSLIANKNCAPIMLRLAWHDAGTYDAQSKTGGPNGSIRNEE 62
                                                                                                                                                              
Query: 270 ELNHSANNGLKIALTLCEPIKAKYPNITYADLYQLAGVVAVEVTGGPTIEFVPGRKDSLA 449
           E  H AN+GLKIAL LCE +KAK+P ITYADLYQLAGVVAVEVTGGP I FVPGRKDS
Sbjct: 63  EHTHGANSGLKIALDLCEGVKAKHPKITYADLYQLAGVVAVEVTGGPDIVFVPGRKDSNV 122

Query: 450 SPREXRLPDAKKXPQQPKGYLYRMGLSXKGIVALSGAHTLGK 575
            P+E RLPDAK+  Q  +   YRMGLS K IVALSG HTLG+ 
Sbjct: 123 CPKEGRLPDAKQGFQHLRDVFYRMGLSDKDIVALSGGHTLGR 164 


Contig-Protein Alignment
>At4g35000.1 ascorbate peroxidase, putative (APX)   /  identical to L-ascorbate
           peroxidase [Arabidopsis thaliana]
           gi|1523791|emb|CAA66926; similar to ascorbate peroxidase
           [Gossypium hirsutum] gi|1019946|gb|AAB52954; supported
           by full-length cDNA: Ceres:21896.
          Length = 287
                                                                                                                                                              
 Score =  214 bits (545), Expect(2) = 1e-61
 Identities = 105/132 (79%), Positives = 115/132 (87%)
 Frame = +2
                                                                                                                                                              
Query: 116 APVVDAAYLKSIDKARRDLRALIAEKNCAPIMLRLAWHDAGTYDAKTKTGGANGSIRNEE 295
           AP+VDA YLK I KARR+LR+LIA KNCAPIMLRLAWHDAGTYDA++KTGG NGSIRNEE
Sbjct: 3   APIVDAEYLKEITKARRELRSLIANKNCAPIMLRLAWHDAGTYDAQSKTGGPNGSIRNEE 62
                                                                                                                                                              
Query: 296 ELNHSANNGLKIALALCEPIKAKYPNITYADLYQLAGVVAVEVTGGPTIEFVPGRKDSLA 475
           E  H AN+GLKIAL LCE +KAK+P ITYADLYQLAGVVAVEVTGGP I FVPGRKDS
Sbjct: 63  EHTHGANSGLKIALDLCEGVKAKHPKITYADLYQLAGVVAVEVTGGPDIVFVPGRKDSNV 122
                                                                                                                                                              
Query: 476 SPREXRLPDAKK 511
            P+E RLPDAK+
Sbjct: 123 CPKEGRLPDAKQ 134
 Score = 39.7 bits (91), Expect(2) = 1e-61
 Identities = 18/22 (81%), Positives = 19/22 (86%)
 Frame = +1
                                                                                                                                                              
Query: 538 FYRMGLSXKGIVALSGAHTLGK 603
           FYRMGLS K IVALSG HTLG+
Sbjct: 143 FYRMGLSDKDIVALSGGHTLGR 164



2. Because of "hanging" EST sequence

While constructing the consensus sequence, Phrap trims off the ends of a contig
whenever there are not at least two ESTs confirming the sequence. As a result, the
full length of an EST may not contribute to the consensus sequence of a contig.
In some cases, the "hanging" piece of an EST, which does not contribute to the consensus
sequence, may also align to the protein. As a consequence, BLAST score for the EST-protein
alignment may be better than that for the contig-protein alignment.

Example: Pine EST NXSI_116_B11_F

EST NXSI_116_B11_F belongs to Contig6072. The EST is 548 bp long, out of which only
first 345 bp contribute to the consensus sequence. Here, EST-protein BLAST score is
286 while contig-protein BLAST score is 183. Hence, PH score for NXSI_116_B11_F is 1.56.
The alignments of the constituent ESTs with the contig are summerized in the table below.

EST EST length Serial no. EST start position EST end position Contig start position Contig end position
NXSI_085_E06_F 549 1 0 548 0 548
NXNV_129_D03_F 477 2 0 476 0 476
NXNV001E04 487 3 0 486 3 489
NXSI_025_A07_F 225 4 0 224 183 407
NXSI_116_B11_F 548 5 0 365 183 548
NXCI_050_A09_F 209 6 8 168 388 548


EST-protein alignment

>At5g48300.1 ADPG pyrophosphorylase small subunit (gb AAC39441.1)   /  ;
           supported by cDNA: gi_15146247_gb_AY049265.1_
          Length = 520
                                                                                                                                                              
 Score =  286 bits (733), Expect = 4e-78
 Identities = 140/163 (85%), Positives = 150/163 (92%)
 Frame = +3
                                                                                                                                                              
Query: 60  VVSPRAVSDTFSELTCLDPVASRSVLGIILGGGAGTRLYPLTKKRAKPAVPLGANYRLID 239
           +VSP+AVSD+ +  TCLDP AS SVLGIILGGGAGTRLYPLTKKRAKPAVPLGANYRLID
Sbjct: 66  IVSPKAVSDSQNSQTCLDPDASSSVLGIILGGGAGTRLYPLTKKRAKPAVPLGANYRLID 125
                                                                                                                                                              
Query: 240 IPVSNCINSNISKIYVLTQFNSASLNRHLSRAYSSNMGSYKDEXXVXVLAAQQSPENPNW 419
           IPVSNC+NSNISKIYVLTQFNSASLNRHLSRAY+SNMG YK+E  V VLAAQQSPENPNW
Sbjct: 126 IPVSNCLNSNISKIYVLTQFNSASLNRHLSRAYASNMGGYKNEGFVEVLAAQQSPENPNW 185
                                                                                                                                                              
Query: 420 FQGTADAVRQYLWLFEEQPVMEFLILAGDHLYRMDYQKFIXXH 548
           FQGTADAVRQYLWLFEE  V+E+LILAGDHLYRMDY+KFI  H
Sbjct: 186 FQGTADAVRQYLWLFEEHNVLEYLILAGDHLYRMDYEKFIQAH 228


Contig-protein alignment
>At5g48300.1 ADPG pyrophosphorylase small subunit (gb AAC39441.1)   /  ;
           supported by cDNA: gi_15146247_gb_AY049265.1_
          Length = 520
                                                                                                                                                              
 Score =  183 bits (464), Expect = 6e-47
 Identities = 107/173 (61%), Positives = 123/173 (71%), Gaps = 2/173 (1%)
 Frame = +3
                                                                                                                                                              
Query: 36  MAGVMAAGVANLNVLGRETAEFTSFRPVFLRGNSQGLSSASSLCDYRIFADS--KRKKHA 209
           MA V A GV  L V    T+  T      +   +   SS+ +  D +I   S   R   +
Sbjct: 1   MASVSAIGV--LKVPPASTSNSTGKATEAVPTRTLSFSSSVTSSDDKISLKSTVSRLCKS 58
                                                                                                                                                              
Query: 210 IFRKQNINRSTVVSPRAVSDTFSELTCLDPVASRSVLGIILGGGAGTRLYPLTKKRAKPA 389
           + R+  I    +VSP+AVSD+ +  TCLDP AS SVLGIILGGGAGTRLYPLTKKRAKPA
Sbjct: 59  VVRRNPI----IVSPKAVSDSQNSQTCLDPDASSSVLGIILGGGAGTRLYPLTKKRAKPA 114
                                                                                                                                                              
Query: 390 VPLGANYRLIDIPVSNCINSNISKIYVLTQFNSASLNRHLSRAYSSNMGSYKD 548
           VPLGANYRLIDIPVSNC+NSNISKIYVLTQFNSASLNRHLSRAY+SNMG YK+
Sbjct: 115 VPLGANYRLIDIPVSNCLNSNISKIYVLTQFNSASLNRHLSRAYASNMGGYKN 167