Computational Systems Biology Group Meetings

Meetings for Fall 2007 resume the week of September 3 in McBryde 655. The journal club meeting is on Tuesday from 11am-12:15pm. The research presentation is on Thursday from 11am-12:15pm. Murali and Naren expect students to volunteer themselves for presentations. For research presentations, please send Murali a title and an abstract. For journal club meetings, we expect each student to read the paper(s) being presented.

Ground Rules

  1. Attendance is mandatory. If you cannot attend a meeting, inform your advisor (Murali or Naren or both) in advance.
  2. Be on time for the meetings.
  3. Be diligent in following the journals assigned to you. Post interesting articles to our CiteULike group. Do not forget to the post the PDF file too.

Schedule

Date Journal paper DateResearch presentation
Fall 2007
September 4, 2007 David Badger
Information Theory Applied to the Sparse Gene Ontology Annotation Network to Predict Novel Gene Function, ISMB, 2007
September 6, 2007 Clifford Conley Owens
Capturing Truthiness: Mining Truth Tables in Binary Datasets
September 11, 2007 Corban Rivera
Comparing Protein Interaction Networks via a Graph Match-and-Split Algorithm JCB, Septemner 2007
September 13, 2007 Matt Dyer
The Landscape of Human Proteins Targeted by Pathogens
September 18, 2007 Naveed Massjouni
QNet: A Tool for Querying Protein Interaction Networks, RECOMB, 2007
September 20, 2007 Cancelled
Work on RECOMB submissions.
September 25, 2007 Sheng Guo
A new method to measure the semantic similarity of GO terms, Bioinformatics, 2007
September 27, 2007 Corban Rivera
Detecting Pathway Perturbation in Cancer
October 2, 2007 Satish Tadepalli
Continuous Hidden Process Model for Time Series Expression Experiments, ISMB, 2007
October 4, 2007 Corban Rivera
Detecting Pathway Perturbation in Cancer part II
October 9, 2007 Matt Dyer
Computation of significance scores of unweighted Gene Set Enrichment Analyses, BMC Bioinformatics, 2007
October 11, 2007 David Badger and T. M. Murali
Gene Function Prediction using Spectral Graph Partitioning
October 16, 2007 Vandana Sreedharan
Automated Discovery of Functional Generality of Human Gene Expression Programs, PLoS Computational Biology, 2007
October 18, 2007 David Badger and T. M. Murali
Gene Function Prediction using Spectral Graph Partitioning, part deux
October 23, 2007 Clifford Conley Owens
A Tight Upper Bound on the Number of Candidate Patterns, ACM TODS, 2005
October 25, 2007 Research project planning
October 30, 2007 General discussion November 1, 2007 Cancelled
November 6, 2007 Corban Rivera
Metagenes and molecular pattern discovery using matrix factorization, PNAS, 2004 and
Metagene projection for cross-platform, cross-species characterization of global transcriptional states, PNAS, 2007
November 8, 2007 John Thomas, Department of Computer Science, Dartmouth College
Graphical Models of Evolutionary Constraints in Protein Families
November 13, 2007 General discussion November 15, 2007
November 27, 2007 General discussion November 29, 2007 Cancelled
December 4, 2007 David Badger
Systematic Discovery of Functional Modules and Context-Specific Functional Annotation of Human Genome, ISMB, 2007
December 6, 2007
December 11, 2007 Matt Dyer
Functional Annotation of Regulatory Pathways, ISMB, 2007
December 13, 2007
December 18, 2007 Ying Jin
Nested effects models for high-dimensional phenotyping screens, ISMB 2007.
December 20, 2007
Summer 2007
No research presentations during summer 2007
May 17, 2007 Cancelled
May 24, 2007 Clifford Conley Owens
Non-derivable Itemsets
May 31, 2007 Ying Jin
Compositional Data Mining
June 7, 2007 T. M. Murali
Report on First Bertinoro Systems Biology Workshop
(Also, shortest obstacle-avoiding paths in the plane)
June 14, 2007 Cancelled due to ACM TKDD deadline.
June 21, 2007 Satish Tadepalli
An Efficient Method for Dynamic Analysis of Gene Regulatory Networks and in silico Gene Perturbation Experiments, RECOMB 2007.
June 28, 2007 Matt Dyer, ISMB 2007 practice talk
Computational Prediction of Host-Pathogen Protein-Protein Interactions
No meetings in July 2007
August 2, 2007 Matt Dyer and T. M. Murali
Report on ISMB 2007
Spring 2007
Jan 25, 2007 Division of labour Joe Gresock
Storytelling: Creative Discovery in Biomedical Literature
Feb 1, 2007 David Badger and Naveed Massjouni
Data-driven modelling of signal-transduction networks
John Gordon's M.S. thesis defence
A new computationally facile analytical approximation of
electrostatic potential suitable for macromolecules
Feb 8, 2007 Cancelled GBCB seminar@VBI, Jill Sible, Dept. of Biology, VT
Building a systems-level view of cell cycle checkpoints
Feb 9, 2007
Friday
MCBB Seminar Series@VBI, 12:20pm, Barry Whitman, University of Georgia
The number and diversity of bacteria on Earth and why we care
Feb 15, 2007 David Badger and Naveed Massjouni
Data-driven modelling of signal-transduction networks
Satish Tadepalli
Clustering by preserving dependencies between datasets
Feb 22, 2007 Mahima Gopalakrishnan
Identification of functional modules using network topology and high-throughput data
Cancelled
Mar 1, 2007 Cancelled Cancelled
Mar 8, 2007 Spring Break
Mar 15, 2007 Satish Tadepalli
Clustering by Passing Messages Between Data Points
Corban Rivera
Pathway Annotation Using Network-Constrained Biclustering
Mar 22, 2007 Tony Xu Han
CS Seminar Series, 655 McBryde
Watching Humans and Detecting Their Behaviors
Ying Jin
Connections between Closed Itemsets and Minimal Separators
Mar 29, 2007 Joe Gresock
Predicting interactions in protein networks by completing defective cliques
GBCB seminar@VBI, Ron Breaker
Discovering Genetic Switches and Logic Gates Made of RNA
Apr 5, 2007 Ying Jin
Prediction of Phenotype and Gene Expression for Combinations of Mutations
Naveed Massjouni
Tree Decompositions of Molecular Interaction Networks
Apr 12, 2007 Arjun Krishnan
Integrative molecular concept modeling of prostate cancer progression
GBCB seminar@VBI, Matthew D. Dyer
Computational Prediction of Host-Pathogen Protein-Protein Interactions
Apr 15, 2007 Concert by Blind Elephant Quintet (featuring our very own Joseph Gresock), 8pm, Squires recital salon
Programme: Schubert Piano Quintet in A Major, "The Trout", Brahms Piano Quartet in C minor
Apr 19, 2007 Deept Kumar's Ph.D. thesis defence, 2-4pm, 110 KWII
The Redescription Algorithmic Framework: Applications to Bioinformatics Data
Apr 26, 2007 T. M. Murali and Corban Rivera
Report on RECOMB 2007.
Cancelled
May 3, 2007 Reading day
May 10, 2007 Corban Rivera
Refinement and expansion of signaling pathways: The osmotic response network in yeast

Talk Abstracts

Graphical Models of Evolutionary Constraints in Protein Families, John Thomas, November 8, 2007

Evolutionary pressure on proteins to maintain structure and function have constrained their sequences over time and across species. Thus, the sequence record of these proteins contain valuable information about the acceptable variation and covariation of amino acids in members of a protein family. We have designed an approach to model these evolutionary constraints using undirected graphical models. Our graphical models of residue coupling (GMRCs) have a formal probabilistic framework and are a strict generalization of conservation based approaches such as hidden Markov models. Our GMRCs can provide insight into biological activities, transparently classify members of a protein family by functional class, design new sequences that obey the evolutionary constraints, and predict whether or not two members from interacting protein families will interact. Results on PDZ domains, G-protein coupled receptors, and WW domains show that GMRCs are able to successfully model evolutionary constraints and assist in a wide range of problems in bioinformatics.

The Landscape of Human Proteins Targeted by Pathogens, Matthew D. Dyer, September 13, 2007

(This research is joint work with T. M. Murali and Bruno Sobral) Infectious diseases result in millions of deaths each year. Mechanisms of infection for many pathogens have been studied in detail. However, relatively unexplored are questions such as which infection mechanisms and pathways are commonly triggered by multiple pathogens, what the properties of the human proteins they target are, and whether pathogens interact with certain functional classes of human proteins. In this paper, we integrate human-pathogen protein-protein interactions (PPIs) for 179 pathogen strains from five public databases to provide the first study of the landscape of human proteins interacting with pathogens. We analyze the network of PPIs between these human proteins, and find that pathogens specifically interact with hubs (proteins with many interacting partners) and bottlenecks (proteins that are central to many paths in the network). We construct a network of PPIs between human proteins interacting with at least two pathogens. Gene Ontology functions enriched in this network reveal a number of processes and complexes commonly participating in interactions with multiple pathogens. Supplementary data is available.

Capturing Truthiness: Mining Truth Tables in Binary Datasets, Clifford Conley Owens, September 6, 2007

(This research is joint work with T. M. Murali and Naren Ramakrishnan.) We introduce a new data mining problem, namely that of mining truth tables in binary datasets. Given a matrix of objects and the properties they satisfy, a truth table identifies a subset of properties that exhibit maximal variability (and hence, complete independence) in occurrence patterns over the underlying objects. This problem is relevant in many domains, e.g., bioinformatics where we seek to identify and model independent components of combinatorial regulatory pathways, and in social/economic demographics where we desire to determine independent behavioral attributes of populations. Besides intrinsic interest in such patterns, we show how the problem of mining truth tables is dual to the problem of mining redescriptions, in that a set of properties involved in a truth table cannot participate in any possible redescription. This allows us to adapt our algorithm to the problem of mining redescriptions as well, by first identifying regions where redescriptions cannot happen, and then pursuing a divide and conquer strategy around these regions. Furthermore, our work suggests dual mining strategies where both classes of algorithms can be brought to bear upon either data mining task. We outline a family of levelwise approaches adapted to mining truth tables, algorithmic optimizations, and applications to bioinformatics and political datasets.

Computational Prediction of Host-Pathogen Protein-Protein Interactions, Matthew D. Dyer, April 12, 2007, GBCB seminar@VBI.

Infectious diseases such as malaria result in millions of deaths each year. An important aspect of any host-pathogen system is the mechanism by which a pathogen can infect its host. One method of infection is via protein-protein interactions (PPIs) where pathogen proteins target host proteins. Computational methods for predicting PPIs have been applied only to proteins from the same species. Developing computational methods that identify which PPIs enable a pathogen to infect a host has great implications in identifying potential targets for therapeutics.
We present a method that integrates known intra-species PPIs with protein domain profiles to predict PPIs between host and pathogen proteins. Given a set of intra-species PPIs, we identify the functional domains in each of the interacting proteins. For every pair of functional domains, we use Bayesian statistics to assess the probability that two proteins with that pair of domains will interact. We apply our method to the Homo sapiens - Plasmodium falciparum host-pathogen system. Our system predicts 516 PPIs between proteins from these two organisms. We show that human protein pairs we predict to interact with the same Plasmodium protein are close to each other in the human PPI network and that Plasmodium pairs predicted to interact with same human protein are co-expressed in DNA microarray datasets measured during various stages of the Plasmodium life cycle. Finally, we identify functionally enriched sub-networks spanned by the predicted interactions and discuss the plausibility of our predictions.

Tree Decompositions of Molecular Interaction Networks, Naveed Massjouni, April 5, 2007

Graph theory is a natural way to approach many problems in the field of computational biology, especially in the context of molecular interaction networks. A number of optimisation problems involving such networks are intractable. We know that many hard graph problems can be solved in linear or polynomial time if there exists a tree decomposition of the graph with small treewidth. Tree decompositions may be a natural way to model molecular interaction networks and may provide new and useful representations of these networks. In this presentation, I will introduce the concepts of tree decompositions and the related concept of treewidth. Since computing the treewidth of a graph is also intractable, I will describe some heuristics to estimate upper and lower bounds on the treewidth and present results on human and yeast molecular interaction networks. I will also discuss some potential ways in which this powerful tool may be utilized to study interaction networks.

Discovering Genetic Switches and Logic Gates Made of RNA, Ron Breaker, Yale University, March 29, 2007, GBCB seminar@VBI

Riboswitches are natural metabolite-binding RNAs that control gene expression in many bacterial species. They typically are found in the non-coding regions of certain messenger RNAs, and control gene expression by ligand-induced allosteric changes in RNA structure. The most highly conserved portion of riboswitches is the metabolite-binding aptamer domain, whose sequence and structural features define each riboswitch class. Of the proven riboswitch classes, the shortest aptamer domain measures only 34 nucleotides while the largest requires approximately 200 nucleotides. Interestingly, even the smaller riboswitch domains appear to be capable of functioning as complex genetic switches. For example, we have identified a riboswitch that uses two ligand-binding domains to bind two glycine molecules. Together, these domains cooperate to function as a more "digital" genetic switch, thus allowing the organism to control gene expression in response to small changes in the concentration of glycine. More recently, we have identified natural RNAs that use components of different riboswitch classes to function like two-input Boolean logic gates. Our findings support the view that RNA is a versatile medium for the construction of complex genetic elements.

Connections between Closed Itemsets and Minimal Separators, Ying Jin, Mar 29, 2007

In this presentation, I am going to present the relationship between closed itemsets and minimal separators. Each element of a concept lattice can be described as a closed itemset. Given a binary relationship, we can define a concept lattice and an underlying graph, such that each element of the concept lattice maps to a minimal separator of the graph. The connections between closed itemsets and minimal separators have a potential usage to analysis and understand biological networks.

Tony Xu Han
Beckman Institute and ECE Department
University of Illinois at Urbana-Champaign
Watching Humans and Detecting Their Behaviors, Department of Computer Science Seminar Series, 655 McBryde, 2pm, March 22, 2007

Because of their broad and important applications, robust detection and tracking of human have drawn attentions from researchers in computer vision, machine learning, and multimedia retrieval community. However, robust human detection and tracking are extremely difficult owing to occlusion, illumination variation, the high degree (usually 20-68) of freedom of articulated body movement, and clothing. Hence, no existing approach or model can achieve robust detection and tracking.

We propose a fusion framework to integrate multiple cues by finding a set of optimal dynamic weights for different tracking modalities. In the setup of Bayesian sequential estimation, we give an optimal criterion to find the dynamic weight for each modality. The fusion problem is then formulated as an optimization problem with a non-convex objective function. We further frame the optimization problem as a constrained convex programming problem. The equations for finding the global optimal solution are given and an approximate analytical solution is derived. The fusion framework can determine reliable cues and apply them dynamically. With the proposed framework, we achieve robust human tracking in various challenging surveillance video sequences.

An efficient Nonparametric Belief Propagation (NBP) algorithm is also proposed, which can be applied to many tasks such as articulated human tracking, image segmentation, image denoising, and superresolution.

We also extend the HMAX model of the visual cortex for recognizing object in still image to a space- time version to detect human behaviors in video. We introduce a coarse-to-fine search and verification scheme in the space-time HMAX model for behavior matching. Our strategy enables the searching of human behaviors with motions ranging from drastic movements (e.g. sports) to small movements as subtle as facial expressions. The convincing results validate that the space-time HMAX model is both selective and robust for searching human behaviors.

The recognition and learning algorithms we present, including dynamic fusion, efficient Nonparametric Belief Propagation, and space-time HMAX model, are general tools, which can be applied to many other tasks such as medial image analysis, human animation, information retrieval, bioinformatics, and robotics.

Pathway Annotation Using Network-Constrained Biclustering, Corban Rivera, Mar 15, 2007

New molecular interactions are being uncovered at an alarming rate. Whole genome biological assays such as yeast 2-hybrid and coimmunoprecipitation are revealing many previously unknown molecular interactions. The purpose of many of these interactions remains unknown. We would like to discover the pathways to which these interactions belong. We have developed a novel method for annotating interactions with pathways. We compare our method to alternate interaction annotation methods. Finally, we examine a few of the novel annotations we predict for interactions between human proteins.

Clustering by preserving dependencies between datasets, Satish Tadepalli, Feb 15, 2007

In this talk I am going to describe algorithms that can simultaneously find clusters in different data sets. The resultant clusters in each data set contain not only related data points but also preserve the dependencies between the data sets. The simplest case is when we have paired data samples and one-to-one relationships between the two data sets. I will describe our implementation of this case and the extensions we propose when we have multiple data sets and many-to-many relationships between the data sets that need to be preserved.

Building a Systems-level view of cell cycle checkpoints, Jill Sible, GBCB seminar, Feb 8, 2007

Because the cell cycle underlies the growth and development of all eukaryotes, and misregulation of the cell cycle typifies cancers, achieving a systems-level of understanding how the cell cycle is controlled ranks among the most important goals in modern cell biology. We have paired experimental biology with mathematical modeling to discover that mitotic transitions are regulated by hysteresis and bistability. We are building on this foundation to address how the cell cycle is affected by external events, in particular those that threaten the integrity of the genome. Checkpoints arrest the cell cycle when a threat to genomic stability, such as unreplicated or damaged DNA, exists. Loss of checkpoint control characterizes nearly all cancers. We are addressing checkpoint control in three system: 1) the Xenopus laevis cell-free egg extract, a highly tractable experimental system where we can gather quantitative data to inform the model, 2) the developing X. laevis embryo where the checkpoint response is dynamic as the cell cycle remodels, and 3) wild populations of amphibian embryos where varying checkpoint responses may contribute to fitness in response to UV radiation. These studies have revealed a high level of plasticity in the checkpoint response spanning the cellular through the ecosystem levels.

Storytelling: Creative Discovery in Biomedical Literature, Joe Gresock, Jan 25, 2007

We have developed a method of finding pathways between documents and concepts among biomedical papers. My implementation of the process, called storytelling, applies a single-pair-shortest-path algorithm to a similarity graph of a subset of the documents in the PubMed database. Each document is represented as a weighted set of terms. The graph contains each document as a node and each edge as a connection between two "similar" documents as determined by a weighted Jaccard's coefficent between their terms. I will describe my implementation in more detail as well as its application in a case study. Finally, I will present some of the results the case study both graphically and in the form of abstract, sentence, and keyword chains.

Papers to be Presented

Journal Assignments

Each student has a journal or two assigned to him/her. The easiest way to follow journals is to sign up for email delivery of tables of contents. The student is responsible for keeping track of the papers published and suggesting interesting and/or relevant papers to Murali for scheduling. For each relevant paper, please send the title, authors, journal issue information, and a Pubmed ID, if available. Also send a few keywords to help us organise the papers. Keywords to use will become apparent as the papers section of this webpage becomes fleshed out.
Journal Frequency Student
Algorithms in Molecular Biology Online Ying Jin
Bioinformatics Fortnightly Sheng Guo, Satish Tadepalli
Cell Monthly
Biology Direct Online
BMC Bioinformatics Monthly David Badger, Naveed Massjouni
BMC Systems Biology Monthly David Badger, Mahima Gopalakrishnan
Genome Biology Monthly Naveed Massjouni, Corban Rivera
Genome Research Monthly Ying Jin, Clifford Owens
IEEE TKDE Monthly Clifford Owens, Mahima Gopalakrishnan
IEEE TCBB Monthly Mahima Gopalakrishnan,
Journal of Biology Monthly
Journal of Computational Biology Monthly Mahima Gopalakrishnan, Ying Jin
Nature Weekly Mahima Gopalakrishnan, Corban Rivera
Nature Genetics Monthly Sheng Guo, Corban Rivera
Nature Biotechnology Monthly Sheng Guo, Corban Rivera
Nature Reviews Cancer Monthly T. M. Murali
Nature Reviews Genetics Monthly T. M. Murali
Nature Reviews Molecular Cell Biology Monthly T. M. Murali
Molecular Systems Biology Online Naveed Massjouni, Satish Tadepalli
Nucleic Acids Research Monthly
Nucleic Acids Research database issue Jan 1 each year Everyone
Nucleic Acids Research web server issue July 1 each year Everyone
OMICS Monthly Clifford Owens
PLoS Biology Monthly David Badger, Matt Dyer
PLoS Computational Biology Monthly Sheng Guo, Satish Tadepalli
PLoS One Online Naren Ramakrishnan
PNAS Weekly David Badger
Science Weekly Matt Dyer, Satish Tadepalli
There are a number of conferences our group should follow. Since each conference has a number of publications, no single person is responsible for any conference proceedings. The list below is made up of conferences devoted to computational biology, data mining, and machine learning. Conferences on algorithms such as SODA, FOCS, and STOC are also important.

T. M. Murali
Last modified: Tue Nov 27 14:19:22 EST 2007