Computational Systems Biology Group Meetings
Meetings for Fall 2007 resume the week of September 3 in McBryde 655. The journal club meeting is on Tuesday from 11am-12:15pm. The research presentation is on Thursday from 11am-12:15pm. Murali and Naren expect students to volunteer themselves for presentations. For research presentations, please send Murali a title and an abstract. For journal club meetings, we expect each student to read the paper(s) being presented.Ground Rules
- Attendance is mandatory. If you cannot attend a meeting, inform your advisor (Murali or Naren or both) in advance.
- Be on time for the meetings.
- Be diligent in following the journals assigned to you. Post interesting articles to our CiteULike group. Do not forget to the post the PDF file too.
Schedule
| Date | Journal paper | Date | Research presentation |
|---|---|---|---|
Fall 2007 |
|||
| September 4, 2007 | David Badger Information Theory Applied to the Sparse Gene Ontology Annotation Network to Predict Novel Gene Function, ISMB, 2007 |
September 6, 2007 | Clifford Conley Owens Capturing Truthiness: Mining Truth Tables in Binary Datasets |
| September 11, 2007 | Corban Rivera Comparing Protein Interaction Networks via a Graph Match-and-Split Algorithm JCB, Septemner 2007 |
September 13, 2007 | Matt Dyer The Landscape of Human Proteins Targeted by Pathogens |
| September 18, 2007 | Naveed Massjouni QNet: A Tool for Querying Protein Interaction Networks, RECOMB, 2007 |
September 20, 2007 | Cancelled Work on RECOMB submissions. |
| September 25, 2007 | Sheng Guo A new method to measure the semantic similarity of GO terms, Bioinformatics, 2007 |
September 27, 2007 | Corban Rivera Detecting Pathway Perturbation in Cancer |
| October 2, 2007 | Satish Tadepalli Continuous Hidden Process Model for Time Series Expression Experiments, ISMB, 2007 |
October 4, 2007 | Corban Rivera Detecting Pathway Perturbation in Cancer part II |
| October 9, 2007 | Matt Dyer Computation of significance scores of unweighted Gene Set Enrichment Analyses, BMC Bioinformatics, 2007 |
October 11, 2007 | David Badger and T. M. Murali Gene Function Prediction using Spectral Graph Partitioning |
| October 16, 2007 | Vandana Sreedharan Automated Discovery of Functional Generality of Human Gene Expression Programs, PLoS Computational Biology, 2007 |
October 18, 2007 | David Badger and T. M. Murali Gene Function Prediction using Spectral Graph Partitioning, part deux |
| October 23, 2007 | Clifford Conley Owens A Tight Upper Bound on the Number of Candidate Patterns, ACM TODS, 2005 |
October 25, 2007 | Research project planning |
| October 30, 2007 | General discussion | November 1, 2007 | Cancelled |
| November 6, 2007 | Corban Rivera Metagenes and molecular pattern discovery using matrix factorization, PNAS, 2004 and Metagene projection for cross-platform, cross-species characterization of global transcriptional states, PNAS, 2007 |
November 8, 2007 | John Thomas, Department of Computer Science, Dartmouth
College Graphical Models of Evolutionary Constraints in Protein Families |
| November 13, 2007 | General discussion | November 15, 2007 | |
| November 27, 2007 | General discussion | November 29, 2007 | Cancelled |
| December 4, 2007 | David Badger Systematic Discovery of Functional Modules and Context-Specific Functional Annotation of Human Genome, ISMB, 2007 |
December 6, 2007 | |
| December 11, 2007 | Matt Dyer Functional Annotation of Regulatory Pathways, ISMB, 2007 |
December 13, 2007 | |
| December 18, 2007 | Ying Jin Nested effects models for high-dimensional phenotyping screens, ISMB 2007. |
December 20, 2007 | |
Summer 2007 |
|||
| No research presentations during summer 2007 | |||
| May 17, 2007 | Cancelled | ||
| May 24, 2007 | Clifford Conley Owens Non-derivable Itemsets |
||
| May 31, 2007 | Ying Jin Compositional Data Mining |
||
| June 7, 2007 | T. M. Murali Report on First Bertinoro Systems Biology Workshop (Also, shortest obstacle-avoiding paths in the plane) |
||
| June 14, 2007 | Cancelled due to ACM TKDD deadline. | ||
| June 21, 2007 | Satish Tadepalli An Efficient Method for Dynamic Analysis of Gene Regulatory Networks and in silico Gene Perturbation Experiments, RECOMB 2007. |
||
| June 28, 2007 | Matt Dyer, ISMB 2007 practice talk Computational Prediction of Host-Pathogen Protein-Protein Interactions |
||
| No meetings in July 2007 | |||
| August 2, 2007 | Matt Dyer and T. M. Murali Report on ISMB 2007 |
||
Spring 2007 |
|||
| Jan 25, 2007 | Division of labour | Joe
Gresock Storytelling: Creative Discovery in Biomedical Literature |
|
| Feb 1, 2007 | David Badger and Naveed
Massjouni Data-driven modelling of signal-transduction networks |
John Gordon's M.S. thesis defence A new computationally facile analytical approximation of electrostatic potential suitable for macromolecules |
|
| Feb 8, 2007 | Cancelled | GBCB seminar@VBI, Jill
Sible, Dept. of Biology, VT Building a systems-level view of cell cycle checkpoints |
|
| Feb 9, 2007 Friday | MCBB Seminar
Series@VBI, 12:20pm, Barry Whitman, University of
Georgia The number and diversity of bacteria on Earth and why we care |
||
| Feb 15, 2007 | David Badger and Naveed
Massjouni Data-driven modelling of signal-transduction networks |
Satish Tadepalli Clustering by preserving dependencies between datasets |
|
| Feb 22, 2007 | Mahima Gopalakrishnan Identification of functional modules using network topology and high-throughput data |
Cancelled | |
| Mar 1, 2007 | Cancelled | Cancelled | |
| Mar 8, 2007 | Spring Break | ||
| Mar 15, 2007 | Satish Tadepalli Clustering by Passing Messages Between Data Points |
Corban Rivera Pathway Annotation Using Network-Constrained Biclustering |
|
| Mar 22, 2007 | Tony Xu Han CS Seminar Series, 655 McBryde Watching Humans and Detecting Their Behaviors |
Ying Jin Connections between Closed Itemsets and Minimal Separators |
|
| Mar 29, 2007 | Joe Gresock Predicting interactions in protein networks by completing defective cliques |
GBCB seminar@VBI,
Ron Breaker Discovering Genetic Switches and Logic Gates Made of RNA |
|
| Apr 5, 2007 | Ying Jin Prediction of Phenotype and Gene Expression for Combinations of Mutations |
Naveed Massjouni Tree Decompositions of Molecular Interaction Networks |
|
| Apr 12, 2007 | Arjun Krishnan Integrative molecular concept modeling of prostate cancer progression |
GBCB seminar@VBI,
Matthew D. Dyer Computational Prediction of Host-Pathogen Protein-Protein Interactions |
|
| Apr 15, 2007 |
Concert by Blind Elephant Quintet (featuring our very own
Joseph Gresock), 8pm, Squires recital salon Programme: Schubert Piano Quintet in A Major, "The Trout", Brahms Piano Quartet in C minor |
||
| Apr 19, 2007 | Deept Kumar's
Ph.D. thesis
defence, 2-4pm, 110 KWII The Redescription Algorithmic Framework: Applications to Bioinformatics Data |
||
| Apr 26, 2007 | T. M. Murali and Corban
Rivera Report on RECOMB 2007. |
Cancelled | |
| May 3, 2007 | Reading day | ||
| May 10, 2007 | Corban Rivera Refinement and expansion of signaling pathways: The osmotic response network in yeast |
||
Talk Abstracts
Graphical Models of
Evolutionary Constraints in Protein Families,
John Thomas, November 8, 2007
Evolutionary pressure on proteins to maintain structure and function
have constrained their sequences over time and across species. Thus,
the sequence record of these proteins contain valuable information
about the acceptable variation and covariation of amino acids in
members of a protein family. We have designed an approach to model
these evolutionary constraints using undirected graphical models.
Our graphical models of residue coupling (GMRCs) have a formal
probabilistic framework and are a strict generalization of
conservation based approaches such as hidden Markov models. Our
GMRCs can provide insight into biological activities, transparently
classify members of a protein family by functional class, design new
sequences that obey the evolutionary constraints, and predict
whether or not two members from interacting protein families will
interact. Results on PDZ domains, G-protein coupled receptors, and
WW domains show that GMRCs are able to successfully model
evolutionary constraints and assist in a wide range of problems in
bioinformatics.
The Landscape of Human Proteins
Targeted by Pathogens,
Matthew D. Dyer, September 13, 2007
(This research is joint work with T. M. Murali and Bruno Sobral)
Infectious diseases result in millions of deaths each year.
Mechanisms of infection for many pathogens have been studied in
detail. However, relatively unexplored are questions such as which
infection mechanisms and pathways are commonly triggered by multiple
pathogens, what the properties of the human proteins they target
are, and whether pathogens interact with certain functional classes
of human proteins. In this paper, we integrate human-pathogen
protein-protein interactions (PPIs) for 179 pathogen strains from
five public databases to provide the first study of the landscape of
human proteins interacting with pathogens. We analyze the network of
PPIs between these human proteins, and find that pathogens
specifically
interact with hubs (proteins with many interacting partners) and
bottlenecks (proteins that are central to many paths in the network).
We construct a network of PPIs between human
proteins interacting with at least two pathogens. Gene Ontology
functions enriched in this network reveal a number of processes and
complexes commonly participating in interactions with
multiple pathogens. Supplementary data is available.
Capturing
Truthiness: Mining Truth Tables in Binary Datasets,
Clifford Conley Owens, September 6, 2007
(This research is joint work with T. M. Murali and Naren Ramakrishnan.) We
introduce a new data mining problem, namely that of mining truth
tables in binary datasets. Given a matrix of objects and the
properties they satisfy, a truth table identifies a subset of
properties that exhibit maximal variability (and hence, complete
independence) in occurrence patterns over the underlying objects.
This problem is relevant in many domains, e.g., bioinformatics
where we seek to identify and model independent components of
combinatorial regulatory pathways, and in social/economic
demographics where we desire to determine independent behavioral
attributes of populations. Besides intrinsic interest in such
patterns, we show how the problem of mining truth tables is dual
to the problem of mining redescriptions, in that a set of
properties involved in a truth table cannot participate in any
possible redescription. This allows us to adapt our algorithm to
the problem of mining redescriptions as well, by first identifying
regions where redescriptions cannot happen, and then pursuing a
divide and conquer strategy around these regions. Furthermore, our
work suggests dual mining strategies where both classes of
algorithms can be brought to bear upon either data mining task. We
outline a family of levelwise approaches adapted to mining truth
tables, algorithmic optimizations, and applications to
bioinformatics and political datasets.
Computational Prediction
of Host-Pathogen Protein-Protein Interactions, Matthew D.
Dyer, April 12, 2007, GBCB seminar@VBI.
Infectious diseases such as malaria result in millions of deaths
each year. An important aspect of any host-pathogen system is the
mechanism by which a pathogen can infect its host. One method of
infection is via protein-protein interactions (PPIs) where
pathogen proteins target host proteins. Computational methods for
predicting PPIs have been applied only to proteins from the same
species. Developing computational methods that identify which PPIs
enable a pathogen to infect a host has great implications in
identifying potential targets for therapeutics.
We present a method that integrates known intra-species PPIs with
protein domain profiles to predict PPIs between host and pathogen
proteins. Given a set of intra-species PPIs, we identify the
functional domains in each of the interacting proteins. For every
pair of functional domains, we use Bayesian statistics to assess
the probability that two proteins with that pair of domains will
interact. We apply our method to the Homo sapiens - Plasmodium
falciparum host-pathogen system. Our system predicts 516 PPIs
between proteins from these two organisms. We show that human
protein pairs we predict to interact with the same Plasmodium
protein are close to each other in the human PPI network and that
Plasmodium pairs predicted to interact with same human protein are
co-expressed in DNA microarray datasets measured during various
stages of the Plasmodium life cycle. Finally, we identify
functionally enriched sub-networks spanned by the predicted
interactions and discuss the plausibility of our predictions.
Tree Decompositions
of Molecular Interaction Networks, Naveed Massjouni, April 5, 2007
Graph theory is a natural way to approach many problems in the
field of computational biology, especially in the context of
molecular interaction networks. A number of optimisation problems
involving such networks are intractable. We know that many hard
graph problems can be solved in linear or polynomial time if there
exists a tree decomposition of the graph with small treewidth.
Tree decompositions may be a natural way to model molecular
interaction networks and may provide new and useful
representations of these networks. In this presentation, I will
introduce the concepts of tree decompositions and the related
concept of treewidth. Since computing the treewidth of a graph is
also intractable, I will describe some heuristics to estimate
upper and lower bounds on the treewidth and present results on
human and yeast molecular interaction networks. I will also
discuss some potential ways in which this powerful tool may be
utilized to study interaction networks.
Discovering Genetic
Switches and Logic Gates Made of RNA, Ron Breaker,
Yale University, March 29, 2007, GBCB seminar@VBI
Riboswitches are natural metabolite-binding RNAs that control gene
expression in many bacterial species. They typically are found in
the non-coding regions of certain messenger RNAs, and control gene
expression by ligand-induced allosteric changes in RNA structure.
The most highly conserved portion of riboswitches is the
metabolite-binding aptamer domain, whose sequence and structural
features define each riboswitch class. Of the proven riboswitch
classes, the shortest aptamer domain measures only 34 nucleotides
while the largest requires approximately 200 nucleotides.
Interestingly, even the smaller riboswitch domains appear to be
capable of functioning as complex genetic switches. For example,
we have identified a riboswitch that uses two ligand-binding
domains to bind two glycine molecules. Together, these domains
cooperate to function as a more "digital" genetic switch, thus
allowing the organism to control gene expression in response to
small changes in the concentration of glycine. More recently, we
have identified natural RNAs that use components of different
riboswitch classes to function like two-input Boolean logic gates.
Our findings support the view that RNA is a versatile medium for
the construction of complex genetic elements.
Connections between Closed Itemsets and Minimal Separators, Ying Jin, Mar 29, 2007
In this presentation, I am going to present the relationship
between closed itemsets and minimal separators. Each element of a
concept lattice can be described as a closed itemset. Given a
binary relationship, we can define a concept lattice and an
underlying graph, such that each element of the concept lattice
maps to a minimal separator of the graph. The connections between
closed itemsets and minimal separators have a potential usage to
analysis and understand biological networks.
Tony Xu Han
Beckman Institute and ECE Department
University of Illinois at Urbana-Champaign
Watching Humans and
Detecting Their Behaviors, Department of Computer Science
Seminar Series, 655 McBryde, 2pm, March 22, 2007
Because of their broad and important applications, robust
detection and tracking of human have drawn attentions from
researchers in computer vision, machine learning, and multimedia
retrieval community. However, robust human detection and tracking
are extremely difficult owing to occlusion, illumination
variation, the high degree (usually 20-68) of freedom of
articulated body movement, and clothing. Hence, no existing
approach or model can achieve robust detection and tracking.
The Landscape of Human Proteins Targeted by Pathogens, Matthew D. Dyer, September 13, 2007
(This research is joint work with T. M. Murali and Bruno Sobral) Infectious diseases result in millions of deaths each year. Mechanisms of infection for many pathogens have been studied in detail. However, relatively unexplored are questions such as which infection mechanisms and pathways are commonly triggered by multiple pathogens, what the properties of the human proteins they target are, and whether pathogens interact with certain functional classes of human proteins. In this paper, we integrate human-pathogen protein-protein interactions (PPIs) for 179 pathogen strains from five public databases to provide the first study of the landscape of human proteins interacting with pathogens. We analyze the network of PPIs between these human proteins, and find that pathogens specifically interact with hubs (proteins with many interacting partners) and bottlenecks (proteins that are central to many paths in the network). We construct a network of PPIs between human proteins interacting with at least two pathogens. Gene Ontology functions enriched in this network reveal a number of processes and complexes commonly participating in interactions with multiple pathogens. Supplementary data is available.Capturing Truthiness: Mining Truth Tables in Binary Datasets, Clifford Conley Owens, September 6, 2007
(This research is joint work with T. M. Murali and Naren Ramakrishnan.) We introduce a new data mining problem, namely that of mining truth tables in binary datasets. Given a matrix of objects and the properties they satisfy, a truth table identifies a subset of properties that exhibit maximal variability (and hence, complete independence) in occurrence patterns over the underlying objects. This problem is relevant in many domains, e.g., bioinformatics where we seek to identify and model independent components of combinatorial regulatory pathways, and in social/economic demographics where we desire to determine independent behavioral attributes of populations. Besides intrinsic interest in such patterns, we show how the problem of mining truth tables is dual to the problem of mining redescriptions, in that a set of properties involved in a truth table cannot participate in any possible redescription. This allows us to adapt our algorithm to the problem of mining redescriptions as well, by first identifying regions where redescriptions cannot happen, and then pursuing a divide and conquer strategy around these regions. Furthermore, our work suggests dual mining strategies where both classes of algorithms can be brought to bear upon either data mining task. We outline a family of levelwise approaches adapted to mining truth tables, algorithmic optimizations, and applications to bioinformatics and political datasets.Computational Prediction of Host-Pathogen Protein-Protein Interactions, Matthew D. Dyer, April 12, 2007, GBCB seminar@VBI.
Infectious diseases such as malaria result in millions of deaths each year. An important aspect of any host-pathogen system is the mechanism by which a pathogen can infect its host. One method of infection is via protein-protein interactions (PPIs) where pathogen proteins target host proteins. Computational methods for predicting PPIs have been applied only to proteins from the same species. Developing computational methods that identify which PPIs enable a pathogen to infect a host has great implications in identifying potential targets for therapeutics.We present a method that integrates known intra-species PPIs with protein domain profiles to predict PPIs between host and pathogen proteins. Given a set of intra-species PPIs, we identify the functional domains in each of the interacting proteins. For every pair of functional domains, we use Bayesian statistics to assess the probability that two proteins with that pair of domains will interact. We apply our method to the Homo sapiens - Plasmodium falciparum host-pathogen system. Our system predicts 516 PPIs between proteins from these two organisms. We show that human protein pairs we predict to interact with the same Plasmodium protein are close to each other in the human PPI network and that Plasmodium pairs predicted to interact with same human protein are co-expressed in DNA microarray datasets measured during various stages of the Plasmodium life cycle. Finally, we identify functionally enriched sub-networks spanned by the predicted interactions and discuss the plausibility of our predictions.
Tree Decompositions of Molecular Interaction Networks, Naveed Massjouni, April 5, 2007
Graph theory is a natural way to approach many problems in the field of computational biology, especially in the context of molecular interaction networks. A number of optimisation problems involving such networks are intractable. We know that many hard graph problems can be solved in linear or polynomial time if there exists a tree decomposition of the graph with small treewidth. Tree decompositions may be a natural way to model molecular interaction networks and may provide new and useful representations of these networks. In this presentation, I will introduce the concepts of tree decompositions and the related concept of treewidth. Since computing the treewidth of a graph is also intractable, I will describe some heuristics to estimate upper and lower bounds on the treewidth and present results on human and yeast molecular interaction networks. I will also discuss some potential ways in which this powerful tool may be utilized to study interaction networks.Discovering Genetic Switches and Logic Gates Made of RNA, Ron Breaker, Yale University, March 29, 2007, GBCB seminar@VBI
Riboswitches are natural metabolite-binding RNAs that control gene expression in many bacterial species. They typically are found in the non-coding regions of certain messenger RNAs, and control gene expression by ligand-induced allosteric changes in RNA structure. The most highly conserved portion of riboswitches is the metabolite-binding aptamer domain, whose sequence and structural features define each riboswitch class. Of the proven riboswitch classes, the shortest aptamer domain measures only 34 nucleotides while the largest requires approximately 200 nucleotides. Interestingly, even the smaller riboswitch domains appear to be capable of functioning as complex genetic switches. For example, we have identified a riboswitch that uses two ligand-binding domains to bind two glycine molecules. Together, these domains cooperate to function as a more "digital" genetic switch, thus allowing the organism to control gene expression in response to small changes in the concentration of glycine. More recently, we have identified natural RNAs that use components of different riboswitch classes to function like two-input Boolean logic gates. Our findings support the view that RNA is a versatile medium for the construction of complex genetic elements.Connections between Closed Itemsets and Minimal Separators, Ying Jin, Mar 29, 2007
In this presentation, I am going to present the relationship between closed itemsets and minimal separators. Each element of a concept lattice can be described as a closed itemset. Given a binary relationship, we can define a concept lattice and an underlying graph, such that each element of the concept lattice maps to a minimal separator of the graph. The connections between closed itemsets and minimal separators have a potential usage to analysis and understand biological networks.Tony Xu Han
Beckman Institute and ECE Department
University of Illinois at Urbana-Champaign
Watching Humans and
Detecting Their Behaviors, Department of Computer Science
Seminar Series, 655 McBryde, 2pm, March 22, 2007
Because of their broad and important applications, robust
detection and tracking of human have drawn attentions from
researchers in computer vision, machine learning, and multimedia
retrieval community. However, robust human detection and tracking
are extremely difficult owing to occlusion, illumination
variation, the high degree (usually 20-68) of freedom of
articulated body movement, and clothing. Hence, no existing
approach or model can achieve robust detection and tracking.
We propose a fusion framework to integrate multiple cues by finding a set of optimal dynamic weights for different tracking modalities. In the setup of Bayesian sequential estimation, we give an optimal criterion to find the dynamic weight for each modality. The fusion problem is then formulated as an optimization problem with a non-convex objective function. We further frame the optimization problem as a constrained convex programming problem. The equations for finding the global optimal solution are given and an approximate analytical solution is derived. The fusion framework can determine reliable cues and apply them dynamically. With the proposed framework, we achieve robust human tracking in various challenging surveillance video sequences.
An efficient Nonparametric Belief Propagation (NBP) algorithm is also proposed, which can be applied to many tasks such as articulated human tracking, image segmentation, image denoising, and superresolution.
We also extend the HMAX model of the visual cortex for recognizing object in still image to a space- time version to detect human behaviors in video. We introduce a coarse-to-fine search and verification scheme in the space-time HMAX model for behavior matching. Our strategy enables the searching of human behaviors with motions ranging from drastic movements (e.g. sports) to small movements as subtle as facial expressions. The convincing results validate that the space-time HMAX model is both selective and robust for searching human behaviors.
The recognition and learning algorithms we present, including dynamic fusion, efficient Nonparametric Belief Propagation, and space-time HMAX model, are general tools, which can be applied to many other tasks such as medial image analysis, human animation, information retrieval, bioinformatics, and robotics.
Pathway Annotation Using Network-Constrained Biclustering, Corban Rivera, Mar 15, 2007
New molecular interactions are being uncovered at an alarming rate. Whole genome biological assays such as yeast 2-hybrid and coimmunoprecipitation are revealing many previously unknown molecular interactions. The purpose of many of these interactions remains unknown. We would like to discover the pathways to which these interactions belong. We have developed a novel method for annotating interactions with pathways. We compare our method to alternate interaction annotation methods. Finally, we examine a few of the novel annotations we predict for interactions between human proteins.Clustering by preserving dependencies between datasets, Satish Tadepalli, Feb 15, 2007
In this talk I am going to describe algorithms that can simultaneously find clusters in different data sets. The resultant clusters in each data set contain not only related data points but also preserve the dependencies between the data sets. The simplest case is when we have paired data samples and one-to-one relationships between the two data sets. I will describe our implementation of this case and the extensions we propose when we have multiple data sets and many-to-many relationships between the data sets that need to be preserved.Building a Systems-level view of cell cycle checkpoints, Jill Sible, GBCB seminar, Feb 8, 2007
Because the cell cycle underlies the growth and development of all eukaryotes, and misregulation of the cell cycle typifies cancers, achieving a systems-level of understanding how the cell cycle is controlled ranks among the most important goals in modern cell biology. We have paired experimental biology with mathematical modeling to discover that mitotic transitions are regulated by hysteresis and bistability. We are building on this foundation to address how the cell cycle is affected by external events, in particular those that threaten the integrity of the genome. Checkpoints arrest the cell cycle when a threat to genomic stability, such as unreplicated or damaged DNA, exists. Loss of checkpoint control characterizes nearly all cancers. We are addressing checkpoint control in three system: 1) the Xenopus laevis cell-free egg extract, a highly tractable experimental system where we can gather quantitative data to inform the model, 2) the developing X. laevis embryo where the checkpoint response is dynamic as the cell cycle remodels, and 3) wild populations of amphibian embryos where varying checkpoint responses may contribute to fitness in response to UV radiation. These studies have revealed a high level of plasticity in the checkpoint response spanning the cellular through the ecosystem levels.Storytelling: Creative Discovery in Biomedical Literature, Joe Gresock, Jan 25, 2007
We have developed a method of finding pathways between documents and concepts among biomedical papers. My implementation of the process, called storytelling, applies a single-pair-shortest-path algorithm to a similarity graph of a subset of the documents in the PubMed database. Each document is represented as a weighted set of terms. The graph contains each document as a node and each edge as a connection between two "similar" documents as determined by a weighted Jaccard's coefficent between their terms. I will describe my implementation in more detail as well as its application in a case study. Finally, I will present some of the results the case study both graphically and in the form of abstract, sentence, and keyword chains.
Papers to be Presented
- Our first list of papers is from the Nature web focus on systems biology.
- Nature's essays in the Connections series (it started on Jan 24, 2007) also seems to be interesting.
- I maintain a loosely-organised list of papers I want to read.
- We have local copies of all papers published in RECOMB 2007 and in ISMB 2007.
- Molecular Systems Biology has a blog.
- A multidimensional analysis of genes mutated in breast and colorectal cancers Jimmy Lin, Christine M. Gan, Xiaosong Zhang, Sian Jones, Tobias Sjoblom, Laura D. Wood, D. Williams Parsons, Nickolas Papadopoulos, Kenneth W. Kinzler, Bert Vogelstein, Giovanni Parmigiani, and Victor E. Velculescu Genome Res. published 10 August 2007, 10.1101/gr.6431107
- A strategy for predicting the chemosensitivity of human cancers and its application to drug discovery Jae K. Lee, Dmytro M. Havaleshko, HyungJun Cho, John N. Weinstein, Eric P. Kaldjian, John Karpovich, Andrew Grimshaw, and Dan Theodorescu
- Real and artificial immune systems: computing the state of the body Irun R. Cohen What is immune computing? Can the immune system compute? Does it use a computational strategy to function? In this Opinion article, Irun Cohen proposes that the answer to these questions is yes, and applies these ideas to different types of immunity.
- Quantitative Morphological Signatures Define Local Signaling Networks Regulating Cell Morphology Chris Bakal, John Aach, George Church, and Norbert Perrimon Science 22 June 2007: Vol. 316. no. 5832, pp. 1753 - 1756
- Chromosomal periodicity of evolutionarily conserved gene pairs Matthew A. Wright, Peter Kharchenko, George M. Church, and Daniel Segrè, PNAS, 19 June 2007; Vol. 104, No. 25
- Molecular Systems Biology has published papers from the Second Annual RECOMB Workshop on Systems Biology.
- Multiple High-Throughput Analyses Monitor the Response of E. coli to Perturbations, Ishii et al, Science, Volume 316, Issue 5824, p. 593, Apr 27, 2007
- Fast Routing in Road Networks with Transit Nodes Holger Bast, Stefan Funke, Peter Sanders, and Dominik Schultes Science, Volume 316, Issue 5824, p. 566, Apr 27, 2007
- Increasing prose quality by decreasing word repetition, Cheryl Strauss, Nature 12 April 2007 Volume 446 Number 7137 p725, 10.1038/446725c
- Network-based prediction of protein function, Roded Sharan Igor Ulitsky & Ron Shamir, Molecular Systems Biology 3:88
- From the Cover: Emergence of tempered preferential attachment from optimization, Raissa M. D'Souza, Christian Borgs, Jennifer T. Chayes, Noam Berger, and Robert D. Kleinberg, PNAS, 10 April 2007; Vol. 104, No. 15
- Discovering transcriptional regulatory regions in Drosophila by a nonalignment method for phylogenetic footprinting, Alona Sosinsky, Barry Honig, Richard S. Mann, and Andrea Califano, PNAS, 10 April 2007; Vol. 104, No. 15
- Metagene projection for cross-platform, cross-species characterization of global transcriptional states, Pablo Tamayo, Daniel Scanfeld, Benjamin L. Ebert, Michael A. Gillette, Charles W. M. Roberts, and Jill P. Mesirov, Proceedings of the National Academy of Sciences, 3 April 2007; Vol. 104, No. 14
- Clustering by Passing Messages Between Data Points, Brendan J. Frey and Delbert Dueck, Science, Volume 315, Issue 5814, p. 972, February 16 2007.
- High-Throughput Oncogene Mutation Profiling in Human Cancer, Nature Genetics 39, 347 - 351 (2007).
- An Integrated Mass Spectrometric and Computational Framework for the Analysis of Protein Interaction Networks, Nature Biotechnology 25, 345 - 352 (2007).
Journal Assignments
Each student has a journal or two assigned to him/her. The easiest way to follow journals is to sign up for email delivery of tables of contents. The student is responsible for keeping track of the papers published and suggesting interesting and/or relevant papers to Murali for scheduling. For each relevant paper, please send the title, authors, journal issue information, and a Pubmed ID, if available. Also send a few keywords to help us organise the papers. Keywords to use will become apparent as the papers section of this webpage becomes fleshed out.| Journal | Frequency | Student |
|---|---|---|
| Algorithms in Molecular Biology | Online | Ying Jin |
| Bioinformatics | Fortnightly | Sheng Guo, Satish Tadepalli |
| Cell | Monthly | |
| Biology Direct | Online | |
| BMC Bioinformatics | Monthly | David Badger, Naveed Massjouni |
| BMC Systems Biology | Monthly | David Badger, Mahima Gopalakrishnan |
| Genome Biology | Monthly | Naveed Massjouni, Corban Rivera |
| Genome Research | Monthly | Ying Jin, Clifford Owens |
| IEEE TKDE | Monthly | Clifford Owens, Mahima Gopalakrishnan |
| IEEE TCBB | Monthly | Mahima Gopalakrishnan, |
| Journal of Biology | Monthly | |
| Journal of Computational Biology | Monthly | Mahima Gopalakrishnan, Ying Jin |
| Nature | Weekly | Mahima Gopalakrishnan, Corban Rivera |
| Nature Genetics | Monthly | Sheng Guo, Corban Rivera |
| Nature Biotechnology | Monthly | Sheng Guo, Corban Rivera |
| Nature Reviews Cancer | Monthly | T. M. Murali |
| Nature Reviews Genetics | Monthly | T. M. Murali |
| Nature Reviews Molecular Cell Biology | Monthly | T. M. Murali |
| Molecular Systems Biology | Online | Naveed Massjouni, Satish Tadepalli |
| Nucleic Acids Research | Monthly | |
| Nucleic Acids Research database issue | Jan 1 each year | Everyone |
| Nucleic Acids Research web server issue | July 1 each year | Everyone |
| OMICS | Monthly | Clifford Owens |
| PLoS Biology | Monthly | David Badger, Matt Dyer |
| PLoS Computational Biology | Monthly | Sheng Guo, Satish Tadepalli |
| PLoS One | Online | Naren Ramakrishnan |
| PNAS | Weekly | David Badger |
| Science | Weekly | Matt Dyer, Satish Tadepalli |
- RECOMB
- ISMB
- ICSB
- PSB
- KDD
- ICDM
- ICML
- NIPS