Abstract:
Protein complexes are groups of interacting proteins that are central to multiple
biological processes. Studying protein complexes as well as their constituents can
enhance our understanding of cellular functions and malfunctions, and thus leads to
the development of more effective cures for diseases. High-throughput experimental
techniques allow the generation of large-scale protein-protein interaction datasets.
Accordingly, various computational approaches were proposed to predict protein
complexes from protein-protein interaction networks in which nodes and edges represent
proteins and their interactions, respectively. State-of-the-art approaches mainly
rely on clustering static networks to identify complexes. However, since protein interactions
are highly dynamic in nature, recent approaches seek to model such dynamics by typically integrating gene expression data and identifying protein complexes accordingly.
We propose MComplex, a method that uses time-series gene expression
with interaction data to generate a temporal network which is passed to a generative
adversarial network that utilizes a graph convolutional network as generator. This
creates embeddings which are then analyzed using a modified graph-based version
of the Mapper algorithm to detect corresponding protein complexes. We test our
approach on multiple benchmark datasets and compare identified complexes against
gold-standard protein complex datasets. Our results show that MComplex outperforms
existing methods in several evaluation aspects, namely recall and sensitivity
as well as a composite score covering aggregated evaluation measures.