REFBSS: Reference Based Similarity Search in Biological Network Databases
Tarih
Yazarlar
Dergi Başlığı
Dergi ISSN
Cilt Başlığı
Yayıncı
Erişim Hakkı
Özet
Biological networks, mostly abstracted as graphs, are key to many important activities inside the cell. Similarity-based analysis is one of the techniques for understanding the role of a query network. In that context, a database consisting of biological networks is aligned with a query network and the networks having a similarity score higher and lower than a predefined cutoff value are separated. Because of the NP-complete sub-graph isomorphism problem, nontrivial similarity score calculation is computationally too expensive. To this end, several methods are proposed in the literature for an acceptable solution. Reference-based indexing methods are one of the popular solutions which indexes the network database by extracting small sized networks as references to be aligned with the query network. Based on this strategy, we propose a novel model that has methodological and heuristic improvements for fast approximate similarity search, which all turn out to be fast and accurate. We also have a high-performance implementation on Hadoop that achieved 11.42 speedup on a Hadoop cluster with 18 cores on a sample KEGG network database.












