Protein domain classification is a useful instrument to deduce functional properties of proteins. Several databases have been introduced that collect domains having a known structure, and SCOP is probably the most used one. It classifies domains in a four level hierarchy and it groups sequences according to both structural similarity and phylogenetic relation. Many automatic tools to classify domains according to available databases have been proposed so far. In this paper we introduce the notion of "fingerprint" as an easy and readable digest of the similarities between a sequence and an entire set of sequences, and this concept offers us a rationale for building an automatic SCOP classifier which assigns a query sequence to the most likely family. Fingerprint-based analysis has been implemented in a software tool and we report some experimental validations for it.
SCOP Family Fingerprints: An Information Theoretic Approach to Structural Classification of Protein Domains
Casagrande, A;
2011-01-01
Abstract
Protein domain classification is a useful instrument to deduce functional properties of proteins. Several databases have been introduced that collect domains having a known structure, and SCOP is probably the most used one. It classifies domains in a four level hierarchy and it groups sequences according to both structural similarity and phylogenetic relation. Many automatic tools to classify domains according to available databases have been proposed so far. In this paper we introduce the notion of "fingerprint" as an easy and readable digest of the similarities between a sequence and an entire set of sequences, and this concept offers us a rationale for building an automatic SCOP classifier which assigns a query sequence to the most likely family. Fingerprint-based analysis has been implemented in a software tool and we report some experimental validations for it.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.