Protein classification - Quick Links Edit
Protein classification - Web Resources Edit
Automatic classification of SWISS-PROT TrEMBL proteins. The CluSTr (Clusters of SWISS-PROT and TrEMBL proteins) database offers an automatic classification of SWISS-PROT and TrEMBL proteins into groups of related proteins. The clustering is based on analysis of all pairwise comparisons between protein sequences. Analysis has been carried out for different levels of protein similarity, yielding a hierarchical organisation of clusters. The database provides links to InterPro, which integrates information on protein families, domains and functional sites from PROSITE, PRINTS, Pfam, ProDom and SMART. Links to the InterPro graphical interface allow users to see at a glance whether proteins from the cluster share particular functional sites. CluSTr also provides cross-references to HSSP and PDB.
Phylogenetic classification of proteins from 44 complete genomes. The database of Clusters of Orthologous Groups of proteins (COGs) is an attempt on phylogenetic classification of the proteins encoded in complete genomes. Each COGs includes proteins that are inferred to be orthologs (direct evolutionary counterparts). The current release consists of 3166 COGs, which include 75725 proteins from 33 bacterial genomes, 9 archaeal genomes and two genomes of unicellular eukaryotes, the yeasts Saccharomyces cerevisiae and Candida albicans. The COG database is updated periodically as new genomes become available. The COGs can be applied to the task of functional annotation of newly sequenced genomes by using the COGNITOR program, which is available on the COG front page.
|DPL Forum: Too few categories!|
Web-based support vector machine software for functional classification of a protein from its primary sequence. SVM classification of a protein into functional family from its primary sequence. SVMProt classification system is trained from representative proteins of a number of functional families and seed proteins of Pfam curated protein families. It currently covers 54 functional families and additional families will be added in the near future. The computed accuracy for protein family classification is found to be in the range of 69.1–99.6%. SVMProt shows a certain degree of capability for the classification of distantly related proteins and homologous proteins of different function and thus may be used as a protein function prediction tool that complements sequence alignment methods.<pubmed argument="SVM-Prot">12824396</pubmed>
Annotation of Domains - helps to assign structual domains to protein sequences and to classify them according to SCOP. <pubmed argument="AnDom">11911710</pubmed>
Searching algorithm SYSTERS (SYSTEmatic Re-Searching) is based on all-against-all calls of a traditional database search tool like BLAST. To circumvent problems originating in asymmetric search results, for each pair of sequences identified by a database search a local alignment is computed using LALIGN and a symmetric E-value is recomputed.
The clustering approach is done in two steps. First, for every sequence in the database we perform a database search and recompute the search results as described above. Then, a single linkage tree is constructed and superfamilies are derived. These superfamilies are then split into family clusters.
Features a major database upgrade and improved tools for analysis and visualization of the ProtoNet hierarchy.
The SUPERFAMILY database provides structural assignments to protein sequences and a framework for analysis of the results. At the core of the database is a library of profile Hidden Markov Models that represent all proteins of known structure. The library is based on the SCOP classification of proteins: each model corresponds to a SCOP domain and aims to represent an entire superfamily. We have applied the library to predicted proteins from all completely sequenced genomes (currently 154), the Swiss-Prot and TrEMBL databases and other sequence collections. Close to 60% of all proteins have at least one match, and one half of all residues are covered by assignments.<pubmed argument="ProtoNet v4.0">14681402</pubmed>
TIGRFAMs Home Page
TIGRFAMs are protein families based on Hidden Markov Models or HMMs. Use this page to see the curated seed alignmet for each TIGRFAM, the full alignment of all family members and the cutoff scores for ...
ProtoMap - automatic hierarchical classification of proteins
ProtoMap+ is an automatic hierarchical classification of all SWISSPROT and TrEMBL proteins. Release 3.0. Total of 365174 proteins. Access/Search the hierarchy of clusters. Classify your new protein sequence. Introduction. Guided tour. ...
Protein Family Identification with Structure Anchored HMMs (FISH)
Accurate in identifying the family membership of domains in a query protein sequence, even in the case of very low sequence identities to known homologues.
Protein classification benchmark collections (for developers) Edit
A Protein Classification Benchmark collection for machine learning was created in order to provide standard datasets on which the performance of machine learning methods for structural and functional annotation of proteins can be compared.It is primarily meant for method developers and users interested in comparing methods under standardized conditions. The collection contains datasets of sequences and structures, and each set is subdivided into positive/negative, training/test sets in several ways. Includes 6405 classification tasks, 3297 on protein sequences, 3095 on protein structures and 10 on protein coding regions in DNA. Typical tasks include the classification of structural domains in the SCOP and CATH databases based on their sequences or structures, as well as various functional and taxonomic classification problems. <pubmed argument="Protein Classification Benchmark">17142240</pubmed>