Protein databases

Proteome databases

Protein databases - Web Resources Edit

ExPASy - UniProt Knowledgebase: Swiss-Prot and TrEMBL
Provides a high level of annotation, minimal level of redundancy and high level of integration with other databases.
ExPASy proteomic server - most of the protein sequences are inferred from the gene nucleotide sequence.

PDB Protein Data Bank
Single worldwide repository for processing and distribution of 3-D biological macromolecular structure data

PIR (Protein Information Resource)
Contains curated protein families, classification-driven rules for the propagation of position-specific features, protein names, and GO terms to protein entries, as well as bibliographic attribution of experimental features.
Protein Information Database including direct sequencing data.

DPL Forum: Too few categories!

PEDANT genome database provides exhaustive automatic analysis of genomic sequences from 468 genomes by a large number of tools. GUI includes DNA and Protein viewers, Web Services data access. Pre-computed analyses are available to analyse protein function: BLAST searches and motif searches against public databases. Prediction of cellular roles and functions based on BLAST search against protein sequences with manually assigned functional categories (FunCat). Protein structure analysis includes the similarity-based identification of known 3D structures and structural domains, searching against PDB and SCOP, and predictions of transmembrane regions, low similarity regions and non-globular domains. (1996-2006) <pubmed argument=”Pedant”>17148486</pubmed>

Graphical integration of annotation for sets of proteins. PANDORA 1.1, developed at The Hebrew University of Jerusalem, allows search for any non-uniform sets of proteins for detecting subsets of proteins that share unique biological properties. PANDORA supports integration of many annotation sources. It is integrated into the ProtoNet system, thus allowing testing of thousands of automatically generated protein families.

Conformational Angles Database of Proteins
Conformation angles of protein structures, with associated crystallographic data (CADB). CADB (Conformation Angles DataBase) provides an online resource to access data on conformation angles (both main-chain and side-chain) of protein structures in two data sets corresponding to 25% and 90% sequence identity between any two proteins, available in the Protein Data Bank. In addition, the database contains the necessary crystallographic parameters. The package has several flexible options and display facilities to visualize the main-chain and the side-chain conformation angles for a particular amino acid residue. The package can also be used to study the interrelationship between the main-chain and the side-chain conformation angles. A web based JAVA graphics interface has been deployed to display the user interested information on the client machine.

EXProt (database for EXPerimentally verified Protein functions)
Non-redundant protein database with entries from genome annotation projects and public databases, aiming at including only proteins with an experimentally verified function with links to references in Medline/PubMed.
Database for proteins with an experimentally verified function. EXProt can be searched using a FASTA or Blast service. EXPerimentally verified Protein functions) is a new non-redundant database containing protein sequences for which the function has been experimentally verified. EXProt, Release 2.01 is a selection of 6491 entries which are described to have an experimentally verified function, from Pseudomonas Community Annotation Project (1; PseudoCAP), from E. coli genome and proteome database (2; GenProtEC) and from division Prokaryotes of the EMBL Nucleotide Sequence Database (3), Release 69. The entries in EXProt all have a unique ID number and provide information about organism, protein sequence, functional annotation, link to entry in original database, and if known, gene name and link to references in PubMed.

Database of annotated comparative protein structure models, and associated resources. MODBASE is a relational database of annotated comparative protein structure models for all available protein sequences matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on the MODELLER package for fold assignment, sequence–structure alignment, model building and model assessment.

Database of eukaryotic protein-encoding genes. Xpro is a relational database that contains all the eukaryotic protein-encoding DNA sequences contained in GenBank with associated data required for the analysis of eukaryotic gene architecture. In addition to the information found in the GenBank records, which includes properties such as sequence, position, length and description about introns, exons and protein-coding regions, Xpro provides annotations on the splice sites and intron phases. Furthermore, Xpro validates intron positions using alignment information between the records sequence and EST sequences found in dbEST. In the process of validation, alternative splicing information is also obtained and can be found in the database. The intron-containing genes in the Xpro are also classified as experimental or predicted based on the intron position validation and specific keywords in the GenBank records that are present in predicted genes. An Entrez-like query system, which is familiar to most biologists, is provided for accessing the information present in the database system. A non-redundant set of Xpro database contents is also obtained by cross-referencing to the Swiss-Prot/TrEMBL and Pfam databases. The database currently contains information for 493 983 genes—351 918 intron- containing genes and 142 065 intron-less genes.<pubmed argument="Xpro">14681359</pubmed>

Trome, trEST and trGEN
Databases of predicted protein sequences. <pubmed argument="Trome, trEST and trGEN">14681469</pubmed>

The Kabat Database Website
The Kabat Database of Sequences of Proteins of Immunological Interest

OWL OWL is a non-redundant composite of 4 publicly-available primary sources: SWISS-PROT, PIR (1-3), GenBank (translation) and NRL-3D. SWISS-PROT is the highest priority source, all others being compared against it to eliminate identical ...

BIOBASE's Proteome databases contain comprehensive information about the entire proteomes of more than twenty species, from human, mouse, rat, C. elegans, budding and fission yeast among others. The BioKnowledge Workspace Viewer enables visualization of protein-protein interactions and interaction networks.

