Sequence Information began to grow enormously in the early 1980's.
Many Universities and Government organizations began to develop database projects, which are the primary databases.
Nucleic acid Primary Databases:
- Nucleic acid sequence database at NCBI : GenBank
- Nucleic acid sequence database at EMBL
- DDBJ: DNA Data Bank of Japan
- PIR
- MIPS
- SWISS-PROT
- TrEMBL
- NRL-3D
GenBank
GenBnak is the DNA sequence database maintained at NCBI; National Center for Biotechnology Information, USA.
As i had mentioned in the previous post it is a Primary Database.
GenBank is the major destination for the DNA sequences in the American continent.
GenBank is an anootated collection of the DNA sequence.
GenBank is accessible through the NCBI home page through several methods:
- Text and similarity searching - ENTREZ Browser.
- BLAST sequence similarity searching.
The EMBL Nuecleotide sequence database is maintained at the European Institute of Bioinformatics.
It was established in 1980's.
The DNA sequences that are generated by individual researchers and genome sequencing centers like Sanger Center are deposited in the EMBL through the world wide web.
Each entry is given an accession number .
EMBL is managed through ORACLE and allows inter-operatability between databases and integration of databases
DDBJ
DDBJ is established in 1986 at the National institute of Genetics in Japan.
DDBJ is the major DNA sequence database along with GenBank and the EMBL nucleotide sequence database.
It is the official database in Japan to accept the data and give accession numbers to the new entries.
DDBJ also develops many tools for retrieval and analysis of the nucleotide sequences.
The 3 Nucleotide sequence databases are collobaratively maintained and are reffered to as the "International Collobaration of Nucleotide Sequence Databases".
They operate the databases every 24 hours.
They exchange information/data between them.
The 3 databases work to collect DNA sequences from the public and make it available on the world wide web (WWW).
These databases are heterogenous in the sense that they contain DNA sequences from different sources and the extent to which annotation is given also differs.
Each entry in the database is given a unique Accession number (AC).
An Accession number is an alphanumeric character.
It is of the form X00000 or XY000000 i.e., one letter followed by 5 digits or 2 letters followed by six sigits.
Accession no' donot change for an entry. It can be removed only when the entry corresponding to it is removed from the database.
The sequences deposited may be modified some times. So a sequence version no (SV No) is given for the new version of the sequence.
A SV No is of the form, XY000000.1 i.e the AC No followed by a period and then a no indicating the sequence version.
For different sequence versions AC no Part remains the sane but the digit after the period differs.
As large amount of data is deposited in these databases by many researchers sometimes repetetions may occur. Thus these databases are Redundant. Though effors will be made to correct for the redundancy, it still remains to be a problem.
0 comments:
Post a Comment