Showing posts with label BIOINFORMATICS. Show all posts

INTRODUCTION TO PRIMARY SEQUENCE DATABASES

Posted by m.s.chowdary at 2:14 AM

Monday, November 24, 2008

Sequence Information began to grow enormously in the early 1980's.
Many Universities and Government organizations began to develop database projects, which are the primary databases.

Nucleic acid Primary Databases:

Nucleic acid sequence database at NCBI : GenBank
Nucleic acid sequence database at EMBL
DDBJ: DNA Data Bank of Japan

Protein sequence Primary Databases:

PIR
MIPS
SWISS-PROT
TrEMBL
NRL-3D

GenBank
GenBnak is the DNA sequence database maintained at NCBI; National Center for Biotechnology Information, USA.
As i had mentioned in the previous post it is a Primary Database.
GenBank is the major destination for the DNA sequences in the American continent.
GenBank is an anootated collection of the DNA sequence.
GenBank is accessible through the NCBI home page through several methods:

Text and similarity searching - ENTREZ Browser.
BLAST sequence similarity searching.

EMBL Nucleotide Sequence Database
The EMBL Nuecleotide sequence database is maintained at the European Institute of Bioinformatics.
It was established in 1980's.
The DNA sequences that are generated by individual researchers and genome sequencing centers like Sanger Center are deposited in the EMBL through the world wide web.
Each entry is given an accession number .
EMBL is managed through ORACLE and allows inter-operatability between databases and integration of databases

DDBJ
DDBJ is established in 1986 at the National institute of Genetics in Japan.
DDBJ is the major DNA sequence database along with GenBank and the EMBL nucleotide sequence database.
It is the official database in Japan to accept the data and give accession numbers to the new entries.
DDBJ also develops many tools for retrieval and analysis of the nucleotide sequences.

The 3 Nucleotide sequence databases are collobaratively maintained and are reffered to as the "International Collobaration of Nucleotide Sequence Databases".
They operate the databases every 24 hours.
They exchange information/data between them.
The 3 databases work to collect DNA sequences from the public and make it available on the world wide web (WWW).
These databases are heterogenous in the sense that they contain DNA sequences from different sources and the extent to which annotation is given also differs.
Each entry in the database is given a unique Accession number (AC).
An Accession number is an alphanumeric character.
It is of the form X00000 or XY000000 i.e., one letter followed by 5 digits or 2 letters followed by six sigits.
Accession no' donot change for an entry. It can be removed only when the entry corresponding to it is removed from the database.
The sequences deposited may be modified some times. So a sequence version no (SV No) is given for the new version of the sequence.
A SV No is of the form, XY000000.1 i.e the AC No followed by a period and then a no indicating the sequence version.
For different sequence versions AC no Part remains the sane but the digit after the period differs.
As large amount of data is deposited in these databases by many researchers sometimes repetetions may occur. Thus these databases are Redundant. Though effors will be made to correct for the redundancy, it still remains to be a problem.

SECONDARY DATABASES

Posted by m.s.chowdary at 2:10 AM

You can follow the below link for material on Secondary Databases. You can even have material on PDB.

Secondary Databases and PDB

Pfam DATABASE

Posted by m.s.chowdary at 11:09 PM

Sunday, November 23, 2008

Pfam database uses Hidden Markov Models (HMM's)
HMM's are statistically based mathematical treatments, consisting of linear chains of match, delete or insert states that attemt to encode the sequence conservation with in aligned families.
Pfam database has a collection of HMM's for a range of protein domains which is maitained at the Sanger center.
Pfam database is based on 2 distinct classes of alignments:
1) Hand edited sedd alignments : Tese are considered to be accurate and are used in Pfam-A.
2) Alignments derived by automatic clustering of SWISS-PROT: tHESE ARE LESS RELIABLE.

The high quality seed alignments are used to build HMM's to which sequences are automatically aligned to generate final full alignment.
If the alignments donot produce diagonostically sound HMM's, the seed is improved and gahtering process is iterated untill a good result is achieved.
The collection of seed and full alignments, the HMM's, database and literature cross references constitute the Pfam-A
All sequence domains that are not included in Pfam-A are automatically clustered and deposited in Pfam-B.

STRUCTURE OF Pfam-A Entries

AC PF00001
ID 7tm
DE 7 transmembrane receptor (rhodopsin Family)
AU ..
AL ..
AM ..
SE ..
DR ..
DR ..
GA ..
DR ..

The format is compatible with PROSITE.

A brief description about the Lines is given below:

ID Line: A single Keyword is used
AC Line: Has the Accession number. The form of Accession number is PF00000
DE Line: Provides description of the family and the title.
AU Line: Indicates the author of the entry.
AL Line: Methods used to create seed alignments are provided in the AL Line.
AM Line: Methods used to create full alignment
SE Line: Information about the Source database
DR Lines : contain the Database cross references.

PROFILES DATABASE

Posted by m.s.chowdary at 10:21 PM

PRINTS, BLOCKS, PROSITE and IDENTIFY are motif based databases.
An alternative philosophy to the motif based approach of protein family characterization is that the variable sequences in between the conserved motifs are also valuable in protein family characterisation.
A 'complete sequence alignment' is used as the discriminator in PROFILES Database as blocks are the discriminators in BLOCKS database.
Profile is the discriminator in the PROFILES Database . A profile is also called a weighted matrix because it is weighted to indicate where insertions and deletions are allowed, where the most conserved regions are present etc.,
Profiles provide a sensitive means of detecting distant relationships, where only few residues are well conserved; in these circumstances regular expressions cannot provide good discrimination.
These profiles are the bases of the PROFILES Database.
The PROFILES Database is developed at the Swiss Institute of experimental cancer research in Lausanne.
Each Profile has seperate data and family annotation files as PROSITE has separate data and documentation files.

STRUCTURE OF PROSITE PROFILE ENTRIES:

ID SH3; MATRIX
AC PS50002
DT NOV-1995 (CREATED); NOV-1995(DATA UPDATED); NOV-1995(INFO UPDATED)

DE Src homology 3 (SH3) domain profile.
MA
MA
.
.
.
MA
MA
MA
MA
MA
MA
MA
MA
.
.
.
.

The above figure illustrates a part of the profile used to characterize the SH3 domain.

The Structure of a profile entry is based on PROSITE but with obvious differences.
The structure has following lines :-
1) ID Line : Identification line
2) Accession number line (AC Line) : The accession number doesnt change and it remains the same as long as the entry exists in the database. It is of the form PS00000.
3) DT Line : Date Line : Date Line follows the AC line. It has information on when the entry is created and when its last updated.
4) DE line: It five the description about the Profile.
5) MA line follows the description line. MA line means Matrix Lines. The PA(Pattern) lines are replaced by the Matrix lines. MA lines tests various parameter specifications used to derive and describe the profile. They include details if the alphabet used (i.e, whether nucleic acid {AGCT} or aminoacid {ABCD......}), LENGTH OF THE pROFILE ETC.

PROFILE Database id accessible through ISREC Web Server

IDENTIFY

Posted by m.s.chowdary at 10:12 PM

IDENTIFY database is derived from BLOCKS and PRINTS.
It is developed at the department of biochemistry in Stanford University.
It is generated using the program eMOTIF
The technique employing eMOTIF is designed to be more flexible than the exact regular expression matching, but it also has an inherent signal to noise ratio i.e., the resulting patterns not only have the potential to make more true positives, but they will consequently also match false positives.
IDENTIFY and its search software eMOTIF are accesible for use via the protein function web server from the biochemistry department at Stanford University.

DYNAMIC PROGRAMMING Vs HUERISTIC METHODS

Posted by m.s.chowdary at 10:55 AM

Dynamic Programming Methods

In dynamic programming every pair of residues are compared and an alignment is generated to give a maximum score.

Dynamic programming method used to create a global alignment is Needleman – Wunch Algorithm.

Dynamic programming method used for creating a local alignment is the Smith-Waterman Algorithm.

Heuristic Methods

Dynamic Programming methods are guaranteed to find the optimal alignment. In particular the scoring scheme involving Affine gap score is generally regarded as providing the most sensitive sequence matches. However, they are not fastest sequence alignment procedures. For applications like homology searches using large database, speed becomes a prime issue. The dynamic programming has time complexity of the order of O (n*m) where n and m are the lengths of the sequences considered for comparisons (i.e., n and m are number of residues/nucleotides) to make a search in a database containing 100 million residues we need to evaluate about 10 to the power 11 matrix elements.

This may take several hours of computer time. If we want to make homology searches for many sequences it becomes very tiring job as far as computer time is concerned.

In order to reduce the time consumed for comparison a few heuristic methods have been developed. The word heuristic means an alternative procedure mostly based on probably an educated guess. In heuristic methods there is always trade-off between sensitivity and speed. The most popular methods are BLAST and FASTA.

PAM Vs BLOSUM

Posted by m.s.chowdary at 10:54 AM

Difference Between PAM & BLOSUM

1. PAM matrices are based on an explicit evolutionary model (i.e. replacements are counted on the branches of a phylogenetic tree), whereas the BLOSUM matrices are based on an implicit model of evolution.
2. The PAM matrices are based on mutations observed throughout a global alignment, this includes both highly conserved and highly mutable regions. The BLOSUM matrices are based only on highly conserved regions in series of alignments forbidden to contain gaps.
3. The method used to count the replacements is different: unlike the PAM matrix, the BLOSUM procedure uses groups of sequences within which not all mutations are counted the same.
4. Higher numbers in the PAM matrix naming scheme denote larger evolutionary distance, while larger numbers in the BLOSUM matrix naming scheme denote higher sequence similarity and therefore smaller evolutionary distance. Example: PAM150 is used for more distant sequences than PAM100; BLOSUM62 is used for closer sequences than Blosum50.

PAM & BLOSUM Equivalents

PAM 100 => BLOSUM 90
PAM 120 => BLOSUM 80
PAM 160 => BLOSUM 60
PAM 200 => BLOSUM 52
PAM 250 => BLOSUM 40

BLOSUM

Posted by m.s.chowdary at 10:53 AM

BLOSUM matrices were constructed by Henikoff and Henikoff.

They derived the BLOSUM matrices from a set of ungapped regions from protein database called the BLOCKS database. The procedure is as follows:
Whenever two sequence were found to have sequence identity greater than some L%, they were put into the same CLuster. This way they clustered sequences from each BLOCK.
Then the frequency Aab, of observing a in one cluster aligned against b in another cluster. Correction for the sizes of the clusters is done byweighting each occurence by 1/(n1n2), where n1 and n2 are the respective cluster sizes.
From Aab they estimated qa and Pab. From these they derived the score matrix entries using the standard equation s(a,b) = log Pab/qa qb.
The resulting log odds matrices were scaled and rounded to the nearest integer value.

The Matrices for L=62 and L=50 are widely used for pairwise alignment and Database searching.

BLOSUM 62 is the standard for ungapped alignments and BLOSUM50 is the standard for gapped alignments.

(Lower L values correspond to longer evolutionary time and are applicable for more distant searches.)

PAM MATRICES

Posted by m.s.chowdary at 10:50 AM

PAM stands for ‘Point Accepted Mutations’

PAM matrices are developed based on the PAM model of evolution. This model assumes that the evolutionary changes occur according to Markov Model which states that residue mutations are independent of previous mutations.

PAM is a unit of evolutionary divergence in which 1% of the amino acids had changed. This doesn’t mean that after 100 PAM’s all the aminoacids are different.

The PAM (Point Accepted Mutation) matrices were developed by Margaret Dayhoff in the 1970s. The basi for their approach is to obtain substitution data from alignments between very similar proteins, allowing for the evolutionary relationships of the protein families, and then extrapolate this information to longer evolutionary distances. They constructed hypothetical hylogenetic trees using Parsiminy method.

The similarity scores are obtained by taking the natural logarithms of the frequencies. Thus they have constructed the PAM1 Matrix. The PAM1 matrix estimates what rate of substitution would be expected if 1% of the amino acids had changed. The PAM1 matrix is used as the basis for calculating other matrices by assuming that repeated mutations would follow the same pattern as those in the PAM1 matrix, and multiple substitutions can occur at the same site. Using this logic, Dayhoff derived matrices as high as PAM250. Usually the PAM 30 and the PAM70 are used.

A matrix for divergent sequences can be calculated from a matrix for closely related sequences by taking the second matrix to a power. For instance, we can roughly approximate the matrix2 from the matrix1 by saying 'Matrix2 Is Equal To The Square Of Matrix1'. This is how the PAM250 matrix is calculated.

Problems with PAM Matrices:

1) Evolutionary rates vary greatly with in a protein
2) Each position has its own 3-D environment
3) Environment changes over evolutionary time.

SUBSTITUTION MATRICES

Posted by m.s.chowdary at 10:49 AM

A substitution matrix describes the rate at which one character in a sequence changes to other character states over time. Substitution matrices are usually seen in the context of SEQUENCE ALIGNMENTS; aminoacid sequences or nucleotide sequences, where the similarity between sequences depends on their divergence time and the substitution rates as represented in the matrix.

PAM and BLOSUM are the most widely used Substitution matrices

DOT MATRIX ANALYSIS

Posted by m.s.chowdary at 10:48 AM

This is a graphical method

Primarily used for finding regions of local matches between sequences

Introduced by Gibbs and McIntyre in 1970.

It’s a very simple method.

The two sequences to be compared are placed as the rows and columns of a matrix.

Then the residues are compared and a dot is plotted at every point where a match is found.

A modification to the above method is suggested. Instead of looking for a match at every residue, some amount of match in every successive overlapping window of residues sought. Usually for DNA the window is of 11 residues is ought with at least 7 matches and for proteins sequences a window of 3 residues with minimum 2 matches are considered. This kind of smoothening procedure leads to a plot with dots representing only significant matches.

Ther e are a number of programs available for DOT Matrix Analyses like : DOTTER, DOTLET, PALIGN.

A number of alignments are possible by using a DOT Matrix Plot. We have to get an optimum alignment for these plots.

Dot matrix Technique is developed in 3 stages

1) Residue to Residue Matching

There are certain disadvantages with this:

a) Signal – noise ratio is low

Signal means visual pattern on the graph

b) The chance of random match is high

So the technique is improved by the use of words or oligimers.

2) Use of words

Advantages : this method decreased the chance of random match.

Disadvantages : when the length of the word is too long then the number of dots decreases as there is low probability that a given word matches in between two sequences.

3) Use of windows

Instead of words or oligomers we here use windows. For protein sequences a window should have 3 residues and for nucleotide sequences a window has 11 residues.

With in a window we set a minimum number of matches.

This method has the advantage of decreasing the chance of random match and at the same time overcomes the disadvantage associated with using words/oligomers.

BIO INFORMATICS Question Papers (Supple, 2007)

Posted by m.s.chowdary at 7:50 AM

Wednesday, October 15, 2008

SET : 1

1. What is Bioinformatics? Describe its scope in modern biology? [16]

2. Write short notes on:
(a) Gapped BLAST
(b) PSI-BLAST
(c) BLAST (2) [16]

3. What is multiple sequence alignment? Describe the applications of multiple sequence alignments? [16]

4. Describe the following:
(a) Structural classifications of proteins
(b) The CATH (Class, Architecture, Topology, Homology) databases [16]

5. Explain Database searching with Smith-waterman Dynamic programming method? [16]

6. Describe about Phylogenetic tree construction by using UPGMA method? [16]

7. Define Genome? Outline the structure and composition of prokaryotic and Eukaryotic genomes? [16]

8. Write short notes on:
(a) GRAIL II
(b) ORF [16]

SET : 2

1. Describe the following:
(a) Uniform Resource Locator (URL)
(b) Role of Internet in Bioinformatics [16]

2. Write a note on Parametric sequence alignment? [16]

3. How to multiply Aligned sequences and assessing quality of alignment? [16]

4. Define Secondary databases? Give an overview of secondary databases? [16]

5. What is Block Substitution Matrices (BLOSUM)? Describe them in detail? [16]

6. Write short notes on:
(a) Star decomposition
(b) PHYLIP programs [16]

7. Define Genome? Outline the structure and composition of prokaryotic and Eukaryotic genomes? [16]

8. Write an essay on gene Prediction in Eukaryotes? [16]

SET :3

1. What are the basic Computer skills required for Bioinformatician? Write elementary commands in Linux operating system? [16]

2. Explain about different BLAST programs in detail with a neat flow charts and how are they useful? [16]

3. How to multiply Aligned sequences and assessing quality of alignment? [16]

4. Write short notes on:
(a) Tr-EMBL
(b) P-fam [16]

5. Name the Database search algorithm employed in alignment of sequences and explain in detail about any one of it? [16]

6. Discuss in detail about Character based methods in phylogenetic Analysis? [16]

7. Describe briefly about different types of Gene mapping? [16]

8. Write short notes on:
(a) GRAIL II
(b) ORF [16]

SET : 4

1. Write short notes on the following:
(a) Domain and Domain name
(b) Modem
(c) Routers
(d) FQDN [16]

2. Describe the following:
(a) Importance of GAPs and GAP penalties in sequence alignment
(b) Edit distance of two strings
(c) Dot matrix Analysis [16]

3. What is a multiple alignment and why we do it? [16]

4. Write short notes on:
(a) PDB
(b) SCOP [16]

5. What is Block Substitution Matrices (BLOSUM)? Describe them in detail? [16]

6. Describe about Phylogenetic tree construction by using UPGMA method? [16]

7. Write short notes on:
(a) Computer tools for sequencing
(b) DNA fingerprinting [16]

8. Write short notes on:
(a) GRAIL II
(b) ORF [16]

BIO INFORMATICS Question Papers (Regular, 2007)

Posted by m.s.chowdary at 7:50 AM

SET : 1

1. Write short notes on the following :

a) Domain and domain name

b) Modem

c) Routers

d) FQDN

2. What types of scoring matrix is used by BLAST? Explain about different substitution matrices?

3. Explain in detail about the databases of multiple alignments?

4. Write an assay on different genome (DNA) information resource?

5. Who created BLAST and explain the type of scoring matrix used by BLAST?

6. Discuss about relationships of phylogenetic analysis to sequence alignment?

7. What are genomic MAP elements? Explain about different types of Maps?

8. What is a gene? Write the fine structure of the gene and compare the structural differences of gene between prokaryotes and eukaryotes?

SET : 2

1. What is telnet? Explain its functioning system?

2. What is meant by similarity searching? Explain in detail about sequence similarity search tools?

3. What is multiple sequence alignment? Describe the applications of multiple sequence alignments?

4. Describe the following :

a) Structural classification of proteins

b) The CATH (Class, Architecture, Topology, Homology ) databases.

5. What is Block Substitution Matrices (BLOSUM)? Describe them in detail.

6. Explain the significance of phylogenetic analysis in inferring the relationship between distantly related sequences?

7. Describe the steps involved in sequence assembly?

8. Write short notes on :

a) Contant based methods

b) Comparative methods

c) Central dogma in gene expression.

SET : 3

1. Explain the steps involved in protein prediction and modeling by using Bioinformatics?

2. What are Direct and Inverted repeats? Suggest a comparision method for knowing direct and inverted repeats and explain about it?

3. Write short notes on :

a) CLUSTAL W

b) PILEUP

4. What is meant by database similarity searches? Explain different ways of database similarity searching?

5. What is Block Substitution Matrices (BLOSUM)? Describe them in detail.

6. What are ultra metric trees and distances? Describe them in detail.

7. What are the important problems that will be encountered in sequencing and explain about them?

8. Write short notes on :

a) GRAIL II

b) ORF

SET : 4

1. What is Bioinformatics? Describe its scope in modern biology?

2. Explain about different BLAST programs in detail with a neat flow chart and how are they useful?

3. Define Multiple Alignments? Describe in detail methods employed for Multiple sequence alignment?

4. Explain about FASTA sequence database similarity searching?

5. Describe the different BLAST suits used to search DNA and protein databases.

6. Discuss in detail about character based methods in phylogenetic analysis?

7. Define gene mapping? Explain about applications of mapping?

8. Write short notes on :

c) GRAIL II

d) ORF

BIO INFORMATICS Question Papers (Supple, 2006)

Posted by m.s.chowdary at 7:49 AM

SET : 1

1. Why is bioinformatics important and explain its applications in the field of biology?

2. What is a sequence alignment? Describe the significance of sequence alignment in detail and list out the types of sequence alignment?

3. Write about hidden markov models of multiple sequence alignment?

4. How can you classify sequence database? Describe about nucleotide sequence databases?

5. Who created BLAST and explain the type of scoring matrix used by BLAST?

6. Discuss about relationships of phylogenetic analysis to sequence alignment?

7. Write short notes on :

a) Computer tools for sequencing

b) DNA finger printing

8. What is a gene? Write the fine structure of the gene and compare the structural differences of gene between prokaryotes and eukaryotes?

SET : 2

1. Describe the following :

a) Uniform resource locator

b) Role of internet in bioinformatics

2. Explain about different BLAST program in detail with a neat flow charts and how are they useful?

3. Write short notes on :

a) Relationship of multiple sequence alignment to phylogenetic analysis.

b) Uses of multiple sequence alignment.

4. Write an essay on different genome (DNA) information resources?

5. Explain the importance of substitution matrices in sequence alignment?

6. Describe about phylogenetic tree construction by using UPGMA method?

7. Define genome? Outline the structure and composition of prokaryotic and eukaryotic genomes?

8. What are neural networks? How they are useful in predicting a gene structure?

SET : 3

1. What are the main objectives of bioinformatics and what does bioinformatics comprise of?

2. Describe dynamic programming algorithm in detail?

3. What is multiple sequence alignment? Describe the applications of multiple sequence alignments?

4. Write a brief notes on the following :

a) Sub division of genbank

b) Structure of a genebank record

5. What is Block Substitution Matrices (BLOSUM)? Describe them in detail.

6. Describe about phylogenetic tree construction by using UPGMA method?

7. Define genome? Outline the structure and composition of prokaryotic and eukaryotic genomes?

8. Write short notes on :

a) GRAIL II

b) ORF

SET : 4

1. What is bioinformatics? Describe its scope in modern biology?

2. Explain about computational method for aligning DNA and protein sequences?

3. Describe any one progressive method of multiple alignment?

4. Define databases? Give an overview of biological databases?

5. Name the database search algorithm employed in alignment of sequences and explain in detail about any one of it?

6. Describe the following :

a) Perfect phylogeny

b) The relationship of phylogenetic analysis and sequence alignment.

7. Write short notes on :

a) GRAIL II

b) ORF.

BIO INFORMATICS Question Papers (Regular, 2006)

Posted by m.s.chowdary at 7:47 AM

SET : 1

1. What is Bioinformatics? Describe its scope in modern biology?

2. Write short note on :

a) Dot plot

b) Local alignment

3. Write short notes on :

a) Relationship of multiple sequence alignment to phylogenetic analysis

b) Uses of multiple sequence alignment

4. Explain in detail about structural databases?

5. What is Block Substitution Matrices (BLOSUM)? Describe them in detail.

6. Describe the following :

a) Perfect phylogeny

b) The relationship of phylogenetic analysis and sequence assembly?

7. Discuss in detail about the steps involved in sequence assembly?

8. Write brief account on any two of the following :

a) Codov usage

b) Intron and exons

c) Regulatory sequences and their role.

SET : 2

1. Explain the steps involved in protein prediction and modeling by using bioinformatics?

2. Explain in detail about dynamics : Programming method for sequence alignment?

3. Give the general account on different common multiple sequence alignment methods?

4. Explain about BLAST sequence database search?

5. What is Block Substitution Matrices (BLOSUM)? Describe them in detail.

6. Describe the following :

c) Perfect phylogeny

d) The relationship of phylogenetic analysis and sequence assembly?

7. Define genome? Outline the structure and composition of prokaryotic and eukaryotic genomes?

8. Write short notes on :

a) GRAIL II

b) ORF

SET : 3

1. What is Bioinformatics? Describe its scope in modern biology?

2. Explain about computational method for aligning DNA and protein sequences.

3. What is multiple sequence alignment? Describe the applications of multiple sequence alignments?

4. Describe the following :

a) Structural classification of proteins

b) The CATH (Class, Architecture, Topology, Homology ) databases.

5. What is Block Substitution Matrices (BLOSUM)? Describe them in detail.

6. Define parsimony? Write in detail how parsimony is used to infer phylogenetic relationships?

7. Write short notes on :

a) Clone map

b) STS map

8. Write short notes on :

c) GRAIL II

d) ORF

BIO INFORMATICS Question Papers (Supple, 2005)

Posted by m.s.chowdary at 7:46 AM

SET : 1

1. What is HTTP? Explain about operation of HTTP? [16]

2. Explain about computational method for aligning DNA and Protein sequences?[16]

3. Define Multiple Alignment? Describe in detail about methods employed for Multiple sequence alignment? [16]

4. What is meant by Database Similarity searches? Explain different ways of Database similarity searching? [16]

5. Explain about Percent Accepted Mutation? [16]

6. Describe the following:
(a) Perfect phylogeny
(b) The relationship of Phylogenetic analysis and sequence alignment. [16]

7. Define Genome? Outline the structure and composition of prokaryotic and Eukaryotic genomes? [16]

8. What is a gene? Write the fine structure of the gene and compare the structural differences of gene between prokaryotes and eukaryotes? [16]

SET :2

1. Explain the steps involved in protein prediction and modeling by using Bioinformatics? [16]

2. Give an account of BLAST programs and how are they useful? [16]

3. Explain about Localized alignments in sequences? [16]

4. Describe the different databases available for the storage of protein information resources? [16]

5. Explain about Percent Accepted Mutation? [16]

6. Discuss about Concept of Trees in phylogenetic Analysis? [16]

7. Define Genome? Outline the structure and composition of prokaryotic and Eukaryotic genomes? [16]

8. Write short notes on:
(a) GRAIL II
(b) ORF [16]

SET :3

1. Describe the following:
(a) Uniform Resource Locator (URL)
(b) Role of Internet in Bioinformatics [16]

2. Definer Multiple sequence alignment? Describe in detail about methods employed in Multiple sequence alignment? [16]

3. Describe any one progressive method of multiple sequence alignment? [16]

4. How can you classify sequence databases? Describe about Nucleotide sequence databases? [16]

5. Explain about Pairwise database searching. [16]

6. Discus about relationships of Phylogenetic analysis to sequence alignment? [16]

7. Write an essay on sequence assembly and Gene Identification? [16]

8. Write short notes on:
(a) GRAIL II
(b) ORF [16]

SET :4
1. What is Bioinformatics? Describe its scope in modern biology? [16]

2. What type of scoring matrix is used by BLAST? Explain about different substitution matrices? [16]

3. What is multiple sequence alignment? Describe the applications of multiple se-quence alignments? [16]

4. Write in detail about DNA versus Protein searches? [16]

5. Write short notes on:
(a) BLOSUM
(b) Differentiate between PAM & BLOSUM [16]

6. Discus about relationships of Phylogenetic analysis to sequence alignment? [16]

7. What is DNA mapping? Describe its applications? [16]

8. Explain in detail about Feature based approaches to Gene Prediction? [16]

Biotechnology Blog 4 JNTU B.Tech Students

Earn Money from Mobile

INTRODUCTION TO PRIMARY SEQUENCE DATABASES

Monday, November 24, 2008

SECONDARY DATABASES

Pfam DATABASE

Sunday, November 23, 2008

PROFILES DATABASE

IDENTIFY

DYNAMIC PROGRAMMING Vs HUERISTIC METHODS

PAM Vs BLOSUM

BLOSUM

PAM MATRICES

SUBSTITUTION MATRICES

DOT MATRIX ANALYSIS

BIO INFORMATICS Question Papers (Supple, 2007)

Wednesday, October 15, 2008

BIO INFORMATICS Question Papers (Regular, 2007)

BIO INFORMATICS Question Papers (Supple, 2006)

BIO INFORMATICS Question Papers (Regular, 2006)

BIO INFORMATICS Question Papers (Supple, 2005)

MY CHOICE OF TEXTBOOKS 4 BIOTECH

COLLEGES OFFERING BIOTECHNOLOGY

LINKS TO RESEARCH CENTRES

4 ENVIRONMENT

COLLEGES OFFERING BIOMEDICAL ENGINEERING

COLLEGES OFFERING B.Pharmacy

About Me

Followers