By Ms Lim Yun Ping, Bioinformatics Institute, A*STAR Singapore
A. Database searching using SRS and Entrez against Pubmed, Genbank and SwissProt
1. Search NCBI database using Entrez for Panagrellus. How many entries can you find ?
| Database |
Number of entries |
| Pubmed |
|
| Nucleotide |
|
| Protein |
|
| Taxonomy |
|
2. Click on Taxonomy hyperlink.
How many Panagrellus entries are listed in the Taxonomy database ?
Which of them has more sequence information in the database ?
3. Click on the nucleotide hyperlink in the Entrez's results page and select the following 2 entries :
-
AY112716
(actin)
-
X82199
(
ornithine decarboxylase
)
| Fields |
AY112716 (actin) |
X82199 ( ornithine decarboxylase ) |
| Length (bp) |
|
|
| CDS position |
|
|
| Molecule Type |
|
|
| Pubmed ID |
|
|
4. Search EMBL database using SRS for Panagrellus.
How many entries can you find ?
Is the search using SRS more specific than Entrez ? Why is it so ?
Hint : Compare the search results you have obtained using SRS and Entrez.
5. Extract the
Panagrellus redivivus actin DNA sequence in FASTA format.
6. Use the limits option to remove entries which are not Panagrellus and only select mRNA sequences.
Retrieve both the actin and ornithine decarboxylase mRNA and protein sequences in FASTA format.
Select entries --> display FASTA --> send to text
7. Locate the pathway ornithine decarboxylase is involved in the KEGG database.
Search from this page : http://www.genome.ad.jp/kegg-bin/mk_point_html
Select organism : C elegans, EC number : 4.1.1.17 (E. C. for ornithine decarboxylase)
B. Sequence similarity search using BLAST
1. Use the BLASTP tool to find sequences from the NR database which are similar to the actin and ornithine decarboxylase translated CDS sequences you retrieved.
Change the BLAST's default parameters to search SwissProt and uncheck the low complexity filter. Click on the taxonomy report to view the other organisms which have similar sequences to the entry.
2. Next, change the database parameter to NR. Is there a significant slow down to the search process ?
Are more entries retrieved ? Why is this so ?
3. Go through your entries carefully, look at the E Value, score and the alignment to determine the entries you wish to select.
Select the
entries and save them as FASTA format.
4. Repeat the same procedure with ornithine decarboxylase. Search the SwissProt database using BLAST from the EXPASY site.
5. From the EXPASY BLAST results obtained from SWISSPROT, click on the entry with accession number P49725.
Look at the annotations.
| Question |
Answer |
| Which pathway is this enzyme involved in ? |
|
| Which co factor plays a role in the enzyme reaction ? |
|
| Which protein family does it belong to ? |
|
| How many amino acids are there ? |
|
| What is the molecular weight of the protein ? |
|
| What is the EC number for this enzyme ? |
|
C. Sequence Comparison using CLUSTALW
1.Compare the FASTA sequences you have extracted for both actin and ornithine decarboxylase using a multiple sequence alignment program, CLUSTALW.
2. Draw the phylogram generated from the
multiple sequence alignment for the ornithine decarboxylase proteins from the various organisms.
3. Where are the conserved regions in the multiple sequence alignment of the actin proteins from the nematodes ?
D. Sequence Analysis using EMBOSS
1. Design primer for the
Panagrellus redivivus actin gene
using the EMBOSS's E primer 3 program.
Primer Design Considerations :
- 18 to 30 bases in length
- %GC of 40 to 60
- A melting temperature (Tm) in the range of 52 C to 65 C
2. Use EMBOSS's Garnier to predict the secondary structure for both the actin and ornithine decarbxylase.
E. 3D structure visualization using Cn3D
From the BLAST search, we have picked out the 2 PDB entries :
actin - 1D4X (
Chain A, Crystal Structure Of Caenorhabditis Elegans Mg-Atp Actin
)
ornithine decarboxylase :
7ODC_A (
Chain A, Crystal Structure Ornithine Decarboxylase From Mouse
)
1. Search the NCBI's Molecular Modeling Database (MMDB) for the accession numbers 1D4X and 7ODC
2. Download the files and view them using Cn3D viewer.
Cn3D viewer can be downloaded from NCBI.
Cn3D is a an application that allows you to view 3-dimensional structures from NCBI's MMDB (Molecular Modeling Database).
It can simultaneously displays structure, sequence together and correlate structure and sequence information.
User can quickly locate and highlight interesting parts of the protein from the sequence window, like a cluster of active site residues.
Default : For single structures, the color for secondary structure is
helices - green
strands - orange
coils - blue
Rendering allows you to select one of the following drawing styles: worms, tubes, wire, ball and stick, space fill.
Style --> Rendering shortcuts --> Worms / Wire frame
Coloring styles alow you to view the following:
- by molecule (each chain is assigned a different color)
- by charge (blue for positively charged, red for negatively charged, grey for neutral)
- by hydrophobicity (red for hydrophobic and blue for hydrophilic residues)
Style --> Coloring shortcuts --> Molecule / Hydrophobicity
| Description |
1D4X |
7ODC |
| Number of molecules |
|
|
Test Your Understanding
You have just been given unknown sequence from the lab.
>unknown seq
GGATTCTGTTCATTTGAAGGAAGGCCCTTTCCTGGGGTACGCGTCGATTCGCATTTTGAGTCTCTCTTCC ACGGTACCAAACCTCACACATCCACGATGAGGAAACTCTTTGCGTTGGGCCTTTTGGCTCTCTTCGCATT CTCACACATCGTCGCCGACGAAGATGCCAAGGAGAAGGACAAGAAGTACGGCACCATCATCGGTATCGAT CTCGGAACCACTTACTCGTGTGTCGGTGTCTACAAGAACGGTCGTGTTGAAATCATTGCCAATGACCAGG GTAACCGTATCACTCCCTCGTACGTCGGATTCACAGCTGAAACTGGAGAGCGTCTCATCGGTGATGCCGC CAAGAACCAGCTTACCACCAACCCCGAGAACACCATCTTCGACGCCAAGCGTTTGATCGGTCGCGAGTTC AATGACAAGACGGTCCAGGCTGATATGAAGTTGTGGCCTTTCAAGATCACCAACAAGAACTCCAAGCCCC ATGTCAACGTCGCCGTCGGCAACGACCGCAAGGAATTCACGCCTGAAGAGGTTTCCGCCATGGTCCTCGG CAAGATGAAGGAAATCGCTGAATCGTACCTCGGTTACGAGGTCAAGCACGCCGTTGTCACCGTCCCGGCC TACTTCAACGACGCCCAGCGTCAGGCTACCAAGGACGCCGGTACCATCGCCGGTTTGAACGTTGTCCGTA TCATCAACGAGCCCACCGCGGCCGCCATCGCCTACGGTCTTGACAAGAAGGACGGCGAACGCAACATCCT CGTTTTCGATCTTGGTGGCGGTACCTTCGATGTCTCCCTCCTGACCATCGACAATGGCGTCTTTGAGGTG TTGGCCACCAACGGTGATACCCATTTGGGTGGTGAAGATTTCGATCAGCGCGTCATGGAGTACTTCATCA AGTTGTACAAGAAGAAGACCGGCAAGGATCTCCGCAAGGACCACCGCGCCACCCAGAAGCTCCGTCGTGA AGTTGAAAAGGCTAAGCGCGCTCTGTCCACCCAGCATCAAGTCAAGGTCGAAGTCGAGTCCATCATCGAC GGCGAAGACTTCTCCGAAACCCTGACCCGTGCCAAGTTCGAGGAGCTCAACATGGACCTCTTCCGCAACA CCATCAAGCCCGTCCAGAAGGTTCTCGATGACGCTGACCTCAAGAAGGATGACGTCCACGAGATCGTCCT CGTCGGTGGCTCCACCCGTATCCCCAAGATCCAGCAACTCATCAAGGAATTCTTCAATGGCAAGGAGCCC TCTCGCGGCATTAACCCCGATGAGGCTGTCGCTTATGGTGCTGCCGTCCAGGGTGGTGTCATCTCCGGCG AAGAGGACACCGAAATCGTCTTGCTCGATGTCAACCCGTTGACCATGGGTATCGAGACCGTCGGTGGCGT CATGACCAAGCTCATCACCCGCAACACCGTCATCCCGACCAAGAAGTCGCAGGTCTTCTCGACTGCTGCC GATAACCAGCCCACCGTCACCATCCAGGTCTTTGAGGGTGAACGCCCCATGACCAAGGACAACCATCAGC
1. Find out the potential function of this sequence.
2. Pick out potential open reading frames (ORF) using the NCBI's ORF Finder.
Translate the longest frame into protein and use BLAST to find other similar proteins in the database.
3. What is the role of this protein ? (hint : look at SwissProt and InterPro)
4.
Where can you find this protein ? (hint : subcellular location)
|