Data Retrieval and Storage
From 2008.igem.org
(→UniProt) |
|||
(3 intermediate revisions not shown) | |||
Line 8: | Line 8: | ||
!align="center"|[[Team:Calgary_Software/Team|The Team]] | !align="center"|[[Team:Calgary_Software/Team|The Team]] | ||
!align="center"|[[Team:Calgary_Software/Project|The Project]] | !align="center"|[[Team:Calgary_Software/Project|The Project]] | ||
- | |||
!align="center"|[[Team:Calgary_Software/Notebook|Notebook]] | !align="center"|[[Team:Calgary_Software/Notebook|Notebook]] | ||
|} | |} | ||
Line 37: | Line 36: | ||
<div align=justify> | <div align=justify> | ||
- | If a protein makes up one of the parts retrieved from the iGEM database, | + | If a protein makes up one of the parts retrieved from the iGEM database, the registry provides its amino acid sequence, which can be used to infer all other required information. Namely, the program sends this amino acid sequence to UniProt. UniProt is a large database of proteins and enzymes. This database can be queried by a Blast algorithm, which is a very powerful programming tool. When inputting the DNA or amino acid sequence, UniProt gives results that are closest to the initial search. Besides giving the name of the protein searched, UniProt will give the reagents from the reaction that the protein catalyzes. All this information is useful for EvoGEM and is stored in a local database. Visit [http://www.uniprot.com Uniprot] to see this database. </div> |
<br style="clear:both"/> | <br style="clear:both"/> | ||
Line 46: | Line 45: | ||
<div align=justify> | <div align=justify> | ||
- | The reagents from the reaction that the protein catalyzes are put through ChemSpider. This large database is much like UniProt except that it is for chemistry. Searching and querying in ChemSpider is | + | The reagents from the reaction that the protein catalyzes are put through ChemSpider. This large database is much like UniProt except that it is for chemistry. Searching and querying in ChemSpider is simple because molecules can be queried using synonyms. After a molecule is queried, ChemSpider produces information about the molecule such as its SMILES, which is a simplified molecular input line entry specification. As useful as this information can be, we needed something that is machine-readable and that could be used for comparisons of metabolic pathways. What is this machine readable format? It is known as the IUPAC International Chemical Identifier (InChI). This InChi is a unique "fingerprint" of the molecule that is not ambiguous like SMILES and is supplied only by IUPAC. An example of an InChi would look like this: |
'''1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-8,10-11H,1H2/t2-,5+/m0/s1''' | '''1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-8,10-11H,1H2/t2-,5+/m0/s1''' | ||
Line 57: | Line 56: | ||
<div align=justify> | <div align=justify> | ||
- | The Perl script’s algorithm works in the following manner | + | The Perl script’s algorithm works in the following manner: First, the program goes to the iGEM registry and retrieves one of the parts, recording its name, type, and sequence. Then, if the part is a protein, it sends the information to Uniprot, where it undergoes the Blast algorithm. From there, it extracts the names of reactants and products and stores them in a file for EvoGEM's local database. Afterwards, if the protein catalyzes a reaction, the program searches for the catalzed compounds in ChemSpider. There, it retrieves more information, such as the InChi, in a local database. Consequently, we now have a large database ready for use for EvoGEM. </div> |
Line 77: | Line 76: | ||
!align="center"|[[Team:Calgary_Software/Team|The Team]] | !align="center"|[[Team:Calgary_Software/Team|The Team]] | ||
!align="center"|[[Team:Calgary_Software/Project|The Project]] | !align="center"|[[Team:Calgary_Software/Project|The Project]] | ||
- | |||
!align="center"|[[Team:Calgary_Software/Notebook|Notebook]] | !align="center"|[[Team:Calgary_Software/Notebook|Notebook]] | ||
|} | |} |
Latest revision as of 02:22, 30 October 2008
Home | The Team | The Project | Notebook |
---|
Evolutionary Algorithm | Data Retrieval | Modeling | Graphical User Interface |
---|
Contents |
Perl
- If it is an enzyme, what reactions does it catalyze?
- If it is a molecule, what is its molecular structure, and what are the synonyms for the molecule name?
UniProt
ChemSpider
The reagents from the reaction that the protein catalyzes are put through ChemSpider. This large database is much like UniProt except that it is for chemistry. Searching and querying in ChemSpider is simple because molecules can be queried using synonyms. After a molecule is queried, ChemSpider produces information about the molecule such as its SMILES, which is a simplified molecular input line entry specification. As useful as this information can be, we needed something that is machine-readable and that could be used for comparisons of metabolic pathways. What is this machine readable format? It is known as the IUPAC International Chemical Identifier (InChI). This InChi is a unique "fingerprint" of the molecule that is not ambiguous like SMILES and is supplied only by IUPAC. An example of an InChi would look like this:
1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-8,10-11H,1H2/t2-,5+/m0/s1
To see this database, visit: [http://www.chemspider.com ChemSpider]
The Algorithm
Evolutionary Algorithm | Data Retrieval | Modeling | Graphical User Interface |
---|
Home | The Team | The Project | Notebook |
---|