Data Retrieval and Storage
From 2008.igem.org
(→Perl) |
(→The Algorithm) |
||
Line 57: | Line 57: | ||
<div align=justify> | <div align=justify> | ||
- | The Perl script’s algorithm works in the following manner. First, the program goes to the iGEM registry and takes one of the parts, where it records its name, type, and sequence. Then, if the part is a protein, it sends the information to Uniprot, where it undergoes the Blast algorithm. From there, it extracts the names of reactants and products and stores them in a file for EvoGEM's local database. Afterwards, if the protein catalyzes | + | The Perl script’s algorithm works in the following manner. First, the program goes to the iGEM registry and takes one of the parts, where it records its name, type, and sequence. Then, if the part is a protein, it sends the information to Uniprot, where it undergoes the Blast algorithm. From there, it extracts the names of reactants and products and stores them in a file for EvoGEM's local database. Afterwards, if the protein catalyzes a reaction, the program searches for the catalzed compounds in ChemSpider. There, it stores more information, such as the InChi, in the local database for further use. Consequently, we now have a large database ready for use for EvoGEM. </div> |
Revision as of 00:07, 29 October 2008
Home | The Team | The Project | Modeling | Notebook |
---|
Evolutionary Algorithm | Data Retrieval | Modeling | Graphical User Interface |
---|
Contents |
Perl
- If they were enzymes, what reactions were they catalyzing?
- If they were molecules, what were the molecular structures or other synonyms for these compounds?
UniProt
ChemSpider
The reagents from the reaction that the protein catalyzes are put through ChemSpider. This large database is much like UniProt except that it is for chemistry. Searching and querying in ChemSpider is quite simple because molecules can be queried using synonyms. After a molecule is queried, ChemSpider produces information about the molecule such as synonyms and SMILES, which is a simplified molecular input line entry specification. As useful as this information can be, the reason for coming for this database is to get something that is machine readable and can be used for comparisons of metabolic pathways. What is this machine readable format? This machine readable format is known as the IUPAC International Chemical Identifier (InChI). This InChi is a unique "fingerprint" of the molecule that is not ambiguous like SMILES and is supplied only by IUPAC. An example of an InChi would look like this:
1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-8,10-11H,1H2/t2-,5+/m0/s1
To see this database, visit: [http://www.chemspider.com ChemSpider]
The Algorithm
Evolutionary Algorithm | Data Retrieval | Modeling | Graphical User Interface |
---|
Home | The Team | The Project | Modeling | Notebook |
---|