Data Retrieval and Storage
From 2008.igem.org
(→ChemSpider) |
(→The Algorithm) |
||
Line 50: | Line 50: | ||
<br style="clear:both"/> | <br style="clear:both"/> | ||
- | == The Algorithm == | + | == The Algorithm == |
+ | |||
+ | The Perl script’s algorithm works in the following manner. First, it goes to the iGEM registry and takes one of the parts, where it records its name, type, and sequence. Then, if the part happens to be a protein, the information would be sent to Uniprot, where it will go through the Blast algorithm. From there, the names of reactants and products are extracted and stored into a file for the local database of EvoGEM. Afterwards, if there are molecules involved in the reaction of proteins, these compounds are searched in ChemSpider. There, more information such as the InChi is also stored into the local database for further use. (See Figure 2.0) Consequently, we now have a large database ready for use for EvoGEM. | ||
== Navigation == | == Navigation == |
Revision as of 05:42, 28 October 2008
Home | The Team | The Project | Modeling | Notebook |
---|
Data Retrieval | Modeling | Evolutionary Algorithm | Graphical User Interface |
---|
Contents |
Perl
UniProt
ChemSpider
After results are gone through UniProt, if there are further molecules that are involved in the reaction that are not proteins, the search goes to ChemSpider. This large database is much like UniProt except that it is for chemistry. Searching and querying in ChemSpider is quite simple as things can be queried using synonyms of molecules. This makes it a very useful tool. After a molecule is queried, ChemSpider will produce information about the molecule such as synonyms and SMILES, which is a simplified molecular input line entry specification. As useful as this information can be, the reason for coming for this database is to get something that is machine readable and can be used for comparisons of metabolic pathways. What is this machine readable format? This machine readable format is known as the IUPAC International Chemical Identifier (InChI). This InChi is a unique "fingerprint" of the molecule that is not ambiguous like SMILES and is supplied only by IUPAC. An example of an InChi would look like this:
1/C6H8O6/c7-1-2(8)5-3(9)4(10)6(11)12-5/h2,5,7-8,10-11H,1H2/t2-,5+/m0/s1
To see this database, go here: [http://www.chemspider.com ChemSpider]
The Algorithm
The Perl script’s algorithm works in the following manner. First, it goes to the iGEM registry and takes one of the parts, where it records its name, type, and sequence. Then, if the part happens to be a protein, the information would be sent to Uniprot, where it will go through the Blast algorithm. From there, the names of reactants and products are extracted and stored into a file for the local database of EvoGEM. Afterwards, if there are molecules involved in the reaction of proteins, these compounds are searched in ChemSpider. There, more information such as the InChi is also stored into the local database for further use. (See Figure 2.0) Consequently, we now have a large database ready for use for EvoGEM.
Data Retrieval | Modeling | Evolutionary Algorithm | Graphical User Interface |
---|
Home | The Team | The Project | Modeling | Notebook |
---|