Team:Calgary Software
From 2008.igem.org
The University of Calgary Software Team |
Contents |
The Project
EvoGEM is an agent based evolutionary design software developed at the University of Calgary in the summer of 2007. Although it inspired some interest in the 2007 jamboree, the project presented in 2007 was merely the starting point for the University of Calgary. This summer, the goal of this team is to bring EvoGEM to the point where this software is able to develop sophisticated systems which present practical use in synthetic biology.
This goal is composed of improving both the simulation aspect of the software, strengthening its results' credibility, and integrating additional functionalities to the evolutionary algorithm, to allow for a more sophisticated search of the huge space the registry, and synthetic biology in general, pose. We will be integrating chemical structures into the model and interpret them in terms of the systems and synthetic biology of the system.
The Plan
Currently, EvoGEM does not understand or distinguish between the chemical or physical properties of various compounds that roam throughout the system. Thus, we have introduced chemistry into the model. In order to accomplish this, a relevant basis must be explored. Hence we search the registry for an arbitrary part and analyze its sequence to determine which protein it codes for. Once the protein is found, we analyze its enzymatic function (if any) and what reaction is involved. Any chemical compounds associated with the protein are examined and subsequently stored in the InChi and SMILES format. The information is gathered and stored for the system to use as a foundation of understanding what chemicals exist within its environment.
Step I
First off, we will characterize the following sample parts from the registry.
[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45120 BBa_J45120]
[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45200 BBa_J45200]
[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45250 BBa_J45250]
[http://partsregistry.org/wiki/index.php?title=Part:BBa_I0462 BBa_I0462]
[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45996 BBa_J45996]
We translate DNA code into amino acids with [http://www.vivo.colostate.edu/molkit/translate/index.html this program]
e.g.
This DNA sequence:
atggaagttgttgaagttcttcacatgaatggaggaaatggagacagtagctatgcaaacaattctttggttcagcaaaaggtgattctcatgacaaagc caataactgagcaagccatgattgatctctacagcagcctctttccagaaaccttatgcattgcagatttgggttgttctttgggagctaacactttctt ggtggtctcacagcttgttaaaatagtagaaaaagaacgaaaaaagcatggttttaagtctccagagttttattttcacttcaatgatcttcctggcaat gattttaatacactttttcagtcactgggggcatttcaagaagatttgagaaagcatataggggaaagctttggtccatgttttttcagtggagtgcctg gttcattttatactagacttttcccttccaaaagtttacattttgtttactcctcctacagtctcatgtggctatctcaggtgcctaatgggattgaaaa taacaagggaaacatttacatggcaagaacaagccctctaagtgttattaaagcatactacaagcaatatgaaatagatttttcaaattttctcaagtac cgttcagaggaattgatgaaaggtggaaagatggtgttaacactcctaggtagagaaagtgaggatcctactagcaaagaatgctgttacatttgggagc ttctagccatggccctcaataagttggttgaagagggattgataaaagaagagaaagtagatgcattcaatattcctcaatacacaccatcaccagcaga agtaaagtacatagttgagaaggaaggatcattcaccattaatcgcttggaaacatcaagagttcattggaatgcttctaataatgagaagaatggtggt tacaatgtgtcaaggtgcatgagagctgtggctgagcctttgcttgtcagccactttgacaaggaattgatggatttagtgttccacaagtacgaagaga ttgtttctgattgcatgtccaaagagaatactgagtttataaatgtcatcatctccttgaccaaaataaattaa
Translates into this amino acid sequence:
MEVVEVLHMNGGNGDSSYANNSLVQQKVILMTKPITEQAMIDLYSSLFPETLCIADLGCS LGANTFLVVSQLVKIVEKERKKHGFKSPEFYFHFNDLPGNDFNTLFQSLGAFQEDLRKHI GESFGPCFFSGVPGSFYTRLFPSKSLHFVYSSYSLMWLSQVPNGIENNKGNIYMARTSPL SVIKAYYKQYEIDFSNFLKYRSEELMKGGKMVLTLLGRESEDPTSKECCYIWELLAMALN KLVEEGLIKEEKVDAFNIPQYTPSPAEVKYIVEKEGSFTINRLETSRVHWNASNNEKNGG YNVSRCMRAVAEPLLVSHFDKELMDLVFHKYEEIVSDCMSKENTEFINVIISLTKIN-
Step II
We have developed a web browsing script that uses [http://www.comp.leeds.ac.uk/Perl/start.html PERL] to retrieve relevant information from [http://partsregistry.org/Main_Page the parts' registry] as well as from other databases such as [http://www.pir.uniprot.org/ Uniprot], [http://pubchem.ncbi.nlm.nih.gov/ PubChem] and [http://www.chemspider.com/ Chemspider].
The final goal is to build a large scale local registry of the same type we made in step I.
The workflow of the software is as follows:
- Retrieve information about every part in the registry
- Use that information to characterize promoters, RBS and terminators
- Retrieve information about produced proteins from different databases and use that to characterize reporters and protein coding regions
- Retrieve information about reactions from different databases and use that to characterize specific enzymes produced by protein coding regions
- Retrieve information about reactants and products from different databases to characterize specific compounds related to proteins (substrates, products, inducers, inhibitors etc')
- Use all that information to build a flat file registry for EvoGEM to use
The following is the format of the file that stores our information.
In parts
[part name]
type = protein coding region
sequence = atggtgcatctatagacgtacgtcatgcagtactactgattattttagctgcacgtcagtacgctaac...taa
output = Heme oxygenase 1
In proteins_and_molecules
[Heme oxygenase 1]
sequence = MVGTRDSSFFT...*
input = InChI=1/C34H34N4O4.Fe/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16- + 3AH(2) + 3O(2) (notice InChI)
output = biliverdin + Fe(2+) + CO + 3 A + 3 H(2)O (notice no InChI)
weight = ... (This should be only a number, and should be in kiloDaltons (kDa))
[Heme]
InChI = InChI=1/C34H34N4O4.Fe/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16- + 3AH(2) + 3O(2)
SMILES = [Fe+2].O=C(O)CCc1c(c3[n-]c1cc/5nc(cc2[n-]c(c(c2C)\C=C)cc/4nc(c3)Cā(\C=C)=C\4C)\C(=C\5CCC(=O)O)C)C
synonyms = ...
weight = ... (This should be only a number, and should be in kiloDaltons (kDa))
fields who don't have their value obtained should not appear
Step III
With all of the data gathered, we run several experiments with EvoGEM and the produced files. Also, further optimizing the parameters in the simulation for the evolution of the required circuits will be taken. We will be implementing an AVL-Tree data structure for storage of the flat file information at run-time for fast, optimized access. While these experiments are running, we will have several properties added to EvoGEM, namely, the inclusion of mRNA and ribosomes objects to the simulation space.