Team:Calgary Software/Project

From 2008.igem.org

Revision as of 03:24, 24 October 2008 by Jleung (Talk | contribs)

Contents

The Project - EvoGEM

Introduction

Evolutionary and genetic strategies have been a very useful method in modeling and have been coupled with agent-based designs. In this stream of modern heuristics, EvoGEM has been created, based on the registry of genetic parts provided by the iGEM competition. In this design methodology, the paradigm of evolution is harnessed in order to select for efficient designs and produce a product or output that is generated autonomously by the system. EvoGEM utilizes the strategies of genetics and evolution to simulate an environment inside of a prokaryotic cell. This entails various events and structures that are present inside of the organism such as RNA polymerase, messenger-RNA, ribosomes, as well as transcription and translation. By taking into account the essential processes necessary for the cell to function, generations are assembled from a selection of parts - retrieved from the iGEM registry - and genetic circuits are created. The circuits proliferate in each generation and the best possible combination of parts is selected. Eventually, the best circuit is obtained from the system and a desired functionality is returned. The powerful agent based logic which allows for minimal pre-assumptions about the overall behavior of the system, along with the empirically proven evolutionary design create superb system that is able to both emulate and develop iGEM circuits.


EvoGEM was briefly presented during the 2007 iGEM jamboree and has sparked quite a lot of interest amongst the different teams. This summer, our team plans to further develop the fitness function EvoGEM employs, introduce more complex pattern recognition, and test the system under a much larger search space than before. The final goal is to produce a system sophisticated enough to rebuild working designs from previous years' teams' projects, as well as intelligent enough to simulate successes and failures of working and non-working systems, respectively.


The main focus of this project is to build perl scripts that will support EvoGEM's requirements of a flat file registry, create an Objective-C based graphical user interface (GUI) in order to make the software-user interaction easy for any potential users, develop the EvoGEM code to include the behaviors specified before, and create a simulation of the processes in the cell such as transcription and translation.

Project Details

The focus of the first part of the project is to create a perl script to quarry the registry and other databases in order to retrieve critical information about different bio-bricks. After that, the goal will be to improve the EvoGEM code to introduce the changes discussed in the previous section

Step I

First off, we will characterize the parts that Kent has pulled from the registry as ideas to simulate.

The systems Kent suggested are linked here:

[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45120 BBa_J45120]

[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45200 BBa_J45200]

[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45250 BBa_J45250]

[http://partsregistry.org/wiki/index.php?title=Part:BBa_I0462 BBa_I0462]

[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45996 BBa_J45996]

Once these are simulated well we can begin inserting more systems to increase search space AND try and find more complex pathways


When characterizing proteins, include a field in the parts file titled "sequence = " and the protein sequence using ONE LETTER AMINO ACID notation

Translate DNA code into amino acids with [http://www.vivo.colostate.edu/molkit/translate/index.html this program]

e.g.

This DNA sequence:

atggaagttgttgaagttcttcacatgaatggaggaaatggagacagtagctatgcaaacaattctttggttcagcaaaaggtgattctcatgacaaagc caataactgagcaagccatgattgatctctacagcagcctctttccagaaaccttatgcattgcagatttgggttgttctttgggagctaacactttctt ggtggtctcacagcttgttaaaatagtagaaaaagaacgaaaaaagcatggttttaagtctccagagttttattttcacttcaatgatcttcctggcaat gattttaatacactttttcagtcactgggggcatttcaagaagatttgagaaagcatataggggaaagctttggtccatgttttttcagtggagtgcctg gttcattttatactagacttttcccttccaaaagtttacattttgtttactcctcctacagtctcatgtggctatctcaggtgcctaatgggattgaaaa taacaagggaaacatttacatggcaagaacaagccctctaagtgttattaaagcatactacaagcaatatgaaatagatttttcaaattttctcaagtac cgttcagaggaattgatgaaaggtggaaagatggtgttaacactcctaggtagagaaagtgaggatcctactagcaaagaatgctgttacatttgggagc ttctagccatggccctcaataagttggttgaagagggattgataaaagaagagaaagtagatgcattcaatattcctcaatacacaccatcaccagcaga agtaaagtacatagttgagaaggaaggatcattcaccattaatcgcttggaaacatcaagagttcattggaatgcttctaataatgagaagaatggtggt tacaatgtgtcaaggtgcatgagagctgtggctgagcctttgcttgtcagccactttgacaaggaattgatggatttagtgttccacaagtacgaagaga ttgtttctgattgcatgtccaaagagaatactgagtttataaatgtcatcatctccttgaccaaaataaattaa


Translates into this amino acid sequence:

MEVVEVLHMNGGNGDSSYANNSLVQQKVILMTKPITEQAMIDLYSSLFPETLCIADLGCS LGANTFLVVSQLVKIVEKERKKHGFKSPEFYFHFNDLPGNDFNTLFQSLGAFQEDLRKHI GESFGPCFFSGVPGSFYTRLFPSKSLHFVYSSYSLMWLSQVPNGIENNKGNIYMARTSPL SVIKAYYKQYEIDFSNFLKYRSEELMKGGKMVLTLLGRESEDPTSKECCYIWELLAMALN KLVEEGLIKEEKVDAFNIPQYTPSPAEVKYIVEKEGSFTINRLETSRVHWNASNNEKNGG YNVSRCMRAVAEPLLVSHFDKELMDLVFHKYEEIVSDCMSKENTEFINVIISLTKIN-

Step II

We will be developing a web browsing script that uses [http://www.comp.leeds.ac.uk/Perl/start.html PERL] to retrieve relevant information from [http://partsregistry.org/Main_Page the parts' registry] as well as from other databases such as [http://www.pir.uniprot.org/ Uniprot], [http://pubchem.ncbi.nlm.nih.gov/ PubChem] and [http://www.chemspider.com/ Chemspider]. Take the time to at least glimps at these databases and at PERL.

Here is a nice tutorial on [http://www.perl.com/pub/a/2002/08/20/perlandlwp.html web-browsing using PERL]

The final goal will be to use PERL to be able to build a large scale local registry of the same type we made in step I.

The workflow of the software is as follows:

  • Retrieve information about every part in the registry
  • Use that information to characterize promoters, RBS and terminators
  • Retrieve information about produced proteins from different databases and use that to characterize reporters and protein coding regions
  • Retrieve information about reactions from different databases and use that to characterize specific enzymes produced by protein coding regions
  • Retrieve information about reactants and products from different databases to characterize specific compounds related to proteins (substrates, products, inducers, inhibitors etc')
  • Use all that information to build a flat file registry for EvoGEM to use


The first thing to do tomorrow is to make sure everyone will have PERL and bioPERL on all machines.

The division of roles is as follows:


  • Boris - Build a module that retrieves all the parts from the registry and characterizes promoters, RBS and terminators - This will be stored in 4 arrays: @final_parts_names, @final_parts_types, @final_parts_parts (if it is a composite parts, this is a breakdown of its subparts, if it is not then is contains the string "NOTHING"), @final_parts_sequences.


  • Taras - Build a module that accepts a DNA sequence, converts it into an amino acid sequence (a hash table that matches codons to amino acids should be useful), searches using that sequence in a protein database and retrieves protein name, and if it is an enzyme, its reaction. You should produce 4 arrays: @final_part_products (stores what protein each part produces, if a part like an RBS, produces nothing the array element will be the string "NOTHING"), @final_protein_names (a list of all protein names), @final_protein_inputs (a list of inputs for all proteins that are enzymes, if they are not enzymes their input is "NOTHING") and @final_protein_outputs.


  • Neven - Build a module that accepts a chemical substance's name and uses a chemical database to convert the chemical names into InChI strings, it then retrieves from the database the InChI, SMILES and mass (in kDa) for that molecule, any field that cannot be found should be "NOTHING".


  • Josh - Build a module that accepts all that information and constructs 2 files ("parts", "proteins_and_molecules") with the following fields (just an example):


In parts

[part name]

type = protein coding region

sequence = atggtgcatctatagacgtacgtcatgcagtactactgattattttagctgcacgtcagtacgctaac...taa

output = Heme oxygenase 1


In proteins_and_molecules

[Heme oxygenase 1]

sequence = MVGTRDSSFFT...*

input = InChI=1/C34H34N4O4.Fe/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16- + 3AH(2) + 3O(2) (notice InChI)

output = biliverdin + Fe(2+) + CO + 3 A + 3 H(2)O (notice no InChI)

weight = ... (This should be only a number, and should be in kiloDaltons (kDa))



[Heme]

InChI = InChI=1/C34H34N4O4.Fe/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16- + 3AH(2) + 3O(2)

SMILES = [Fe+2].O=C(O)CCc1c(c3[n-]c1cc/5nc(cc2[n-]c(c(c2C)\C=C)cc/4nc(c3)Cā€‹(\C=C)=C\4C)\C(=C\5CCC(=O)O)C)C

synonyms = ...

weight = ... (This should be only a number, and should be in kiloDaltons (kDa))


fields who don't have their value obtained should not appear

Meanwhile, Terrance will be responsible for obtaining MathLab for our team for the purposes of designing a network system wrap-around for the simulation environment, Kent will be responsible for developing a bridge between MathLab and EvoGEM.


Since RE (Regular Expressions) will be quite a big portion of this kind of search, here is a [http://www.comp.leeds.ac.uk/Perl/matching.html link to RE in PERL] and here is [http://www.troubleshooters.com/codecorn/littperl/perlreg.htm another one].

Here is another useful link about [http://www.troubleshooters.com/codecorn/littperl/perlsub.htm subroutines in PERL]

In case the LWP is not present on your computer [http://search.cpan.org/~gaas/libwww-perl-5.800/lib/LWP.pm here is a link] where you can download it


Also, I emailed everyone the "Perl goodies" Vlad has sent out to the emails you posted on this wiki.

Step III

The next step once we have a script that performs all the required functions will be to run several experiments with EvoGEM and the produced files. Also, further optimizing the parameters in the simulation for the evolution of the required circuits will be taken.

while these experiments are running, we will have several properties added to EvoGEM, namely, the inclusion of mRNA and ribosomes objects to the simulation space.

Neven, Josh and Taras - you're in charge of adding the ribosome class to EvoGEM. The first step will probably be just a simple outline of the class and some of its properties and methods. By Wednesday you should have a basic understanding of ribosomes, ribosome binding sites, mRNA and tRNA. You don't need to know every little detail but the main idea of translation should be familiar. Wikipedia should be useful.

As you may have guessed, I ask that you know this by Wednesday because you will be giving a presentation on this topic as well as presenting your basic class outline (just the idea of what the ribosome class can and can't do, you don't need to have a bullet proof implementation by then)

You can find a copy of EvoGEM on chuck in the drop boxes

Kent - I'll need you to get off the AVL tree concept for a while and add an mRNA class to EvoGEM. Again, by Wednesday you should have a basic concept of what mRNA is, how it works, and how your class will work in relation to that. Be ready to give a 5-7 minute talk about mRNA and the mRNA class.

You can find a copy of EvoGEM on beagle in your drop box


While all of this is going on, feel free to ask me any questions about EvoGEM and the VIGO::3D environment, since you don't have much experience with it (looking at the code for some of the examples is very useful though since Ian documented those fairly well for those purposes exactly).

*If anyone wants to run through a practice presentation, let me know and I'll be more than happy to listen

*If you wish to use keynote but it is not installed on your computer, for now download a free iWork trial version, once Christian comes back we can make sure all the machines have that

The Experiments

Part 3

Results

Navigation

The Project
Home The Team Parts Submitted to the Registry Modeling Notebook