Team:Calgary Software/Project
From 2008.igem.org
You can write a background of your team here. Give us a background of your team, the members, etc. Or tell us more about something of your choosing. | |
Tell us more about your project. Give us background. Use this is the abstract of your project. Be descriptive but concise (1-2 paragraphs) | |
Team Example 2 |
Contents |
The Project
EvoGEM is an agent based system which seeks to design iGEM "genetic circuits" by harnessing the power of evolutionary design. In this design methodology, the paradigm of evolution is harnessed in order to select for efficient designs. The survival criteria are determined by the purpose for which these circuits are built, and those that fulfill that purpose well survive and create even better offspring while those that fail to please the criteria, are thrown away from the gene pool. The powerful agent based logic which allows for minimal pre-assumptions about the overall behavior of the system, along with the empirically proven evolutionary design create superb system that is able to both emulate and develop iGEM circuits.
EvoGEM was briefly presented during the 2007 iGEM jamboree and has sparked quite a lot of interest amongst the different teams. This summer, our team plans to further develop the fitness function EvoGEM employs, introduce more complex pattern recognition, and test the system under a much larger search space than before. The final goal is to produce a system sophisticated enough to rebuild working designs from previous years' teams' projects, as well as intelligent enough to simulate successes and failures of working and non-working systems, respectively.
The main focus of this project is to build perl scripts that will support EvoGEM's requirements of a flat file registry, and develop the EvoGEM code to include the behaviors specified before.
Project Details
The focus of the first part of the project is to create a perl script to quarry the registry and other databases in order to retrieve critical information about different bio-bricks. After that, the goal will be to improve the EvoGEM code to introduce the changes discussed in the previous section
Step I
First off, we will characterize the parts that Kent has pulled from the registry as ideas to simulate.
The systems Kent suggested are linked here:
[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45120 BBa_J45120]
[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45200 BBa_J45200]
[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45250 BBa_J45250]
[http://partsregistry.org/wiki/index.php?title=Part:BBa_I0462 BBa_I0462]
[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45996 BBa_J45996]
Once these are simulated well we can begin inserting more systems to increase search space AND try and find more complex pathways
When characterizing proteins, include a field in the parts file titled "sequence = " and the protein sequence using ONE LETTER AMINO ACID notation
Translate DNA code into amino acids with [http://www.vivo.colostate.edu/molkit/translate/index.html this program]
e.g.
This DNA sequence:
atggaagttgttgaagttcttcacatgaatggaggaaatggagacagtagctatgcaaacaattctttggttcagcaaaaggtgattctcatgacaaagc caataactgagcaagccatgattgatctctacagcagcctctttccagaaaccttatgcattgcagatttgggttgttctttgggagctaacactttctt ggtggtctcacagcttgttaaaatagtagaaaaagaacgaaaaaagcatggttttaagtctccagagttttattttcacttcaatgatcttcctggcaat gattttaatacactttttcagtcactgggggcatttcaagaagatttgagaaagcatataggggaaagctttggtccatgttttttcagtggagtgcctg gttcattttatactagacttttcccttccaaaagtttacattttgtttactcctcctacagtctcatgtggctatctcaggtgcctaatgggattgaaaa taacaagggaaacatttacatggcaagaacaagccctctaagtgttattaaagcatactacaagcaatatgaaatagatttttcaaattttctcaagtac cgttcagaggaattgatgaaaggtggaaagatggtgttaacactcctaggtagagaaagtgaggatcctactagcaaagaatgctgttacatttgggagc ttctagccatggccctcaataagttggttgaagagggattgataaaagaagagaaagtagatgcattcaatattcctcaatacacaccatcaccagcaga agtaaagtacatagttgagaaggaaggatcattcaccattaatcgcttggaaacatcaagagttcattggaatgcttctaataatgagaagaatggtggt tacaatgtgtcaaggtgcatgagagctgtggctgagcctttgcttgtcagccactttgacaaggaattgatggatttagtgttccacaagtacgaagaga ttgtttctgattgcatgtccaaagagaatactgagtttataaatgtcatcatctccttgaccaaaataaattaa
Translates into this amino acid sequence:
MEVVEVLHMNGGNGDSSYANNSLVQQKVILMTKPITEQAMIDLYSSLFPETLCIADLGCS LGANTFLVVSQLVKIVEKERKKHGFKSPEFYFHFNDLPGNDFNTLFQSLGAFQEDLRKHI GESFGPCFFSGVPGSFYTRLFPSKSLHFVYSSYSLMWLSQVPNGIENNKGNIYMARTSPL SVIKAYYKQYEIDFSNFLKYRSEELMKGGKMVLTLLGRESEDPTSKECCYIWELLAMALN KLVEEGLIKEEKVDAFNIPQYTPSPAEVKYIVEKEGSFTINRLETSRVHWNASNNEKNGG YNVSRCMRAVAEPLLVSHFDKELMDLVFHKYEEIVSDCMSKENTEFINVIISLTKIN-
Step II
We will be developing a web browsing script that uses [http://www.comp.leeds.ac.uk/Perl/start.html PERL] to retrieve relevant information from [http://partsregistry.org/Main_Page the parts' registry] as well as from other databases such as [http://www.pir.uniprot.org/ Uniprot], [http://pubchem.ncbi.nlm.nih.gov/ PubChem] and [http://www.chemspider.com/ Chemspider]. Take the time to at least glimps at these databases and at PERL.
Here is a nice tutorial on [http://www.perl.com/pub/a/2002/08/20/perlandlwp.html web-browsing using PERL]
The final goal will be to use PERL to be able to build a large scale local registry of the same type we made in step I.
The workflow of the software is as follows:
- Retrieve information about every part in the registry
- Use that information to characterize promoters, RBS and terminators
- Retrieve information about produced proteins from different databases and use that to characterize reporters and protein coding regions
- Retrieve information about reactions from different databases and use that to characterize specific enzymes produced by protein coding regions
- Retrieve information about reactants and products from different databases to characterize specific compounds related to proteins (substrates, products, inducers, inhibitors etc')
- Use all that information to build a flat file registry for EvoGEM to use
The first thing to do tomorrow is to make sure everyone will have PERL and bioPERL on all machines.
The division of roles is as follows:
- Boris - Build a module that retrieves all the parts from the registry and characterizes promoters, RBS and terminators - This will be stored in 4 arrays: @final_parts_names, @final_parts_types, @final_parts_parts (if it is a composite parts, this is a breakdown of its subparts, if it is not then is contains the string "NOTHING"), @final_parts_sequences.
- Taras - Build a module that accepts a DNA sequence, converts it into an amino acid sequence (a hash table that matches codons to amino acids should be useful), searches using that sequence in a protein database and retrieves protein name, and if it is an enzyme, its reaction. You should produce 4 arrays: @final_part_products (stores what protein each part produces, if a part like an RBS, produces nothing the array element will be the string "NOTHING"), @final_protein_names (a list of all protein names), @final_protein_inputs (a list of inputs for all proteins that are enzymes, if they are not enzymes their input is "NOTHING") and @final_protein_outputs.
- Neven - Build a module that accepts a chemical substance's name and uses a chemical database to convert the chemical names into InChI strings, it then retrieves from the database the InChI, SMILES and mass (in kDa) for that molecule, any field that cannot be found should be "NOTHING".
- Josh - Build a module that accepts all that information and constructs 2 files ("parts", "proteins_and_molecules") with the following fields (just an example):
In parts
[part name]
type = protein coding region
sequence = atggtgcatctatagacgtacgtcatgcagtactactgattattttagctgcacgtcagtacgctaac...taa
output = Heme oxygenase 1
In proteins_and_molecules
[Heme oxygenase 1]
sequence = MVGTRDSSFFT...*
input = InChI=1/C34H34N4O4.Fe/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16- + 3AH(2) + 3O(2) (notice InChI)
output = biliverdin + Fe(2+) + CO + 3 A + 3 H(2)O (notice no InChI)
weight = ... (This should be only a number, and should be in kiloDaltons (kDa))
[Heme]
InChI = InChI=1/C34H34N4O4.Fe/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16- + 3AH(2) + 3O(2)
SMILES = [Fe+2].O=C(O)CCc1c(c3[n-]c1cc/5nc(cc2[n-]c(c(c2C)\C=C)cc/4nc(c3)Cā(\C=C)=C\4C)\C(=C\5CCC(=O)O)C)C
synonyms = ...
weight = ... (This should be only a number, and should be in kiloDaltons (kDa))
fields who don't have their value obtained should not appear
- Kent - Maintain the UofC iGEM software team page
Meanwhile, Terrance will be responsible for obtaining MathLab for our team for the purposes of designing a network system wrap-around for the simulation environment, Kent will be responsible for developing a bridge between MathLab and EvoGEM.
Since RE (Regular Expressions) will be quite a big portion of this kind of search, here is a [http://www.comp.leeds.ac.uk/Perl/matching.html link to RE in PERL] and here is [http://www.troubleshooters.com/codecorn/littperl/perlreg.htm another one].
Here is another useful link about [http://www.troubleshooters.com/codecorn/littperl/perlsub.htm subroutines in PERL]
In case the LWP is not present on your computer [http://search.cpan.org/~gaas/libwww-perl-5.800/lib/LWP.pm here is a link] where you can download it
Also, I emailed everyone the "Perl goodies" Vlad has sent out to the emails you posted on this wiki.
Step III
The Experiments
Part 3
Results
The Project |
---|
Home | The Team | Parts Submitted to the Registry | Modeling | Notebook |
---|