Team:Calgary Software/Project

From 2008.igem.org

(Difference between revisions)
(Step I)
(Step II)
Line 102: Line 102:
=== Step II ===
=== Step II ===
 +
 +
We will be developing a web browsing script that uses [http://www.comp.leeds.ac.uk/Perl/start.html PERL] to retrieve relevant information from [http://partsregistry.org/Main_Page the parts' registry] as well as from other databases such as [http://www.pir.uniprot.org/ Uniprot], [http://pubchem.ncbi.nlm.nih.gov/ PubChem] and [http://www.chemspider.com/ Chemspider]. Take the time to at least glimps at these databases and at PERL.
 +
 +
Here is a nice tutorial on [http://www.perl.com/pub/a/2002/08/20/perlandlwp.html web-browsing using PERL]
 +
 +
The final goal will be to use PERL to be able to build a large scale local registry of the same type we made in step I.
 +
 +
The workflow of the software is as follows:
 +
 +
* Retrieve information about every part in the registry
 +
* Use that information to characterize promoters, RBS and terminators
 +
* Retrieve information about produced proteins from different databases and use that to characterize reporters and protein coding regions
 +
* Retrieve information about reactions from different databases and use that to characterize specific enzymes produced by protein coding regions
 +
* Retrieve information about reactants and products from different databases to characterize specific compounds related to proteins (substrates, products, inducers, inhibitors etc')
 +
* Use all that information to build a flat file registry for EvoGEM to use
 +
 +
 +
The first thing to do tomorrow is to make sure everyone will have PERL and bioPERL on all machines.
 +
 +
The division of roles is as follows:
 +
 +
 +
 +
* Boris - Build a module that retrieves all the parts from the registry and characterizes promoters, RBS and terminators - This will be stored in 4 arrays: @final_parts_names, @final_parts_types, @final_parts_parts (if it is a composite parts, this is a breakdown of its subparts, if it is not then is contains the string "NOTHING"), @final_parts_sequences.
 +
 +
 +
* Taras - Build a module that accepts a '''DNA sequence''', converts it into an amino acid sequence (a hash table that matches codons to amino acids should be useful), searches using that sequence in a protein database and retrieves protein name, and if it is an enzyme, its reaction. You should produce  4 arrays: @final_part_products (stores what protein each part produces, if a part like an RBS, produces nothing the array element will be the string "NOTHING"), @final_protein_names (a list of all protein names), @final_protein_inputs (a list of inputs for all proteins that are enzymes, if they are not enzymes their input is "NOTHING") and  @final_protein_outputs.
 +
 +
 +
* Neven - Build a module that accepts a chemical substance's name and uses a chemical database to convert the chemical names into InChI strings, it then retrieves from the database the InChI, SMILES and mass (in kDa) for that molecule, any field that cannot be found should be "NOTHING".
 +
 +
 +
* Josh - Build a module that accepts all that information and constructs 2 files ("parts", "proteins_and_molecules") with the following fields (just an example):
 +
 +
 +
'''In parts'''
 +
 +
[part name]
 +
 +
type = protein coding region
 +
 +
sequence = atggtgcatctatagacgtacgtcatgcagtactactgattattttagctgcacgtcagtacgctaac...taa
 +
 +
output = Heme oxygenase 1
 +
 +
 +
'''In proteins_and_molecules'''
 +
 +
[Heme oxygenase 1]
 +
 +
sequence = MVGTRDSSFFT...*
 +
 +
input = InChI=1/C34H34N4O4.Fe/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16- + 3AH(2) + 3O(2) ('''notice InChI''')
 +
 +
output = biliverdin + Fe(2+) + CO + 3 A + 3 H(2)O ('''notice no InChI''')
 +
 +
weight = ...  (This should be only a number, and should be in kiloDaltons (kDa))
 +
 +
 +
 +
 +
[Heme]
 +
 +
InChI = InChI=1/C34H34N4O4.Fe/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16- + 3AH(2) + 3O(2)
 +
 +
SMILES = [Fe+2].O=C(O)CCc1c(c3[n-]c1cc/5nc(cc2[n-]c(c(c2C)\C=C)cc/4nc(c3)C​(\C=C)=C\4C)\C(=C\5CCC(=O)O)C)C
 +
 +
synonyms = ...
 +
 +
weight = ...  (This should be only a number, and should be in kiloDaltons (kDa))
 +
 +
 +
'''fields who don't have their value obtained should not appear'''
 +
 +
* Kent - Maintain the [https://2008.igem.org/Team:Calgary_Software UofC iGEM software team page]
 +
 +
Meanwhile, Terrance will be responsible for obtaining MathLab for our team for the purposes of designing a network system wrap-around for the simulation environment, Kent will be responsible for developing a bridge between MathLab and EvoGEM.
 +
 +
 +
Since RE (Regular Expressions) will be quite a big portion of this kind of search, here is a [http://www.comp.leeds.ac.uk/Perl/matching.html link to RE in PERL] and here is [http://www.troubleshooters.com/codecorn/littperl/perlreg.htm another one].
 +
 +
Here is another useful link about [http://www.troubleshooters.com/codecorn/littperl/perlsub.htm subroutines in PERL]
 +
 +
In case the LWP is not present on your computer [http://search.cpan.org/~gaas/libwww-perl-5.800/lib/LWP.pm here is a link] where you can download it
 +
 +
 +
 +
Also, I emailed everyone the "Perl goodies" Vlad has sent out to the emails you posted on this wiki.
=== Step III ===
=== Step III ===

Revision as of 19:04, 10 June 2008


This is a template page. READ THESE INSTRUCTIONS.
You are provided with this team page template with which to start the iGEM season. You may choose to personalize it to fit your team but keep the same "look." Or you may choose to take your team wiki to a different level and design your own wiki. You can find some examples HERE.
You MUST have a team description page, a project abstract, a complete project description, and a lab notebook. PLEASE keep all of your pages within your Team:Example namespace.



You can write a background of your team here. Give us a background of your team, the members, etc. Or tell us more about something of your choosing.
Example logo.png

Tell us more about your project. Give us background. Use this is the abstract of your project. Be descriptive but concise (1-2 paragraphs)

Your team picture
Team Example 2


Contents

The Project

            EvoGEM is an agent based system which seeks to design iGEM "genetic circuits" by harnessing the power of evolutionary design. In this design methodology, the paradigm of evolution is harnessed in order to select for efficient designs. The survival criteria are determined by the purpose for which these circuits are built, and those that fulfill that purpose well survive and create even better offspring while those that fail to please the criteria, are thrown away from the gene pool. The powerful agent based logic which allows for minimal pre-assumptions about the overall behavior of the system, along with the empirically proven evolutionary design create superb system that is able to both emulate and develop iGEM circuits.

            EvoGEM was briefly presented during the 2007 iGEM jamboree and has sparked quite a lot of interest amongst the different teams. This summer, our team plans to further develop the fitness function EvoGEM employs, introduce more complex pattern recognition, and test the system under a much larger search space than before. The final goal is to produce a system sophisticated enough to rebuild working designs from previous years' teams' projects, as well as intelligent enough to simulate successes and failures of working and non-working systems, respectively.

            The main focus of this project is to build perl scripts that will support EvoGEM's requirements of a flat file registry, and develop the EvoGEM code to include the behaviors specified before.

Project Details

The focus of the first part of the project is to create a perl script to quarry the registry and other databases in order to retrieve critical information about different bio-bricks. After that, the goal will be to improve the EvoGEM code to introduce the changes discussed in the previous section

Step I

First off, we will characterize the parts that Kent has pulled from the registry as ideas to simulate.

The systems Kent suggested are linked here:

[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45120 BBa_J45120]

[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45200 BBa_J45200]

[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45250 BBa_J45250]

[http://partsregistry.org/wiki/index.php?title=Part:BBa_I0462 BBa_I0462]

[http://partsregistry.org/wiki/index.php?title=Part:BBa_J45996 BBa_J45996]

Once these are simulated well we can begin inserting more systems to increase search space AND try and find more complex pathways


When characterizing proteins, include a field in the parts file titled "sequence = " and the protein sequence using ONE LETTER AMINO ACID notation

Translate DNA code into amino acids with [http://www.vivo.colostate.edu/molkit/translate/index.html this program]

e.g.

This DNA sequence:

atggaagttgttgaagttcttcacatgaatggaggaaatggagacagtagctatgcaaacaattctttggttcagcaaaaggtgattctcatgacaaagc caataactgagcaagccatgattgatctctacagcagcctctttccagaaaccttatgcattgcagatttgggttgttctttgggagctaacactttctt ggtggtctcacagcttgttaaaatagtagaaaaagaacgaaaaaagcatggttttaagtctccagagttttattttcacttcaatgatcttcctggcaat gattttaatacactttttcagtcactgggggcatttcaagaagatttgagaaagcatataggggaaagctttggtccatgttttttcagtggagtgcctg gttcattttatactagacttttcccttccaaaagtttacattttgtttactcctcctacagtctcatgtggctatctcaggtgcctaatgggattgaaaa taacaagggaaacatttacatggcaagaacaagccctctaagtgttattaaagcatactacaagcaatatgaaatagatttttcaaattttctcaagtac cgttcagaggaattgatgaaaggtggaaagatggtgttaacactcctaggtagagaaagtgaggatcctactagcaaagaatgctgttacatttgggagc ttctagccatggccctcaataagttggttgaagagggattgataaaagaagagaaagtagatgcattcaatattcctcaatacacaccatcaccagcaga agtaaagtacatagttgagaaggaaggatcattcaccattaatcgcttggaaacatcaagagttcattggaatgcttctaataatgagaagaatggtggt tacaatgtgtcaaggtgcatgagagctgtggctgagcctttgcttgtcagccactttgacaaggaattgatggatttagtgttccacaagtacgaagaga ttgtttctgattgcatgtccaaagagaatactgagtttataaatgtcatcatctccttgaccaaaataaattaa


Translates into this amino acid sequence:

MEVVEVLHMNGGNGDSSYANNSLVQQKVILMTKPITEQAMIDLYSSLFPETLCIADLGCS LGANTFLVVSQLVKIVEKERKKHGFKSPEFYFHFNDLPGNDFNTLFQSLGAFQEDLRKHI GESFGPCFFSGVPGSFYTRLFPSKSLHFVYSSYSLMWLSQVPNGIENNKGNIYMARTSPL SVIKAYYKQYEIDFSNFLKYRSEELMKGGKMVLTLLGRESEDPTSKECCYIWELLAMALN KLVEEGLIKEEKVDAFNIPQYTPSPAEVKYIVEKEGSFTINRLETSRVHWNASNNEKNGG YNVSRCMRAVAEPLLVSHFDKELMDLVFHKYEEIVSDCMSKENTEFINVIISLTKIN-

Step II

We will be developing a web browsing script that uses [http://www.comp.leeds.ac.uk/Perl/start.html PERL] to retrieve relevant information from [http://partsregistry.org/Main_Page the parts' registry] as well as from other databases such as [http://www.pir.uniprot.org/ Uniprot], [http://pubchem.ncbi.nlm.nih.gov/ PubChem] and [http://www.chemspider.com/ Chemspider]. Take the time to at least glimps at these databases and at PERL.

Here is a nice tutorial on [http://www.perl.com/pub/a/2002/08/20/perlandlwp.html web-browsing using PERL]

The final goal will be to use PERL to be able to build a large scale local registry of the same type we made in step I.

The workflow of the software is as follows:

  • Retrieve information about every part in the registry
  • Use that information to characterize promoters, RBS and terminators
  • Retrieve information about produced proteins from different databases and use that to characterize reporters and protein coding regions
  • Retrieve information about reactions from different databases and use that to characterize specific enzymes produced by protein coding regions
  • Retrieve information about reactants and products from different databases to characterize specific compounds related to proteins (substrates, products, inducers, inhibitors etc')
  • Use all that information to build a flat file registry for EvoGEM to use


The first thing to do tomorrow is to make sure everyone will have PERL and bioPERL on all machines.

The division of roles is as follows:


  • Boris - Build a module that retrieves all the parts from the registry and characterizes promoters, RBS and terminators - This will be stored in 4 arrays: @final_parts_names, @final_parts_types, @final_parts_parts (if it is a composite parts, this is a breakdown of its subparts, if it is not then is contains the string "NOTHING"), @final_parts_sequences.


  • Taras - Build a module that accepts a DNA sequence, converts it into an amino acid sequence (a hash table that matches codons to amino acids should be useful), searches using that sequence in a protein database and retrieves protein name, and if it is an enzyme, its reaction. You should produce 4 arrays: @final_part_products (stores what protein each part produces, if a part like an RBS, produces nothing the array element will be the string "NOTHING"), @final_protein_names (a list of all protein names), @final_protein_inputs (a list of inputs for all proteins that are enzymes, if they are not enzymes their input is "NOTHING") and @final_protein_outputs.


  • Neven - Build a module that accepts a chemical substance's name and uses a chemical database to convert the chemical names into InChI strings, it then retrieves from the database the InChI, SMILES and mass (in kDa) for that molecule, any field that cannot be found should be "NOTHING".


  • Josh - Build a module that accepts all that information and constructs 2 files ("parts", "proteins_and_molecules") with the following fields (just an example):


In parts

[part name]

type = protein coding region

sequence = atggtgcatctatagacgtacgtcatgcagtactactgattattttagctgcacgtcagtacgctaac...taa

output = Heme oxygenase 1


In proteins_and_molecules

[Heme oxygenase 1]

sequence = MVGTRDSSFFT...*

input = InChI=1/C34H34N4O4.Fe/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16- + 3AH(2) + 3O(2) (notice InChI)

output = biliverdin + Fe(2+) + CO + 3 A + 3 H(2)O (notice no InChI)

weight = ... (This should be only a number, and should be in kiloDaltons (kDa))



[Heme]

InChI = InChI=1/C34H34N4O4.Fe/c1-7-21-17(3)25-13-26-19(5)23(9-11-33(39)40)31(37-26)16-32-24(10-12-34(41)42)20(6)28(38-32)15-30-22(8-2)18(4)27(36-30)14-29(21)35-25;/h7-8,13-16H,1-2,9-12H2,3-6H3,(H4,35,36,37,38,39,40,41,42);/q;+2/p-2/b25-13-,26-13-,27-14-,28-15-,29-14-,30-15-,31-16-,32-16- + 3AH(2) + 3O(2)

SMILES = [Fe+2].O=C(O)CCc1c(c3[n-]c1cc/5nc(cc2[n-]c(c(c2C)\C=C)cc/4nc(c3)C​(\C=C)=C\4C)\C(=C\5CCC(=O)O)C)C

synonyms = ...

weight = ... (This should be only a number, and should be in kiloDaltons (kDa))


fields who don't have their value obtained should not appear

Meanwhile, Terrance will be responsible for obtaining MathLab for our team for the purposes of designing a network system wrap-around for the simulation environment, Kent will be responsible for developing a bridge between MathLab and EvoGEM.


Since RE (Regular Expressions) will be quite a big portion of this kind of search, here is a [http://www.comp.leeds.ac.uk/Perl/matching.html link to RE in PERL] and here is [http://www.troubleshooters.com/codecorn/littperl/perlreg.htm another one].

Here is another useful link about [http://www.troubleshooters.com/codecorn/littperl/perlsub.htm subroutines in PERL]

In case the LWP is not present on your computer [http://search.cpan.org/~gaas/libwww-perl-5.800/lib/LWP.pm here is a link] where you can download it


Also, I emailed everyone the "Perl goodies" Vlad has sent out to the emails you posted on this wiki.

Step III

The Experiments

Part 3

Results

The Project
Home The Team Parts Submitted to the Registry Modeling Notebook