Team:Newcastle University/Parts Repository

From 2008.igem.org

(Difference between revisions)
Line 17: Line 17:
The objectives of this project are:
The objectives of this project are:
-
* To develop a database that stores clear definitions of the components of a genetic circuit, including:
+
* Decide what information is required about circuit components and how this will be stored.
-
** sequences
+
* Design user stories to show requests that the API may need to support.
-
** important domains
+
* Outline the structure of the database.
-
** abstract model(s)
+
* Carry out background research into example components that will be used to answer competency questions on the database.
-
** the ontology of component types
+
* Identify all parts that will be defined in the initial database
-
** provenance back to e.g. BioBrick of origin
+
* Collect information on the selected parts
-
* To develop an API that will provide access to the repository in a standard, easily usable manner
+
* Begin to build up the database with parts and their information
 +
* Model each part in the selected mark up language.
 +
* Design an API that will support software that requires access to the information, such as CellML.
 +
 
 +
<html><a name="user-stories"></html>
 +
===User Stories===
 +
The PR provides a resource of computationally defined behaviour of the specific parts.  The overall use for the parts information is to facilitate the design of synthetic systems based on this behaviour. Ways in which the database information could be utilised was defined by user stories.  Stories were created for use and maintenance of the database.
 +
 
 +
These user stories provided the basis for creating SQL statements to query the repository and access the information.  Data could be obtained using SQL queries in the database to return the requested information. These were written out for each type of information that could be required by the users in the iGEM project. SQL queries were also developed for future uses of the database; these included the addition of a new part and updating information on current parts.
 +
 
 +
====Work bench====
 +
 
 +
# The workbench will request a list of all part names, types, ids, and part type ids. The program will retrieve all of these from the repository and send these back to the workbench as ordered lists that map between names and their corresponding Id.
 +
# The work bench will request all Chassis and their Ids. The program will retieve these from the repository and out them to the work bench as lists that map between the Chassis name and their corresponding Id.
 +
# The workbench will provide the Id of a certain part, the program will find this part in the database and return all of the information stored on the specified part to the workbench.
 +
# the workbench wil provide the Id of the part that it would like a model for, the program will look up the part in the database, find the model and return this model to the workbench with the Id attached
 +
 
 +
====Evolutionary algorithm====
 +
 
 +
# EA will provide the Id of the part type required with the parameter that the part should adhere to, the program will find a part of that type that fits the parameter set by the EA that doesn’t have the same Id, then output this part to the EA.
 +
# EA will provide part ID and request the model for this part, the program will find the part based on it’s Id in the database then find the model for this part and return it to the EA.
 +
 
 +
====Constraints repository====
 +
 
 +
*Originally all interactions between the constraints repository and parts repository will go through the work bench but if they were to interact directly it would be as follows*
 +
 
 +
# Constraints repository will request all parts, all part types, all part Ids and part type IDs, the programm will ask query the database for all of these and return them to the constraints repository.
 +
# Once a part is entered the program will inform the constriant repository that there is a new part available.
 +
 
 +
====Person/other user====
 +
 
 +
1. Someone wants to enter a new part. The program tells that database to create a new space and Id for this part. The user provides the informationfor the new part, the program inputs this information into the database where the space is created.
 +
 
 +
2. User wants to update a part, the user provides the Id of the part that they wish to update and the field that they want to update and the information to be stored. The program finds the part and field in the database and replaces the information.
 +
 
<html><a name="outcomes"></html>
<html><a name="outcomes"></html>
Line 35: Line 69:
The PR attempts to completely define a "part", which can be any genetically defined construct. These can include BioBrick parts and devices, as well as independently defined objects like the 2-part quorum sensing systems used in the BugBuster project.  
The PR attempts to completely define a "part", which can be any genetically defined construct. These can include BioBrick parts and devices, as well as independently defined objects like the 2-part quorum sensing systems used in the BugBuster project.  
 +
 +
There is only one relationship present between the tables in the database, this is between the TypeId in both the Part types and Parts tables. Figure 2-Table 1, is the table of part types. This table represents nine different types of genetic parts.  Of these nine types of parts, six are represented computationally using complete CellML models (CellML, 2003). The models for part types are based on mathematics provided by Cooling M, 2008.
 +
 +
All parts are assigned a unique numerical identifier. Part names are given as the protein name in the cases where the parts are encoded proteins, for all other parts their relative names are given.  The source organism is detailed to allow for identification of parts and to prevent confusion in cases where a name is used for a certain part in more than one organism.  Where possible protein accession numbers are recorded, and the BioBrick numbers recorded where part have already been registered in the BioBricks registry (Registry, 2008).  Part sequences are hyperlinked to a text file containing the sequence of the part to enable access by the model to sequence converter.  For individual parts, relevant components of the part models are stored in text format with the file path provided in the repository. This is for practical reasons concerning the iGEM project as a whole. 
* [[Team:Newcastle University/Defining Parts|Progress towards defining Parts]]
* [[Team:Newcastle University/Defining Parts|Progress towards defining Parts]]
Line 41: Line 79:
Image:Erd-big.jpg |The Entity Relationship Diagram for the parts repository database
Image:Erd-big.jpg |The Entity Relationship Diagram for the parts repository database
Image:database_screen_shot.jpg | A screen shot of the populated database
Image:database_screen_shot.jpg | A screen shot of the populated database
-
Image:relationships.jpg | Showing relationships between the Parts and their IDs
+
Image:relationships.jpg | Showing relationships between the Parts and their IDs. The most fundamental table is the Parts table as it hold all of the parts and their associated models. The Part Yype table is needed to define each Part Type into a category and for the iGEM project to enable mutation of Parts that belong to the same type category.
-
Image:Db.jpg | Another view of the populated database, with defined functions for parts
+
Image:Db.jpg | Another view of the populated database, with defined functions for Parts
</gallery>
</gallery>
<html><a name="part-sequences"></html>
<html><a name="part-sequences"></html>
==== Part Sequences ====
==== Part Sequences ====
-
The nucleotide sequences for the parts used in the BugBuster project, to further define the function of a part in the part repository.  
+
The nucleotide sequence for each Part is stored as a text file. This file is available from the PR. Sequences are available for most parts stored in the PR. In cases where the parts do not have an associated sequence this is because the information is not available. The nucleotide sequences are retrieved from EMBL-Bank. Where possible the specific strain that the sequence has come from is provided in the source species. For example, sequences from ''S. pneumoniae'' are acquired from sequencing strain R6.
<html><a name="cellml-models"></html>
<html><a name="cellml-models"></html>
Line 166: Line 204:
Lead: [[Team:Newcastle University/Megan Aylward|Megan Aylward]]
Lead: [[Team:Newcastle University/Megan Aylward|Megan Aylward]]
-
</div>
 
<div id="sidebar">
<div id="sidebar">
{{:Team:Newcastle University/Template:PostItBox
{{:Team:Newcastle University/Template:PostItBox
Line 173: Line 210:
|detail-text=<ul>
|detail-text=<ul>
<li>[[Team:Newcastle University/Parts Repository#aims|Aims and Objectives]]
<li>[[Team:Newcastle University/Parts Repository#aims|Aims and Objectives]]
 +
<li>[[Team:Newcastle University/Parts Repository#user-stories|User Stories]]
<li>[[Team:Newcastle University/Parts Repository#outcomes|Outcomes]]
<li>[[Team:Newcastle University/Parts Repository#outcomes|Outcomes]]
<ul>
<ul>

Revision as of 17:59, 28 October 2008

Bugbuster-logo-red.png
Ncl uni logo.jpg


Newcastle University

GOLD MEDAL WINNER 2008

Home Team Original Aims Software Modelling Proof of Concept Brick Wet Lab Conclusions


Home >> Original Aims >> Parts Repository

The objectives of this project are:

  • Decide what information is required about circuit components and how this will be stored.
  • Design user stories to show requests that the API may need to support.
  • Outline the structure of the database.
  • Carry out background research into example components that will be used to answer competency questions on the database.
  • Identify all parts that will be defined in the initial database
  • Collect information on the selected parts
  • Begin to build up the database with parts and their information
  • Model each part in the selected mark up language.
  • Design an API that will support software that requires access to the information, such as CellML.

User Stories

The PR provides a resource of computationally defined behaviour of the specific parts. The overall use for the parts information is to facilitate the design of synthetic systems based on this behaviour. Ways in which the database information could be utilised was defined by user stories. Stories were created for use and maintenance of the database.

These user stories provided the basis for creating SQL statements to query the repository and access the information. Data could be obtained using SQL queries in the database to return the requested information. These were written out for each type of information that could be required by the users in the iGEM project. SQL queries were also developed for future uses of the database; these included the addition of a new part and updating information on current parts.

Work bench

  1. The workbench will request a list of all part names, types, ids, and part type ids. The program will retrieve all of these from the repository and send these back to the workbench as ordered lists that map between names and their corresponding Id.
  2. The work bench will request all Chassis and their Ids. The program will retieve these from the repository and out them to the work bench as lists that map between the Chassis name and their corresponding Id.
  3. The workbench will provide the Id of a certain part, the program will find this part in the database and return all of the information stored on the specified part to the workbench.
  4. the workbench wil provide the Id of the part that it would like a model for, the program will look up the part in the database, find the model and return this model to the workbench with the Id attached

Evolutionary algorithm

  1. EA will provide the Id of the part type required with the parameter that the part should adhere to, the program will find a part of that type that fits the parameter set by the EA that doesn’t have the same Id, then output this part to the EA.
  2. EA will provide part ID and request the model for this part, the program will find the part based on it’s Id in the database then find the model for this part and return it to the EA.

Constraints repository

  • Originally all interactions between the constraints repository and parts repository will go through the work bench but if they were to interact directly it would be as follows*
  1. Constraints repository will request all parts, all part types, all part Ids and part type IDs, the programm will ask query the database for all of these and return them to the constraints repository.
  2. Once a part is entered the program will inform the constriant repository that there is a new part available.

Person/other user

1. Someone wants to enter a new part. The program tells that database to create a new space and Id for this part. The user provides the informationfor the new part, the program inputs this information into the database where the space is created.

2. User wants to update a part, the user provides the Id of the part that they wish to update and the field that they want to update and the information to be stored. The program finds the part and field in the database and replaces the information.


Outcomes

The work for the Parts Repository (PR) was concluded on 1 Sept 2008.

Parts Repository

The database repository of genetic parts for bottom-up modelling of synthetic biology. Implemented as a Microsoft Access database.

The PR attempts to completely define a "part", which can be any genetically defined construct. These can include BioBrick parts and devices, as well as independently defined objects like the 2-part quorum sensing systems used in the BugBuster project.

There is only one relationship present between the tables in the database, this is between the TypeId in both the Part types and Parts tables. Figure 2-Table 1, is the table of part types. This table represents nine different types of genetic parts. Of these nine types of parts, six are represented computationally using complete CellML models (CellML, 2003). The models for part types are based on mathematics provided by Cooling M, 2008.

All parts are assigned a unique numerical identifier. Part names are given as the protein name in the cases where the parts are encoded proteins, for all other parts their relative names are given. The source organism is detailed to allow for identification of parts and to prevent confusion in cases where a name is used for a certain part in more than one organism. Where possible protein accession numbers are recorded, and the BioBrick numbers recorded where part have already been registered in the BioBricks registry (Registry, 2008). Part sequences are hyperlinked to a text file containing the sequence of the part to enable access by the model to sequence converter. For individual parts, relevant components of the part models are stored in text format with the file path provided in the repository. This is for practical reasons concerning the iGEM project as a whole.

Part Sequences

The nucleotide sequence for each Part is stored as a text file. This file is available from the PR. Sequences are available for most parts stored in the PR. In cases where the parts do not have an associated sequence this is because the information is not available. The nucleotide sequences are retrieved from EMBL-Bank. Where possible the specific strain that the sequence has come from is provided in the source species. For example, sequences from S. pneumoniae are acquired from sequencing strain R6.

CellML models

The behaviour of each part was defined by a [http://www.cellml.org/ CellML model].

Some of the models are complete, and others are incomplete. It is suggested to compile the models in [http://cor.physiol.ox.ac.uk/ COR] and to run them in [http://www.cellml.org/tools/pcenv/ PCEnv].

There are also some components, which are the fragments of CellML code that are associated with each part, and fragments to create complete models from these. They appear as they are stored in the parts repository for access by the iGEM project as a whole.

Download: File:Newcastle-igem2008-CellML-models.zip contains:

  • Alt.promoter.constitutive.cellml
  • Alt.promoter.constitutiveImport.cellml
  • Alt.promoter.inductive.cellml
  • Alt.promoter.inductive3.cellml
  • Alt.promoter.respressive.cellml
  • Alt.protein.cellml
  • Alt.RBS.cellml
  • Bacillus RBS.cellml
  • Basic promoter 2.cellml
  • Basic promoter 3.cellml
  • Basic Promoter.cellml
  • Check comp.cellml
  • CodingRegion.cellml
  • ComD(NoInducSynthWithComEP).cellml
  • ComD.cellml
  • ComDAGAIN.cellml
  • composite2.cellml
  • CompositeModel.cellml
  • constitutive promoter.cellml
  • DNAa.cellml
  • EncodedProtein.cellml
  • example tut3main.cellml
  • example tut3sub..cellml
  • FluxComEP.cellml
  • ftsA.cellml
  • ftsW.cellml
  • HySpank.cellml
  • InitialRBS.cellml
  • minC.cellml
  • NewComD.cellml
  • PbofA.cellml
  • PcotA.cellml
  • Protein.cellml
  • proteinCoding.cellml
  • PspaS.cellml
  • PspollQ.cellml
  • Pxyl.cellml
  • ResponseRegulator.cellml
  • Sensor.cellml
  • Tanscription factor.cellml
  • To join the components together.doc
  • veg.cellml
  • XComDP.cellml
  • zapA.cellml

File:Newcastle-igem2008-CellML-components.zip contains:

  • AgrA.txt
  • AgrC.txt
  • Bind Components.txt
  • CFP.txt
  • Cin-box.txt
  • ComD.txt
  • ComE.txt
  • ConstitutivePromoter.txt
  • DNAa.txt
  • Environment.txt
  • Essentials.txt
  • Footer.txt
  • ftsA.txt
  • ftsW.txt
  • GFP.txt
  • Header.txt
  • HySpank.txt
  • InducablePromoter.txt
  • mCherry.txt
  • minC.txt
  • OptimalRBS.txt
  • P2.txt
  • P3.txt
  • PbofA.txt
  • PbofAmod.txt
  • PcotA.txt
  • PlcA.txt
  • PromoterToRBS.txt
  • Protein.txt
  • PspaS.txt
  • PspollQ.txt
  • Pxyl.txt
  • RBS.txt
  • RBSToCodingRegion.txt
  • RepressablePromoter.txt
  • ResponseRegulator.txt
  • Sensor.txt
  • Units.txt
  • veg.txt
  • XyloseIsomerase.txt
  • YFP.txt
  • YocF.txt
  • YocG.txt
  • zapA.txt

iGEM Java programme

The Java programme requires a level of communication to enable access of the PR to query information. JDBC(TM) was used to provide a connection between the Java code and the repository stored as a Microsoft Access database. Firstly a JDBC-ODBC bridge was installed, then using the ODBC data source administrator, a Microsoft Access driver was set up to access the PR. The sun JDBC driver was used (sun.jdbc.odbc.JdbcOdbcDriver). A Java programme was written to access data from the repository. Once an interface and its relative classes were created, a temporary interface was designed for the interaction of the parts repository and evolutionary algorithm.

This is an interface that allows easy access to the items stored in the PR database without needing knowledge of the database structure. It is written in Java to access via webservices. The Javadoc is available.

Further Reading

Contributors

Lead: Megan Aylward