Team:Newcastle University/Parts Repository

Aims and Objectives
To develop a system that will define the kinds of parts that make up a biological circuit and their behaviour.

Computers do not know any biology. Many of the other projects involve manipulating data without any need to understand its biological meaning. This module is responsible for capturing biological knowledge and how it relates to that data. So the parts that make up a biological system need to be described in a computationally accessible manner. This system will model the parts that make up the biological circuits. Efforts to define parts are already underway in the BioBricks community, and this project will build upon these.

The requirement for a repository of models to complement defined parts and assist synthetic design has not been neglected (Le Novere et al., 2006; Rodrigo, et al., 2007; Rouilly et al., 2007). However, to date these needs have not been met by a suitable repository of part models. In January 2007, Rouilly et al. announced that creating a repository of CellML models to correspond to the BioBricks parts registry was both useful and achievable. Since this declaration, a registry of biological models has been created yet populated by barely more than 10 models, many of which are not validated (Registry of Standard Biological Models, 2007). CellML models have been created for generic parts such as promoters, ribosome-binding sites, mRNA, proteins and oscillators (Registry of Standard Biological Models, 2007).

The objectives of this project are:
 * Decide what information is required about circuit components and how this will be stored.
 * Design user stories to show requests that the API may need to support.
 * Outline the structure of the database.
 * Carry out background research into example components that will be used to answer competency questions on the database.
 * Identify all parts that will be defined in the initial database
 * Collect information on the selected parts
 * Begin to build up the database with parts and their information
 * Model each part in the selected mark up language.
 * Design an API that will support software that requires access to the information, such as CellML.



User Stories
The PR provides a resource of computationally defined behaviour of the specific parts. The overall use for the parts information is to facilitate the design of synthetic systems based on this behaviour. Ways in which the database information could be utilised was defined by user stories. Stories were created for use and maintenance of the database.

These user stories provided the basis for creating SQL statements to query the repository and access the information. Data could be obtained using SQL queries in the database to return the requested information. These were written out for each type of information that could be required by the users in the iGEM project. SQL queries were also developed for future uses of the database; these included the addition of a new part and updating information on current parts.

Work bench

 * 1) The workbench will request a list of all part names, types, ids, and part type ids. The program will retrieve all of these from the repository and send these back to the workbench as ordered lists that map between names and their corresponding Id.
 * 2) The work bench will request all Chassis and their Ids. The program will retieve these from the repository and out them to the work bench as lists that map between the Chassis name and their corresponding Id.
 * 3) The workbench will provide the Id of a certain part, the program will find this part in the database and return all of the information stored on the specified part to the workbench.
 * 4) the workbench wil provide the Id of the part that it would like a model for, the program will look up the part in the database, find the model and return this model to the workbench with the Id attached

Evolutionary algorithm

 * 1) EA will provide the Id of the part type required with the parameter that the part should adhere to, the program will find a part of that type that fits the parameter set by the EA that doesn’t have the same Id, then output this part to the EA.
 * 2) EA will provide part ID and request the model for this part, the program will find the part based on it’s Id in the database then find the model for this part and return it to the EA.

Constraints repository

 * Originally all interactions between the constraints repository and parts repository will go through the work bench but if they were to interact directly it would be as follows*


 * 1) Constraints repository will request all parts, all part types, all part Ids and part type IDs, the programm will ask query the database for all of these and return them to the constraints repository.
 * 2) Once a part is entered the program will inform the constriant repository that there is a new part available.

Person/other user
1. Someone wants to enter a new part. The program tells that database to create a new space and Id for this part. The user provides the informationfor the new part, the program inputs this information into the database where the space is created.

2. User wants to update a part, the user provides the Id of the part that they wish to update and the field that they want to update and the information to be stored. The program finds the part and field in the database and replaces the information.

Outcomes
The work for the Parts Repository (PR) was concluded on 1 Sept 2008.



Parts Repository
The database repository of genetic parts for bottom-up modelling of synthetic biology. Implemented as a Microsoft Access database.

The PR attempts to completely define a "part", which can be any genetically defined construct. These can include BioBrick parts and devices, as well as independently defined objects like the 2-part quorum sensing systems used in the BugBuster project.

There is only one relationship present between the tables in the database, this is between the TypeId in both the Part types and Parts tables. Figure 2-Table 1, is the table of part types. This table represents nine different types of genetic parts. Of these nine types of parts, six are represented computationally using complete CellML models (CellML, 2003). The models for part types are based on mathematics provided by Cooling M, 2008.

All parts are assigned a unique numerical identifier. Part names are given as the protein name in the cases where the parts are encoded proteins, for all other parts their relative names are given. The source organism is detailed to allow for identification of parts and to prevent confusion in cases where a name is used for a certain part in more than one organism. Where possible protein accession numbers are recorded, and the BioBrick numbers recorded where part have already been registered in the BioBricks registry (Registry, 2008). Part sequences are hyperlinked to a text file containing the sequence of the part to enable access by the model to sequence converter. For individual parts, relevant components of the part models are stored in text format with the file path provided in the repository. This is for practical reasons concerning the iGEM project as a whole.


 * Progress towards defining Parts



Part Sequences
The nucleotide sequence for each Part is stored as a text file. This file is available from the PR. Sequences are available for most parts stored in the PR. In cases where the parts do not have an associated sequence this is because the information is not available. The nucleotide sequences are retrieved from EMBL-Bank. Where possible the specific strain that the sequence has come from is provided in the source species. For example, sequences from S. pneumoniae are acquired from sequencing strain R6.



CellML models
The behaviour of each part was defined by a CellML model.

Some of the models are complete, and others are incomplete. It is suggested to compile the models in COR and to run them in PCEnv.

There are also some components, which are the fragments of CellML code that are associated with each part, and fragments to create complete models from these. They appear as they are stored in the parts repository for access by the iGEM project as a whole.

See the Modelling page for more information, and for downloads.



iGEM Java programme
The Java programme requires a level of communication to enable access of the PR to query information. JDBC(TM) was used to provide a connection between the Java code and the repository stored as a Microsoft Access database. Firstly a JDBC-ODBC bridge was installed, then using the ODBC data source administrator, a Microsoft Access driver was set up to access the PR. The sun JDBC driver was used (sun.jdbc.odbc.JdbcOdbcDriver). A Java programme was written to access data from the repository. Once an interface and its relative classes were created, a temporary interface was designed for the interaction of the parts repository and evolutionary algorithm.

This is an interface that allows easy access to the items stored in the PR database without needing knowledge of the database structure. It is written in Java to access via webservices. The Javadoc is available.

Contributors
Lead: Megan Aylward