Team:Newcastle University/Mark's Lab Journal

Mark Wappett's Lab Journal
16/05/2008

Following Thursdays meeting I have been researching into Quorum sensing peptides that are unique to Staphylococcus aureus. After consulting the literature I have noted the presence of four Quorum sensing systems. They are agr, sae, artRS and SrrAB. The relevant papers are linked to below. Previously I discovered that there were four different autoinducing peptides in Staphylococcus aureus. However it has now become clear that these are indeed unique to S. aureus, and are not present (in the same sequence) in any other member of the Staph species. The agrC receptors also have different corresponding conformational groups. It may be possible to use all four AIP's and agrC's as a S. aureus biosensor.

At this point I went on to retrieve the sequences using the NCBI and EBI databases.

19/05/2008

Carried out the Blast analysis of the different agrC groups. There are very small differences between the agrC protein sequences in Staphylococcus aureus, however the differences are much greater between Staphylococcus aureus species and other Staph species, namely epidermis and haemolyticus. As a result the agrC protein could well be used as a receptor for the agrD receptor proteins.

The alignment for agrD contains group I, III and IV agrD peptides. The alignment for the sensory protein in agrC contains three peptides from agrC and two from Staphylococcus haemolyticus and Staphylococcus epidermis. The differences mentioned above are documented.

Have started looking at Listeria monocytogenes. It is a gram-positive human pathogenic bacterium and the cause of listeriosis a serious infection characterized by high mortality rates in immunocompromised individuals and pregnant women. It is often involved in food poisoning and although rare results in a mortality rate of over 25% compared to Salmonella which only has a mortality rate of 1%. The bacteria uses a similar two-component system to Staphylococcus aureus for its quorum sensing signalling system, the agr system.

I am now going to blast the genomic sequence data I have found on the agr system for Listeria. The results for the Listeria agrC protein are promising as it is distincly different to any agrC present in any other species (including importantly Staphylococcus aureus). The result is similar for the agrD protein. The results are listed below:

I am now moving on to look at the neural network training data, and to incorporate some noise into the network. I want to use multiple instances and inputs to map to a single output, and see how the network trains the data. Using the Emergent software I created 8 inputs, 25 hidden layer nodes and mapped it all to a single output. As specified the data was extremely noisy as the graphical output showed. I specified first 10 batches, and then 50 batches each to run through until the error reached zero. This enables the demonstration of network learning. As the network learns, the length of time and the number of epoch required to run through the data to reach zero is significantly reduced, even in a network filled with noisy data.

20/05/2008

Began the day by reading my new neural networks paper. Now moving onto the CellML tutorials. Worked through the cellML tutorials printing out the xml, and working through the signal transduction examples.

I then went onto downloading the COR program and started to look at becoming familiar with the language and how it fits in with XML.

Have also begun writing a java doc that reads in perfect data from a text file that can be used on a neural network. The program reads in each double value and performs a gaussian distribution on it generating a population of 1,000 tht is noisy data. It will output this in a file that can be input into the neural network program.

I started off by constructing a file parser which scanns the file for doubles. I also constructed a method that can perform the gaussian distribution step. I am now working on how to generate the population of 1 000 and also how to integrate the method with the parser.

Plan for tomorrow is to finish off the java doc and to start looking at further data structures for my evolutionary algorithm

Came in again to uni this evening and managed to get my double array input to work. I have also now written a full method for performing th gaussian distribution step.

21/05/2008

Plan for today is basically to get my java program to work. Been working on it all morning and have been trying to access by array of stored double numbers witrh my gaussian method in a forLoop. This has proven a sticking point and is very anoying as for some reason the array cannot be accessed even though both methods are under the same class. Have e-mailed matt about this and we are communicating freely now via MSN messenger.

Have also (under Matt's instructions) e-mailed Dan at support@bsu.ncl.ac.uk to set up and log into the svn repository, which is for code developing and sharing. Have also downloaded the Tortoise development tool. It is worth mentioning the usefulness of this tool in that I now have access to a code repository. As i alter or change my code I can then commit any changes once I have updated the files and they can then be accessed in the repository. The Tortoise development tool simply enables you to carry out this task.

It is just after lunchtime at this point, and I have fixed a big problem with file access on the server, and I am now just waiting for Matt to have a look at my code. In the meantime I am going to look at sourceforge at past API's, to get an idea of how to structure my evolutionary algorithm.

Link to sourceforge

Started looking through sourceforge for relevant programs/

22/05/2008

Began this morning by finishing off my java program which makes noisy data. The program now takes a file in containing the relevant perfect data, and produces some nice noisy data to prevent underfitting of the network. This is given in an output file. The exercise has been useful both for proof of concept, and helping me with my Java skills and has some good value for using neural network software. Have also decided to add to this week's agenda by adding a discussion on what programs to practice Java coding, providing that they are relevant to the topic.

Matt came in at 11 and helped me finish off my program. Note to self - there is a programmer in there somewhere, its just actually learning how to get the stuff down on paper that is the problem at the moment. Need more practice. Matt went through the svn system with everyone (including some non-iGEM intruders) and we used my system that we got running yesterday as an example of how the svn system actually works. It was developed for open source code sharing.

At 12 Marcus Kaiser came in and went through with each of us where we are with out projects. I gave him a working demo of my neural net and new java doc software, and oulined my plan for the coming weeks as well as talked about the Biological concept.

Just going on again to looking through the Sourceforge database for relevant programs and structures. Prior to the meeting at 2 30 we should also have all our biological concept stuff nailed down and ready to discuss.

We are also just about to do a bit of last minute Marketing brainstorming.

Had the meeting at 2. 30 and was very productive. Aims for the week are to look in greater depth at the proteins involved in the Listeria and Staphylococcus sytems, and to start writeing an evolutionary algorithm that maps input nodes to output nodes, and evaluates fitness based on the highest number of connections between those nodes.

23/05/2008

Came in for 10 today as I overslept slightly (landlord finally replaced my broken bed and it was comfy is my excuse). Started planning my evolutionary algorithm this morning, with a view to getting it coded up next week. Started to work through and read through back-propagation neural network stuff. Spend the afternoon with Morgan going through possible program structure and how neural networks are structured.

Also read through some papers on Listeria, in particular searching for quorum sensing systems other than agr, and discovered PrfA which is a diffusible autorepressor protein.

27/05/2008

Today have started coding up my evolutionary algorithm after agreeing with Matt to start coding up the structure. The network will map two inputs to two outputs via three hidden layer nodes. Each node is represented as an array of boolean arrays, therefore for each layer there is an array, with an element for each node. Each element contains a duoble array which contains a selection of double values that will make up the threshold value and weights for each node. These will be input by the user.

At this stage the structure of the network is being coded in, tomorrow I will start working on the functionality. I finished the plan for the network overall earlier today. The network will run two input values mapped to two output values, and the network will be trained to do this based on learning and the weights that it can vary. I wrote some pseudocode for the start of the structural programming and this is shown below:

public class neuralNetwork {

public static void main(String[] args) {	//declare number of nodes (1st array size) for each array //declare size of double arrays (2nd array size) for each array //Read in values for all variables to date using Sanner object (re-use) //declare input array(array of double arrays) //declare hidden layer array //declare output array //enter range of values inte the double arrays for weights //for (each node) {			for(each element of double array) {				Using input stream scanner object enter range of weights }			//last value is the threshold value - entered seperately }}

This has been hard coded today.The next step is to add some functionality. This will involve designing methods that mutate and change node weights, and implementing a fitness function.

Also continued reading into Listeria organism - located a couple of new quorum sening targets - worth discussing.

28/05/2008

Continued on with coding up the EA. Have completely finished the structure coded section, and am now working on a fitness funtion and mutation method that can be implemented to change the weights on these nodes. Also need to specify the boundaries for the double numbers to be incorporated into the double arrays. Thinking of between 0.1 and 10.

Have also done some biological research today concerning the Listeria monocytogenes organism and come up with a couple of signal transduction systems that have been defined that are not homologous to the agr signalling system. This might well make it easier and cleaner than having two agr signalling systems. The systems are the LisRK and the CesRK two-component systems.

Going to start coding up the next part of the EA tomorrow, the mutating method and adding more functionality.

29/05/2008

Continuing on with coding up algorithm, and collating all biosensor data for the meeting at 2.30.

The CesRK Two-component System

• The most recently two-component system that has been found in Listeria monocytogenes.

• cesK encodes the histidine kinase element

• cesR encodes the response regulator element

• AI – peptide is not clear, literature mining not really helping this – could be the protein coded for at orf2420

The LisK two-component system

• Found early in Listeria monocytogenes

• lisR is the response regulator element

• lisK is the histidine kinase component

• The AI transmitter peptide is encoded in the lisK region and mediates signalling

• Similar system to the one above

• Genbank holds entire two-component system – good potential alternative target to the agr system.

Meeting will commence soon. Need to decide on some further program structure with Matt and Morgan to be coding up over the next couple of days.

30/05/08

Came in first thing and attended the Research group meeting where Jen went through a paper and Frank discussed his latest work.

Went on to a Java tutorial with Morgan where she re-capped on all the basics and set a number of exercises to be completed. Completed the tray of muffins, triangles of asterisks and triangle of numbers exercises.

Marcus came in at lunchtime and we talked through where I was at.

31/05/08

Continued working on my own personal part of the Java tutorial - calling classes. I will be looking to implement this into my basic evolutionary algorithm which will be completed vy the end of the week.

01/06/08

Finshing off the Java tutorial, and reading up and practising inheritance. Looking at possible logo design and motto design for the marketing, sent off a brief to a Marketing and Advertising agency who are going to come up with a series of professional logos, from which we can choose.

02/06/08

Today began implementing the exercises set in the Java tutorial into my EA. I created two classes called Node and Promoter, where I created a number of promoter objects to be associated with each node. This has now replaced the double arrays that I started off with in my program. Each promoter object has a name prefixes with P, followed by a number (from count). I have defined 14 different promoters, and for each node 5 of these ate selected randomly and input into a subArray which is then added to the Promoter array in the Promoter class. This is carried out for each node in the three layers in the network, the input/hidden/ and output layers. A double threshold value has also been stated, which in this case has been set to ten. Only when the promoter threshold values in the five promoters add up to ten will the fitness be accepted. Untill this point the EA will mutate the network. This mutate method is what I will begin writing tonight and tomorrow.

Have also spent some time today investigating the LisRK and agr two component systems in Listeria monocytogenes. Delved deep into the literature and also carried out a number of Bioinformatics analyses using UniProt, pfam (although it was down for most of the day), Kegg and Blast. The conclusions that I have drawn from this are that the LisRK system is very unique (in terms of identity), but we would be sensing the unknown, as an A-I has not yet been identified for this organism. In the arg system, I have discovered that again there is only 100% identity for the agrC sensor kinase, but similarity is quite high for one other Listeria species. These findings need to be discussed in full in the meeting on thursday before going any further with this organism. The findings in terms of papers and Blast results are shown below for Listeris monocytogenes:

This evening continuing in the same vein with Staphylococcus aureus.

03/06/08

Continued coding up EA.

Wrote up slides for presentation in the evening

04/06/08

Spent the majority of the day hassling with the EA, and getting the bugs out of the program. Got the bugs out of the system and the basic EA now works and evolves the hidden nodes until the threshold value is met.

Code sample for the mutate method is shown below: public void mutate

{     int mutateThreshold = 4; Random random = new Random; int mutate =random.nextInt; int change = 0; double value = 0;     if (mutate<=mutateThreshold) {     }      else  { for (int i=0; i<hidden.length; i++) {             value = random.nextInt(5); change = random.nextInt(5); Node node = hidden[i]; //System.out.println("Node value" +hidden[i]); Promoter[] barry = node.getCarry; //System.out.println("Node getcarry" + node.getCarry);             Promoter promoter = barry[change]; //System.out.println(promoter); promoter.setStrength(value);         } }Spent some time with Matt and Morgan doing user stories for the interface. Wrote some down for the EA itself.

05/06/08

Spent the morning preparing for the presentation and the meeting, and also wrote a few user stories.

Wrote an agenda

Had a meeting with Marcus Kaiser and went through current achievements and where I am at.

Had the iGEM meeting itself.

06/06/08

Went to the meeting for the research group and went throught the cellML tutorial.

Then went through the CellML tutorial myself, and started to plan how to do part composite models. It is going to be very important to plan interfaces with Nina as she will be planning the constraint models and I will be assembling the part models together to create a composite model based on these constraints.

Got some logos sent through - sent them round for feedback. Then did some mnore user stories. Then e-mailed Jan-Willem about the coshh and the availability and cost of the strains we are planning to use.

09/06/08

Designed a system architecture based on user interactions. It was made up of an overall architecture, a mutate parts architecture, a fitness function architecture, and an import/ output information architecture.

had meeting with Jen at 1.00 in which we discussed the system architecture and planned architecture and made some modifications. We also decided on a fitness function. As the parts for the model aren ot going to be ready for a while Jen suggested first coding up the EA using the lac operon sequence data as it is well characterised and useful to the project. When the parts and constraints repositories are up and running these can be added.

I also did Morgan's Java exercises and changed the arrays to the collections, mostly Lists and incorporated these into my program mostly as ArrayLists. This did involve changing quite a lot of code involving variables, but once I had got the hang of iot it did not take too long.

10/06/08

Spent some time with Matthew in the morning going through static architectures and communications with Morgan's workbench. This led to completion of an inputs and a steering diagram that can be used to help code up interfaces.

Then spent some time researching the lac operon so that it can be used for model parts to code up an evolutionary algorithm. I then planned how I would implement this in an EA. Basically there are four states that the lac operon funtions under;

1. High expression - Low glucose, and lactose available

2. No expression - High glucose, and lactose unavailable

3. No expression - Low glucose, and lactose unavailable

4. Low expression - Low glucose, and lactose available

I then designed a cellML model for the lac operon system which can fit into my EA. The hidden layer will be a single promoter that can be affected by a number of proteins leading to the various levels of transcription present on the lac operon. The cellML model incorporates various for substance amount and time along with a number of variables for the different proteins around the promoter start site. I then started to do some maths to code up different scenarios (represented as components) that would enable the model to function with four different outcomes. Ultimately these will be mutated in the EA.

11/06/08

I finished off my CellML model and implemented all the appropriate units and cariables into a single component. In this component there were four equations that equalled the four different states in the model. I then carried out a number of simulations on the model and thereby tested the variables in order to see whether or not the model did as I would have liked.

The next stage is to implement this into my EA, but before I could do this I need a Java simulation package that can be plugged into the EA itself, as the CellML madels cannot be directly incorporated. After consulting with Morgan I decided that Jsim was the best option especially as it was detailed on the CellML website.

12/06/08

Downloaded the Jsim program as a source tarball and spent some time trying to compile it. Spent some time preparing for the meeting and practising Java. Had the meeting at 2.30 and planned for the week

13/06/08

Had to do some troubleshooting with the jsim program as it would not compile. Turned out a different version of Java was required and the import statements in a number of the class files were wrong and needed altering. There was also a major problem in one of the files with one of the getState methods that had a problem due to different Java versions. This too needed to be fixed before the probelm would compile. Got the program to run.

Then found a newer version of Jsim - better GUI to work in and incorporated the Lac operon cellML model that I had produced. The language is slightly different in the cellML model to the jSim version and so several alterations neede to be made - I also had problems with the units that needed to be solved.

Plan for the weekend - Have a good go at a flyer and sort out some logo issues - start planning the incorpoation of the part models into the EA.#

15/06/08

Did some research into the flyer design - looked at some possible layouts, will start drafting up a flyer asap.

16/06/08

Below is some system architecture for communicating with the various repositories for (each part) {

getPartType //From the Parts Repository getPartID //From the Parts Repository getPartModel(Part ID, PartType) //Based on PartType and PartID - select a new model of same type based on ID and type getInteractionModel(PartID) //Retrieve interaction model and use to evaluate constraints if (Value is better) {	change part } else (select a different part) }Workbench architecture

Recieve request for

newEA  newPartSet(Part details)   newWiring   newFitness   newPopulationSize   runNewEA   requestFitnessValue(generation n)   requestCurrentGenerationNumber(n)   stopEA   requestFinalFitnessValue   requestCompositeModelSet and return to workbench

newEAID  fitnessValue(generation n)   currentGenerationNumber   finalFitnessValue   compositeModelWe have a tutorial with Matt that we can use to go through how to construct architecture.

17/06/08

Continued working on system architecture and designing our own

18/06/08

Again worked on system architecture, also had a meeting with Jen - got the different system modules to send stuff to each other.

19/06/08

Wrote up the presentation for the meeting this afternoon