Mutation Model

=Part I Mutation Model=

Hypothesis

 * The mutation efficiency from the initial gene to the target gene should remain a continuous curve along reproduction, or along time. Here we discrete the mutation rate to be linear function according to the replication time, which in fact create several discontiguous point.
 * DNA lesion repair rate appears high enough in replication than other cell activities, so that the DNA lesion repair rate can be conformed to background mutations except in replication process.
 * The whole yeast is large enough to obey the statistics rules

Model Construction

 * Probability Model For each DNA sequence status, we take down a probability matrix for the coding sequence, which gives out each base appearance possibility on every site. The structure on each site is 5*1 matrix,(Pa,Pt,Pc,Pg,Pu), and ΣP=1. Therefore, the whole sequence status is recorded as 5*length matrix.E.g. For the Gal4 DNA, with a length of 2646, one status is a 5*2646 matrix.
 * Scanning Model The hotbox appearance rate on a specific site i is calculated as below.
 * W: a, t; R: a, g; H: a, c, g;
 * Hotbox appearrance rate PH=PW(i-2)PR(i-2)PC(i-2)PH(i-2), and here (i) refers to the site i.


 * Model on DNA lesion repair As reviewed by the paper, the U:G mismatch DNA lesion will be repaired in three pathways.Lacks of the knowledge on the preference of repair mechanism in yeast, we temporarily assume that mechanisms happen randomly.
 * Copy Status distribution along replication
 * Back ground mutations happen all the time in a fix rate.
 * Hotspot mutation happens in the transcription process on the coding DNA, while the lesion repair happen at the replication.
 * The figure below shows the whole process.Si is a sequence distribution matrix.
 * After n times replication, Copy concentration proportion from S1 to Sn+1 will be spread binomial expression of (1+1)n, which is Cn0:Cn1:n2:...:Cnn.


 * Result Calculation To find a probability from the initial gene to the target gene. Consider the DNA of Gal4 with a length of 2646, the target gene is a1a2a3a...a2545a2646.(Coding sequence)
 * Pa(j) is the probability of the base a on the site j, and (Pa 1 (1)*Pundefinedundefined**Pundefinedundefined*...Pundefinedundefined**Pundefinedundefined*)i means the calculation is based on the Si status matrix.