From 2008.igem.org

Part I Mutation Model

Hypothesis

The mutation efficiency from the initial gene to the target gene should remain a continuous curve along reproduction, or along time. Here we discrete the mutation rate to be linear function according to the replication time (), which in fact create several discontiguous point.
DNA lesion repair rate appears high enough in replication than other cell activities, so that the DNA lesion repair rate can be conformed to background mutations except in replication process.
The whole yeast is large enough to obey the statistics rules

Model Construction

Probability Model
For each DNA sequence status, we take down a probability matrix for the coding sequence, which gives out each base appearance possibility on every site. The structure on each site is 5*1 matrix,(P_a,P_t,P_c,P_g,P_u), and ΣP=1. Therefore, the whole sequence status is recorded as 5*length matrix.E.g. For the Gal4 DNA, with a length of 2646, one status is a 5*2646 matrix.
Scanning Model
The hotbox appearance rate on a specific site i is calculated as below.

W: a, t; R: a, g; H: a, c, g;

Hotbox appearrance rate P_H=P_W^(i-2)P_R^(i-2)P_C^(i-2)P_H^(i-2), and here (i) refers to the site i.

Model on DNA lesion repair
As reviewed by the paper, the U:G mismatch DNA lesion will be repaired in three pathways.Lacks of the knowledge on the preference of repair mechanism in yeast, we temporarily assume that mechanisms happen randomly.

VH Odegard, DG Schatz. Targeting of somatic hypermutation. Nat Rev Immunol. 2006 Aug;6(8):573-83

Copy Status distribution along replication
- Back ground mutations happen all the time in a fix rate.
- Hotspot mutation happens in the transcription process on the coding DNA, while the lesion repair happen at the replication.
- The figure below shows the whole process.S_i is a sequence distribution matrix.

After n times replication, Copy concentration proportion from S₁ to S_n+1 will be spread binomial expression of (1+1)_n, which is C_n⁰:C_n¹:_n²:...:C_nⁿ.

Result Calculation
To find a probability from the initial gene to the target gene. Consider the DNA of Gal4 with a length of 2646, the target gene is a₁a₂a₃a...a₂₅₄₅a₂₆₄₆.(Coding sequence)

P_a^(j) is the probability of the base a on the site j, and (P_a₁⁽¹⁾*P**P*...P**P*)_i means the calculation is based on the S_i status matrix.