          ProcessDB User Guide Wiki    Computational Cell Biology Textbook Chapter 4

## Translating Your System Diagram to Mathematics

Conservation of mass is the principle that underlies translation of your diagram to the language of mathematics. For each symbol in your diagram, you will write a differential equation, and the terms on the right hand side of the differential equation will correspond to the solid arrows that originate or terminate on that symbol. Consider the example of a specific messenger RNA (mRNA). Recalling that a compartment is a chemical species in a physical place, you want to answer the question: What are the determinants of the rate of change of mRNA (the chemical species) in the cytosol (the physical place)? The answer, as adumbrated in the previous chapter, is that all the processes (arrows) delivering mRNA to the cytosol or removing mRNA from the cytosol must be represented by terms on the right hand side of the differential equation you write. where m is the mass of mRNA in the cytosol, Fnpc is the flux of this particular mRNA through the nuclear pore complex, Fnuc is the flux of mRNA degradation to nucleotides by cytosolic nucleases, Fbind is the flux of mRNA binding to ribosomes, and Funbind is the flux of mRNA being released from ribosomes. This example was chosen because it includes all three types of biological processes: translocation of mRNA from the nucleus to the cytosol, transformation of mRNA to individual nucleotides by the action of a cytoplasmic nuclease, and binding of mRNA to cytoplasmic ribosomes. The differential equation is the mathematical statement of conservation of mass for this particular mRNA.

The flux terms on the right hand side of the equation can be considered as inputs and outputs. These fluxes are thus the determinants of all changes in cytosolic mRNA mass. Any change in one of these fluxes must result in at least a transient change in mRNA mass. As written here, the derivative on the left hand side of the differential equation has dimensions of mass per unit time. This means, of course, that the flux terms also have dimensions of mass per unit time.

Among biochemists, such differential equations are commonly written in terms of concentration. This makes good physical-chemical sense. Unfortunately, writing differential equations in terms of concentration produces several practical problems when building kinetic models of living systems. The most obvious of these is that the underlying principle is conservation of mass, not conservation of concentration. To see this difficulty more clearly, calculate what happens to the nuclear and cytosolic mRNA concentration if 20 copies of the mRNA are translocated from the nucleus to the cytoplasm. To make this comparison it is essential to know the relative volumes of the two cellular compartments; so if we make the reasonable approximation that cytosolic volume is 1 pl and nuclear volume is 0.1 pl, the change in nuclear mRNA concentration is -200 copies per pl while the change in cytosolic mRNA concentration is only +20 copies per pl. Our conclusion must be that whenever your system consists of two or more compartments with different volumes, it is prudent to write the differential equations in terms of mass, rather than concentration. It is, of course, possible to use concentration state variables, but your equations will be cluttered with a variety of volume ratios whose net effect is to recast the equations in mass terms.

Biological experiments are conventionally begun at a time when nothing in the system is changing. This is the condition of steady state or stationary state. To state this condition in precise terms, it is a state characterized by constancy of all the state variables, xi: where the inverted A means "for all".

Steady state is the essence of life. If a cell has the means to maintain a steady state, it lives; if not, it dies. It is critical to realize that the steady state does not imply or require that the fluxes are all zero. In the case of the mRNA system, for example, steady state is achieved whenever that is, whenever the inputs balance the outputs. Rarely do we have the means to prove that our biological system is in steady state. We simply cannot measure all of the state variables. Despite this obstacle, the steady state assumption is a nearly universal feature of modern biological investigation. Few scientists, however, can distinguish the concept of steady state from the more restrictive concept of equilibrium. If you are interested in the construction of biological kinetic models, you should be expert about this distinction.

Steady State Is Not Equilibrium

At equilibrium all chemical potential gradients are zero. Moreover, the principle of microscopic reversibility guarantees that, at equilibrium, the rate at which any process proceeds in the "forward" direction is exactly balanced by the rate of that process in the "reverse" direction. This holds for every individual process, and means that at equilibrium there can be no net flux through any enzyme, any ion channel, or any transport protein. This is easy to understand intellectually because a nonzero chemical potential difference is absolutely required to drive a net flux. But steady state and equilibrium are so commonly used as if they were synonyms, that you will likely have to think about these points for hours before you can parry the objections of scientists who are sure of their misinformation.

Translocation and transformation processes are rarely at equilibrium in living systems, but binding processes often are. If we accept, for the moment, the oversimplification that mRNA binding to ribosomes is typical of other binding processes, then a quantitative description of this process could be based on classical binding theory and the resulting equations could be used for a wide range of ligand-receptor interactions.

The equilibrium constant for a binding reaction is called the dissociation constant if the numerator of the equilibrium concentration ratio is chosen as the product of the unbound ligand and unbound receptor concentrations. In this case the denominator is the concentration of the ligand receptor complex. where KD is the dissociation constant, [m] is the equilibrium concentration of unbound mRNA, r is unbound ribosomes, and mr represents the mRNA-ribosome complex. The units of a dissociation constant are concentration. If the binding process is viewed from the other side of the reaction, the equilibrium constant is referred to as an association constant and the corresponding equation is where KA is the association constant (units of concentration-1 ) and the other symbols are as before.

A simplification, characteristic of biological systems, is that one of the reactants, usually the "receptor", is limiting. As a consequence, the binding process saturates when there are no unbound receptors remaining. If the total amount of "receptor" (in this case, ribosomes) is given the symbol, Btot, then conservation requires that Btot=[mr]+[r] or [r]= Btot-[mr]. Substituting this for [r] in the expression for KD and solving for [mr] yields the classical binding expression: which gives the amount of bound mRNA as a function of the "free", or unbound, mRNA concentration.

This is such a compact description of binding, that modelers are often tempted to use this equation to represent binding processes which they believe to be in "rapid equilibrium". The assertion of "rapid equilibrium" has to be evaluated in two parts. We have already explored what is meant by "equilibrium", so the question is what additional assumption is conveyed by adding the word, "rapid." What is usually meant is that the processes represented by Fbind and Funbind are so fast that reasonable transients in the concentration of ligand never displace the binding reaction from it's equilibrium point. In terms of our example, this means that a sudden increase in transcription of the gene that codes for our mRNA, and the resulting increase in translocation of this mRNA from the nucleus to the cytoplasm will cause an increase in [m] that will result instantaneously in an increase in [mr] that always keeps the ratio of [m][r]/[mr] constant and equal to the KD.

Thus, the system of equations that would be written is  Notice that the two fluxes, Fbind and Funbind, are missing from the differential equation. Instead, the binding process is embodied in the algebraic binding equation, and these equations can be solved simultaneously to give m(t) and mr(t).

Unfortunately, these equations fail to enforce conservation of mass on the dynamic system. You can see this by imagining the consequences of a sudden increase in the number of ribosomes, that is an increase in Btot. The calculated amount of bound mRNA, [mr], will increase immediately since [mr] is proportional to Btot. But where does this increase in bound mRNA come from? The disturbing answer is that the increase does not come at the expense of free mRNA. In fact, the increase arises from mathematical "thin air." Because there is no term in the differential equation for m that represents loss of mRNA to the bound state, the equations are "unaware" that free m must decline. Clearly, this violates conservation of mass.

There are two methods for solving this problem. The first is to write out the complete system of differential equations including the Fbind and Funbind terms. This enforces conservation of mass automatically, but can lead to stiff differential equations if the binding and unbinding processes are much faster than the other processes in the system. We will have more to say about stiff systems later.

The second is to write the differential equation for total mRNA (call it M) in the cytosol rather than for free mRNA in the cytosol. The equations are then   In effect, this method uses the differential equation to keep track of the total amount of mRNA, and uses the equilibrium binding equation to distribute this total between the bound and unbound forms. The two algebraic equations can be solved simultaneously for m and mr, but the result is a fairly involved quadratic of the kind that professors always seem to leave as an exercise for the student.

I got and I would love to be corrected or corroborated by my students.

Experience suggests that, whenever possible, you should write out and solve the complete system of differential equations. Numerical methods for the solution systems of nonlinear differential equations are much more efficient and robust than methods for the solution of systems of nonlinear algebraic equations.

Updating Your Assumption List

If you assume that your biological system is in steady state during all or part of your experiment, a statement to this effect should be added to your assumption list. If you assume that a binding process is at equilibrium throughout your experiment, this should be made explicit as well.

Translation Versus Interpretation

The process of translating a working hypothesis to mathematics is, perhaps, better seen as interpretation. The process of translation can be carried out by anyone with a language1-to-language2 dictionary, but the result is nearly worthless. Interpretation, as it is done by the unseen experts at the United Nations General Assembly, requires that the interpreter be expert in the idioms and culture of both the speaker and the listener. Your job is equivalent; you must be expert in the idioms and culture of both physical chemistry and biology.

Recognizing this as the ideal situation, we can embark on a journey that teaches this wisdom by the method of John Dewey. His maxim: "We learn what we do."

A Step-by-Step Procedure for Translation
1. Define Your System Boundary in Quantitative Terms
2. Formulate Your Rate Laws
3. Add Regulation and Control to Your Rate Laws
4. Link Your State Variables to Experimental Measurements

Step 1: Define Your System Boundary in Quantitative Terms

In reality, no system variable is ever constant, but for practical purposes many system variables may be treated as constant during the experimental protocols you choose to impose. Constancy, whether measured or assumed, constrains your model and helps define the boundary of your system.

Some common examples are: constant

• temperature
• pressure
• pH
• osmolality
• enzyme concentration
• PO2
• volume

These features of the system do not define the physical boundary of the system, but they are part of the boundary nonetheless. This is because they define parts of the system for which you will not write differential equations. Sometimes the actual value of the constant is known. If so, you convert the value to consistent units and enter the result in the appropriate part of your model file. If the value is not known, you add it to the parameter list if you expect the experimental data to contain information on its value, or you assume a value and add this assumption to your Assumption List.

When a system variable has a known time course during your experimental protocol, it is usually used as a constraint on model structure by requiring that the model's differential equations yield the observed time course. But if the variable whose time course is known lies on the boundary of your model, then we frequently impose the constraint by using a forcing function. In the context of kinetic modeling, a forcing function is either a time series of data points or an algebraic function of time that serves as a look-up table supplying the variable's value whenever the model solution requires it. When a series of data points is used, they are usually connected by linear interpolation, so that values of the variable are available to the numerical integrator at times when no observation was made. Alternatively, the data points may be fitted to an algebraic function, such as a power series in t, and this function can supply the required values at any required time, t.

Occasionally, forcing functions are used in place of system variables whose differential equations are part of the model. This is done to decouple the system so that related parts of an unfinished model can be developed in parallel. The advantage of using forcing functions is that each part of the model is developed using "correct" (as defined by the experimental data) inputs, rather than the "incorrect" inputs that result from solving the differential equations of a model still under development. As the modeling process converges on a consistent model, the forcing functions are removed, and the model's differential equations take over as suppliers of inputs to the rest of the system.

In summary, the boundary of your dynamic model is defined by specifying those system variables for which you will not write differential equations. This boundary is quantified by specifying the constant numerical values of these variables, or by providing an explicit function of time to be used by the model whenever a value for the variable is required.

### Step 2: Formulate Your Rate Laws

Rate laws are the heart of kinetic modeling. A rate law is an algebraic expression which can be evaluated to give the flux of a given chemical species through a given process. The term flux is used here to mean the mass (of the chemical species) per unit time passing or moving through the process. Typical units of flux are fmol/sec or umol/min. Since most biological processes are mediated by proteins, most rate laws are quantitative statements of the function of one or another protein. In some biological disciplines, the word flux is reserved for a normalized quantity, namely, the mass per unit time per unit cross-sectional area. This is especially useful in studies of membrane transport. In this case, typical units are pmol min-1cm-2.

Writing rate laws requires careful application of physics, chemistry, and scientific imagination. A fundamental tenet of chemical kinetics is that the probability that a process proceeds or that a reaction takes place depends on the frequency of molecular collision of the reactants. Moreover, collision frequency is proportional to the product of the concentrations of the reactants since the probability that a molecule of each reactant is present in a given microvolume increases with its concentration in the bulk solution.

Hint concerning units: In solution, of course, concentration is measured as mass per unit volume. Typical units are pmol/liter or mmol/liter. SI units no longer recognize the simpler, but frequently misused, concept of Molar. Nevertheless, the symbol, M, frequently appears in the biological literature and means 1 mole/liter. Thus, 1 pmol/liter = 1 pM. The symbol, M, must never be used to signify moles. A concentration given as 1 pM/liter is nonsense.

Concentration, however, is not the only physical determinant of collisions. If two solutions have the same concentrations of the reactants, it is still possible for one of these solutions to produce many more collisions per unit time than the other. This could happen for two reasons; one simple, one not. First, the simple reason: there might be more of one solution than the other; more collisions will take place because there are more molecules of both reactants available. This approach to increasing the flux through a given process is adopted in essentially all multicellular organisms. By synthesizing a molecule, say insulin, in many pancreatic beta cells rather than in only one, the organism can produce insulin at a rate proportional to the number of working beta cells as well as proportional to the concentrations of the required substrates.

A less obvious means of increasing collision frequency (and therefore flux), without increasing concentration, is to increase the kinetic energy of the reactants by increasing the temperature. This works is two ways: 1) increasing the temperature increases the number of collisions that occur per unit time because the reactants travel at greater velocities between collisions so the next collision will occur sooner, and 2) increasing the temperature increases the number of collisions between reactants with energies sufficient to drive the reaction. These collisions may be referred to as successful collisions.

Combining these ideas yields the fundamental rate law: where Flux is the mass of product formed per unit time (mol sec-1 ), A and B are the concentrations (mol/liter) of the two reactants, fc(T) is the temperature dependence (sec-1 M-1 )of the frequency of collision, S is the probability (unitless) that the spatial arrangement of the reactants at the moment of collision is appropriate for initiating the reaction, Fa(T) is the fraction (unitless) of the reactant molecules having sufficient energy to yield a successful collision, and V is the volume (liters) in which the reaction takes place.

Temperature increases fc only linearly so a 10o change will typically alter fc by less than 5%, but the temperature dependence of Fa is much more dramatic. You can see this by starting with the concept of activation energy. The activation energy, Ea, of the reaction is the minimum free energy the reactants must have to initiate a successful collision. We know from the theoretical work of Maxwell and Boltzman that the fraction of molecules having energy greater than or equal to Ea is an exponential function of temperature, namely This means that the flux will approximately double, for an activation energy in the usual range of 6000 to 15,000 cal/mol (6 to 15 kcal/mol), if the temperature is increased by 10o. Since values of RT are about 0.54 kcal/mol at 0 oC, 0.58 kcal/mol at 20 oC , and 0.62 kcal/mol at 37 oC, values of Fa are very small numbers in the range of 10-8 to 10-7. Nevertheless, because of the exponential dependence, Fa changes by significant factors over this small range of RT. Consequently, you must expect changes in reaction rates when temperature changes. If, on the other hand, temperature is constant, the flux equation becomes where k(T) is the second order rate constant of the reaction, and is itself frequently written as the Arrhenius equation: formulated empirically in 1889, well before the development of the collision theory by Eyring in 1935. Here, KA is the Arrhenius constant that lumps together all the factors that either do not depend on temperature or depend on it only weakly.

Exercise to get a feel for the magnitude of KA: You have found that the rate constant for a given reaction is 0.1 sec-1. What is the value of KA if the activation energy of the reaction is 10 kcal/mol and the temperature is 20 oC?

Your choice of rate law is determined, in part, by your knowledge (or assumptions) concerning which of the chemical species involved are changing with time. Indeed, if you believe that one participating species is present in such large quantities, that its concentration can be assumed constant, then this species becomes part of the boundary of your system, and its constancy becomes one of your assumptions.

### Absolute values and choosing the units for your States

Units have fascinated and tortured me for 44 years. From freshman physics at MIT where we amused ourselves calculating the speed of light in furlongs per fortnight (life was simpler then), to the present day when Im wordlessly tormented by famous biologists who blithely speak of moles and molar as if they were synonyms or, worse, by experimentalists who have no interest at all in real absolute values of the things they measure. Oh, thats not important, they say. Were only interested in differences here.

Years ago, in a fit of cynicism, I gave up complaining about axes labeled percent of control or fold-increase or even amplitude. But having become part of the system, a card carrying biologist, I find I cannot resist fighting a sort of guerrilla war to unseat normalized values and put units on their rightful throne. I love biology, and I am a biologist, but Im quite sure we cannot reach the goal of understanding human biology well enough to solve complex diseases without understanding and using units well.

Engineers and physicists and chemists do not need convincing. They already wince when reminded of a multi million dollar mission to Mars that was lost because a single number was not converted from English to metric units. If you have read this far, you must have begun to suspect that biology is too complex to be comprehended without mathematical and computational help. But if you finish this book and go back to publishing percent of control graphs without at least thinking to yourself, It would be better to have absolute values here, I will have failed in an important respect.

What do I mean by an absolute value? I mean a number with units and a numerical value I can trust. It doesnt have to have 15 significant figures, like the speed of light in a vacuum or the gas constant. In fact, if its accurate to within 15 or 20 percent, Ill be endlessly grateful, but we biologists have forgone accuracy and seem to worship at the altar of precision or reproducibility instead.

Everyone aims for reproducibility. Exquisitely tight error bars abound. But only transplanted physicists and engineers seem willing to invest the time, and it can take a lot of time, to make a measurement that is not merely proportional to the quantity of interest, but actually is the quantity of interest.

Obviously, if you only have a measurement that is merely proportional to the real value, normalizing or forming a ratio is an excellent strategy. The unknown constant of proportionality disappears in the magic of long division. and you reap the added benefit that even if you had no idea what the correct units were, they have become immaterial since the ratio is both unitless and beyond criticism. No wonder people like ratios!

In other words, reproducibility is no guarantee of accuracy. A wonderfully reproducible measurement can be inaccurate by orders of magnitude if the person making the measurement makes the same error every time.

There are several ways you can convince yourself that this is a common problem. First, go to your favorite biological journal and look at the vertical axes of 100 graphs in 100 different published papers. Somewhere between 70 and 90 of those graphs will plot a normalized variable or a ratio of experimental to control. Absolute values are rare; ratios are rampant.

A second way to convince yourself that this is a serious issue is to search through the biological literature for the intracellular concentration or abundance of your favorite protein. The web search is straightforward: type your proteins name and molecules per cell in the search box.

What you will find, I think, is that reported values (even in the same cell type) commonly differ by factors of 5 and sometimes differ by orders of magnitude. Can you imagine any other human endeavor where the available measurements of important variables differ to this extent and yet do not provoke serious doubts about methodology?

There are individual investigators and even entire sub-fields of biomedical research for whom accuracy is a high priority, but many of us, recognizing that its harder and costlier to make absolute measurements, choose the easier and widely sanctioned course. We normalize measurements to their control values or their peak values or to a housekeeping gene, or to some other measurement that makes every subsequent measurement relative.

### Problems caused by arbitrary scaling of states

Why not? you might ask.

What is really wrong with this approach?

There are two important problems: one theoretical and one practical.

The theoretical problem is that whenever you make two measurements and then report only the ratio of the two, you lose information. Suppose you are the GM of a major league baseball team. Youre considering two minor league prospects; player A has a batting average of .333 and player B a batting average of .310. Would you really want to make your decision without knowing that A had only 3 at-bats and B had 500? A genie emerges from your great-great grandfathers oil lamp and offers you one of two rewards for setting him free: one is 50% gold and the other is 10% gold. Would you really want to choose without knowing that the first is a ring and the second is a castle?

The practical problem is more subtle, but its implications are profound, especially for modelers of biological systems. It arises when a modeler decides to express all his/her system variables as normalized or scaled quantities.

Chemical engineering has a long and distinguished tradition of expressing its equations in unitless scaled variables. This has always been attractive to mathematicians and software engineers for whom the speed of light is c. They have little interest in whether your speed of light is in meters per second or furlongs per fortnight. They simply, and reasonably, assume you will not presume to apply their formulas without using consistent units for all the quantities involved. Personally, I agree that non-dimensionalized equations are more general, but I find them far less intuitive, so I dont recommend them. But the success of non-dimensionalization hinges on a very careful procedure, and I fear that the prominence of this approach has led to the potentially dangerous practice of arbitrary normalization of state variables.

What do I mean by dangerous? Lets consider a simple biological subsystem:

Shown here is a simple prototype biochemical reaction regulated by a modifier molecule. The modifier is itself regulated by binding to a partner molecule forming the modifier:binding partner complex. What I want you to think about is that choosing to normalize state variables in an arbitrary way, such as all concentration variables are scaled to be dimensionless numbers of order 1, can have undesirable consequences. In particular, if the molecular abundances of the modifier and its binding partner are very different, the ability of the modifier to control conversion of reactant to product will depend strongly on whether modifier >> (is much greater than) binding partner or binding partner >> modifier. The problem with scaling the state variables is that this potentially crucial dependence on relative abundance can never be observed.

You can test this for yourself. Create this model in your favorite ODE solver or use the public model, modulator binding in the ProcessDB database. Create a model realization of your own and specify the following initial conditions:

modifier = 100 molecules/cell

reactant = 10,000 molecules/cell

binding partner = 100,000 molecules/cell

modifier:binding partner complex = 0

product = 0

second order rate constant for modifier binding = 1E-5

rate constant for modifier:binding partner unbinding = 0.2

A simple rate law for the reaction will suffice:

reaction = k*reactant*modifier

k = 0.01

Solve this model. Save the solutions for modifier(t) and reaction(t). Then change the initial conditions so that

modifier = 100,000

binding partner = 100

This is just the opposite of the first solution. Solve it and compare the reaction(t) solution with the saved solution. The difference between solutions should leave no doubt in your mind that the model is very sensitive to modifier and binding partner abundances.

Now the next part of the argument aims to convince you that arbitrary scaling will lead to the wrong answer. You can test this idea either by simulation or by just thinking. Heres the thinking approach.

If the abundances of modifier and binding partner were scaled so that each was normalized to its initial value, then both initial conditions would be 1.0. I imagine it will be obvious to you that interchanging these scaled initial conditions will cause absolutely no change in the simulation results for reaction(t).

Consequently, arbitrary scaling of state variables cannot be recommended because there are many situations where it will yield the wrong answer.

One further biological example may help solidify this concept if you have ever made up a buffer. Buffers are basically weak acids with equilibrium constants near the pH you want to maintain. But if you make up your buffer with 1 nM HEPES instead of 10 mM HEPES, your buffer strength is totally inadequate and your solution is at risk for wild swings in pH. If you substitute H+ for modifier and weak acid for binding partner, then the above diagram would represent a pH-controlled reaction in a buffered solution. Obviously, if protons are far in excess of weak acid, the reaction will in all likelihood stop dead. Relative concentrations or relative abundances are important; arbitrary scaling should not be employed without careful consideration of these issues.

### A recommendation on units

Here, then, is a simple recommendation on units for States that will serve you well. For cellular systems where you are primarily or exclusively interested in intracellular processes, consider using molecular abundance (expressed in molecules/cell or copies per cell) instead of any volume based measurement of concentration such as pmol/liter or µM. For extracellular States it is often easier to use concentration units, but remember that you will want to be sure that all your rate laws yield fluxes in molecules/sec or molecules per some other unit of time or pmol/sec, especially if any of your processes are transport processes that move molecules from one physiological place to another.

Abundance is particularly useful when some of the models molecular species are (or are bound to) integral membrane proteins or lipids in the membrane bilayer. Some modelers will insist that the only acceptable measure of chemical activity for use in rate laws is concentration, but this is probably a historical bias based on chemical kinetics in a spectrophotometer cuvette.

Consider two rate laws for the same second order process: The first expresses the flux, P1, as a function of the molar concentrations of the reactants, [S1] and [S2]. The second expresses the same flux as a function of the abundances S1 and S2. Both are correct, so they must be equivalent. What must be true for this equivalence to be true? First we can express the concentrations as where S1 and S2 are the abundances of the two reactants in units of molecules/cell, V1 and V2 are the appropriate volumes of distribution (in units of liters per cell) for the intracellular compartments where these molecules are distributed and NA is Avagodros number, 6E23 molecules/mole.

So all that is required for equivalence of the two rate laws is that In other words, the only difference between using concentration and using abundance is the value of the rate constant. In return for this unfamiliar rate constant you avoid the nagging problem of never knowing the appropriate volumes of distribution for membrane proteins and lipids. By using abundances, you reduce the number of unknown parameters in your model and improve your quality of life.

Next, we consider how to write rate laws for the major classes of processes occurring in biological systems: binding, transformation and translocation.

Chapter 5   