ProcessDB User Guide Wiki    Computational Cell Biology Textbook  

Chapter 2

Assembling Your Experimental Data

If you are working on a modeling project based on data collected by someone else, this first section is addressed to you. There are several different kinds of information you will want to collect from whatever sources are available to you.

  • What measurements were made?
  • What protocols were followed, and what perturbations were applied?
  • What constraints are available?

Examine the available information very carefully, paying particular attention to numbers. It's often useful to construct tables, grouping the available information in ways that make sense to you. When you add a number to one of your tables, be sure to include the units of measurement. The units you record in the tables should begin to suggest the most convenient set of units to use in your analysis. No matter what units you choose to use in the internal calculations of your kinetic model, we strongly suggest that you present the observed variables in the same units used by the experimentalist. Experience has shown that this promotes communication and understanding between experimental and theoretical investigators. Dimensionless quantities are frequently constructed by theoreticians in an effort to identify characteristic features of a biological system, but while these may make your point effectively in a theoretical audience, they are among the most powerful soporifics ever discovered for an audience of biologists.

As you fill in your data tables, you may be tempted to choose a consistent set of units for your modeling work, but we suggest that you delay this decision until you have chosen the basis for your model, as described in the next chapter. For modeling projects that are based on published data, the Methods section of the published paper is an essential source of detailed information. You should go through this section with a magnifying glass (especially for those journals which relegate Methods to miniprint). Useful numbers, often found here, include temperatures at which the various experiments were performed, molecular weights of relevant reagents and biomolecules, composition and pH of buffers and media, Km values for enzymes and membrane transport systems, equilibrium (binding) constants for receptor-ligand interactions, concentrations of physiological and pharmacological agents used in the experimental protocols, and, importantly, the timing of the experimental protocols. Add all such information to your data tables.

Recommendation: As you enter numbers in your data tables, annotate each number with the source of the information. Initially, you may find this time-consuming and irksome, but experience suggests that you will find many occasions when you want to be able to cite the source of your information. Common questions you will ask yourself during the modeling process are, "Am I sure I have the right value for this number" or "What was the context in which this number was reported?" Even if you never need to look it up yourself, another scientist is very likely to be skeptical about a given number and journal reviewers often ask how one or another number was obtained. Time course data, of course, form the focus of your modeling work. If you have access to the original data points as they were recorded in a laboratory notebook, by all means take advantage of your good fortune. Your goal is a columnar table with time in the first column and the measured values of experimentally determined quantities in succeeding columns. If you have the original lab notebook, you can probably just copy the appropriate pages, but if you have only a published paper, you will have to read these numbers from published graphs. This method has inherent weaknesses, notably the difficulty of reading points that are close to either axis, and the substantial possibility that errors have crept into the figure that were not part of the original data set. Whenever possible, verify that your data tables are correct by contacting the person who did the experiments. Unless you can identify an internal inconsistency, such as violation of conservation of mass, you cannot be expected to detect such errors at a distance, but there are practical techniques for improving the accuracy of the values you read from published graphs. One we have often used is to enlarge the published figure with the aid of a modern photocopy machine, and then, using a millimeter scale, measure the x and y coordinates of each point on the enlarged graph. To calculate the conversion factors between your millimeter measurements and the actual values, it is best to measure the entire length of the time axis in mm. Then form the ratio of the number of seconds, minutes, hours represented by the entire time axis to the number of mm you measured. This ratio will have units of, say, seconds/mm, and can be used to convert all of your x coordinates (measured in mm) to seconds. A similar strategy can be applied to your experimental measurements on the y axis. Two common situations complicate this method of assigning numbers to data points. First, the published graph may have "breaks" in the axis. These present no problem unless the millimeter measurements are made without taking them into account. Second, one or both axes of the published graph may be plotted on a logarithmic scale. In this case, you can measure the number of mm per decade and then form the ratio of the measured mm for each data point to this number of mm per decade. This tells you how many decades the point is above the point you or the authors chose as the minimum on that axis. To convert this number of decades to a real number, you calculate the antilog of the number of decades, that is calculate 10n, where n is the number of decades multiply the resulting factor by the minimum value on that axis, that is, the value that is plotted at the origin of this axis the result is the desired datum value

Another useful source of information is figure legends. Indeed, for some journals, this is where all quantitative information on methods will be found. Two sorts of numbers that are often relegated to figure legends are normalization and recovery information. An effective reviewer will insist that if a graph is plotted as "percent of control" or "fraction of initial value", then the figure legend must contain a real number for the control or the initial value. As a modeler, you will often need these numbers. Recovery calculations are frequently required if you wish to convert reported activities into in vivo quantities. For example, if an enzyme activity is reported as nmol min-1 mg-1, you need to know how many mg of "equivalent" protein are present in the cells or tissue you are studying. Information on this is frequently found in the figure legend. As a final resort, you may find useful numbers in the articles cited in the bibliography. It is prudent to treat these as tentative since you cannot be certain they apply to the situation described in the primary source.

If you or your colleagues collected the data you plan to analyze, you are in the best possible situation. This is because you have access to the raw experimental data, you can look up forgotten details in the lab notebooks, you can determine precisely how the experiments were carried out, and you are keenly (perhaps even painfully) aware of the limitations of the experiments and the resulting data. In this situation, your biggest problem may be finding the required information in some dusty filing cabinet in the least-used corner of the lab. Yours is also the ideal situation because you have the option of using what you learn from the model to design informative experiments, and to complete the cycle of experiment - model - experiment - model as many times as required to answer your particular scientific question. This is the environment in which kinetic modeling is most powerful. Also, if the data are your own, you may find it helpful to convert to absolute numbers all numbers expressed as "% of control" or "% increase". The power of knowing the absolute value of a number has been amply demonstrated in the field of physics, but remains largely untapped in biology. Perhaps the most common example of this is the situation in which a complete kinetic analysis is carried out on the "%-of-control" data, and as the analysis nears completion someone asks to see the model predictions expressed in absolute terms. It is very likely that the model which accounts beautifully for the "%-of-control" data will fail miserably in accounting for the pre- and post-stimulus steady states when these are expressed in the same units as the transient data.

Recording Experimental Protocols

In parallel with your data tables, it's essential to construct a timeline for each experimental protocol you wish to impose on your kinetic model. A simple example is shown in the figure below.

Quantitative information about the protocol, such as the rate of infusion, the concentration of phenylephrine in the infusate, and the concentration of the alpha blocker serves as an extremely useful annotation of your protocol timeline.

The key questions that should be answered by this timeline are:

  • What was done?
  • When was it done?
  • How long did it last?

Many experimental designs impose delays between the onset of a perturbation and the time when the perturbation is first sensed by the cells in the living system. As a kineticist, you must remain alert to this possibility. If you suspect such delays, you will need to include them in your model. It is essential that such experimental delays are not rate limiting for the measured dynamics. Control experiments can and should be designed to insure that it is the biological system, not the experimental setup, that determines the onset and relaxation dynamics seen in the experimental data.

An example of how the experimental setup limits the dynamic resolution of the experiment is provided by the common situation in which cells are growing in a perfused chamber. Typically, the perfusate is changed by turning a stopcock so that the perfusate now contains the hormone or other perturbation being studied. You must first consider the transport delay between the stopcock and the cell chamber, but even if this is negligible you must still be certain that the mixing time of the chamber is rapid. A useful rule of thumb is the 70% rule which says the half-time for mixing the chamber is 0.7V/F, where V is the volume of the chamber and F is the flow rate of the perfusate. A half-time is the time required to achieve half the final hormone concentration in the chamber. If your experimentally determined biological measurement has a half-time close to the one calculated by the 70% rule, you should consider the possibility that the observed dynamics are caused by the experimental setup, not by the cellular response. The 70% rule works because F/V (which has units of time-1) is the first order rate constant, k, characterizing chamber mixing. To find the time when 1-exp(-kt) reaches half its final value, solve exp(-kt) = 0.5 for t. The solution (which is the half-time) is k multiplied by ln2, and since the natural logarithm of 2 is 0.693 the 70% rule works well.

If you have data for many protocols, you should expect to be able to extract a great deal of mechanistic information from your data. The time required to carry out the modeling task is not a quadratic function of the number of data sets, but it is more than a linear function of the number of data sets. Consequently, you may want to concentrate on one or two informative protocols for your first modeling effort. This raises the question of how to identify informative protocols. First, they should provide "complete" dynamics. This means that data are available on the pre-stimulus value of the measured quantity, and that after a perturbation is applied, measurements are made until a new steady state is achieved. For tracer experiments, it is desirable that data be recorded until radioactivity is only twice the background level. Second, informative protocols should be related to one another in the sense that each involves variables that are part of the theory or hypothesis you wish to test. We will have more to say on this topic when we take up the construction of rigorous symbol and arrow diagrams in the next chapter.

Beginning Your Assumption List

Kinetic modeling is no different from the rest of scientific endeavor in it's dependence on assumptions. But one of the significant benefits of the modeling process is that your assumptions are made explicitly. As the chief modeler on your project, you are responsible for knowing what assumptions are built into your analysis. A good way to carry out your responsibility is to maintain an assumption list. This task is often avoided by beginners because they have been trained to believe that assumptions are weaknesses. You may hope, for example, that if you do not reveal your assumptions, your reviewers (who are all pressed for time) will miss them. This is self-defeating because it is much more likely that an unstated assumption will come back to haunt you than that it will come back to haunt your reviewer. Furthermore, when things are going poorly in a modeling project, an explicit assumption list serves as an excellent source for possible explanations. Just as in human affairs, it is the unconscious or unstated assumption that leads to trouble. Start your assumption list before you are emotionally invested in a particular point of view. An effective way to do this is to list the assumptions made by the experimentalist who produced the data you plan to analyze. This is harder, but just as valuable, if you are the experimentalist. Some of these assumptions will have the character of laws; they are assertions that any reasonable colleague would make if he or she were in your shoes. But what about the unreasonable colleague? In the long run, if not the short, you serve yourself best by highlighting your weakest assumptions. Certainly on a personal assumption list, you should not hesitate to bare your craziest assumptions. At the very least, your readers or listeners may suggest a measurement you could make to substantiate your assumption. At best, you may find that your assumption is known to be correct based on published work of which you were unaware.

Look for Applicable Constraints

A final class of information that can be invaluable as your modeling project proceeds is constraints that can be enforced as you build your model. A constraint is an absolute rule whose truth is unquestioned and which can be expressed in the language of mathematics. These have a way of being specific for individual projects, but there are a few common ones that are frequently useful. First, if your experimental system is closed, such as a test tube, or a Petri dish, or a culture flask, you may find it useful to impose conservation of mass as a constraint. Second, if your experimental system is open, in the sense that it exchanges material with its environment, you will want to know if the system is in a steady state at the beginning of your experiment. This information is particularly valuable if you are carrying out a tracer experiment. Another constraint that can be enormously helpful is direct measurement of the initial values of any of the variables that appear in your theory of how the system works. These may already be recorded in your data tables; if so, go back and highlight them.

Armed with your data tables, your protocol timelines, your assumption list, and your constraints you are ready to embark on the process of constructing a rigorous symbol and arrow diagram for your system. That is the subject of the next chapter.

Chapter 3