 |
Chapter 2
Assembling Your Experimental Data
If you are working on a modeling project based on data
collected by someone else, this first section is addressed to
you. There are several different kinds of information you will
want to collect from whatever sources are available to you.
- What measurements were made?
- What protocols were followed, and what perturbations were
applied?
- What constraints are available?
Examine the available information very carefully, paying
particular attention to numbers. It's often useful to construct
tables, grouping the available information in ways that make
sense to you. When you add a number to one of your tables, be
sure to include the units of measurement. The units you record in
the tables should begin to suggest the most convenient set of
units to use in your analysis. No matter what units you choose to
use in the internal calculations of your kinetic model, we
strongly suggest that you present the observed variables in the
same units used by the experimentalist. Experience has shown that
this promotes communication and understanding between
experimental and theoretical investigators. Dimensionless
quantities are frequently constructed by theoreticians in an
effort to identify characteristic features of a biological
system, but while these may make your point effectively in a
theoretical audience, they are among the most powerful soporifics
ever discovered for an audience of biologists.
As you fill in your data tables, you may be tempted to choose
a consistent set of units for your modeling work, but we suggest
that you delay this decision until you have chosen the basis for
your model, as described in the next chapter. For modeling
projects that are based on published data, the Methods section of
the published paper is an essential source of detailed
information. You should go through this section with a magnifying
glass (especially for those journals which relegate Methods to
miniprint). Useful numbers, often found here, include
temperatures at which the various experiments were performed,
molecular weights of relevant reagents and biomolecules,
composition and pH of buffers and media, Km values for enzymes
and membrane transport systems, equilibrium (binding) constants
for receptor-ligand interactions, concentrations of physiological
and pharmacological agents used in the experimental protocols,
and, importantly, the timing of the experimental protocols. Add
all such information to your data tables.
Recommendation: As you enter numbers in your data tables,
annotate each number with the source of the information.
Initially, you may find this time-consuming and irksome, but
experience suggests that you will find many occasions when you
want to be able to cite the source of your information. Common
questions you will ask yourself during the modeling process are,
"Am I sure I have the right value for this number" or
"What was the context in which this number was
reported?" Even if you never need to look it up yourself,
another scientist is very likely to be skeptical about a given
number and journal reviewers often ask how one or another number
was obtained. Time course data, of course, form the focus of your
modeling work. If you have access to the original data points as
they were recorded in a laboratory notebook, by all means take
advantage of your good fortune. Your goal is a columnar table
with time in the first column and the measured values of
experimentally determined quantities in succeeding columns. If
you have the original lab notebook, you can probably just copy
the appropriate pages, but if you have only a published paper,
you will have to read these numbers from published graphs. This
method has inherent weaknesses, notably the difficulty of reading
points that are close to either axis, and the substantial
possibility that errors have crept into the figure that were not
part of the original data set. Whenever possible, verify that
your data tables are correct by contacting the person who did the
experiments. Unless you can identify an internal inconsistency,
such as violation of conservation of mass, you cannot be expected
to detect such errors at a distance, but there are practical
techniques for improving the accuracy of the values you read from
published graphs. One we have often used is to enlarge the
published figure with the aid of a modern photocopy machine, and
then, using a millimeter scale, measure the x and y coordinates
of each point on the enlarged graph. To calculate the conversion
factors between your millimeter measurements and the actual
values, it is best to measure the entire length of the time axis
in mm. Then form the ratio of the number of seconds, minutes,
hours represented by the entire time axis to the number of mm you
measured. This ratio will have units of, say, seconds/mm, and can
be used to convert all of your x coordinates (measured in mm) to
seconds. A similar strategy can be applied to your experimental
measurements on the y axis. Two common situations complicate this
method of assigning numbers to data points. First, the published
graph may have "breaks" in the axis. These present no
problem unless the millimeter measurements are made without
taking them into account. Second, one or both axes of the
published graph may be plotted on a logarithmic scale. In this
case, you can measure the number of mm per decade and then form
the ratio of the measured mm for each data point to this number
of mm per decade. This tells you how many decades the point is
above the point you or the authors chose as the minimum on that
axis. To convert this number of decades to a real number, you
calculate the antilog of the number of decades, that is calculate
10n, where n is the number of decades multiply the resulting
factor by the minimum value on that axis, that is, the value that
is plotted at the origin of this axis the result is the desired
datum value
Another useful source of information is figure legends.
Indeed, for some journals, this is where all quantitative
information on methods will be found. Two sorts of numbers that
are often relegated to figure legends are normalization and
recovery information. An effective reviewer will insist that if a
graph is plotted as "percent of control" or
"fraction of initial value", then the figure legend
must contain a real number for the control or the initial value.
As a modeler, you will often need these numbers. Recovery
calculations are frequently required if you wish to convert
reported activities into in vivo quantities. For example, if an
enzyme activity is reported as nmol min-1 mg-1, you need to know
how many mg of "equivalent" protein are present in the
cells or tissue you are studying. Information on this is
frequently found in the figure legend. As a final resort, you may
find useful numbers in the articles cited in the bibliography. It
is prudent to treat these as tentative since you cannot be
certain they apply to the situation described in the primary
source.
If you or your colleagues collected the data you plan to
analyze, you are in the best possible situation. This is because
you have access to the raw experimental data, you can look up
forgotten details in the lab notebooks, you can determine
precisely how the experiments were carried out, and you are
keenly (perhaps even painfully) aware of the limitations of the
experiments and the resulting data. In this situation, your
biggest problem may be finding the required information in some
dusty filing cabinet in the least-used corner of the lab. Yours
is also the ideal situation because you have the option of using
what you learn from the model to design informative experiments,
and to complete the cycle of experiment - model - experiment -
model as many times as required to answer your particular
scientific question. This is the environment in which kinetic
modeling is most powerful. Also, if the data are your own, you
may find it helpful to convert to absolute numbers all numbers
expressed as "% of control" or "% increase".
The power of knowing the absolute value of a number has been
amply demonstrated in the field of physics, but remains largely
untapped in biology. Perhaps the most common example of this is
the situation in which a complete kinetic analysis is carried out
on the "%-of-control" data, and as the analysis nears
completion someone asks to see the model predictions expressed in
absolute terms. It is very likely that the model which accounts
beautifully for the "%-of-control" data will fail
miserably in accounting for the pre- and post-stimulus steady
states when these are expressed in the same units as the
transient data.
Recording Experimental Protocols
In parallel with your data tables, it's essential to construct
a timeline for each experimental protocol you wish to impose on
your kinetic model. A simple example is shown in the figure
below.

Quantitative information about the protocol, such as the rate
of infusion, the concentration of phenylephrine in the infusate,
and the concentration of the alpha blocker serves as an extremely
useful annotation of your protocol timeline.
The key questions that should be answered by this timeline
are:
- What was done?
- When was it done?
- How long did it last?
Many experimental designs impose delays between the onset of
a perturbation and the time when the perturbation is first sensed
by the cells in the living system. As a kineticist, you must
remain alert to this possibility. If you suspect such delays, you
will need to include them in your model. It is essential that
such experimental delays are not rate limiting for the measured
dynamics. Control experiments can and should be designed to
insure that it is the biological system, not the experimental
setup, that determines the onset and relaxation dynamics seen in
the experimental data.
An example of how the experimental setup limits the dynamic
resolution of the experiment is provided by the common situation
in which cells are growing in a perfused chamber. Typically, the
perfusate is changed by turning a stopcock so that the perfusate
now contains the hormone or other perturbation being studied. You
must first consider the transport delay between the stopcock and
the cell chamber, but even if this is negligible you must still
be certain that the mixing time of the chamber is rapid. A useful
rule of thumb is the 70% rule which says the half-time for mixing
the chamber is 0.7V/F, where V is the volume of the chamber and F
is the flow rate of the perfusate. A half-time is the time
required to achieve half the final hormone concentration in the
chamber. If your experimentally determined biological measurement
has a half-time close to the one calculated by the 70% rule, you
should consider the possibility that the observed dynamics are
caused by the experimental setup, not by the cellular response.
The 70% rule works because F/V (which has units of time-1) is the
first order rate constant, k, characterizing chamber mixing. To
find the time when 1-exp(-kt) reaches half its final value, solve
exp(-kt) = 0.5 for t. The solution (which is the half-time) is k
multiplied by ln2, and since the natural logarithm of 2 is 0.693
the 70% rule works well.
If you have data for many protocols, you should expect to be
able to extract a great deal of mechanistic information from your
data. The time required to carry out the modeling task is not a
quadratic function of the number of data sets, but it is more
than a linear function of the number of data sets. Consequently,
you may want to concentrate on one or two informative protocols
for your first modeling effort. This raises the question of how
to identify informative protocols. First, they should provide
"complete" dynamics. This means that data are available
on the pre-stimulus value of the measured quantity, and that
after a perturbation is applied, measurements are made until a
new steady state is achieved. For tracer experiments, it is
desirable that data be recorded until radioactivity is only twice
the background level. Second, informative protocols should be
related to one another in the sense that each involves variables
that are part of the theory or hypothesis you wish to test. We
will have more to say on this topic when we take up the
construction of rigorous symbol and arrow diagrams in the next
chapter.
Beginning Your Assumption List
Kinetic modeling is no different from the rest of scientific
endeavor in it's dependence on assumptions. But one of the
significant benefits of the modeling process is that your
assumptions are made explicitly. As the chief modeler on your
project, you are responsible for knowing what assumptions are
built into your analysis. A good way to carry out your
responsibility is to maintain an assumption list. This task is
often avoided by beginners because they have been trained to
believe that assumptions are weaknesses. You may hope, for
example, that if you do not reveal your assumptions, your
reviewers (who are all pressed for time) will miss them. This is
self-defeating because it is much more likely that an unstated
assumption will come back to haunt you than that it will
come back to haunt your reviewer. Furthermore, when things are
going poorly in a modeling project, an explicit assumption list
serves as an excellent source for possible explanations. Just as
in human affairs, it is the unconscious or unstated assumption
that leads to trouble. Start your assumption list before you are
emotionally invested in a particular point of view. An effective
way to do this is to list the assumptions made by the
experimentalist who produced the data you plan to analyze. This
is harder, but just as valuable, if you are the experimentalist.
Some of these assumptions will have the character of laws; they
are assertions that any reasonable colleague would make if he or
she were in your shoes. But what about the unreasonable
colleague? In the long run, if not the short, you serve yourself
best by highlighting your weakest assumptions. Certainly on a
personal assumption list, you should not hesitate to bare your
craziest assumptions. At the very least, your readers or
listeners may suggest a measurement you could make to
substantiate your assumption. At best, you may find that your
assumption is known to be correct based on published work of
which you were unaware.
Look for Applicable Constraints
A final class of information that can be invaluable as your
modeling project proceeds is constraints that can be enforced as
you build your model. A constraint is an absolute rule whose
truth is unquestioned and which can be expressed in the language
of mathematics. These have a way of being specific for individual
projects, but there are a few common ones that are frequently
useful. First, if your experimental system is closed, such as a
test tube, or a Petri dish, or a culture flask, you may find it
useful to impose conservation of mass as a constraint. Second, if
your experimental system is open, in the sense that it exchanges
material with its environment, you will want to know if the
system is in a steady state at the beginning of your experiment.
This information is particularly valuable if you are carrying out
a tracer experiment. Another constraint that can be enormously
helpful is direct measurement of the initial values of any of the
variables that appear in your theory of how the system works.
These may already be recorded in your data tables; if so, go back
and highlight them.
Armed with your data tables, your protocol timelines, your
assumption list, and your constraints you are ready to embark on
the process of constructing a rigorous symbol and arrow diagram
for your system. That is the subject of the next chapter.
|  |