A B E
|
|
Version:
1.0, November 2002
By Gordon Webster
gwebster@users.sourceforge.net |
Copyright © 2002, 2003 Gordon Webster, EMD Lexigen research Center, Bedford Campus,45A Middlesex Turnpike, Billerica, MA 01821 USA
Preparing your bioassay data in XML format
Estimating the Curve Midpoint (ED50)
Fitting the Four-Parameter Data Model
Problems Fitting the Four-Parameter Model
Fitting the Polynomial Data Model
General Tips and Tricks for Working With ABE
Appendix
A: The GNU General Public License (GPL)
ABE 1.0 (Copyright ã
2002, 2003 Gordon Webster, EMD Lexigen Research Center) is Open Source software
that is available at no charge and may be freely used and distributed under the
terms of the Gnu General Public License (GPL) as described in Appendix A
of this manual.
ABE 1.0 is supplied for free, as is, without
warranty or support of any kind. The author would be grateful to hear of any
bugs or problems that arise in the course of the (sensible) use of this
program, but makes no claims, implicitly or otherwise inferred, as to its
usability or the validity of the results of its use (see Appendix A for a full
disclosure of the GNU GPL licensing terms and conditions that apply to this
software).
In the event that the use of ABE contributes to any
scientific work that is published in any medium or presented to an audience in
any oral or visual format, the author would be grateful if users could cite
their use of ABE and the contribution that it made to their work. The following
citation should suffice in most cases:
Webster, G.D. (2002) ABE, A bioassay data analysis
package written in Python, EMD Lexigen Research Center, MA 01821 USA
Gordon
Webster
EMD Lexigen Research Center
November 2002
ABE is a fast and convenient program for modeling
bioassay data using either polynomials or a four-parameter model based upon the
standard, sigmoidal dose-response curve. ABE stands for Analysis
of Bioassay Experiments.
ABE is written in Python 2.2 and uses the Tkinter
graphic library to generate a friendly graphical user interface (GUI) for manipulating
and visualizing the data. As a result, the program should be capable of being
run on any platform with a standard Python 2.2 (or higher) distribution,
including Intel-based platforms running Windows or Linux as well as Apple
platforms running OSX.
For the data modeling, ABE uses the nonlinear
regression routines from the Scientific Python (SciPy) and Numeric
libraries as well as the polyfit function from Raymond Hettinger's Matfunc
module, for computing the fitted polynomial coefficients from the supplied
data. Functions for computing the derivatives of the fitted polynomials,
solving polynomial roots and estimating the initial parameters for the
nonlinear regression were added by the author and are included in the body of
the main Abe module. In addition to the libraries of the standard Python
distribution therefore, only the SciPy, Numeric and Matfunc
modules need to be packaged with the Abe module, for installation on
other platforms.
Given the lack of any current standard format for bioassay
data, ABE accepts bioassay data in XML (eXtensible Markup Language)
format since this has the virtue of being concise, unambiguous and
human-readable, and also because XML is rapidly becoming a standard for data
exchange. The input XML data is parsed using the functions and classes supplied
in the xml.parsers module that is included as part of the current,
standard Python distribution. No additional Python modules are therefore
required for reading and parsing data in XML format, making ABE more modular,
portable and flexible than it would be if a specialized data format unique to
this program were used. XML is also very easy to produce from the kind of
tabular data used in applications such as Microsoft Excelä.
ABE
is a work in progress, to which more features can and will be added as demand
requires and time allows. The current version (1.0) has sufficient
functionality to perform a useful analysis of experimental bioassay data,
offering a couple of standard methods for modeling the data and generating
fitted ED50 values. Future versions should include a more exhaustive
statistical analysis of the data model and the option to compare different
bioassays and calculate relative potencies (including parallel line analysis)
Basic XML (the only kind that ABE uses) is very
intuitive and easy to learn. If you are not familiar with it, there are many excellent
books available, as well as many free online references, introductions and
tutorials. ABE accepts bioassay data formatted in XML, according to the
following schema:
<bioassay id=’bioassay name’
x=’name
of x column’
y=’name
of y column’
err=’name
of error column’
columns=’column-1
column-2 column3 … column-n’>
<molecule
id=’name of molecule’>
<data> column-1-data column-2-data column-3-data …
column-n-data </data>
.
.
.
</molecule>
.
.
.
</bioassay>
The literal XML tag names and attributes that are
required are shown in bold, field entries to be supplied by the user are shown
in italics and a vertical column of periods indicates that an arbitrary number
of the last-described data block can be inserted at that point. The bioassay
data is described in a bioassay block, within which, an arbitrary number
of molecule blocks may occur, each of which contains an arbitrary number
of data records.
The bioassay tag must be the top-level XML tag in the
data file and should supply the attributes id, x, y, err
and columns, where x is the name of the data column containing
the x-axis data (normally the protein concentration), y is the name of
the y data column (normally the measure of the protein’s activity e.g. in
counts per minute), err is the name of the column containing the
standard errors and columns is a list of the column names in the order
that they occur in the data records. Note that this list is white space
delimited therefore, no white space should occur in each column name. For
example, a column can NOT be called protein conc, but protein.conc,
protein-conc and protein_conc are all valid column names.
Obviously, there should be the same number of column labels as there are
columns in each of the data records that follow. NB: The bioassay and
molecule id attributes are always inside quotes and therefore CAN
contain any amount of white space.
The molecule tag has only a single required
attribute, id, which uniquely identifies the molecule. Within the
molecule, any number of data records may be given. Each data record is also
read as a white space delimited list, therefore no field in the list should
contain any white-space e.g. in the data record:
<data> Protein 1 70.00 85323 1243.2 1028.5
978 134.6 </data>
The protein name field Protein 1 would
be read as two fields Protein and 1, even if it was
intended to be read as a single field according to the description supplied for
the columns attribute in the bioassay tag.
Any other valid XML components such as XML
comments or even other valid XML tags are allowed, but these will be ignored by
ABE. The data file must only contain valid XML components otherwise an error
will be generated when the data are parsed by ABE. One easy way to check your
files for XML compliance is to open them with an XML-compliant web browser such
as Microsoft’s Internet Explorerä 5.0. If your XML formatting is OK, the web browser
will display the data, formatted and color-coded according to its XML data
structure.
Tip: Make sure your XML data files have the file
extension .xml so that they can automatically be opened and viewed with a
(properly configured) web browser if you double-click on the file icons.
Remember, XML is just a meta-format for describing
data. Just because your file is valid XML, does not necessarily mean it will be
recognized by ABE unless you correctly include all of the tags and attributes
that ABE recognizes, as described above. A typical bioassay data file for input
to ABE, should look something like the data shown below (the pretty indents and
tabs are not necessary for XML but they make the XML easier to read and
understand for humans!)
<!-- This is my bioassay data in XML
format -->
<bioassay id='Assay 112602A'
x='protein.conc' y='avg.cpm' err='std.err'
columns='protein.conc
molecule avg.cpm std.dev percent.cv std.err'>
<molecule id="Protein 1">
<data> 70.0000 Protein_1 85323 2164 3 1082 </data>
<data> 35.0000 Protein_1 81292 5963 7 2981 </data>
<data> 17.5000 Protein_1 84020 4796 6 2398 </data>
<data> 8.7500 Protein_1 82278 3843 5 1921 </data>
<data> 4.3750 Protein_1 74464 6975 9 3487 </data>
<data> 2.1875 Protein_1 42086 12911 31 6456 </data>
<data> 1.0938 Protein_1 22193 11927 54 5963 </data>
<data> 0.5469 Protein_1 8639 5414 63 2707 </data>
<data> 0.2734 Protein_1 2280 1005 44 50 </data>
<data> 0.1367 Protein_1 697 121 17 60 </data>
<data> 0.0684 Protein_1 229 61 27 30 </data>
</molecule>
<molecule id="Protein 2">
<data> 16.0000 Protein_2 82926 4807 6 2404 </data>
<data> 8.0000 Protein_2 80540 6001 7 3000 </data>
<data> 4.0000 Protein_2 78766 2046 3 1023 </data>
<data> 2.0000 Protein_2 77948 5670 7 2835 </data>
<data> 1.0000 Protein_2 56149 4027 7 2013 </data>
<data> 0.5000 Protein_2 38119 5113 13 2556 </data>
<data> 0.2500 Protein_2 23296 4842 21 2421 </data>
<data> 0.1250 Protein_2 12010 1607 13 804 </data>
<data> 0.0625 Protein_2 4906 702 14 351 </data>
<data> 0.0313 Protein_2 2270 449 20 225 </data>
<data> 0.0156 Protein_2 1040 269 26 135 </data>
</molecule>
</bioassay>
Tip: XML can be easily generated from tabular data in
Excelä, using a macro to add the XML tags and saving the
resulting table as white space delimited text.
ABE
currently supports two kinds of data model that can be fitted to your experimental
data. The four-parameter dose-response model takes the form:
with a
and d being the y asymptotes as x®0 and x®¥
respectively, b is proportional to the slope at the midpoint of the
curve and c is the value of X at which the midpoint occurs (equivalent
to the ED50 value for a bioassay).
Since
nonlinear, least squares regression applied to a function of this type can be
unstable (if c gets close to zero or becomes negative for example), a good
initial estimate of the parameters being optimized, can greatly increase the
chances of the regression algorithm converging upon a sensible solution. For
this reason, prior to performing the nonlinear regression, ABE estimates the
parameters a, b and d, and uses the user-supplied estimate
of c (from the graph) as initial values for a four-dimensional search
that produces refined estimates of these parameters. These are generally very
close to the final best-fit parameters derived from the regression, but the
small trade-off in time (a few seconds on my IBM laptop) greatly increases the
reliability of the nonlinear regression. ABE uses the Levenberg-Marquardt
algorithm implemented in the Optimize module of the Scientific Python (SciPy)
library, to compute optimized values of the four parameters by nonlinear least
squares regression.
ABE also
supports the fitting of polynomials of arbitrary degree. For n data
pairs (x, y) in a two-dimensional function, a polynomial of up to degree (n-1)
can be fitted. A polynomial of degree n takes the form:
The
polynomial-fitting algorithm in ABE also requires an initial estimate of the
curve midpoint to be supplied by the user (from the graph). In this case
however, it is not used for fitting the polynomial, but rather as an initial
estimate in the iterative Newton-Raphson function that is used to solve the
local root of the second derivative of the polynomial (corresponding
(hopefully) to the point of inflexion at the midpoint of the curve, which should
be the ED50 value if the data are from a bioassay experiment).
In
general, the higher the degree of the polynomial that is chosen, the more
complex the curve and the greater is the number of potential stationary points
(local roots of the second derivative). For this reason, the initial estimate
of the midpoint is somewhat more critical when fitting polynomials of higher
order, since the Newton-Raphson solution for the second derivative might
converge at a local root away from the ‘true’ midpoint. Generally, a polynomial
of degree n=3 to n=5 should suffice for the vast majority of
‘well-behaved’ bioassay datasets.
For
ease of use, ABE recognizes a set of environment variables that allow the user
to define the working directory that ABE will go to by default for opening or
saving files, as well as an HTML version of the manual for displaying help
information.
ABE
currently recognizes two environment variables which can be set either in the
Windows environment (e.g. using the “System” icon in the control panel for
Win2000) or in the users “.login” file for POSIX operating systems (such as
UNIX and Linux platforms).
ABE_PATH
The
path to the default, working directory for ABE to open/save files
ABE_HELP
The
path to a local or online HTML version of the current ABE manual.
Operating
systems may vary in whether or not they are case-sensitive in their use of
environment variables so it is safest to use all uppercase characters since
they will probably not work in many instances, otherwise. The default working
directory defined in ABE_PATH can be overridden using the File-> Set Working Directory-> menu
function (described below).
Note: Menu functions in ABE will written in Courier typeface as lists separated by arrows to indicate
their positions in the menu tree. For example:
File->Load Bioassay
Data->
This refers to the ‘Load Bioassay Data’ function in
the top-level ‘File’ menu.
Upon launching ABE, a dialog window appears offering
a link to the text of the terms and conditions for the use of Abe and the
options to either accept or reject these terms and conditions. These terms and
conditions MUST be accepted before the program will continue (rejecting them,
terminates the program).
Tip: The terms and conditions for the use of ABE
under the GNU GPL can be viewed at any time via the Help->About-> menu function.
Upon
accepting the terms and conditions for use, a single window with a series of
menus and a square display area is visible. The display area is used for
plotting both the experimental data and the fitted models and it also allows
the user to interact with the graph in order to make initial estimates of the
midpoint of the curve prior to data fitting. A second window can be displayed
or hidden by clicking on:
Window->Activity
Log->
This toggles the activity log window (ALW) on and
off. The ALW records the ABE session in detail, and can optionally be saved as
a text file, providing the user with a complete record of the data analysis.
ABE also provides the user with several options for the way that the observed
and fitted data are plotted and the graphs that are drawn in the display can be
also saved to file (as PostScriptä).
In broad outline, the data analysis consists of
loading the data from the bioassay data file (in XML format), then, for each
individual molecule dataset, estimating the curve midpoint from the graph,
fitting one or more of the available data models and plotting the observed and
fitted data, and finally, once all of the data have been processed, saving the
results.
Data
can be loaded from file using the menu function
File->Load Bioassay Data->
This
launches a dialog window that allows you to select the file that contains your
data. Depending upon the OS you are using, this dialog will look and function
exactly like the standard “Open File’ windows that you are used to seeing.
If
the environment variable ABE_PATH is set in your operating system environment,
ABE will by default, go initially to the directory defined by this variable,
whenever a file is to be opened either to read as input, or to be written as
output. If this variable is not set, ABE will default to the directory from which the ABE application was launched.
This default working directory path can be set (or overridden) using the menu
function:
File->Set Working Directory->
Once
the bioassay data have been loaded, the Data->
menu will contain a list of all of the molecule datasets found in the data file
and the status bar under the display area will show the full directory path of
the file from which the data was read. You will notice that if you click on one
of the molecules in the menu list, a check mark will appear next to it to
indicate that this data is selected as the current dataset for processing
(obviously, you can only select one dataset at a time).
Nothing
will happen when you select the dataset you wish to process, until you use the
menu function
Data->Process Data->
When
this function is selected, the dataset you previously selected by clicking on
the list of molecules, will become the current dataset for processing. The
display area is cleared and the selected data are plotted. If any prior
analysis and data fitting has been done using this data, the results will also
be plotted if the appropriate graph options are selected (see the section on
graph options below). The status bar underneath the display area will also be
updated to show the name of the bioassay and the currently selected molecule.
For
example, if your bioassay data file contains experimental data for a molecule
called My Protein, the following sequence will select that data for
processing, load it into memory as the current dataset and plot the graph of
the observed data and any previous models generated for this data, during the
current session.
Data->My Protein->
Data->Process Data->
NB:
Once you have selected and loaded the data for a particular molecule, all
subsequent data fitting and analysis will only affect the current
molecule. Only the plotting options in the Graph->
menu remain current for all molecules.
The
first thing to be done with any new dataset is to provide an estimate of the ED50
value, using the graph. This is done using the menu function:
Data Model->Estimate ED50->
Once
this function is selected, the status bar displays the message “Click on the
graph to estimate the ED50 value”. Using the mouse, click on the graph
at the point (approximately) where you estimate the curve midpoint to be. This
will normally be roughly the middle of the central ‘linear’ segment of the
curve that lies between the minimum and the maximum plateaus. The accuracy of
this initial estimate of the ED50 is not critical at all for most
‘well-behaved’ datasets. Indeed, an initial estimate that lies almost anywhere
between the plateaus of the curve will almost always yield the same modeled ED50
using either the four-parameter or polynomial fitting algorithms. Clicking on
the graph, displays a vertical blue line at the value of x chosen for
the initial ED50 estimate. If you are unhappy with this choice you
may repeat this step as many times as desired until you are satisfied with your
choice.
Once
an initial estimate of the ED50 has been made, you may proceed
directly to the data modeling if you plan to use the four-parameter model (if you
wish to fit a polynomial, you still need to choose the degree of the fitted
polynomial – see the section on polynomial fitting below). The fitting
algorithm for the four-parameter model is launched by using the menu function:
Data Model->Fit 4-Parameter Model->
ABE
refines the initial estimate of the ED50 supplied by the user, along with its
own internally generated estimates of the other three parameters, and then
feeds these refined estimates to the nonlinear least squares regression
function. A dialog window appears during the optimization of the four
parameters, showing the progress of the computation and the final result. For
every value of x in the observed data, a fitted value of y using
the newly derived 4-parameter model is calculated and the fitted data are
plotted for comparison, in green, on the same axes as the observed data. In
addition, a vertical green line is plotted at the value of x
corresponding to the newly fitted ED50. If the newly fitted data are
not automatically plotted, it is because the corresponding options in the Graph-> menu have been
changed from their default values. Also, in some rare cases, the ED50
might actually lie outside the range of x values in your data and
will not appear, plotted as a line on the graph. The plotting of observed and
modeled data in the display area and the options in the Graph-> menu are described in
more detail in the section ‘Plotting your data’ below.
The
nonlinear least squares optimization of the 4-parameter model implemented in
ABE sacrifices a little time in favor of robustness. Sometimes however, it may
fail to produce a sensible solution, particularly if your data are noisy,
poorly measured, contain no ‘signal’ or are otherwise pathologically awful.
Remember that in fitting a 4-parameter sigmoidal dose-response model to your
data, you are already making certain assumptions about the physical phenomena
underlying your experimental data.
If
your data do not conform to a sigmoidal dose-response model, you will probably
not get a good result trying to fit them to one (duh)!
Sometimes
however, you may be missing some data, for example, the upper plateau of the
sigmoidal curve that defines the y asymptote. In such cases, ABE
provides a mechanism for you to supply these values yourself. In nearly all
cases, this should not be necessary and it is absolutely not recommended unless
you really cannot get a satisfactory, fitted data model using just the data you
have. If you really (really) must, you can set the upper and lower y
asymptotes yourself using the menu function:
Options->Estimate Curve Max/Min->
Selecting
this function displays a dialog window that shows the current estimated values
for each of the four parameters if estimates for them have already been made.
This dialog window will only allow you to set the upper and lower y asymptotes.
Estimates for the ED50 (the c parameter) must be supplied using the Data Model-> Estimate ED50->
menu function. Estimates for the proportional slope (b) cannot be made
manually in the current version of ABE.
If
the initial refinement of the 4 parameters prior to the nonlinear regression
optimization, fails to yield sensible estimates, the regression algorithm may
well fall over and/or converge on nonsensical values. This is extremely
unlikely if your data describe a normal dose-response curve, but ABE does
provide you with a really (really) last-resort mechanism for tinkering with the
pre-regression search algorithm via the menu function:
Options->4-Parameter Search->
Study
the 4-parameter model equation on page 6 and be sure you understand what
you are doing before attempting to change any of these search parameters.
The
fractional y-search and fractional x-search
parameters define the range on each axis (as a fraction of the respective data
ranges) to search around the initial estimates of a, c and d.
The initial estimates for a, c and d will be min(y),
the user-supplied ED50 and max(y) respectively, unless a
and d were set manually. The parameters y-search iterations
and ED50-search iterations determine the sampling rate of the y
and x axes over their respective search ranges. The optimal proportional
slope parameter (b) is initially searched for from just above zero up to
maximum slope, in increments determined by initial slope
search iterations, before being optimized jointly with the other
parameters over the range fractional x-search in increments
determined by the slope search iterations parameter. In general,
this two-stage search yields estimates for the four-parameter model that are
pretty close to the final optimized values produced by the subsequent nonlinear
regression fitting, and helps to ensure that the nonlinear regression is stable
and converges on a sensible solution.
If
this is not the case and the nonlinear regression fails, a warning message is
entered into the activity log and the optimized parameters that are quoted will
be the refined estimates produced by the initial four-dimensional search. In
this case, the following strategy could be tried … If the estimates of a
and d seem ok, first try increasing the range and/or resolution of the slope parameter (b) search by
increasing maximum slope and/or initial slope search
iterations. Typical values for b should lie in the range 1.0
< b < 5.0. Next, try increasing the range and/or resolution
of the y-axis search by
increasing fractional y-search and/or y-search iterations.
Finally, you might want to look carefully at your ED50 estimate and
possibly increase the range and resolution of
the ED50 parameter (c) search by increasing fractional
x-search and/or ED50-search iterations.
Warning:
Increasing the sampling rates on multiple axes of a four-dimensional search
will exponentially increase the compute time.
Fitting
a polynomial model to your bioassay data requires only the additional step of choosing
the degree of polynomial to be fitted. A minimum degree of 3 is recommended for
the kind of sigmoidal dose-response curve that a good bioassay data set should
conform to. In general, for n data points, a polynomial of maximum
degree n-1 can be fitted. In practice however, the use of polynomials of
degree higher than n=5 is not recommended. Using very high order
polynomials will produce an excellent fit to your data, but in effect, you are
also modeling the errors and irregularities in the data. In addition, higher
order polynomials can have more stationary points, making the initial estimate
of the ED50 more critical. In general a polynomial of degree n=5
is recommended.
To
fit a polynomial model to your data, first use the menu function:
Data Model->Choose Polynomial->
Having
selected the degree of the polynomial to be fitted, use the menu function:
Data Model->Fit Polynomial->
This
will perform the polynomial fitting and for every value of x in the
observed data, a fitted value of y using the newly derived polynomial
model is calculated and the fitted data are plotted for comparison, in red, on
the same axes as the observed data. In addition, a vertical red line is plotted
at the value of x corresponding to the newly fitted ED50. If
the newly fitted data are not automatically plotted, it is because the
corresponding options in the Graph->
menu have been changed from their default values. The plotting of observed and
modeled data in the display area and the options in the Graph-> menu are described in
more detail in the section ‘Plotting your data’ below.
Tip:
If the fitted ED50 derived from the polynomial model seems to fall
at the wrong place on the curve, try re-estimating the initial ED50
and/or fitting a lower order polynomial.
After fitting the four-parameter and/or polynomial
data models for the current molecule, a table of the observed and modeled data
can be generated in the activity log window, using the menu function:
Data Model->Show
Fitted Data->
For each value of x in the observed data, the
fitted y values for the four-parameter and polynomial models are listed
next to the observed y value in tabular columns labeled 4Par and Poly
respectively.
ABE
provides numerous options for the graphical display of the observed and modeled
data. The graphical display itself is color coded, with everything related to the
observed data shown in blue, everything related to the four-parameter model
shown in green and everything related to the polynomial model shown in red. The
graph is automatically redrawn after each model fitting. All of the functions
related to the graphical display are in the menu:
Graph->
Each
time the graph is redrawn according to the options currently selected in the Graph-> menu. Any changes
made to the graph plotting options will not be visible until the graph is
redrawn using the menu function:
Graph->Redraw Graph->
The
graphical display options are as follows:
The
tightness of fit of the graph within the display area can be controlled using
the menu function:
Graph->Border Width->
This
is useful if some of the fitted data points fall outside the display area or if
you simply wish to rescale the graph for presentation purposes. The default is
20 pixels. Increasing this value reduces the scale of the data graph.
The
graph key (top left) and legend (lower right) can be toggled on and off using
the menu functions:
Graph->Show Key->
Graph->Show Legend->
The
key and legend are also color coded for observed and modeled data, according to
the scheme described at the start of this section (default = On).
Whether
or not the graph and corresponding legends for the respective data models are
shown in the graphical display area, can be controlled using the menu
functions:
Graph->Show 4-Parameter Model->
Graph->Show Polynomial Model->
If
you perform a data-modeling step (fitting either a four-parameter or polynomial
model) and the graph is redrawn without the new model, it is probably because
one or other of these options is switched off. These options are useful for
inspecting the individual data models in detail, switching off one of the data
models to allow a clearer view of the graph.
Tip:
If the data fitting works well, the fitted curves and ED50 values
for the different models should overlap very closely. If they do not, something
may be wrong with one or both models.
A
dark border is normally drawn around the graphical display area as this
produces a neater effect when the graph is saved as PostScript (see the section
Saving Your Results below). It can be toggled on and off however, using
the menu function:
Graph->Draw Border->
If
a data column describing the errors in the observed data is designated in the
input file using the err attribute in the bioassay tag (see
section Preparing your bioassay data in XML format above), the error
bars for the graph of the observed data can be toggled on and off using the
menu function:
Graph->Show Error Bars->
The
default for this option is Off (NOT to draw any error bars).
Once
you are satisfied with the data processing for each of the molecules in your
bioassay data file, the results of your analysis can be saved using the menu
functions in the File->
menu:
File->Save Activity Log->
File->Save Graph Image->
File->Export Results Table->
File->Save Activity Log->
allows the user to save the entire activity log as a text file that will have
the exact same contents as the activity log window.
File->Save Graph Image->
will save the currently displayed graph, rendered according to the currently
selected graph options, as a PostScriptä file. It is often
useful to play with the options in the Graph-> menu to get exactly the image you want, before
saving each graph.
NB:
Unlike the menu function File->Save Activity
Log->, the menu function File->Save Graph Image->
only saves the currently displayed graph for the current molecule and
will need to be used for each molecule if you wish to have a complete graphical
record of your analysis.
File->Export Results Table->
will save the current data model(s) (four-parameter and polynomial) as a
tab-delimited data table suitable for importing into a spreadsheet application
such as Microsoft Excelä.
Tip:
Once you are satisfied with your results, go back through the list of molecules
in the Data->
menu, selecting each molecule in turn, setting the graph options you require,
then redrawing and saving the individual graphs as PostScriptä
files.
Avoid
any special characters in your molecule or column names when preparing your
data in XML format. Symbols such as the ampersand (&) for example may
have a special meaning in XML depending upon the context in which they appear
and are therefore best avoided.
Make
sure your bioassay data files have the suffix .xml so that they can be
easily read and verified by any XML-compliant software.
Pre-check
the validity of your XML data files by reading them into an XML-compliant web
browser such as Microsoft’s Internet Explorer 5.0. If there are any XML errors,
they will be listed as the browser attempts to parse the XML data.
Just
because your data file conforms to XML specifications, does not mean it will
automatically be acceptable for input into ABE. XML is just a meta-format for
describing data and any application that must read and parse XML (such as ABE)
must still find XML tags and attributes that are meaningful for its purposes.
Make
sure all of the data columns in your <data> blocks are properly white space delimited and
contain no white space or special characters themselves. Make sure that the
number and order of the columns is correctly described in the columns
attribute of the <bioassay>
tag and that the x, y and err attributes correctly
identify which columns are to be used for the data analysis.
The
menus in ABE all have a dotted line separator in the uppermost position. If you
click on this separator, you can “tear off” the menus and position them
anywhere you want on your desktop, as separate windows. This is especially
useful for the Data->
and Graph->
menus which must be frequently accessed during data processing with ABE.
Tearing these two menus off at the beginning of your analysis and positioning
them next to the main console window of ABE, makes life a lot easier.
Do
not “over-fit” your data using polynomials of a high degree. Just because they
fit your data points more closely does not necessarily mean that they are
better models for your data. You may just end up fitting the “noise” in your
data, as well as the real “signal”.
Learn
a little Python. ABE is written entirely in Python and is Open Source. If you
feel like adding, changing or tweaking a function in ABE, some knowledge of
Python and the availability of the ABE source code make this entirely possible.
For
bug reports or questions regarding ABE, please contact:
Gordon
Webster
EMD
Lexigen Research Center
Bedford
Campus, 45A Middlesex Turnpike
Billerica, MA 01821-3936
Email: gwebster@users.sourceforge.net
…
but I don’t expect the Spanish Inquisition*
*Respecting the tradition of the Python developer community
to (wherever possible) refer directly or indirectly to Monty Python’s Flying
Circus, the British TV comedy from the 1960’s that provided the inspiration
for the Python name.
GNU GENERAL PUBLIC LICENSE
Version 2, June 1991
Copyright (C) 1989, 1991 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.
Preamble
The licenses for most software are designed to take away your
freedom to share and change it. By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users. This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it. (Some other Free Software Foundation software is covered by
the GNU Library General Public License instead.) You can apply it to
your programs, too.
When we speak of free software, we are referring to freedom, not
price. Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have. You must make sure that they, too, receive or can get the
source code. And you must show them these terms so they know their
rights.
We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software. If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
Finally, any free program is threatened constantly by software
patents. We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary. To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
The precise terms and conditions for copying, distribution and
modification follow.
GNU GENERAL PUBLIC LICENSE
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License. The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language. (Hereinafter, translation is included without limitation in
the term "modification".) Each licensee is addressed as "you".
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope. The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
a) You must cause the modified files to carry prominent notices
stating that you changed the files and the date of any change.
b) You must cause any work that you distribute or publish, that in
whole or in part contains or is derived from the Program or any
part thereof, to be licensed as a whole at no charge to all third
parties under the terms of this License.
c) If the modified program normally reads commands interactively
when run, you must cause it, when started running for such
interactive use in the most ordinary way, to print or display an
announcement including an appropriate copyright notice and a
notice that there is no warranty (or else, saying that you provide
a warranty) and that users may redistribute the program under
these conditions, and telling the user how to view a copy of this
License. (Exception: if the Program itself is interactive but
does not normally print such an announcement, your work based on
the Program is not required to print an announcement.)
These requirements apply to the modified work as a whole. If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works. But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
a) Accompany it with the complete corresponding machine-readable
source code, which must be distributed under the terms of Sections
1 and 2 above on a medium customarily used for software interchange; or,
b) Accompany it with a written offer, valid for at least three
years, to give any third party, for a charge no more than your
cost of physically performing source distribution, a complete
machine-readable copy of the corresponding source code, to be
distributed under the terms of Sections 1 and 2 above on a medium
customarily used for software interchange; or,
c) Accompany it with the information you received as to the offer
to distribute corresponding source code. (This alternative is
allowed only for noncommercial distribution and only if you
received the program in object code or executable form with such
an offer, in accord with Subsection b above.)
The source code for a work means the preferred form of the work for
making modifications to it. For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable. However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License. Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
5. You are not required to accept this License, since you have not
signed it. However, nothing else grants you permission to modify or
distribute the Program or its derivative works. These actions are
prohibited by law if you do not accept this License. Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions. You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License. If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all. For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices. Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded. In such case, this License incorporates
the limitation as if written in the body of this License.
9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time. Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
Each version is given a distinguishing version number. If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation. If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission. For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this. Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
NO WARRANTY
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
END OF TERMS AND CONDITIONS
Return to top of document ▲