A B E

 

Version: 1.0, November 2002

By Gordon Webster

gwebster@users.sourceforge.net

Copyright © 2002, 2003 Gordon Webster, EMD Lexigen research Center, Bedford Campus,45A Middlesex Turnpike, Billerica, MA 01821 USA

 

Table of Contents

License and disclaimer

Introduction

Preparing your bioassay data in XML format

Data Modeling in ABE

ABE Environment Variables

Overview of ABE

Loading Bioassay Data

Estimating the Curve Midpoint (ED50)

Fitting the Four-Parameter Data Model

Problems Fitting the Four-Parameter Model

Fitting the Polynomial Data Model

Viewing the Fitted Data Model

Plotting Your Data

Saving Your Results

General Tips and Tricks for Working With ABE

Contact Details

Appendix A: The GNU General Public License (GPL)

 

 

License and disclaimer

 

ABE 1.0 (Copyright ã 2002, 2003 Gordon Webster, EMD Lexigen Research Center) is Open Source software that is available at no charge and may be freely used and distributed under the terms of the Gnu General Public License (GPL) as described in Appendix A of this manual.

 

ABE 1.0 is supplied for free, as is, without warranty or support of any kind. The author would be grateful to hear of any bugs or problems that arise in the course of the (sensible) use of this program, but makes no claims, implicitly or otherwise inferred, as to its usability or the validity of the results of its use (see Appendix A for a full disclosure of the GNU GPL licensing terms and conditions that apply to this software).

 

In the event that the use of ABE contributes to any scientific work that is published in any medium or presented to an audience in any oral or visual format, the author would be grateful if users could cite their use of ABE and the contribution that it made to their work. The following citation should suffice in most cases:

 

Webster, G.D. (2002) ABE, A bioassay data analysis package written in Python, EMD Lexigen Research Center, MA 01821 USA

 

Gordon Webster

EMD Lexigen Research Center

November 2002

 

 

Introduction

 

ABE is a fast and convenient program for modeling bioassay data using either polynomials or a four-parameter model based upon the standard, sigmoidal dose-response curve. ABE stands for Analysis of Bioassay Experiments.

 

ABE is written in Python 2.2 and uses the Tkinter graphic library to generate a friendly graphical user interface (GUI) for manipulating and visualizing the data. As a result, the program should be capable of being run on any platform with a standard Python 2.2 (or higher) distribution, including Intel-based platforms running Windows or Linux as well as Apple platforms running OSX.

 

For the data modeling, ABE uses the nonlinear regression routines from the Scientific Python (SciPy) and Numeric libraries as well as the polyfit function from Raymond Hettinger's Matfunc module, for computing the fitted polynomial coefficients from the supplied data. Functions for computing the derivatives of the fitted polynomials, solving polynomial roots and estimating the initial parameters for the nonlinear regression were added by the author and are included in the body of the main Abe module. In addition to the libraries of the standard Python distribution therefore, only the SciPy, Numeric and Matfunc modules need to be packaged with the Abe module, for installation on other platforms.

 

Given the lack of any current standard format for bioassay data, ABE accepts bioassay data in XML (eXtensible Markup Language) format since this has the virtue of being concise, unambiguous and human-readable, and also because XML is rapidly becoming a standard for data exchange. The input XML data is parsed using the functions and classes supplied in the xml.parsers module that is included as part of the current, standard Python distribution. No additional Python modules are therefore required for reading and parsing data in XML format, making ABE more modular, portable and flexible than it would be if a specialized data format unique to this program were used. XML is also very easy to produce from the kind of tabular data used in applications such as Microsoft Excelä.

 

ABE is a work in progress, to which more features can and will be added as demand requires and time allows. The current version (1.0) has sufficient functionality to perform a useful analysis of experimental bioassay data, offering a couple of standard methods for modeling the data and generating fitted ED50 values. Future versions should include a more exhaustive statistical analysis of the data model and the option to compare different bioassays and calculate relative potencies (including parallel line analysis)

 

 

 

Preparing your bioassay data in XML format

 

Basic XML (the only kind that ABE uses) is very intuitive and easy to learn. If you are not familiar with it, there are many excellent books available, as well as many free online references, introductions and tutorials. ABE accepts bioassay data formatted in XML, according to the following schema:

 

<bioassay id=’bioassay name’

 x=’name of x column’

 y=’name of y column’

 err=’name of error column’

 columns=’column-1 column-2 column3 … column-n’>

 

     <molecule id=’name of molecule’>

          <data> column-1-data column-2-data column-3-data … column-n-data </data>

          .

          .

          .

     </molecule>

     .

     .

     .

</bioassay>

 

The literal XML tag names and attributes that are required are shown in bold, field entries to be supplied by the user are shown in italics and a vertical column of periods indicates that an arbitrary number of the last-described data block can be inserted at that point. The bioassay data is described in a bioassay block, within which, an arbitrary number of molecule blocks may occur, each of which contains an arbitrary number of data records.

 

The bioassay tag must be the top-level XML tag in the data file and should supply the attributes id, x, y, err and columns, where x is the name of the data column containing the x-axis data (normally the protein concentration), y is the name of the y data column (normally the measure of the protein’s activity e.g. in counts per minute), err is the name of the column containing the standard errors and columns is a list of the column names in the order that they occur in the data records. Note that this list is white space delimited therefore, no white space should occur in each column name. For example, a column can NOT be called protein conc, but protein.conc, protein-conc and protein_conc are all valid column names. Obviously, there should be the same number of column labels as there are columns in each of the data records that follow. NB: The bioassay and molecule id attributes are always inside quotes and therefore CAN contain any amount of white space.

 

The molecule tag has only a single required attribute, id, which uniquely identifies the molecule. Within the molecule, any number of data records may be given. Each data record is also read as a white space delimited list, therefore no field in the list should contain any white-space e.g. in the data record:

 

<data> Protein 1 70.00 85323 1243.2 1028.5 978 134.6 </data>

 

The protein name field Protein 1 would be read as two fields Protein and 1, even if it was intended to be read as a single field according to the description supplied for the columns attribute in the bioassay tag.

 

Any other valid XML components such as XML comments or even other valid XML tags are allowed, but these will be ignored by ABE. The data file must only contain valid XML components otherwise an error will be generated when the data are parsed by ABE. One easy way to check your files for XML compliance is to open them with an XML-compliant web browser such as Microsoft’s Internet Explorerä 5.0. If your XML formatting is OK, the web browser will display the data, formatted and color-coded according to its XML data structure.

 

Tip: Make sure your XML data files have the file extension .xml so that they can automatically be opened and viewed with a (properly configured) web browser if you double-click on the file icons.

 

Remember, XML is just a meta-format for describing data. Just because your file is valid XML, does not necessarily mean it will be recognized by ABE unless you correctly include all of the tags and attributes that ABE recognizes, as described above. A typical bioassay data file for input to ABE, should look something like the data shown below (the pretty indents and tabs are not necessary for XML but they make the XML easier to read and understand for humans!)

 

  <!-- This is my bioassay data in XML format -->

                                 

  <bioassay id='Assay 112602A' x='protein.conc' y='avg.cpm' err='std.err'

       columns='protein.conc molecule avg.cpm std.dev percent.cv std.err'>

 

       <molecule id="Protein 1">

 

              <data> 70.0000       Protein_1     85323  2164   3      1082   </data>

              <data> 35.0000       Protein_1     81292  5963   7      2981   </data>

              <data> 17.5000       Protein_1     84020  4796   6      2398   </data>

              <data> 8.7500        Protein_1     82278  3843   5      1921   </data>

              <data> 4.3750        Protein_1     74464  6975   9      3487   </data>

              <data> 2.1875        Protein_1     42086  12911  31     6456   </data>

              <data> 1.0938        Protein_1     22193  11927  54     5963   </data>

              <data> 0.5469        Protein_1     8639   5414   63     2707   </data>

              <data> 0.2734        Protein_1     2280   1005   44     50     </data>

              <data> 0.1367        Protein_1     697    121    17     60     </data>

              <data> 0.0684        Protein_1     229    61     27     30     </data>

 

       </molecule>

 

       <molecule id="Protein 2">

 

              <data> 16.0000       Protein_2     82926  4807   6      2404   </data>

              <data> 8.0000        Protein_2     80540  6001   7      3000   </data>

              <data> 4.0000        Protein_2     78766  2046   3      1023   </data>

              <data> 2.0000        Protein_2     77948  5670   7      2835   </data>

              <data> 1.0000        Protein_2     56149  4027   7      2013   </data>

              <data> 0.5000        Protein_2     38119  5113   13     2556   </data>

              <data> 0.2500        Protein_2     23296  4842   21     2421   </data>

              <data> 0.1250        Protein_2     12010  1607   13     804    </data>

              <data> 0.0625        Protein_2     4906   702    14     351    </data>

              <data> 0.0313        Protein_2     2270   449    20     225    </data>

              <data> 0.0156        Protein_2     1040   269    26     135    </data>

 

       </molecule>

 

</bioassay>

 

Tip: XML can be easily generated from tabular data in Excelä, using a macro to add the XML tags and saving the resulting table as white space delimited text.

 

 

 

Data modeling in Abe

 

ABE currently supports two kinds of data model that can be fitted to your experimental data. The four-parameter dose-response model takes the form:

 

 

with a and d being the y asymptotes as x®0 and x®¥ respectively, b is proportional to the slope at the midpoint of the curve and c is the value of X at which the midpoint occurs (equivalent to the ED50 value for a bioassay).

 

Since nonlinear, least squares regression applied to a function of this type can be unstable (if c gets close to zero or becomes negative for example), a good initial estimate of the parameters being optimized, can greatly increase the chances of the regression algorithm converging upon a sensible solution. For this reason, prior to performing the nonlinear regression, ABE estimates the parameters a, b and d, and uses the user-supplied estimate of c (from the graph) as initial values for a four-dimensional search that produces refined estimates of these parameters. These are generally very close to the final best-fit parameters derived from the regression, but the small trade-off in time (a few seconds on my IBM laptop) greatly increases the reliability of the nonlinear regression. ABE uses the Levenberg-Marquardt algorithm implemented in the Optimize module of the Scientific Python (SciPy) library, to compute optimized values of the four parameters by nonlinear least squares regression.

 

ABE also supports the fitting of polynomials of arbitrary degree. For n data pairs (x, y) in a two-dimensional function, a polynomial of up to degree (n-1) can be fitted. A polynomial of degree n takes the form:

 

 

The polynomial-fitting algorithm in ABE also requires an initial estimate of the curve midpoint to be supplied by the user (from the graph). In this case however, it is not used for fitting the polynomial, but rather as an initial estimate in the iterative Newton-Raphson function that is used to solve the local root of the second derivative of the polynomial (corresponding (hopefully) to the point of inflexion at the midpoint of the curve, which should be the ED50 value if the data are from a bioassay experiment).

 

In general, the higher the degree of the polynomial that is chosen, the more complex the curve and the greater is the number of potential stationary points (local roots of the second derivative). For this reason, the initial estimate of the midpoint is somewhat more critical when fitting polynomials of higher order, since the Newton-Raphson solution for the second derivative might converge at a local root away from the ‘true’ midpoint. Generally, a polynomial of degree n=3 to n=5 should suffice for the vast majority of ‘well-behaved’ bioassay datasets.

 

 

 

ABE environment variables

 

For ease of use, ABE recognizes a set of environment variables that allow the user to define the working directory that ABE will go to by default for opening or saving files, as well as an HTML version of the manual for displaying help information.

 

ABE currently recognizes two environment variables which can be set either in the Windows environment (e.g. using the “System” icon in the control panel for Win2000) or in the users “.login” file for POSIX operating systems (such as UNIX and Linux platforms).

 

ABE_PATH

The path to the default, working directory for ABE to open/save files

 

ABE_HELP

The path to a local or online HTML version of the current ABE manual.

 

Operating systems may vary in whether or not they are case-sensitive in their use of environment variables so it is safest to use all uppercase characters since they will probably not work in many instances, otherwise. The default working directory defined in ABE_PATH can be overridden using the File-> Set Working Directory-> menu function (described below).

 

 

 

Overview of ABE

 

Note: Menu functions in ABE will written in Courier typeface as lists separated by arrows to indicate their positions in the menu tree. For example:

 

File->Load Bioassay Data->

 

This refers to the ‘Load Bioassay Data’ function in the top-level ‘File’ menu.

 

Upon launching ABE, a dialog window appears offering a link to the text of the terms and conditions for the use of Abe and the options to either accept or reject these terms and conditions. These terms and conditions MUST be accepted before the program will continue (rejecting them, terminates the program).

 

Tip: The terms and conditions for the use of ABE under the GNU GPL can be viewed at any time via the Help->About-> menu function.

 

 Upon accepting the terms and conditions for use, a single window with a series of menus and a square display area is visible. The display area is used for plotting both the experimental data and the fitted models and it also allows the user to interact with the graph in order to make initial estimates of the midpoint of the curve prior to data fitting. A second window can be displayed or hidden by clicking on:

 

Window->Activity Log->

 

This toggles the activity log window (ALW) on and off. The ALW records the ABE session in detail, and can optionally be saved as a text file, providing the user with a complete record of the data analysis. ABE also provides the user with several options for the way that the observed and fitted data are plotted and the graphs that are drawn in the display can be also saved to file (as PostScriptä).

 

In broad outline, the data analysis consists of loading the data from the bioassay data file (in XML format), then, for each individual molecule dataset, estimating the curve midpoint from the graph, fitting one or more of the available data models and plotting the observed and fitted data, and finally, once all of the data have been processed, saving the results.

 

 

 

Loading bioassay data

 

Data can be loaded from file using the menu function

 

File->Load Bioassay Data->

 

This launches a dialog window that allows you to select the file that contains your data. Depending upon the OS you are using, this dialog will look and function exactly like the standard “Open File’ windows that you are used to seeing.

 

If the environment variable ABE_PATH is set in your operating system environment, ABE will by default, go initially to the directory defined by this variable, whenever a file is to be opened either to read as input, or to be written as output. If this variable is not set, ABE will default  to the directory from which the ABE application was launched. This default working directory path can be set (or overridden) using the menu function:

 

File->Set Working Directory->

 

 

Once the bioassay data have been loaded, the Data-> menu will contain a list of all of the molecule datasets found in the data file and the status bar under the display area will show the full directory path of the file from which the data was read. You will notice that if you click on one of the molecules in the menu list, a check mark will appear next to it to indicate that this data is selected as the current dataset for processing (obviously, you can only select one dataset at a time).


Nothing will happen when you select the dataset you wish to process, until you use the menu function

 

Data->Process Data->

 

When this function is selected, the dataset you previously selected by clicking on the list of molecules, will become the current dataset for processing. The display area is cleared and the selected data are plotted. If any prior analysis and data fitting has been done using this data, the results will also be plotted if the appropriate graph options are selected (see the section on graph options below). The status bar underneath the display area will also be updated to show the name of the bioassay and the currently selected molecule.

 

For example, if your bioassay data file contains experimental data for a molecule called My Protein, the following sequence will select that data for processing, load it into memory as the current dataset and plot the graph of the observed data and any previous models generated for this data, during the current session.

 

Data->My Protein->

Data->Process Data->

 

NB: Once you have selected and loaded the data for a particular molecule, all subsequent data fitting and analysis will only affect the current molecule. Only the plotting options in the Graph-> menu remain current for all molecules.

 

 

 

Estimating the curve midpoint (ED50)

 

The first thing to be done with any new dataset is to provide an estimate of the ED50 value, using the graph. This is done using the menu function:

 

Data Model->Estimate ED50->

 

Once this function is selected, the status bar displays the message “Click on the graph to estimate the ED50 value”. Using the mouse, click on the graph at the point (approximately) where you estimate the curve midpoint to be. This will normally be roughly the middle of the central ‘linear’ segment of the curve that lies between the minimum and the maximum plateaus. The accuracy of this initial estimate of the ED50 is not critical at all for most ‘well-behaved’ datasets. Indeed, an initial estimate that lies almost anywhere between the plateaus of the curve will almost always yield the same modeled ED50 using either the four-parameter or polynomial fitting algorithms. Clicking on the graph, displays a vertical blue line at the value of x chosen for the initial ED50 estimate. If you are unhappy with this choice you may repeat this step as many times as desired until you are satisfied with your choice.

 

 

 

Fitting the four-parameter data model

 

Once an initial estimate of the ED50 has been made, you may proceed directly to the data modeling if you plan to use the four-parameter model (if you wish to fit a polynomial, you still need to choose the degree of the fitted polynomial – see the section on polynomial fitting below). The fitting algorithm for the four-parameter model is launched by using the menu function:

 

Data Model->Fit 4-Parameter Model->

 

ABE refines the initial estimate of the ED50 supplied by the user, along with its own internally generated estimates of the other three parameters, and then feeds these refined estimates to the nonlinear least squares regression function. A dialog window appears during the optimization of the four parameters, showing the progress of the computation and the final result. For every value of x in the observed data, a fitted value of y using the newly derived 4-parameter model is calculated and the fitted data are plotted for comparison, in green, on the same axes as the observed data. In addition, a vertical green line is plotted at the value of x corresponding to the newly fitted ED50. If the newly fitted data are not automatically plotted, it is because the corresponding options in the Graph-> menu have been changed from their default values. Also, in some rare cases, the ED50 might actually lie outside the range of x values in your data and will not appear, plotted as a line on the graph. The plotting of observed and modeled data in the display area and the options in the Graph-> menu are described in more detail in the section ‘Plotting your data’ below.

 

 

 

Problems fitting the four-parameter model

 

The nonlinear least squares optimization of the 4-parameter model implemented in ABE sacrifices a little time in favor of robustness. Sometimes however, it may fail to produce a sensible solution, particularly if your data are noisy, poorly measured, contain no ‘signal’ or are otherwise pathologically awful. Remember that in fitting a 4-parameter sigmoidal dose-response model to your data, you are already making certain assumptions about the physical phenomena underlying your experimental data.

 

If your data do not conform to a sigmoidal dose-response model, you will probably not get a good result trying to fit them to one (duh)!

 

Sometimes however, you may be missing some data, for example, the upper plateau of the sigmoidal curve that defines the y asymptote. In such cases, ABE provides a mechanism for you to supply these values yourself. In nearly all cases, this should not be necessary and it is absolutely not recommended unless you really cannot get a satisfactory, fitted data model using just the data you have. If you really (really) must, you can set the upper and lower y asymptotes yourself using the menu function:

 

Options->Estimate Curve Max/Min->


Selecting this function displays a dialog window that shows the current estimated values for each of the four parameters if estimates for them have already been made. This dialog window will only allow you to set the upper and lower y asymptotes. Estimates for the ED50 (the c parameter) must be supplied using the Data Model-> Estimate ED50-> menu function. Estimates for the proportional slope (b) cannot be made manually in the current version of ABE.

 

If the initial refinement of the 4 parameters prior to the nonlinear regression optimization, fails to yield sensible estimates, the regression algorithm may well fall over and/or converge on nonsensical values. This is extremely unlikely if your data describe a normal dose-response curve, but ABE does provide you with a really (really) last-resort mechanism for tinkering with the pre-regression search algorithm via the menu function:

 

Options->4-Parameter Search->

 

Study the 4-parameter model equation on page 6 and be sure you understand what you are doing before attempting to change any of these search parameters.

 

The fractional y-search and fractional x-search parameters define the range on each axis (as a fraction of the respective data ranges) to search around the initial estimates of a, c and d. The initial estimates for a, c and d will be min(y), the user-supplied ED50 and max(y) respectively, unless a and d were set manually. The parameters y-search iterations and ED50-search iterations determine the sampling rate of the y and x axes over their respective search ranges. The optimal proportional slope parameter (b) is initially searched for from just above zero up to maximum slope, in increments determined by initial slope search iterations, before being optimized jointly with the other parameters over the range fractional x-search in increments determined by the slope search iterations parameter. In general, this two-stage search yields estimates for the four-parameter model that are pretty close to the final optimized values produced by the subsequent nonlinear regression fitting, and helps to ensure that the nonlinear regression is stable and converges on a sensible solution.

 

If this is not the case and the nonlinear regression fails, a warning message is entered into the activity log and the optimized parameters that are quoted will be the refined estimates produced by the initial four-dimensional search. In this case, the following strategy could be tried … If the estimates of a and d seem ok, first try increasing the range and/or resolution of  the slope parameter (b) search by increasing maximum slope and/or initial slope search iterations. Typical values for b should lie in the range 1.0 < b < 5.0. Next, try increasing the range and/or resolution of  the y-axis search by increasing fractional y-search and/or y-search iterations. Finally, you might want to look carefully at your ED50 estimate and possibly increase the range and resolution of  the ED50 parameter (c) search by increasing fractional x-search and/or ED50-search iterations.

 

Warning: Increasing the sampling rates on multiple axes of a four-dimensional search will exponentially increase the compute time.

 

 

 

Fitting the polynomial data model

 

Fitting a polynomial model to your bioassay data requires only the additional step of choosing the degree of polynomial to be fitted. A minimum degree of 3 is recommended for the kind of sigmoidal dose-response curve that a good bioassay data set should conform to. In general, for n data points, a polynomial of maximum degree n-1 can be fitted. In practice however, the use of polynomials of degree higher than n=5 is not recommended. Using very high order polynomials will produce an excellent fit to your data, but in effect, you are also modeling the errors and irregularities in the data. In addition, higher order polynomials can have more stationary points, making the initial estimate of the ED50 more critical. In general a polynomial of degree n=5 is recommended.

 

To fit a polynomial model to your data, first use the menu function:

 

Data Model->Choose Polynomial->

 

Having selected the degree of the polynomial to be fitted, use the menu function:

 

Data Model->Fit Polynomial->

 

This will perform the polynomial fitting and for every value of x in the observed data, a fitted value of y using the newly derived polynomial model is calculated and the fitted data are plotted for comparison, in red, on the same axes as the observed data. In addition, a vertical red line is plotted at the value of x corresponding to the newly fitted ED50. If the newly fitted data are not automatically plotted, it is because the corresponding options in the Graph-> menu have been changed from their default values. The plotting of observed and modeled data in the display area and the options in the Graph-> menu are described in more detail in the section ‘Plotting your data’ below.

 

Tip: If the fitted ED50 derived from the polynomial model seems to fall at the wrong place on the curve, try re-estimating the initial ED50 and/or fitting a lower order polynomial.

 

 

 

Viewing the fitted data model

 

After fitting the four-parameter and/or polynomial data models for the current molecule, a table of the observed and modeled data can be generated in the activity log window, using the menu function:

 

Data Model->Show Fitted Data->

 

For each value of x in the observed data, the fitted y values for the four-parameter and polynomial models are listed next to the observed y value in tabular columns labeled 4Par and Poly respectively.

 

 

 

Plotting your data

 

ABE provides numerous options for the graphical display of the observed and modeled data. The graphical display itself is color coded, with everything related to the observed data shown in blue, everything related to the four-parameter model shown in green and everything related to the polynomial model shown in red. The graph is automatically redrawn after each model fitting. All of the functions related to the graphical display are in the menu:

 

Graph->

 

Each time the graph is redrawn according to the options currently selected in the Graph-> menu. Any changes made to the graph plotting options will not be visible until the graph is redrawn using the menu function:

 

Graph->Redraw Graph->

 

The graphical display options are as follows:

 

The tightness of fit of the graph within the display area can be controlled using the menu function:

 

Graph->Border Width->

 

This is useful if some of the fitted data points fall outside the display area or if you simply wish to rescale the graph for presentation purposes. The default is 20 pixels. Increasing this value reduces the scale of the data graph.

 

The graph key (top left) and legend (lower right) can be toggled on and off using the menu functions:

 

Graph->Show Key->

Graph->Show Legend->

 

The key and legend are also color coded for observed and modeled data, according to the scheme described at the start of this section (default = On).

 

Whether or not the graph and corresponding legends for the respective data models are shown in the graphical display area, can be controlled using the menu functions:

 

Graph->Show 4-Parameter Model->

Graph->Show Polynomial Model->

 

If you perform a data-modeling step (fitting either a four-parameter or polynomial model) and the graph is redrawn without the new model, it is probably because one or other of these options is switched off. These options are useful for inspecting the individual data models in detail, switching off one of the data models to allow a clearer view of the graph.

 

Tip: If the data fitting works well, the fitted curves and ED50 values for the different models should overlap very closely. If they do not, something may be wrong with one or both models.

 

A dark border is normally drawn around the graphical display area as this produces a neater effect when the graph is saved as PostScript (see the section Saving Your Results below). It can be toggled on and off however, using the menu function:

 

Graph->Draw Border->

 

If a data column describing the errors in the observed data is designated in the input file using the err attribute in the bioassay tag (see section Preparing your bioassay data in XML format above), the error bars for the graph of the observed data can be toggled on and off using the menu function:

 

Graph->Show Error Bars->

 

The default for this option is Off (NOT to draw any error bars).

 

 

 

Saving your results

 

Once you are satisfied with the data processing for each of the molecules in your bioassay data file, the results of your analysis can be saved using the menu functions in the File-> menu:

 

File->Save Activity Log->

File->Save Graph Image->

File->Export Results Table->

 

File->Save Activity Log-> allows the user to save the entire activity log as a text file that will have the exact same contents as the activity log window.

 

File->Save Graph Image-> will save the currently displayed graph, rendered according to the currently selected graph options, as a PostScriptä file. It is often useful to play with the options in the Graph-> menu to get exactly the image you want, before saving each graph.

 

NB: Unlike the menu function File->Save Activity Log->, the menu function File->Save Graph Image-> only saves the currently displayed graph for the current molecule and will need to be used for each molecule if you wish to have a complete graphical record of your analysis.

 

File->Export Results Table-> will save the current data model(s) (four-parameter and polynomial) as a tab-delimited data table suitable for importing into a spreadsheet application such as Microsoft Excelä.

 

Tip: Once you are satisfied with your results, go back through the list of molecules in the Data-> menu, selecting each molecule in turn, setting the graph options you require, then redrawing and saving the individual graphs as PostScriptä files.

 

 

 

General tips and tricks for working with ABE

 

Avoid any special characters in your molecule or column names when preparing your data in XML format. Symbols such as the ampersand (&) for example may have a special meaning in XML depending upon the context in which they appear and are therefore best avoided.

 

Make sure your bioassay data files have the suffix .xml so that they can be easily read and verified by any XML-compliant software.

 

Pre-check the validity of your XML data files by reading them into an XML-compliant web browser such as Microsoft’s Internet Explorer 5.0. If there are any XML errors, they will be listed as the browser attempts to parse the XML data.

 

Just because your data file conforms to XML specifications, does not mean it will automatically be acceptable for input into ABE. XML is just a meta-format for describing data and any application that must read and parse XML (such as ABE) must still find XML tags and attributes that are meaningful for its purposes.

 

Make sure all of the data columns in your <data> blocks are properly white space delimited and contain no white space or special characters themselves. Make sure that the number and order of the columns is correctly described in the columns attribute of the <bioassay> tag and that the x, y and err attributes correctly identify which columns are to be used for the data analysis.

 

The menus in ABE all have a dotted line separator in the uppermost position. If you click on this separator, you can “tear off” the menus and position them anywhere you want on your desktop, as separate windows. This is especially useful for the Data-> and Graph-> menus which must be frequently accessed during data processing with ABE. Tearing these two menus off at the beginning of your analysis and positioning them next to the main console window of ABE, makes life a lot easier.

 

Do not “over-fit” your data using polynomials of a high degree. Just because they fit your data points more closely does not necessarily mean that they are better models for your data. You may just end up fitting the “noise” in your data, as well as the real “signal”.

 

Learn a little Python. ABE is written entirely in Python and is Open Source. If you feel like adding, changing or tweaking a function in ABE, some knowledge of Python and the availability of the ABE source code make this entirely possible.

 

 

 

Contact details

 

For bug reports or questions regarding ABE, please contact:

 

Gordon Webster

EMD Lexigen Research Center

Bedford Campus, 45A Middlesex Turnpike

Billerica, MA 01821-3936

Email: gwebster@users.sourceforge.net

 

 

… but I don’t expect the Spanish Inquisition*

 

*Respecting the tradition of the Python developer community to (wherever possible) refer directly or indirectly to Monty Python’s Flying Circus, the British TV comedy from the 1960’s that provided the inspiration for the Python name.

 

 

 

Appendix A: The GNU General Public License (GPL)

 

                                   GNU GENERAL PUBLIC LICENSE
                                      Version 2, June 1991
 
 Copyright (C) 1989, 1991 Free Software Foundation, Inc.
 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.
 
                                                  Preamble
 
  The licenses for most software are designed to take away your
freedom to share and change it.  By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users.  This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it.  (Some other Free Software Foundation software is covered by
the GNU Library General Public License instead.)  You can apply it to
your programs, too.
 
  When we speak of free software, we are referring to freedom, not
price.  Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.
 
  To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.
 
  For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have.  You must make sure that they, too, receive or can get the
source code.  And you must show them these terms so they know their
rights.
 
  We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.
 
  Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software.  If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.
 
  Finally, any free program is threatened constantly by software
patents.  We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary.  To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.
 
  The precise terms and conditions for copying, distribution and
modification follow.
 
                                   GNU GENERAL PUBLIC LICENSE
   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
 
  0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License.  The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language.  (Hereinafter, translation is included without limitation in
the term "modification".)  Each licensee is addressed as "you".
 
Activities other than copying, distribution and modification are not
covered by this License; they are outside its scope.  The act of
running the Program is not restricted, and the output from the Program
is covered only if its contents constitute a work based on the
Program (independent of having been made by running the Program).
Whether that is true depends on what the Program does.
 
  1. You may copy and distribute verbatim copies of the Program's
source code as you receive it, in any medium, provided that you
conspicuously and appropriately publish on each copy an appropriate
copyright notice and disclaimer of warranty; keep intact all the
notices that refer to this License and to the absence of any warranty;
and give any other recipients of the Program a copy of this License
along with the Program.
 
You may charge a fee for the physical act of transferring a copy, and
you may at your option offer warranty protection in exchange for a fee.
 
  2. You may modify your copy or copies of the Program or any portion
of it, thus forming a work based on the Program, and copy and
distribute such modifications or work under the terms of Section 1
above, provided that you also meet all of these conditions:
 
    a) You must cause the modified files to carry prominent notices
    stating that you changed the files and the date of any change.
 
    b) You must cause any work that you distribute or publish, that in
    whole or in part contains or is derived from the Program or any
    part thereof, to be licensed as a whole at no charge to all third
    parties under the terms of this License.
 
    c) If the modified program normally reads commands interactively
    when run, you must cause it, when started running for such
    interactive use in the most ordinary way, to print or display an
    announcement including an appropriate copyright notice and a
    notice that there is no warranty (or else, saying that you provide
    a warranty) and that users may redistribute the program under
    these conditions, and telling the user how to view a copy of this
    License.  (Exception: if the Program itself is interactive but
    does not normally print such an announcement, your work based on
    the Program is not required to print an announcement.)
 
These requirements apply to the modified work as a whole.  If
identifiable sections of that work are not derived from the Program,
and can be reasonably considered independent and separate works in
themselves, then this License, and its terms, do not apply to those
sections when you distribute them as separate works.  But when you
distribute the same sections as part of a whole which is a work based
on the Program, the distribution of the whole must be on the terms of
this License, whose permissions for other licensees extend to the
entire whole, and thus to each and every part regardless of who wrote it.
 
Thus, it is not the intent of this section to claim rights or contest
your rights to work written entirely by you; rather, the intent is to
exercise the right to control the distribution of derivative or
collective works based on the Program.
 
In addition, mere aggregation of another work not based on the Program
with the Program (or with a work based on the Program) on a volume of
a storage or distribution medium does not bring the other work under
the scope of this License.
 
  3. You may copy and distribute the Program (or a work based on it,
under Section 2) in object code or executable form under the terms of
Sections 1 and 2 above provided that you also do one of the following:
 
    a) Accompany it with the complete corresponding machine-readable
    source code, which must be distributed under the terms of Sections
    1 and 2 above on a medium customarily used for software interchange; or,
 
    b) Accompany it with a written offer, valid for at least three
    years, to give any third party, for a charge no more than your
    cost of physically performing source distribution, a complete
    machine-readable copy of the corresponding source code, to be
    distributed under the terms of Sections 1 and 2 above on a medium
    customarily used for software interchange; or,
 
    c) Accompany it with the information you received as to the offer
    to distribute corresponding source code.  (This alternative is
    allowed only for noncommercial distribution and only if you
    received the program in object code or executable form with such
    an offer, in accord with Subsection b above.)
 
The source code for a work means the preferred form of the work for
making modifications to it.  For an executable work, complete source
code means all the source code for all modules it contains, plus any
associated interface definition files, plus the scripts used to
control compilation and installation of the executable.  However, as a
special exception, the source code distributed need not include
anything that is normally distributed (in either source or binary
form) with the major components (compiler, kernel, and so on) of the
operating system on which the executable runs, unless that component
itself accompanies the executable.
 
If distribution of executable or object code is made by offering
access to copy from a designated place, then offering equivalent
access to copy the source code from the same place counts as
distribution of the source code, even though third parties are not
compelled to copy the source along with the object code.
 
  4. You may not copy, modify, sublicense, or distribute the Program
except as expressly provided under this License.  Any attempt
otherwise to copy, modify, sublicense or distribute the Program is
void, and will automatically terminate your rights under this License.
However, parties who have received copies, or rights, from you under
this License will not have their licenses terminated so long as such
parties remain in full compliance.
 
  5. You are not required to accept this License, since you have not
signed it.  However, nothing else grants you permission to modify or
distribute the Program or its derivative works.  These actions are
prohibited by law if you do not accept this License.  Therefore, by
modifying or distributing the Program (or any work based on the
Program), you indicate your acceptance of this License to do so, and
all its terms and conditions for copying, distributing or modifying
the Program or works based on it.
 
  6. Each time you redistribute the Program (or any work based on the
Program), the recipient automatically receives a license from the
original licensor to copy, distribute or modify the Program subject to
these terms and conditions.  You may not impose any further
restrictions on the recipients' exercise of the rights granted herein.
You are not responsible for enforcing compliance by third parties to
this License.
 
  7. If, as a consequence of a court judgment or allegation of patent
infringement or for any other reason (not limited to patent issues),
conditions are imposed on you (whether by court order, agreement or
otherwise) that contradict the conditions of this License, they do not
excuse you from the conditions of this License.  If you cannot
distribute so as to satisfy simultaneously your obligations under this
License and any other pertinent obligations, then as a consequence you
may not distribute the Program at all.  For example, if a patent
license would not permit royalty-free redistribution of the Program by
all those who receive copies directly or indirectly through you, then
the only way you could satisfy both it and this License would be to
refrain entirely from distribution of the Program.
 
If any portion of this section is held invalid or unenforceable under
any particular circumstance, the balance of the section is intended to
apply and the section as a whole is intended to apply in other
circumstances.
 
It is not the purpose of this section to induce you to infringe any
patents or other property right claims or to contest validity of any
such claims; this section has the sole purpose of protecting the
integrity of the free software distribution system, which is
implemented by public license practices.  Many people have made
generous contributions to the wide range of software distributed
through that system in reliance on consistent application of that
system; it is up to the author/donor to decide if he or she is willing
to distribute software through any other system and a licensee cannot
impose that choice.
 
This section is intended to make thoroughly clear what is believed to
be a consequence of the rest of this License.
 
  8. If the distribution and/or use of the Program is restricted in
certain countries either by patents or by copyrighted interfaces, the
original copyright holder who places the Program under this License
may add an explicit geographical distribution limitation excluding
those countries, so that distribution is permitted only in or among
countries not thus excluded.  In such case, this License incorporates
the limitation as if written in the body of this License.
 
  9. The Free Software Foundation may publish revised and/or new versions
of the General Public License from time to time.  Such new versions will
be similar in spirit to the present version, but may differ in detail to
address new problems or concerns.
 
Each version is given a distinguishing version number.  If the Program
specifies a version number of this License which applies to it and "any
later version", you have the option of following the terms and conditions
either of that version or of any later version published by the Free
Software Foundation.  If the Program does not specify a version number of
this License, you may choose any version ever published by the Free Software
Foundation.
 
  10. If you wish to incorporate parts of the Program into other free
programs whose distribution conditions are different, write to the author
to ask for permission.  For software which is copyrighted by the Free
Software Foundation, write to the Free Software Foundation; we sometimes
make exceptions for this.  Our decision will be guided by the two goals
of preserving the free status of all derivatives of our free software and
of promoting the sharing and reuse of software generally.
 
                                                  NO WARRANTY
 
  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
REPAIR OR CORRECTION.
 
  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
POSSIBILITY OF SUCH DAMAGES.
 
                                    END OF TERMS AND CONDITIONS
 
 
 
 

Return to top of document