home *** CD-ROM | disk | FTP | other *** search
Text File | 1989-07-31 | 9.4 KB | 299 lines | [TEXT/nX^n] |
-
-
- Dotty Plotter
- © 1989 by Don Gilbert
-
- version 1.0c
-
-
-
-
-
-
- Dotty Plotter is a tool for drawing dot matrix comparisons of sequences in
- molecular biology.
-
- Dot plots are used to view all areas of homology between two nucleic acid or protein
- sequences. Dot plots are useful for determining if there are one or more segments of
- similarity between sequences.
-
- The dot plot is generated by lining up the sequences, and plotting a dot where bases in
- sequence A match bases in sequence B. The major diagonal is the line of matches when A
- and B are lined up from start to finish. Each diagonal off the major diagonal is the line of
- matches when A is shifted left or right from the start of B. A dot, or match, is placed
- where the bases in a given range, or window, produce a certain number of matches, or
- stringency. For perfect matches the stringency is equal to the window (n:n). For a dot at
- every matching base, the stringency:window is 1:1.
-
-
-
-
- Dotty Plotter has 3 "views"
-
- • The text edit view, where you can view or type any text (including sequence files).
-
- • The sequence view, where sequences are listed one per line, for selections, alignments
- and editing.
-
- • The dotplot view, where two sequences are plotted against each other to compare regions
- of similarity.
-
-
-
- Thi release of Dotty Plot contains a minimum of features, and potential bugs. I make it
- available with the provision that you provide me with suggestions and problems you
- encounter using it. I hope to improve Dotty Plotter's sequence size and memory limits,
- and add features. Please send me your comments if you wish to see improvements.
-
-
- Don Gilbert
-
- BioComputing Office
- Biology Dept., Indiana University
- Bloomington, IN 47405
- Bitnet: GilbertD @ IUBACS
- Internet: GilbertD @ Gold.Bacs.Indiana.Edu
-
-
-
-
-
-
- Input Data
-
- Dotty Plotter currently accepts data in several standard formats, including UWGCG,
- GenBank, Stanford / IG, EMBL, NBRF/PIR, Fitch, Pearson, and DNA Strider (see
- Appendix). It also accepts unformatted sequences. The data files should be of plain TEXT
- type. DNAStrider native format sequences must be converted to plain text with the
- File:Write menu option of DNAStrider.
-
-
-
- Editing Data
-
- The text edit view of Dotty Plotter is similar to basics of most Macintosh programs that
- process text. You can open a window from a text file of sequence data, or create a new
- one, edit it and save it. There is a limit of 32,000 characters per text window.
-
- Sequence view of Dotty Plotter displays one sequence per line (see Fig. 1). The top line of
- this view marks sequence position. You can select all of a sequence for analysis by
- double-clicking on it. You can select a portion of a sequence by either (a) mouse down on
- the starting base, then drag the mouse, with button still down, to the end base, or (b)
- single-clicking on the start base, scrolling to the end base, then shift-clicking (click with
- mouse while holding down the shift key). If you click on the name of a sequence, at the
- left side of the window, an information box will list the sequence length and range of any
- active selection.
-
- In this release of Dotty Plotter, the sequence view is not editable. There are also display
- problems for sequences longer than about 3000 characters (with 12 pt font). You can
- increase the display length by reducing font size. Also you must use monospaced fonts
- such as Courier and Monaco for a sequence to line up properly with the position marks.
-
-
- Dot Plots
-
- Two selected sequences are compared to each other, or one sequence to itself, for a dot
- plot. When you have one or more sequence views open, you should select the base range
- of one or two sequences to compare using mouse and shiftkey-mouse selection methods,
- as per standard Macintosh editing. Then select the Format Dot Plot item. When
- you select the Format Methods… menu item, an option dialog is displayed (see Fig.
- 1). Here you may set the window width and number of matches (stringency) per window.
- The Plot all dots check box will plot all dots in a match window, rather than just one
- dot in middle of each window that contains a match. If your top window is a dot plot, then
- the Methods… item will change options and redraw that window.
-
-
-
-
- Editing Plots
-
- Maybe in a later version. For now, save a plot to disk or clipboard as a PICTure file and
- edit with your favorite drawing program (MacDraw, Canvas, SuperPaint… ).
-
- The sizing options, Reduce, Reduce to fit, Enlarge, Normal size, affect
- only the screen display of the plot. Each plot is sized to fit your printer page size (Page
- Setup selects this). Future versions of Dotty Plotter may include a drawing size selection
- for multiple page plots.
-
-
-
-
- Saving Plots
-
- If you get a drawing that looks presentable, you may print a drawing (File Print), save
- it as a standard Macintosh PICTure file (File Save), or copy it to the clipboard (Edit
- Copy). The Page Setup item can be used to configure page size, and whether to print in
- landscape or portrait orientation.
-
-
-
-
- Speed of Dotty Plotter
-
- Times for 2000 x 2000 sequence comparison (Blue.Seq x BlueKsm.seq) with a 15/25
- stringency/window, results in 2407 dots plotted.
-
- Mac SE/30 40.seconds
- Mac II 53.seconds
- Mac SE 230.seconds
- µVax II 229.seconds
-
-
-
- Some Limitations of Dotty Plotter
-
- • Display limitation: About 3,000 bases per sequence can be selected at 12 pt font .
- Reducing font size increases number of bases that can be selected.
-
- • Plots are limited to one page only (page may be any size that printer can handle).
-
- • Only one sequence at a time in a file/window containing several sequences may be
- selected.
-
- • Sequence symbols are compared in a case-sensitive, verbatim manner. The base "A"
- does not match the base "a". Ambiguity codes are not recognized as such, nor are match
- probabilities used. The following symbols are defined as valid sequence symbols, all
- others are ignored when reading sequence strings:
-
- seqCharSet:= ['A'..'Z','a'..'z','_','@','+','-','*','.','&'];
-
-
-
-
- Figure 1. Options dialog in Dotty Plotter.
-
- Note the two sequence windows (blue.seq and blueksm.seq). The blueksm.seq window
- shows sequences up to position 501 as selected (dark).
-
-
-
-
- Figure 2. Sequence Information.
-
- Sequence information is displayed by clicking on the name of a sequence.
-
- APPENDIX
- Sequence formats known to Dotty Plotter
-
-
- Stanford/IG format
- ;comments
- ;...
- seq1 info
- abcd...
- efgh...1 (1 or 2 = terminator)
- ;another seq
- ;....
- seq2 info
- abcd...1
- --- for e.g. ----
- ; Dro5s-T.Seq Length: 120 April 6, 1989 21:22 Check: 9487 ..
- dro5stseq
- GCCAACGACCAUACCACGCUGAAUACAUCGGUUCUCGUCCGAUCACCGAAAUUAAGCAGCGUCGCGGGCG
- GUUAGUACUUAGAUGGGGGACCGCUUGGGAACACCGCGUGUUGUUGGCCU1
-
- ; TOIG of: Dro5srna.Seq check: 9487 from: 1 to: 120
- ; another sequence here...
-
-
- Genbank format
- LOCUS seq1 ID..
- ...
- ORIGIN ...
- 123456789abcdefg....(1st 9 columns are formatting)
- hijkl...
- // (end of sequence)
- LOCUS seq2 ID ..
- ...
- ORIGIN
- abcd...
- //
-
-
- NBRF & PIR format
- > seq1 id
- ?? junk 2nd line
- abcdefg...
- hijkl...
- > seq2 ID
- ?? junk
- abcd....
-
- (from uwgcg's ToNBRF)
- >DL;DRO5SRNA
- Iubio$Dua0:[Gilbertd.Gcg]Dro5srna.Seq;2 => DRO5SRNA
-
- 1