Day 5: Design

Traditionally, programmers sketched out a quick design on a scrap of paper and then started writing code. For small C programs, this method worked fairly well and, in truth, many useful programs were produced with only the flimsiest of preliminary designs.

C++ tends to be somewhat less tolerant of this approach. Countless hours of programming experience have shown that good C++ programs are designed well before coding begins. The reward for this diligence is programs that are flexible, extensible, and rock-solid.

The Personal Information Manager you will be writing over the course of this book is a nontrivial, real-world program and, as such, it deserves and demands an appropriate investment in design.

Today you will learn

How to approach designing the PIM
How to build the program in stages
How to plan for data storage, persistence, and high performance
How to plan for specification changes and extensions

The Specification

Specifications can usefully be divided into functional specifications and technical specifications. Functional specifications tell you what the user's experience will be like, and technical specifications tell you how the program actually will work.

Technical specifications then can be divided into class interfaces, database designs, inheritance diagrams, and so on.

The Functional Specification

ROBIN is, from a functional viewpoint, a Personal Information Manager (PIM) with which the user can store and quickly retrieve notes and other text-based information.

It is designed to provide quick access to the kinds of notes one usually writes in a notebook. The quick access, however, invites the user to use ROBIN as a phone book or a database of personal contacts. One can imagine storing away recipes, inventories of videotapes, and creating other general-purpose, simple databases.

What sets ROBIN apart from other database products is that the data is entered without regard to keywords, titles, subject headings, or other classifications. All the text in all the notes is indexed, and it later can be retrieved with a very simple interface.

Version 1 is designed to be a text-based product, with only a rudimentary command-line interface. Output is streamed to the console or screen and can be redirected to a file. Version 1 does not envision menus, dialog boxes, windows, buttons, or any of the other interface objects characteristic of graphical user interfaces, these are reserved for a subsequent release.

The command-line interface works with flags. A flag is a dash followed by a letter or word indicating a command. Every command can be abbreviated to a single letter. For example, -Index can be abbreviated as -I, and -Search can be abbreviated as -S.

Storing Data

The user stores data in ROBIN by one of three methods:

Direct entry
Providing a file as a note
Providing a file, each line of which is to be made into a note

To provide direct entry, the user enters

    ROBIN -I (Index)

followed by the text to index. All the text entered until the user presses Enter will be indexed on a new note.

Suppose that the user enters the following:

    ROBIN -Index Remember to put out the recycling on the second Tuesday of every month.

A new note will be created and time stamped with the following text:

    Remember to put out the recycling on the second Tuesday of every month.

The user can open any word processor or editor that saves flat-text files and enter as much text as wanted, without regard to new lines and paragraphs. The user then would save the file (for example, with the name NOTE1.TXT) and then feed that file to ROBIN by writing the following:

    ROBIN -File Note1.txt

ROBIN then would read the entire file and create a new, time-stamped note with the contents of that file.

Alternatively, the user can point to a file and ask ROBIN to incorporate a note that just references that file. In this case, the user might enter the following:

    ROBIN -Reference Note2.txt How to make chicken soup

This line would create a note with the text How to make chicken soup and a reference to the file NOTE2.TXT.

Finally, the user also can provide a file with records separated by a new line. Such a file might have phone book entries, with one entry on each line. If the file were called PHONE.TXT, the user would enter the following:

    ROBIN -Lines phone.txt

ROBIN would read the file and make a new time-stamped note for each line in the file.

A potential extension to this capability, not implemented in version 1, would be to let the users provide their own record delimiters, rather than using the new line character. Perhaps ROBIN could support the syntax -L";", where whatever appears between the quotation marks (in this case, a semicolon) would become the delimiter for new records. Thus, if the user wrote

    ROBIN -L";" myRecords.txt

ROBIN would open a new time-stamped note for each record, where a record is all the text between semicolons.

Searching for Data

The user searches for data by entering ROBIN followed by the command -Search and then a list of target words. The user might write the following line, for example:

    ROBIN -Search computer disk

This line will find any note with either the word computer or the word disk. Note that -? functions as a synonym for Search, so the user can enter the following:

    ROBIN -? computer disk

One issue to consider is whether to support wild cards. You may want to use comput*, for example, to match computer, compute, computing, and so on. A more advanced version of this capability would allow the user to enter p*y, which would match party, parity, pinkey, and so on. Some systems use the question mark (?) to match any single letter, so p??y would match pony and play, but not party. Version 1 will support only the wild card *, used at the end of a word to indicate any number of additional letters.

Future releases of ROBIN could be extended to include more powerful wild cards and perhaps Boolean search terms such as AND, OR, and NOT. You even may want to consider adding general regular expression parsing (GREP), which supports very powerful text searching.

Displaying Found Data

If the search returns no notes, the message Nothing found is displayed. If a single note is found, that note is displayed. If multiple notes are found, a menu of choices is provided.

In version 1, the user interface will be rudimentary. The menu of results will be numbered, with each entry displaying the first 30 characters of the note and its date. If the note is longer than 30 characters, the display will end with an ellipsis (). At the bottom of the message, a menu is displayed.

If the user chooses a note, it is displayed, and then the user is returned to the menu of notes.

Database Management

ROBIN should be fairly easy to maintain by the user. Periodically, however, the user will want to be able to delete notes and to pack the database. The user is provided with three tools for database management: Delete, Undelete, and Pack.

Every message has a message number, which is displayed along with the message itself. The user can enter

    ROBIN -Delete 123

to delete message 123. The system will display the first few lines of the message and the following prompt:

    Delete this message (y/N)?

The user can press Y and then Enter to confirm the deletion. Pressing any other letter or just pressing Enter cancels the deletion.

The user can request to see all deleted messages that have not yet been packed (removed from the database) by entering the following:

    ROBIN -Morgue

This provides a list of all notes marked as deleted, much like the list displayed after a search. A message can be undeleted by using the -Undelete flag. For example, ROBIN -U 123 will undelete message 123.

The deleted messages can be fully removed by packing the database with the -Pack flag. If the user enters

    ROBIN  -Pack

the system confirms with the following message:

    Really pack the database (y/N)

The user presses Y and Enter to confirm; pressing any other letter cancels the operation.

The system needs to know where to find its database of existing notes. It first searches for an environment variable, ROBIN. If this is set to a path, the system looks at that location for a file named ROBIN.DB. If the environment variable is not set, the system searches the user's path for the file. If ROBIN.DB is not found, the system displays the following message:

    No database found, create one (y/N)

Confirmation works in the usual way; if the user presses Y, a new ROBIN.DB file is created.

Getting Help

Entering ROBIN with no parameters prints the preliminary Help message:

    ROBIN (c) copyright 1994 Jesse Liberty Version 1.0
    Usage:  -Index | -File | -Reference -Lines | -Search (-?) | -Delete | -Morgue
     -Undelete | -Pack  | -Help

Entering ROBIN with -Help provides the Help prompt:

    Help is available on a number of subjects. Enter Help Help for a full explanation
    of Help, or enter Help followed by one of the following keywords: Index, File,
    Lines, Search, Delete, Morgue, Undelete, Pack, for help on that subject.

Help itself is just a set of notes provided with the software, which uses a special search to find the designated topics.

A Moving Target

Developers always want a solid, immovable specification coupled with the flexibility to change their own designs as needed. Reality is quite different; your clients will, no doubt, reserve the right to change their specifications at will. Other developers working on your project, on the other hand, will complain long and loud if you change the interface to classes on which they depend!

It would be nice if we were all smart enough to get our designs right the first time and to write code that compiles, links, and runs without bugs on the first attempt; none of that is likely any time soon, however. Design, like coding, is iterative; this is programmer talk for the need to keep trying after the first attempt fails miserably.

The Technical Specification

Supporting the functional specification with an object-oriented design takes a great deal of thought and effort. There is no one, clear, true, and perfect way to design anything, and the world is full of other designers who might do whatever you do differently.

That doesn't mean that there aren't good designs and bad designs, however, or that it is impossible to tell the difference between them. A good design is flexible and extensible, but not at the expense of being so general as to be insupportably complex. A good design adjusts to a changing specification, but still exhibits high performance and good reliability.

Iterative Design

I could cheat and write the whole program, debug it, release it, and then come back and write this chapter. It would be easy to do, and no one would be any the wiser. "What a genius," you would think, "his very first design worked perfectly." You would look inward with despair and lament "How come I can't get my design right the first time?"

Such an approach would leave you with that overwhelming feeling of inadequacy that technical writers delight in instilling in new programmers. But it would be a lie. No design of a complex program is 100 percent correct the first time you write it down. Let me rephrase that: No design of a complex program is 100 percent correct the first time I write it down.

What I will do, however, is present my unedited initial design, blemishes and all. I'll try to think out loud (okay, I'll try to think into my word processor, but let's not quibble). This will, I hope, give some insight into the design process.

Where Do I Start?

When creating a design from scratch, it is common to become overwhelmed by the complexity of the design. A good way to get grounded is to make a list of what you already know, and what you don't know.

In this case, you might write down that you know you will need notes, menus, and so on; but that you don't know whether there are derived types of notes, who will manage the display, how you will manage switching on the command line, and so on.

The second rule of good design is to concentrate on what 90 percent of the people will do 90 percent of the time. This rule was taught to me by Jay Leve at Citibank, and his point was that techies tend to get over-focused on boundary conditions (yes, but what if the user does this weird thing...). We lose sight of the 90-percent case.

Once your design handles the 90-percent case well, you always can work on the exceptions to the rule. You cannot stop at the 90-percent case, of course. Truly professional code is bulletproof and handles anything the user might possibly do, but that can be worried about fairly late in the design phase.

The third rule is not to confuse analysis with design. Although you will iterate over all of analysis, design, and programming, it is important to know when you are doing each. Analysis is the assessment of the problem domain, and design is exploration of the solution space. This goes back to differentiating between what you know and what you don't know.

It also is possible to become immobilized by conflicting designs. You finally decide on one approach, only later to realize that you should have gone with the earlier approach. You rip up your work, start over, and then halfway through decide you were right the first time.

This is called analysis paralysis (and a few other choice words). The only cure is, at some point, to make a decision and press onward. After you ship version 1, you may want to go back and redesign, but sooner or later it comes down to getting a product out the door. I would wager that more products have been destroyed by failing to ship than by poor design.

What Are the Initial Classes?

Certainly ROBIN will require a Note class. It is useful to append all the classes with a letter indicating that they belong to the same collection of classes, so I'll create an rNote class (pronounced are-note).

Because I'll be working with many rNotes, I'll need a Collection class, although it isn't immediately obvious what kinds of collections I'll need. A likely candidate is a sparse array. Sparse arrays are discussed in detail on day 9, but for now think of them as arrays with missing elements. The idea is to enable the user to delete rNotes without incurring any penalty, and to be able to recover the saved disk space.

I'll need a way to quickly find all the indexed words, both for when I'm adding new words and for when I'm conducting a search. Clearly, the words will have to be indexed with a tight and efficient structure, as described on day 7.

Regardless of what structure the words are stored in, clearly an rWord class will be required. Each rWord object will have the text of the term, and then a list of the rNotes associated with that rWord.

Assembling the menu will require yet another collection this time of rHit objects. An rHit object will point to an rNote and will keep count of the number of hits on that rNote. This count will allow ROBIN to display the notes in order of how well they match the terms.

Event-Driven Programming

When the user starts ROBIN with a flag, a series of actions follows. Typically, these actions end with the presentation of a menu. From that time until the program is exited, ROBIN must respond to the user's requests, known as events.

New Term: An event is anything that causes a response in your program. Typical events include the user pressing a key or clicking the mouse button, but also may include timer events and other interrupts provided by the system.

Event-driven programming is discussed in depth on day 13, but essentially the idea is to create a program that responds to the user's requests instead of imposing a sequence of actions on the user.

Object Orientation

The hallmarks of object-oriented design are encapsulation, inheritance, and polymorphism. The central idea of encapsulation is to ensure that implementation details of the program are hidden in the objects most likely to need to know those details. Rather than your program maintaining an omniscient overview of every detail, it delegates responsibility for much of the action to the various classes.

There is no reason for any part of your program other than the rWord class itself, for example, to know how rWord objects are kept in memory. The memory management is an implementation detail that the clients (users) of rWord don't need or want to know. It may be that rWord objects are kept in a linked list or a binary tree (see Day 7), but this is a detail wholly encapsulated by the rWord class itself.

Inheritance refers to the capability for new types to be built up out of existing types. When setting out to create an object-oriented design, you want to look for is-a and has-a relationships. An is-a relationship implies public inheritance, and a has-a relationship probably implies containment or perhaps private inheritance.

In the case of ROBIN, few such relationships are obvious immediately. The rWord object is not a kind of rNote, nor is the rNote a kind of rHit. Both rWord and rNote seem to be at the top of the inheritance chain. Over time, however, different kinds of rNotes may emerge. Perhaps, for example, the Help notes really are a different but related type. Perhaps Help notes need additional data or behavior, even though they are rNotes at heart. This would imply that rHelp objects are a kind of rNote and should inherit publicly from rNote. It is too early to be certain, however, so I'll just file that away as a possibility.

Polymorphism refers to the capability to treat objects in a common way and to trust the individual objects to "do the right thing." Therefore, if you do create an rHelp object and derive it from rNote, you can tell either object to display(), and each will respond according to the semantics of its own type.

Getting Started

When the user enters the following, a new note is created, indexed, and stored:

    ROBIN -I the quick brown fox

Now, stop right there. Who creates the note?

It would be nice to have a command-line parser that takes the flags and other terms entered on the command line and assumes responsibility to process them and create the right objects. Command-line processing is discussed in detail on day 6, but for now it will suffice to say that it is possible to read in these options as they are entered by the user, and to switch on the flags provided.

This implies that an rCommandLine object is needed, whose job it is to process the command line and create the right kinds of command objects. The command family might include the indexer, the searcher, and the database manager.

Presumably, the CommandLine object instantiates the right kind of command object, and that object in turn begins a process of fulfilling the request. In the case under discussion, an Indexer object would be created and would take, as the parameters to its constructor, the text of the message.

This constructor might be overloaded to take a file name, allowing for the -F flag as well as the -L flag. It may prove true that the indexer needs to be subclassed into two or three subclasses; such as CommandLineIndexer, FileIndexer, and LineIndexer; but that will become clearer as the design evolves.

The indexer will be responsible for creating the rNote, but not for storing the rNote in its collection, or for writing the rNote to disk; these functions will be the responsibility of the rNote constructor.

Additionally, the indexer will need to create rWord objects for every word in the rNote's text. These rWord objects will, likewise, be responsible for storing themselves in the index, as well as for writing themselves to disk.

It isn't clear whether the indexer has much to do that isn't covered in these other classes, so I'll consider having the CommandLine object just create an rNote object, bypassing the rIndexer altogether. The rNote then could index itself and then store itself, without any other help at all.

If the user enters an -S flag, the rCommandLine object will not create an rIndexer; instead, it will create an rMenu object. The rMenu will need to know how to get the index of words, and how to interact with that index to create a menu of rHits.

Likewise, if the user enters one of the database commands (such as Pack or Undelete), the rCommandLine object probably will create an rDataBaseManager object, which will take over responsibility for managing the database.

In summary, and after reflection, here's the plan: main() will pass the command line to the CommandLine object, which will read and switch on the flag. When it reads -i, -l, or -f, it will create a new rNote, calling the appropriate constructor and passing in the flag and the rest of the command line.

When the rNote is constructed, it will instantiate rWord objects for every word in the note; then it will time-stamp itself and store itself away.

If the rCommandLine object receives -S or -?, it will create an rSearcher object; and if it receives a database switch, it will create an rDataBaseManager object. The rCommandLine object also will be responsible for managing errors on the command line, and possibly for working with the environment variables to find the appropriate files.

Abstract Data Types

The program allows for the creation of two types of notes: with and without file references. Most of the objects don't really care which type of rNote is being used, but the two types must be distinguished in their behaviors.

The need to distinguish between notes with and without file references indicates that the two types of notes are related, but in what way? Call the type of note with a file an rFileNote, and call the type with all its data kept as text an rInternalNote. Does one inherit from the other?

An rFileNote might at first appear to be a type of rNote, indicating that it is derived from rNote, but there are methods in rNote that rFileNote doesn't need or care about. On the other hand, rFileNote has data and methods relating to the referred to file that rNote certainly doesn't need.

The answer is to create a common base class, rNote, and inherit both rFileNotes and rNote from that base. Because you never will create a pure rNote, you can make the rNote an abstract data type. Most of the clients of your code will deal with references or pointers to rNotes, but the actual objects will be of one of the derived types.

Searching

When the CommandLine object switches on an -S flag, it will instantiate an rMenu object, passing in the terms on which to search. The rMenu object will ask the collection of rWords to find each term in turn. It then will ask each rWord for its list of rNotes, adding rHit objects to its own collection for each rNote as it is found. The rMenu will add only one rHit object for each rNote; multiple hits on the same note will be tracked by a counter in the rHit object.

Note that each object does only those things that make sense for it to know how to do. The collection knows how to find each matching rWord. The rWord itself knows which notes it is associated with. The rHit object knows how many hits have been registered on each rNote.

The Collections

Although it is too early to describe the various collections in detail, some aspects now are well understood.

Make note of the fact that all our Collectable classes (rWord, rNote, and rHit) probably will need nodes as well. This is a good indication that a template may be required, even if these node objects may be stored in different types of collections.

Getting Started

It is impossible to write the program all at once, and it isn't wise to try to do so. My preference is to write a small part, get it working, and then write some more.

That said, I do find it useful to write a preliminary version of the interface to the principal classes even before getting any of it working. This provides a reference point, although very often the initial interface to the class is barely recognizable in the final product. Listing 5.1 provides a first cut at an rNote class.

Listing 5.1 Using the rNote Class

    1:     // Listing 5.1 - Using the rNote Class
    2: #include "string.hpp"
    3:
    4:     class rNote
    5:     {
    6:     public:
    7:
    8:        // constructors
    9:        rNote(char Flag, const String * const text);
    10:       rNote(const DeSerializer&); // for reading from disk
    11:       rNote(const rNote&);  // copy constructor
    12:       ~rNote();
    13:
    14:       // accessors
    15:       const String& GetText() const { return *myText; }
    16:       ULONG GetNoteNumber() const { return myNoteNumber; }
    17:       ULONG GetCreationDate() const { return myCreationDate; }
    18:       ULONG GetModificationDate() const { return myModificationDate; }
    19:       BOOL IsDeleted() const { return myIsDeleted; }
    20:
    21:       void SetText(const String& rhs)  { *myText = rhs; }
    22:       void SetNoteNumber(ULONG num)  { myNoteNumber = num; }
    23:       void SetCreationDate(ULONG newDate) { myCreationDate = newDate; }
    24:       void SetModificationDate(ULONG newDate) { myModificationDate = newDate; }
    25:       void SetDeleted(BOOL flag) { myIsDeleted = flag; }
    26:
    27:       // operators
    28:       const rNote& operator=(const rNote&);
    29:       DeSerializer operator>>(DeSerializer &);
    30:
    31:       // Display
    32:       void Display();
    33:
    34:       //
    35:
    36:    private:
    37:       String* myText;
    38:       ULONG myNoteNumber;
    39:       ULONG myCreationDate;
    40:       ULONG myModificationDate;
    41:       BOOL  myIsDeleted;
    42:
    43:    };

Output:

None. This is a class declaration.

Analysis:

The rNote object takes three constructors. The first, shown in line 9, is called by the CommandLine object, and takes a flag (I, F, or L) and a block of text. There is no convenient way to overload this function based on whether the text actually is to be indexed or is the name of a file; it is simpler to just have the constructor switch on the flag.

The second constructor, in line 10, takes a DeSerializer reference. Although the design has not yet specified how objects will be created from disk, it is clear that a constructor will be required that will take some sort of object from which the rNote will be created. Persistence is covered in depth on day 8, "Writing Objects To Disk", and at that time the nature of this DeSerializer can be determined.

A copy constructor (line 11) and an operator= (line 29) are provided, because this class controls dynamic memory in its myText pointer.

Various accessor functions (lines 15 through 25) allow the client to obtain and to set the various member data, which itself is in lines 37 through 41. The operator>>() function, shown in line 29, is a stand-in for whatever function will be used to serialize the object's data to disk.

The rWord Class

Although rNotes consist of strings, the user actually will search on words, as described earlier. The preliminary rWord class is shown in listing 5.2.

Listing 5.2 Using the Preliminary rWord Class

    1:      // Listing 5.2 - Using the Preliminary rWord Class
    2:
    3:      class rWord
    4:      {
    5:
    6:      public:
    7:
    8:         // constructors
    9:         rWord(const String&);
    10:        rWord(const char * const);
    11:        rWord(const DeSerializer&); // for reading from disk
    12:        rWord(const rWord&);  // copy constructor
    13:        ~rWord();
    14:
    15:        // accessors
    16:        const String& GetText() const { return *myText; }
    17:        void SetText(const String& rhs)  { *myText = rhs; }
    18:        const rNoteNode& GetHeadNode();
    19:        void SetHeadNode(const rNoteNode&);
    20:
    21:        // operators
    22:        const rWord& operator=(const rWord&);
    23:        DeSerializer operator>>(DeSerializer &);
    24:
    25:        // data manipulation
    26:        long AddToIndex();
    27:
    28:        // Display
    29:        void Display();
    30:
    31:        //
    32:
    33:     private:
    34:        String* myText;
    35:        rNoteNode myFirstNode;
    36:
    37:     };

Output:

None. This is a class declaration.

Analysis:

Listing 5.2 represents a preliminary first take at the rWord class declaration. There are four constructors that provide the capability to create an rWord object from a String object (line 9) or from a C-style string (line 10). The constructor in line 11 creates rWord objects from disk, and line 12 includes the copy constructor.

The rWord object contains a pointer to the word itself, along with an array of nodes, each of which point to an rNote object.

Each rNoteNode will contain the offset of the word in the text, as well as the ordinal position of the word in the text. This data will be unused in version 1, but is reserved for future use. The offset will be used to highlight the word in the text after ROBIN is ported to a GUI. The ordinal position will be used for more sophisticated searching, so that the user can request matches of this word in proximity to another word (computer within five words of screen, for example). Listing 5.3 provides the preliminary interface of the rNoteNode object.

Listing 5.3 Using rNoteNode

    1:      // Listing 5.3 - Using rNoteNode
    2:
    3:      class rNoteNode
    4:      {
    5:
    6:      public:
    7:
    8:         // constructors
    9:         rNoteNode(const rNote&);
    10:        rNoteNode(const rNote&, ULONG offset, ULONG position);
    11:        rNoteNode(const DeSerializer&); // for reading from disk
    12:        rNoteNode(const rNoteNode&);  // copy constructor
    13:        ~rNoteNode();
    14:
    15:        // accessors
    16:        ULONG getNoteNumber() const { return myNoteNumber; }
    17:        void setNoteNumber(ULONG number) { myNoteNumber = number; }
    18:
    19:        ULONG getOffset() const { return myOffset; }
    20:        void setOffset(ULONG number) { myOffset = number; }
    21:
    22:        ULONG getPosition() const { return myPosition; }
    23:        void setPosition(ULONG number) { myPosition = number; }
    24:
    25:        rNoteNode* GetNext() { return myNextNode; }
    26:         void SetNext(rNoteNode* next) { myNextNode = next ; }
    27:
    28:        // operators
    29:        const rNoteNode& operator=(const rNoteNode&);
    30:        DeSerializer operator>>(DeSerializer &);
    31:
    32:     private:
    33:        ULONG myNoteNumber;
    34:        ULONG myOffset;
    35:        ULONG myPosition;
    36:        rNoteNode *myNextNode;
    37:     };

Output:

None. This is a class declaration.

Analysis:

Listing 5.3 provides the preliminary interface to the rNoteNode class. When a word is added to the index, the rNote in which it appears must be registered, along with its position and offset in that rNote. To manage this relationship, an rNoteNode is created and added to the rWord's list.

The constructor for rNoteNode can take an rNote object, either with the offset and position of the matching word (line 10) or without (line 9). rNoteNodes can be stored to and retrieved from disk (lines 11 and 30), and a copy constructor (line 12) and operator=() (line 29) are provided as well.

Reviewing a Technical Specification

With the preliminary class interfaces and decisions made earlier in this chapter, you are ready to outline your technical specification sufficient to start writing the program.

The classes will be arranged in a number of data structures, all of which can be written to and read from disk. rWord objects will be indexed for quick retrieval in a tree structure to be determined later. Each node in that tree, however, will consist of an rWordNode object.

The rWord also will maintain a list of rNoteNode objects, each of which will provide access to the rNote in which the word appears, along with its offset and ordinal position.

When a menu of choices is created, it will consist of a list of rHit object nodes. Each rHit object will maintain a link to the particular rNote, and will provide a count of the number of matches in the current search. This method allows the menu to be sorted by how well each rNote matches the search.

Next Steps

To get started, you first will want to explore how the command flags will be read and processed. This process is covered on day 6 in an in-depth study of command-line processing, environment, and configuration files.

Next, you will want to create the set of indexed words, so on day 7 you learn about trees and complex data structures.

After you can read the command line and index the words, you need to be able to save that index to disk so that you don't have to re-create it each time you rerun the program. On day 8, you learn about object persistence.

The rMenu will be a collection, but probably not an indexed tree as the list of rWords will be, so on day 9 you learn about Collection classes. On day 10, you learn about providing iterators to your collections.

With these skills, you will be able to build a solid preliminary version of ROBIN. On day 11, you learn how to manage the growing database and keep the data free of corruption.

Up until day 11, you will be working with single words, but ROBIN needs to be able to parse all the words from a string. Therefore, on day 12, you learn more about text parsing. At that point, it will be time to turn to the user interface, building the menu of choices and responding to the user's choices. Day 13 includes an in-depth discussion of event-driven programming.

The second week ends with a review of object database issues, and here you learn how to delete and pack records, as well as how to provide the Undelete facility.

The third week takes a detour to talk about manipulating bits, which can be an important skill in working with databases. On day 16, you learn some advanced memory-management techniques.

On day 17, you look at providing synonyms for search terms and on reducing the size of your index through the use of stop words.

On day 18, you examine the performance characteristics of your program; on day 19, you explore advanced exception handling and error management. Day 20 reviews techniques for debugging your application, and the three weeks end with a discussion of marketing your program and enhancing it.

Other Designs: CLiDE

Although this book focuses on the Personal Information Manager, it can be beneficial to take a look at other products and designs from time to time. Doing so can help bring out other approaches and algorithms, and these then can be contrasted with the design of ROBIN to help build a more rounded picture of what professional programming is all about.

With this goal in mind, I will describe a second project--a command-line directory enhancer named CLiDE. As shown in the coming chapters, CLiDE provides all the normal directory capabilities (listing the files, showing their attributes, sorting them, and so on), along with some additional features that you will not see implemented but which I leave, as they say, as an exercise for the reader.

CLiDE's Functional Specifications

As stated previously, all good programs begin with a specification and a design. From a functional viewpoint, CLiDE is a powerful command-line utility that enables you to list files, combine directories, filter and sort the output, and call on DOS commands and internal commands to take action.

In version 1, CLiDE implements the capability to sort the directory based on file name, extension, size, and date. It also enables you to filter based on the usual DOS and UNIX wild cards. Internal commands such as Copy, Move, View, and others will be planned for, but not implemented, in version 1.

Portability Issues

One of the first issues facing such a utility is that it is, by its nature, operating-system specific. Although UNIX and DOS directories seem similar, they are quite different in some critical ways (for example, DOS doesn't use INODES). The MAC System 7 is so different that even the functional specification will not match.

If you don't have a DOS machine, don't worry; you will not spend much time implementing CLiDE in any case. I'm using it just to illustrate some of the other issues that might arise in object-oriented design.

The User Interface

To continue with the functional specification, you use CLiDE by typing CLIDE at the DOS prompt, followed by one or more switches and one or more file specifications. The switches are each a single letter, and they can be combined (you can write CLIDE -i -a -o or you can write CLIDE -iao, for example). The switches can appear in any order, and can follow or precede the file specifications.

Some switches require an argument; these always are followed by a colon and then the argument, with no spaces. In future versions, CLiDE will allow advanced wild cards in the file specifications, including general regular expressions. For version 1.0, however, only the standard DOS wild cards will be supported.

The result of a CLiDE search will be a set, and sets can be named or unnamed. Sets can be mixed and matched with each other, using set operators such as Union and Intersection. Named sets can be saved in CLiDE configuration files, and recalled as needed. A set is not a group of files, it is more like a stored query against the file system. A set is a specification for a sorted, filtered group of files.

The Technical Specification

CLiDE will attempt as much as possible to isolate the platform-specific code from the platform-independent code. It is anticipated, for example, that the DOS version of CLiDE might be migrated to Windows and then to Windows 95.

What Are the Objects?

Even a preliminary design of CLiDE is likely to include directory objects, file objects, sets, collections, filters, and sorters. One can imagine that the CommandLine object from ROBIN might work with CLiDE as well; parsing the flags here is a superset of what is required for ROBIN.

You immediately confront this question: Are directories collections of files, are they types of files, or are they something altogether different? At first glance, a directory might seem to be a collection, because the contents of the directory are files and other directories. DOS, however, considers directories and files as being very much the same when it comes to locating and displaying their names.

It is possible to combine both concepts into one hierarchy of objects by saying that directories inherit publicly from files and therefore are a kind of file, but that directories add the capability to contain files and other directories. A directory therefore would inherit much of the display and other characteristics of the file, but would add its own methods to manage its collections.

A second immediate design question is this: Do sets know how to filter their contents, or are there filter objects that know how to act on sets and directories? Similarly, do sets know how to order their collections, or are there sorters that know how to do this?

Prototyping

One way to test your tentative answers to these design questions is to build a prototype. Because CLiDE is a large, complex utility, there is great risk inherent in plowing ahead with a design before you have done any testing.

Building a prototype provides a safer alternative, one that enables you to prove your early design assumptions and explore which areas of implementation are likely to be most difficult.

Carefully deciding which methods to implement first is more important than it might sound at first. It might be tempting to get all the sorting working before worrying about the filtering, but that would undermine the point of the early prototype version. The idea is to get one sort and one filter criterion working, to display one type of directory listing, and to implement other representative methods.

Sorting, Filtering, and Displaying

The essence of CLiDE is this: Read the command-line switches, find the requested files, filter based on the switches, sort based on the switches, and then display 20 lines at a time.

CLiDE is, at least at first glance, quite procedural. The actual work of assembling the files is likely to be a large for loop that iterates through the different files in the requested directories and applies the filters. After the list is assembled, it will need to be sorted and then displayed.

Version 1 of CLiDE probably isn't even event driven; after the files are displayed, the program exits. This will change when CLiDE is ported to a windowing environment, where the user might ask for files to be moved or copied by dragging-and-dropping within the GUI or by choosing menu commands.

Object-oriented design and programming does have a role to play in the creation of CLiDE, however. The relationship between the files and the directories and sets cries out for inheritance. The discrete jobs of sorting, filtering, and displaying call for encapsulation. Finally, the capability to treat directories and files as a single type, asking each to display itself or otherwise be manipulated, looks very much like a candidate for polymorphism.

CLiDE's First Steps

Over the next few days, you will see many of the component requirements for CLiDE: sorting, filtering, parsing the command line, and so on. Even as you work on ROBIN, keep an eye out for how the lessons apply to CLiDE and to other projects of your own that you might have in mind. Think of CLiDE as ROBIN's dopey younger brother along for the design ride and taking the hand-me-down algorithms.

Q&A

Q: Why do C++ programs require more analysis and design than C programs?

A: The purpose of C++ is to make it possible to write much larger and more complex programs than were practical in C. The cost, however, is more time spent thinking before you start to code.

Q: Is analysis and design in C++ different than in C?

A: Yes. You think in a very different way. In C, or in any procedural language (BASIC or Pascal, for example), you start the analysis by asking yourself "What is going to happen? What are the significant operations?" In C++, or in any object-oriented language (Smalltalk or Eiffel, for example), you start by asking yourself "What are the important things in this problem domain? What are the responsibilities of each thing? What does it need to know?"

Q: Why are inheritance and polymorphism treated as different things? Isn't polymorphism just one of the uses for inheritance?

A: In C++, polymorphism is implemented via the inheritance hierarchy: You define an abstract type with some virtual functions, and then you override the functions in your subclasses. In other object-oriented languages, however, polymorphism has nothing to do with inheritance. In Smalltalk, for example, any object can be sent any message (the equivalent of calling a method), as long as the class has declared an operation for that message. There is no need for two classes to share a common base class in order for them to be able to respond to the same message.

The term inheritance (when talking about encapsulation, polymorphism, and inheritance) refers to the idea of subclassing existing, working classes; modifying only a small part of their functionality; and very quickly producing other working classes. Although C++ uses the inheritance relationship to implement polymorphism, these two terms are not intrinsically related.

Q: How detailed should the technical specification be? What is its purpose?

A: The technical specification should be detailed enough that you understand the types of data that will be important, and the range of operations that you will want to do with the data. You should be able to comprehend the breadth and borders of the problem domain. There isn't much point to specifying the user interface in minute detail, because it probably will change three or four times before you ship the product anyway.

In ROBIN, for example, the basic functionality has been set down, and a very simple command-line user interface has been proposed for invoking the different operations. This is good enough to know what operations are needed, and the user interface will be fine for testing those operations, but it probably will change to something more user friendly before we "ship."

Q: How detailed should my analysis and design be before I start implementing?

A: This depends on the scope of the project. For a small, one-person project, you probably can get away with just a few notes scribbled on a sheet of paper. For a large, 40-person effort, you will need much more. Remember that the goal of the initial design is to implement something that works. The sooner you get something working, the sooner you will locate the inevitable design bugs. With this goal, you should identify the classes that are needed for minimal functionality and design those, considering (but not designing in detail) the immediate clients of these classes. Then get started on the code.

Q: How do I know when to use an abstract data type?

A: Whenever you are designing the public interface for a class, you should be thinking about the clients (the other classes or code) that will use this class. You need an ADT when your clients want to use more than one class as if they were all of the same type, but each class needs its own mutually exclusive data or behavior.

In ROBIN, for example, two types of notes exist: the regular notes that are stored in the file ROBIN.DB, and the notes that have the body stored in another file. For the latter type, only the file name is stored in ROBIN.DB.

Think about the clients of these two types of notes: They are whatever classes display notes, and whatever indexes notes. You may not know much (right now) about how the notes will be displayed, and less about how they will be indexed, but you know that the job of writing the displayer and the indexer will be much simpler if they just have to deal with the abstract concept of an rNote. It's true that the process of getting the text for one type of note is much harder than getting it for the other, but that is the problem of the specific types of notes. The displayer and the indexer don't care, and shouldn't have to care.

Workshop

The Workshop provides quiz questions to help you solidify your understanding of the material covered, and exercises to provide you with experience in using what you have learned. Try to answer the quiz and exercise questions before checking the answers in Appendix A, and make sure that you understand the answers before continuing to the next chapter.

Quiz

What are the hallmarks of a good design?
What are the major steps involved in creating a piece of software?
What is the first step of the analysis and design of a program, after you have figured out more or less what the program is going to do?
In ROBIN, what is the purpose of the rNote class?
In ROBIN, what is the purpose of therWord class?
In ROBIN, what is the purpose of therHit class?

[Click here for Answers]

Exercises

Suppose that you are designing an e-mail system. What will it need to do (the functional specification)? You don't need to detail how the user will invoke these operations the way you did for ROBINjust identify the operations.
What are the major classes in the problem domain? (Remember, skip the storage and transport issues.)
Flesh out what is contained in each of your four major classes. Identify the type of the data members, if it isn't obvious.
The To field in both MailMessage and MailingList seems awkward. Any ideas? (Hint: What should you think about when two things are used in the same way by some clients?)
Redo exercise 3, in light of your clever design decision in exercise 4. Be sure to show the inheritance relationships.

Go to: Table of Contents | Next Page