Frakes - WISR9 Position Paper

Domain Engineering Education

Bill Frakes

Virginia Tech

Computer Science Department

Falls Church, VA 22042

Tel: (703) 698-4712, fax: (703) 698-6062

Email: wfrakes@vt.edu

URL: http://frakes.cs.vt.edu

Abstract

This paper presents some observations and research questions about domain engineering, based on academic and industrial teaching and research.

Keywords: domain analysis, education, systematic reuse

Workshop Goals: Discuss domain engineering and education issues with other active researchers.

Working Groups: Key issues and educational methods for domain engineering.

1 Background

Domain engineering is the key to systematic software reuse. Reuse education is also a key factor in making systematic reuse work. Based on many years of experience teaching domain engineering in both industry and academia, several research problems in this area have emerged. Some of these issues have been discussed in working groups at previous reuse workshops e.g. [ASSET 93], but are still unresolved.

2 Position

To learn how to do domain engineering, people often seem to need to be shown or work through a problem in a domain they know well.

Searching and retrieval is much more difficult and important than many researchers believe. Most students don't know how to do it.

Finding example domains for instruction is difficult.

The parts of domain analysis that can be automated are easiest to teach. The parts requiring induction, model building, pattern matching and so on are much more difficult.

Linking domain analysis and domain implementation is critical, but insufficiently understood. [Frakes 98]

3 Approach

I have been teaching systematic reuse and domain engineering for many years now, both in industry and academia. My observations here are based on data collected from these courses, and on research on domain engineering and systematic reuse, e.g. [FPF98]. One source of data was an advanced topics graduate course I taught at Virginia Tech in the spring semester of 1998 (http://sarvis.cs.vt.edu/~frakes/6704s98.html). The course was presented live to seven students at the NOVA campus, and via two way television to seventeen students at the Blacksburg campus. Most of the NOVA students were part time students who work in technology companies around the beltway. Most of the Blacksburg students were full time. There was a midterm, and a book report on a reuse book required of each student.

I also had each student do a domain analysis of conflation algorithms [Frakes 92]. Term conflation is the process of relating variant word forms. Words may be related semantically or phonetically. Semantic relationships involve words that have a common meaning but variant forms. Phonetic relationships involve words with the same or similar sounds, but different spellings. Phonetic relationships involve words that sound the same, but are spelled differently - e.g. Khan, Cohen, Kahn, Kohn. This is usually handled with algorithms like soundex. I had previously tried using the domain of information retrieval systems, but found it too large and complex for a semester course.

I used the DARE domain analysis method in the class, and used the following major steps.

1.Domain Feasibility Analysis

2.Scoping the Domain

3.Domain Search

4.Develop Domain Book Structure

5.Vocabulary Analysis

6.Code Analysis

7.Expert Analysis

Project deliverables were,

- a domain search

- domain scope document

- facet/template analysis

- code analysis

- a generic architecture

- a feature table

- implementation of at least one reusable component either n-gram or successor variety (hafer-weiss)

- certification of component(s)

- a written summary of each student project describing what they did and how they did it.

- implementation of an application generator (extra credit)

There was much early reuse work on the searching and library problem. A consensus then developed that this was largely a solved problem, and that the main research in the area needed to focus on how to create reusable artifacts. I no longer believe that this is true. Students had real difficulty finding existing domain specific artifacts on the web. Many implementations of code components in this domain were available, but students couldn't find them. Typical of their input was the following email I received from the student who eventually relieved the highest grade in the class.

I'm still a little confused about what we should produce for the code analysis part of the project. I know we will try to come up with a generic architecture by looking for similarities in the code. I think this will be hard, considering the fact that I have only found code for one algorithm (Porter). Are we supposed to compare different implementations of the same algorithm?

I found in working with the students that they did not know how to formulate good search queries.

Another problem in teaching domain engineering is that students seem to need to develop a somewhat deep understanding of a domain in order to analyze it and thus understand the domain engineering process. I often find this in industrial teaching where students are very uneasy until examples can be given in a domain they know. I believe Don Batory has also noted this.

Another problem is finding an example domain that is rich enough to require a domain analysis, but small enough to be analyzed within a semester or an industrial class.

The parts of domain analysis that we have been able to automated, such as domain text analysis are easiest to teach. Much of domain analysis still lacks a repeatable process.

References

[ASSET 93] ASSET_A_541: Reuse Education and Training Workshop Proceedings, October 1993

[FPF98] Frakes, William , Ruben Prieto-Diaz and Christopher Fox, DARE: Domain analysis and reuse environment Annals of Software Engineering 5 (1998), 125-141.

[Frakes 98] Frakes, Bill, "Linking Domain Analysis and Domain Implementation", Proceedings of Fifth International Conference on Software Reuse. 1998. Victoria, BC: IEEE CS Press, pp. 348-349.

[Frakes 92] Frakes, W. B., "Stemming Algorithms", in Frakes, William B. and Baeza-Yates, Ricardo (Eds.) Information Retrieval: Data Structures and Algorithms, Englewood Cliffs, NJ: Prentice-Hall, 1992. pp. 131-160.

Biography

Bill Frakes is an associate professor in the computer science program at Virginia Tech. Recent reuse activities include editing an issue of the Annals of Software Engineering on Reuse http://www.baltzer.nl/ansoft/5.html, and participating in the the recent European Reuse Workshop. He chairs the IEEE TCSE committee on software reuse, and edits ReNews.