The Results

We have just started to analyze the data and cannot yet report everything that might be lurking in them. We plan to document statistical details of the following (and other results) in a future paper.

Assignment 1

Examining average total effort data for the two parts of the assignment (Table 1), we noted that the students overall spent less than half as much total time on the layered implementation as on the one without layering. Even those who did part (a) first spent less total time on the layered implementation than on the non-layered one. Looking at just design/coding effort gave a similar picture.

Table 1:

Average Total Times for Assignment 1

  Group A Group B All
  (Layered First) (Direct First) Students
Layered 145 57 101
Direct 182 261 222
Total 327 318 323

To test the statistical significance of these observations, we performed an analysis of variance [Hicks 73], looking for the significance of three primary effects on the total effort required for the assignment: (1) the effect due to the treatment, i.e., the difference in times to implement the secondary operations with layering and without layering; (2) the effect due to the group, i.e., the effect, on total time to do the two implementations, of the order in which layering and non-layering were done; and (3) the interaction effect between treatment and order, i.e., the potential ``learning'' effect that completing the first implementation had on the time to do the other implementation. Our nested-factorial model also included the effect due to students within groups and the interaction effect between students and treatments, but these effects were untestable because we had only one point per student for each level of treatment. In this model, effects (1) and (3) are tested against the interaction between students and treatments, while effect (2) is tested against the student effect. We looked for F values that were significant at the 5% level; with 1 and 16 degrees of freedom, the minimum significant F is 4.49.

We found (Table 2) that effects (1) and (3) were statistically significant, and that effect (2) was not significant. That is, non-layering took significantly more total time than layering. Furthermore, there was an apparent learning effect in the sense that the total time spent on the first treatment condition was significantly greater than that for the second treatment condition. We found no significant difference between the two groups in the total time to do both parts of the assignment.

Table 2:

Analysis of Variance for Total Time for Assignment 1

Source/Effect df Sum of Squares Mean Square F
Treatment (layering) 1 130,321 130,321 23.70*
Group (order) 1 160 160 0.02
Treatment X Group (learning) 1 63,001 63,001 11.46*
Student (within Group) 16 149,566 9,348  
Treatment X Student 16 87,970 5,498  

* Significant at the 5% level, i.e., F > F1, 16 = 4.49.

These data indicate a measurable productivity advantage when secondary operations are implemented without violating A/E/L principles. Several students noted in their lab reports that it was far easier to think abstractly about queues when designing and coding the secondary operations than it was to worry about the nodes and pointers of the underlying representation. This seems to be the most reasonable explanation of the observed data—exactly what A/E/L advocates might have predicted.

The lack of a significant effect due to order is also plausible from common sense. While there is reason to expect that something about the task will be learned from the first treatment condition, in fact the mode of thinking, algorithms, and code for layering and non-layering are quite different. Therefore, the total time to complete both parts of the assignment should (intuitively) be independent of which one was done first. Indeed, this is what we observed.

We also found a significant difference in the quality of the code, as measured by the number of bugs causing run-time errors that were found and fixed before testing revealed no more. The layered implementations had significantly fewer bugs than the non-layered ones. Based on the Mann-Whitney U Test [Downie 65], we were able to reject the hypothesis of no difference between the number of bugs in the layered and non-layered implementations, at the 5% level.

Assignment 2

In the second assignment the students undertook a typical maintenance task: change the representation of an abstraction and all the code that depends on it. Using layering, as in part (a) of the first assignment, means that the code for the secondary operations can be written once and certified to be correct. A change to the underlying representation costs only as much as changing the primary operations. The students, however, also had to change the secondary operations, because they were implemented without layering. It was this extra—and with A/E/L principles, unnecessary—effort that the assignment was intended to help us measure.

We found that the students averaged spending about half their total redesign and recoding effort on the four secondary operations. However, they had to find and fix an average of two-thirds of all their bugs in these operations. These data have such large confidence intervals that we hesitate to draw any serious conclusions from a small sample and one example. Nonetheless, it is entirely plausible that secondary operations generally should be more difficult to get right than primary operations. Secondary operations perform more complicated manipulations than the primary operations, which are chosen precisely because they are ``primitive.''