Ada 95 Quality and Style Guide Chapter 10
In many ways, performance is at odds with maintainability and portability. To achieve improved speed or memory usage, the most clear algorithm sometimes gives way to confusing code. To exploit special purpose hardware or operating system services, nonportable implementation dependencies are introduced. When concerned about performance, you must decide how well each algorithm meets its performance and maintainability goals. Use the guidelines in this chapter with care; they may be hazardous to your software.
The best way to build a system that satisfies its performance requirements is through good design. You should not assume that speeding up your code will result in a visible increase in system execution. In most applications, the overall throughput of the system is not defined by the execution speed of the code but by the interaction between concurrent processes and the response time of the system peripherals.
Most of the guidelines in this chapter read ". . . when measured performance indicates." "Indicates" means that you have determined that the benefit in increased performance to your application in your environment outweighs the negative side effects on understandability, maintainability, and portability of the resulting code. Many of the guideline examples show the alternatives that you will need to measure in order to determine if the guideline is indicated.
... Initial : Matrix; begin -- Find_Solution Initialize_Solution_Matrix: for Row in Initial'Range(1) loop for Col in Initial'Range(2) loop Initial (Row, Col) := Get_Value (Row, Col); end loop; end loop Initialize_Solution_Matrix; Converge_To_The_Solution: declare Solution : Matrix := Identity; Min_Iterations : constant Natural := ...; begin -- Converge_To_The_Solution for Iterations in 1 .. Min_Iterations loop Converge (Solution, Initial); end loop; end Converge_To_The_Solution; ... end Find_Solution;
rationale
Late initialization allows a compiler more choices in register usage optimization. Depending on the circumstance, this may introduce a significant performance improvement.
Some compilers incur a performance penalty when declarative blocks are introduced. Careful analysis and timing tests by the programmer may identify those declarative blocks that should be removed.
notes
It is difficult to accurately predict through code inspections which declarative blocks improve performance and which degrade performance. However, with these general guidelines and a familiarity with the particular implementation, performance can be improved.
For some compilers, offset calculations for an array whose lower bound is 0 (either the integer zero or the first value of an enumeration type) are simplified. For other compilers, optimization is more likely if the lower bound is 1.
subtype Line_Range is Integer range 0 .. Max_Lines; subtype Length_Range is Integer range 0 .. Max_Length; -- Note that Max_Lines and Max_Length need to be static type Paragraph_Body is array (Line_Range range <>, Length_Range range <>) of Character; type Paragraph (Lines : Line_Range := 0; Line_Length : Length_Range := 0) is record Text : Paragraph_Body (1 .. Lines, 1 .. Line_Length); end record;
rationale
Determine the size and speed impact of unconstrained records having components depending on discriminants. Some compilers will allocate the maximum possible size to each object of the type; others will use pointers to the dependent components, incurring a possible heap performance penalty. Consider the possibility of using fixed-size components.
-- Array of records Process (Student (Index).Name, Student (Index).Grade); -- Record of arrays Process (Student.Name (Index), Student.Grade (Index)); -- Parallel arrays Process (Name (Index), Grade (Index));
rationale
Determine the impact of structuring data as arrays of records, records containing arrays, or parallel arrays. Some implementations of Ada will show significant performance differences among these examples.
10.4.5 Record and Array Aggregates
Determine the impact of using an aggregate versus a sequence of assignments. Using an aggregate generally requires the use of a temporary variable. If the aggregate is "static" (i.e., its size and components are known at compile- or link-time, allowing link-time allocation and initialization), then it will generally be more efficient than a sequence of assignments. If the aggregate is "dynamic," then a series of assignments may be more efficient because no temporary variable is needed.
See Guideline 5.6.10 for a discussion of aggregates from the point of view of readability and maintainability.
See Guideline 10.6.1 for a discussion of extension aggregates.
-- Nested "if" if Last >= Target_Length then if Buffer (1 .. Target_Length) = Target then ... end if; end if; -- "and then" if Last >= Target_Length and then Buffer (1 .. Target_Length) = Target then ... end if;
rationale
Determine the impact of using nested if statements versus using the and then or or else operator. One of the above may be significantly more efficient than the other.
10.5.3 Case Statement Versus elsif
subtype Small_Int is Integer range 1 .. 5; Switch : Small_Int; ... -- Case statement case Switch is when 1 => ... when 2 => ... when 3 => ... when 4 => ... when 5 => ... end case; -- "elsif construct" if Switch = 1 then ... elsif Switch = 2 then ... elsif Switch = 3 then ... elsif Switch = 4 then ... elsif Switch = 5 then ... end if;
rationale
Determine the impact of using case statements versus the elsif construct. If the case statement is implemented using a small jump table, then it may be significantly more efficient than the if .. then .. elsif construct.
See also Guideline 8.4.6 for a discussion of the table-driven programming alternative.
10.5.4 Checking for Constraint Errors
subtype Small_Int is Positive range Lower .. Upper; Var : Small_Int; ... -- Using exception handler Double: begin Var := 2 * Var; exception when Constraint_Error => ... end Double; -- Using hard-coded check if Var > Upper / 2 then ... else Var := 2 * Var; end if;
rationale
Determine the impact of using exception handlers to detect constraint errors. If the exception handling mechanism is slow, then hard-coded checking may be more efficient.
10.5.5 Order of Array Processing
type Table_Type is array (Row_Min .. Row_Max, Col_Min .. Col_Max) of ... Table : Table_Type; ... -- Row-order processing for Row in Row_Min .. Row_Max loop for Col in Col_Min .. Col_Max loop -- Process Table (Row, Col) end loop; end loop; -- Column-order processing for Col in Col_Min .. Col_Max loop for Row in Row_Min .. Row_Max loop -- Process Table (Row, Col) end loop; end loop;
rationale
Determine the impact of processing two-dimensional arrays in row-major order versus column-major order. While most Ada compilers are likely to use row-major order, it is not a requirement. In the presence of good optimization, there may be no significant difference in the above examples. Using static array bounds is also likely to be significant here. See Guidelines 10.4.1 and 10.4.2.
-- Using "if .. else" if Condition then Var := One_Value; else Var := Other_Value; end if; -- Using overwriting Var := Other_Value; if Condition then Var := One_Value; end if;
rationale
Determine the impact of styles of assigning alternative values. The examples illustrate two common methods of doing this; for many systems, the performance difference is significant.
10.5.7 Packed Boolean Array Shifts
subtype Word_Range is Integer range 0 .. 15; type Flag_Word is array (Word_Range) of Boolean; pragma Pack (Flag_Word); Word : Flag_Word; ... -- Loop to shift by one bit for Index in 0 .. 14 loop Word (Index) := Word (Index + 1); end loop; Word (15) := False; -- Use slice assignment to shift by one bit Word (0 .. 14) := Word (1 .. 15); Word (15) := False;
rationale
Determine the impact of slice manipulation when shifting packed Boolean arrays. For Ada 83 implementations using packed Boolean arrays, shift operations may be much faster when slice assignments are used as opposed to for loop moving one component at a time. For Ada 95 implementations, consider using modular types instead (see Guideline 10.6.3).
example
The term "static dispatching" in this example refers to the use of if/elsif sequences to explicitly determine which subprograms to call based on certain conditions:
-- (1) Dispatching where tag is not known at compile time -- (See ACES V2.0 test "a9_ob_class_wide_dynamic_01") -- Object_Type is a tagged type -- The_Pointer designates Object_Type'Class; -- Subclass1_Pointer designates Subclass1 (derived from Object_Type) -- Subclass2_Pointer designates Subclass2 (derived from Subclass1) -- Subclass3_Pointer designates Subclass3 (derived from Subclass2) Random_Value := Simple_Random; -- Call to a random number generator if Random_Value < 1.0/3.0 then The_Pointer := Subclass1_Pointer; elsif Random_Value > 2.0/3.0 then The_Pointer := Subclass2_Pointer; else The_Pointer := Subclass3_Pointer; end if; Process (The_Pointer.all); -- Tag is unknown -- (2) Tag is determinable at compile time (static dispatching) -- (See ACES V2.0, test "a9_ob_class_wide_static_01") -- Object_Type is a tagged type -- The_Pointer designates Object_Type'Class; -- Subclass1_Pointer designates Subclass1 (derived from Object_Type) -- Subclass2_Pointer designates Subclass2 (derived from Subclass1) -- Subclass3_Pointer designates Subclass3 (derived from Subclass2) Random_Value := Simple_Random; -- Call to a random number generator if Random_Value < 1.0/3.0 then Process (Subclass1_Pointer.all); elsif Random_Value > 2.0/3.0 then Process (Subclass2_Pointer.all); else Process (Subclass3_Pointer.all); end if; -- (3) No tagged types are involved (no dispatching) -- (See ACES V2.0, test "ap_ob_class_wide_01") -- Object_type is a discriminated type with variants; possible -- discriminant values are Subclass1, Subclass2, and Subclass3 -- All the pointers designate values of Object_Type -- Subclass1_Pointer := new Object_Type (Subclass1); -- Subclass2_Pointer := new Object_Type (Subclass2); -- Subclass3_Pointer := new Object_Type (Subclass3); -- There is only one "Process" procedure (operating on Object_Type) Random_Value := Simple_Random; -- Call to a random number generator if Random_Value < 1.0/3.0 then Process (Subclass1_Pointer.all); elsif Random_Value > 2.0/3.0 then Process (Subclass2_Pointer.all); else Process (Subclass3_Pointer.all); end if;
rationale
Determine the impact of dynamic and static subprogram dispatching. The compiler may generate much more efficient code for one form of dispatching than the other.
notes
Dynamic dispatching will almost certainly be more efficient than an explicit if . . . elsif sequence. However, you should be aware of any optimizing decisions made by a compiler that might affect this situation.
-- (1) Using protected objects -- (See ACES V2.0, test "a9_pt_prot_access_02") protected Object is function Read return Float; procedure Write (Value : in Float); private Data : Float; end Object; protected body Object is function Read return Float is begin return Data; end Read; procedure Write (Value : in Float) is begin Data := Value; end Write; end Object; task type Modify is end Modify; type Mod_Bunch is array (1 .. 5) of Modify; task body Modify is ... begin -- Modify for I in 1 .. 200 loop The_Value := Object.Read; Object.Write (The_Value - 0.125); if The_Value < -1.0E7 then The_Value := 1.0; end if; end loop; end Modify; ... -- Block statement to be timed declare Contending_Tasks : array (1 .. 5) of Modify; begin null; -- 5 tasks contend for access to protected data end; ------------------------------------------------------------------------------ -- (2) Using monitor task -- (See ACES V2.0, test "tk_rz_entry_access_02") Task Object is entry Write (Value : in Float); entry Read (Value : out Float); end Object; task body Object is Data : Float; begin -- Object loop select accept Write (Value : in Float) do Data := Value; end Write; or accept Read (Value : out Float) do Value := Data; end Read; or terminate; end select; end loop; end Object; -- Task type Modify declared as above -- Block statement to be timed as above
rationale
Protected objects are meant to be much faster than tasks used for the same purpose (see Guideline 6.1.1). Determine the impact of using protected objects to provide access safely to encapsulated data in a concurrent program.
10.6.3 Bit Operations on Modular Types
-- (1) Packed Boolean arrays -- (See ACES V2.0, test "dr_ba_bool_arrays_11") type Set is array (0 .. 15) of Boolean; pragma Pack (Set); S1 : Set; S2 : Set; Empty : Set := (Set'Range => False); Result : Boolean; ... -- Is S1 a subset of S2? Result := ((S1 and not S2) = Empty); --------------------------------------------------------------------- -- (2) Modular types -- (See ACES V2.0, test "a9_ms_modular_oper_02") type Set is mod 16; S1 : Set; S2 : Set; Empty : Set := 0; Result : Boolean; ... -- Is S1 a subset of S2? Result := ((S1 and not S2) = Empty);
rationale
Determine the impact of performing bit-wise operations on modular types. The performance of these operations may be significantly different from similar operations on packed Boolean arrays. See also Guideline 10.5.7.
The unbounded strings may be allocated on the heap. If bounded strings are not allocated on the heap, then they may provide better performance. Determine the impact of using the string type declared in instantiations of Ada.Strings.Bounded.Generic_Bounded_Length versus the type declared in Ada.Strings.Unbounded.
The predefined Ada 95 language environment defines packages that support both bounded and unbounded strings. Using bounded strings may avoid the unpredictable duration of delays associated with using heap storage.
10.6.5 String Handling Subprograms
Determine the relative performance cost of functions and procedures having the same name and functionality in Ada.Strings.Fixed, Ada.Strings.Bounded, Ada.Strings.Unbounded and the corresponding child packages whose names include Wide.
While functional notation typically leads to clearer code, it may cause the compiler to generate additional copying operations.
In this example, two potential constraint checks are eliminated. If the function Get_Response returns String, then the initialization of the variable Input would require constraint checking. If the variable Last is type Positive, then the assignment inside the loop would require constraint checking:
... subtype Name_Index is Positive range 1 .. 32; subtype Name is String (Name_Index); ... function Get_Response return Name is separate; ... begin ... Find_Last_Period: declare -- No Constraint Checking needed for initialization Input : constant Name := Get_Response; Last_Period : Name_Index := 1; begin -- Find_Last_Period for I in Input'Range loop if Input(I) = '.' then -- No Constraint Checking needed in this `tight' loop Last_Period := I; end if; end loop; ... end Find_Last_Period;
rationale
Because run-time constraint checking is associated with slow performance, it is not intuitive that the addition of constrained subtypes could actually improve performance. However, the need for constraint checking appears in many places regardless of the use of constrained subtypes. Even assignments to variables that use the predefined subtypes may need constraint checks. By consistently using constrained subtypes, many of the unnecessary run-time checking can be eliminated. Instead, the checking is usually moved to less frequently executed code involved in system input. In the example, the function Get_Response may need to check the length of a user-supplied string and raise an exception.
Some compilers can do additional optimizations based on the information provided by constrained subtypes. For example, although an unconstrained array does not have a fixed size, it has a maximum size that can be determined from the range of its index. Performance can be improved by limiting this maximum size to a "reasonable" number. Refer to the discussion on unconstrained arrays found in NASA (1992).
The packages Ada.Synchronous_Task_Control and Ada.Asynchronous_Task_Control have been defined to provide an alternative to tasking and protected types for use in applications where a minimal run-time is desired (Ada Reference Manual 1995, Annex D).
This may facilitate the construction of simpler run-time environments (Ada Reference Manual 1995, §§13.12, D.7, and H.4).
This may reduce memory write operations after load time (Ada Reference Manual 1995, §§10.2.1 and C.4).
This may permit the compiler to omit calls on library-level subprograms of the library unit if the results are not needed after the call (Ada Reference Manual 1995, §10.2.1).
This may reduce the memory needed to store names of Ada entities, where no operation uses those names (Ada Reference Manual 1995, §C.5).
See the Ada Reference Manual (1995, Annex H).