The most common means to measure presence is by asking observers to report their sense of presence on a numeric scale, or to adjust a device (such as a lever) to indicate their sense of presence. This technique is known as ``magnitude estimation''. It is an example of a ``Class B'' measure, since it requires the participant to evaluate a mental state to respond. Magnitude estimation has the advantage of simplicity: this is why it is so widely used (including for the research reported in this dissertation). Unfortunately, magnitude estimation also has fundamental flaws.
Numeric verbal magnitude estimation of presence is likely to produce results of marginal validity. Following its introduction by Stevens [98], magnitude estimation procedures were widely used and produced many useful observations. However, numerous limitations soon became apparent [76,25]. One of these is the ``range effect'' [100]: participants' numerical ratings are strongly influenced by the range of physical stimuli to which they are exposed. The possible influence of range effects can be controlled or evaluated for cases where the physical stimulus dimension is well described; e.g., when relating physical sound intensity to perceived loudness. Similar control or evaluation is not available when the domain for which verbal magnitude estimations are being provided is not described. Presence is a product of the observer's cognitive processes; physical manipulations to appropriately manipulate perceived presence are only vaguely known. Consequently, we are unable to evaluate possible range effects when performing magnitude estimation of presence. (Similar limitations are encountered when using magnitude estimation of assessment of cognitively-mediated percepts such as pain and motion sickness.)
A second difficulty with magnitude estimation is anchor effects [103], in which the value observers assign to a given condition may depend on the conditions to which it is compared.
The existence of range and anchor effects in magnitude estimation has serious consequences. These effects sharply limit our ability to draw valid conclusions from comparisons between data gathered in separate experiments with different conditions.