(1) Human Interface Technology Laboratory
University of Washington
Box 352142
Seattle, WA 98195 USA
grof@hitl.washington.edu
(2) Advanced Perception Unit
BT Laboratories
Martlesham Heath
Ipswich, IP5 3RE
United Kingdom
{jerry.bowskill, jason.morphett} @bt-sys.bt.co.uk
Wearable computers provide constant access
to computing and communications resources. We apply traditional
virtual reality techniques to develop a wearable bodystabilized
space which allows multiple remote people to communicate with
a wearable computer user. By combining communication and computing
facilities we are able to use spatialised 3D graphics and audio
cues to aid communication. The result is a portable augmented
reality communication space with audio enabled avatars of the
remote collaborators surrounding the user. The wearable user can
use natural head motion to attend to the collaborators they wish
to communicate with while being aware of other conversations,
and remote users can freely connect or disconnect from the communication
space.
KEYWORDS
Collaborative Wearable Computing, CVE, CSCW
One of the broad trends emerging in advanced
humancomputer interaction is the increasing portability
of computing and communication facilities. With mobile phones
and conference calling people have access to wearable collaborative
audio spaces. The addition of mobile computing and display facilities
enables visual and audio enhancements to aid the communication
process. However it remains an open question as to how computing
and communications can be best integrated on a portable platform.
Wearable computers are the most recent generation
of portable machines. Worn on the body, they provide constant
access to computing and communications resources. In general,
a wearable computer may be defined as a computer that is subsumed
into the personal space of the user, controlled by the wearer
and has both operational and interactional constancy, i.e. is
always on and always accessible [Mann 97]. Wearables are typically
composed of a belt or back pack PC, see-though or see-around
head mounted display (HMD), wireless communications
hardware and an input device such as touchpad or chording keyboard.
This configuration has been demonstrated in a number of real world
applications including aircraft maintenance [Espisito 97], navigational
assistance [Feiner 97] and vehicle mechanics [Bass 97]. In such
applications wearables have dramatically improved user performance,
reducing task time by half in the case of vehicle inspection [Bass
97].
Many of the target application areas for wearable computers are those where the user could benefit from expert assistance. Internet enabled wearable computers can be used as a communications device to enable remote experts to collaborate with the wearable user. The presence of remote experts have been found to significantly improve task performance[Siegal 95], [Kraut 96]. However, current collaborative wearable applications have only involved connections between one local and one remote user. The problem we are addressing is how the computing power of the wearable can be used to support collaboration between multiple remote people. In particular we want to explore the following issues:
These issues are becoming increasingly important
as telephones incorporate more computing power and portable computers
become more like telephones. However a key question is whether
it is necessary to use visual and audio enhancements in collaborative
spaces do we need computer mediated communication when
a conference phone call may be just as effective ?
In the next section we review relevant related
research from the teleconferencing and collaborative virtual environment
(CVE) fields. The remainder of the paper then describes how communications and computing facilities can be combined in a wearable to form a collaborative wearable communication space. We describe our initial prototype, initial user experiences and possible application areas.
Previous research on the roles of audio and
visual cues in teleconferencing has produced mixed results. There
have been many experiments conducted comparing facetoface,
audio and video, and audio only communication conditions as summarized
by Sellen [Seller 95]. While people generally do not prefer the
audio only condition, they are often able to perform tasks as
effectively or almost as effectively as in the face-toface
or video conditions. Sellen reports that the main effect on collaborative
performance was due to whether the collaboration was technologically
mediated or not, not on the type of technology mediation used.
Naturally this varies somewhat according to task. While Williams
[Williams 77] finds that facetoface interaction is
no better than speech only communication for cognitive problem
solving tasks, Chapanis [Chapanis 75] finds that visual cues were
important in tasks requiring negotiation. Even though the outcome
may the same, the process of communication is affected by the
presence or absence of visual cues [O'Malley 96], although not
for managing turn taking [Whittaker 95].
However, there is strong evidence that video
transmits social cues and affective information, establishing
"Social Presence"[Whittaker 97], although not as effectively
as facetoface interaction [Heath 91]. In general,
the usefulness of video for transmitting nonverbal cues
may be overestimated and video may be better used to show the
communication availability of others or views of shared workspaces
[Whittaker 95]. So even when users attempt nonverbal communication
in a video conferencing environment their gestures must be wildly
exaggerated to be recognized as the equivalent facetoface
gestures [Heath 91].
Based on these results, and the fact that speech
is the critical medium in teleconferencing experiments [Whittaker
95], audio alone should be suitable for a shared communication
space. An example of this, Thunderwire [Hindus 96], was a purely
audio system which allowed high quality audio conferencing between
multiple participants at the flip of a switch. In a 3 month trial
Hindus et. al. found that audio can be sufficient for a usable
communication space and that Thunderwire afforded a social space
for its users.
However there were several major problems:
In addition, Thunderwire was rarely used by
more than two or three users at once. With more users it becomes
increasingly difficult to discriminate between speakers and there
is a higher incidence of speaker overlap and interruptions. These
problems are typical of audio only spaces and suggest that while
audio may be useful for small group interactions, it becomes less
usable the more people present.
These shortcomings can be overcome through
the use of visual and spatial cues. For example, at BT Labs, enhanced
audio conferencing is being developed in which a supporting web
page shows visual information. This can include icons of those
people within the conference and simple cues, such as microphone
on/off, which allow a participants activity or interest to be
inferred. In facetoface interaction, speech, gesture,
body language and other nonverbal cues combine to show attention
and interest in collaborative conversations. However the absence
of spatial cues in most video conferencing systems means that
users often find it difficult to know when people are paying attention
to them, to hold side conversations, and to establish eye contact
[Seller 92]. This may explain the similarity in results between
audio only, and video and audio teleconferencing conditions, and
the difference they both produce from facetoface results.
In collaborative virtual environments spatial cues can combine with visual and audio cues in a natural way to aid communication [Benford 93]. The well known "cocktailparty" effect shows that people can easily monitor several spatialised audio streams at once, selectively focusing on those of interest [Bregman 90]. Schmandt shows how a spatial sound system with nonspatial audio enhancements can allow a person to simultaneous listen to several sound sources[Schmandt 95]. Even a simple virtual avatar representation and spatial audio model of other users in the collaborative space enables users to discriminate between multiple speakers [Nakanishi 96]. Spatialised interactions are particularly valuable for governing interactions between groups of people; enabling crowds of people to inhabit the same virtual environment and interact in a way impossible in traditional video or audio conferencing [Benford 97].
In the remainder of the paper we describe how
techniques from collaborative virtual environments can be used
to develop a wearable communication interface.
3 A WEARABLE COMMUNICATION SPACE
While considerable work has been conducted
on the development of collaborative immersive virtual environments
there has been almost no research on collaboration in an augmented
reality setting, such as with a wearable computer. Billinghurst
et. al. have found that users located in the same room prefer
collaboration in an Augmented Reality setting over a fully immersive
interface [Billinghurst 96] and perform better in the augmented
reality environment. Similar results have also been reported by
Schmalstieg et. al. [Schmalstieg 96]. However there have been
no augmented reality interfaces which support remote collaborators
even though these types of interfaces are particularly relevant
for wearable computing. In an earlier paper we describe how wearable
computers are ideally suited for three dimensional CSCW because
they allow enhancement of real world tasks and seamless interaction
between the real world and virtual image overlays [Billinghurst
97].
The results in the previous section suggest
that an ideal wearable communications space should have three elements:
One of the most important aspects of creating
a collaborative communication interface is the visual and audio
presentation of information. Most current wearable computers use
seethrough or seearound monoscopic head mounted displays.
With these displays visual information can be presented in a combination
of three ways:
Headstabilized information is fixed to the users viewpoint and doesn't change as the user changes viewpoint orientation or position.
Bodystabilized information is fixed relative to the users body position and varies as the user changes viewpoint orientation, but not as they change position. This requires the users viewpoint orientation to be tracked.
Worldstabilized information is fixed to real world locations and varies as the user changes viewpoint orientation and position. This requires the users viewpoint position and orientation to be tracked.
Body and World stabilized information display
is attractive for a number of reasons. As Reichlen [Reichen 93]
demonstrates, a bodystabilized information space can overcome
the resolution limitations of head mounted displays. In his work
a user wears a head mounted display while seated on a rotatable
chair. By tracking head orientation the user experiences a hemispherical
information surround in effect a "hundred million
pixel display". Worldstabilized information presentation
enables annotation of the real world with context dependent data,
creating information enriched environments [Rekimoto 95]. This
increases the intuitiveness of real world tasks. For example,
researchers at the University of North Carolina register virtual
fetal ultrasound views on the womb [Bajura 92]. Despite these advantages,
most wearables only use headstabilized information display.
In our work we have chosen to begin with the
simplest form of bodystabilized display; one which uses
one degree of orientational freedom to give the user the impression
they are surrounded by a virtual cylinder of visual and auditory
information. Figure 1.0 contrasts this with the traditional head
stabilized wearable information presentation.
Figure 1.0a Head Stabilized Information
Display
In this case we just track head motion about the y (yaw) axis to change the user's view of the information space. Using only
one degree of freedom has a number of advantages:
A head mounted display only allows the portion
of the information space in it's field of view to be seen. So
there are two ways the data can be viewed in a cylindrical bodystabilized
space; by rotating the information space about the users head,
or tracking the users head as they look about the space. The first
requires no additional hardware and can be done by mapping mouse,
switch or voice input to direction and angle of rotation, while
the second requires only a simple one degree of freedom tracker.
The minimal hardware requirements make cylindrical spatial information
displays particularly attractive.
With this display configuration it is possible to have remote collaborators appear as virtual avatars distributed about the user (figure 2.0). As they speak their audio streams can be spatialised in real time so that they appear to emit from the corresponding avatar.
Figure 2.0 A Spatial Conferencing Space.
Just as in facetoface collaboration,
users can turn to face the collaborators they want to talk to while still being aware of the other conversations taking place. Since the displays are
seethrough or seearound the user can also see the real
world at the same time, enabling the remote collaborators to help
them with real world tasks. These remote users may also be using
wearable computers and head mounted displays or could be interacting
through a desktop workstation.
Our research is initially focused on collaboration
between a single wearable computer user and several desktop PC
users. The aim is to develop software to support medium sized
meetings (56 people) in a manner that is natural and intuitive
to use.
The wearable computer we use is a custom built
586 PC with 20mb of RAM running Windows '95. A hand held Logitech
wireless radio trackball with three buttons is used as the primary
input device. The display is a pair of Virtual iO iglasses!
converted into a monoscopic display by the removal of the left
eyepiece. The Virtual iO head mounted display can either
be used in seethrough or occluded mode, has a resolution of 262
by 230 pixels and a 30 degree field of view. The iglasses! have
stereo headphones and a sourceless inertial and gyroscopic three
degree of freedom orientation tracker. A BreezeCom wireless LAN
is used to give 2mb/s internet access up to 500 feet from a base
station. The wearable also has a soundBlaster compatible sound
board with headmounted microphone. Figure 3.0 shows a user wearing
the display and computer. The desktop PCs are standard Pentium
class machines with internet connectivity and sound capability.
Figure 3.0 The Wearable Interface.
4.1 Wearable Interface
Our wearable computer has no graphics acceleration
hardware so the interface was deliberately kept simple. The conferencing
space runs as a full screen application that is initially blank
until remote users connect. When users join the conferencing space
they are represented by blocks with 128x128 pixel texture mapped
pictures of themselves on them. Although the resolution of the
images is crude it is sufficient to identify who the speakers
are. The wearable user has their head tracked so they can simply
turn to face the speakers they are interested in. As they face
different speakers the speaker volume changes due to the 3D sound
spatialisation. Users can also navigate through the space; by
rolling the trackball forwards or backwards their viewpoint is
moved forwards or backwards along the direction they are looking.
Since the virtual images are superimposed on the real world, when
the user rolls the trackball it appears to them as though they
are moving the virtual space around them, rather than navigating
through the space. Figure 4.0 shows the wearable interface from
the wearable users perspective. The Microsoft DirectX libraries
were used to implement the interface.
4.2 Desktop Interface
Users at a desktop workstation interact with
the conferencing space through a similar interface as the wearable
user, although in this case the application runs as a windows
application on the desktop. Users navigate through the space using
the mouse. Mouse movements rotate head position when the left
mouse button is held down otherwise they translate the user backwards
and forwards in space. Mapping avatar orientation to mouse movements
mean that the desktop interface is not quite as intuitive as the
wearable interface. Users at the desktop machine wear headmounted
microphones to talk into the conferencing space.
The wearable and desktop interfaces use multicast
sockets to communicate with each other. As shown in figure 5.0,
two multicast groups are used, one for the used positions and
orientation and one for audio communication. As users move their avatar position and orientation values
with unique avatar identify numbers are streamed to the position
multicast group and rebroadcast to all the interested interfaces.
Similarly when users speak, their speech is digitized an avatar
identity number added and the speech is sent to the audio group
to be rebroadcast. When the digitized speech arrives at the client
computer the audio identity number is used to find the speakers
position and spatialize the speech. All the connections to the
multicast groups are bidirectional and users can connect and disconnect
at will without affecting other users in the conferencing space.
Figure 5.0 Software Architecture
We are in the process of conducting user trials
to evaluate how the use of spatialized audio and visual representations
affects communication between collaborators. Pretrial results
have found that users are able to easily discriminate between
three speakers when their audio streams are spatialized, but not
when nonspatialized audio is used. Users also prefer seeing
a visual representation of their collaborators as opposed to just
hearing their speech. They found that they could continue doing
real world tasks while talking to collaborators in the conferencing
space and it was possible to move the conferencing space with
the trackball so that collaborators weren't blocking critical
portions of the user's field of view.
However, as more users connect to the conferencing
space the need to spatialize multiple audio streams puts a severe
load on the CPU, slowing down the graphics
and head tracking. This makes it difficult for the wearable user
to conference with more that two or three people simultaneously.
This problem will be reduced as faster CPU's and hardware support
for Direct3D graphics become available for wearable computers.
Spatial culling of the audio streams could also be used to overcome
this limitation.
We have presented a prototype wearable communication
space that uses spatial visual and audio cues to enhance communication
between remote groups of people. Our interface shows what is possible
when computing and communications facilities are coupled together
on a wearable platform. Preliminary results have found that users
prefer using both the audio and visual cues together and that
spatialised audio makes it easy for users to discriminate between
speakers. Although we have minimal avatar representations users
found them to be socially engaging. We are currently conducting
formal user studies to confirm these results and evaluate the
effect of spatial cues on communication patterns.
In the future we plan to investigate how the
presence of spatialised video can further enhance communication.
We will incorporate live video texture mapping into our interface,
enabling users to see their remote collaborators as they speak.
This will also allow users to send views of their workspace as
well, improving collaboration on realworld tasks.
ACKNOWLEDGMENTS
We would like to thank our colleagues at British
Telecom for many useful and productive conversations and Nick
Dyer for producing the renderings used in some of the figures.
[Bajura 92] Bajura, M., Fuchs, H., Ohbuchi,
R. "Merging Virtual Objects with the Real World: Seeing Ultrasound
Imagery Within the Patient." In Proceedings of SIGGRAPH
'92, Chicago, Illinois, July 2631, 1992, New York: ACM,
pp. 203210.
[Bass 97] Bass, L., Kasabach, C., Martin, R.,
Siewiorek, D., Smailagic, A., Stivoric, J. The Design of a Wearable
Computer. In Proceedings of CHI '97, Atlanta, Georgia.
March 1997, New York: ACM, pp. l39146.
[Benford 93] Benford, S. and Fahlen, L. A Spatial
Model of Interaction in Virtual Environments. In Proceedings
of Third European Conference on Computer Supported Cooperative
Work (ECSCW '93), Milano, Italy, September 1993.
[Benford 97] Benford, S., Greenhalgh, C., Lloyd,
D. Crowded Collaborative Virtual Environments. In Proceedings
of CHI '97, Atlanta, Georgia. March 1997, New York: ACM, pp.
5966.
[Billinghurst 96] Billinghurst, M., Weghorst,
S., Furness, T. Shared Space: Collaborative Augmented Reality.
In Proceedings CVE '96 Workshop, 1920th September
1996, Nottingham, Great Britain.
[Billinghurst 1997] Billinghurst, M., Weghorst,
S., Furness, T. Wearable Computers for Three Dimensional CSCW.
In Proceedings of the International Symposium on Wearable Computers,
Cambridge, MA, October 1314, 1997. Los Alamitos, IEEE
Press, pp. 3946.
[Billinghurst 1998] Billinghurst, M., Bowskill,
J., Dyer, N., Morphett, J. An Evaluation of Wearable Information
Spaces. Submitted to VRAIS 98.
[Bregman 90] Bregman, A. Auditory Scene
Analysis: The Perceptual Organization of Sound. MIT Press,
1990.
[Chapanis 75] Chapanis, A. Interactive Human
Communication. Scientific American, 1975, Vol. 232, pp
3642.
[Espisito 97] Espisito, C. Wearable Computers:
FieldTest Results and System Design Guidelines. In Proceedings
Interact '97, July 14th 18'h, Sydney Australia.
[Feiner 97] Feiner, S., MacIntyre, B. Hollerer,
T. A Touring Machine: Prototyping 3D Mobile Augmented Reality
Systems for Exploring the Urban Environment. In Proceedings
of the International Symposium on Wearable Computers, Cambridge,
MA, October 1314, 1997, Los Alamitos: IEEE Press, pp. 7481.
[Heath 91] Heath, C., Luff, P. Disembodied
Conduct: Communication Through Video in a Multimedia Environment.
In Proceedings of CHI '91 Human Factors in Computing Systems,
1991, New York, NY: ACM Press, pp. 99103.
[Hindus 96] Hindus, D., Ackerman, M., Mainwaring,
S., Starr, B. Thunderwire: A Field study of an AudioOnly
Media Space. In Proceedings of CSCW '96, Nov. 16th _20~h
Cambridge MA, 1996, New York, NY: ACM Press.
[Kraut 96] Kraut, R., Miller, M., Siegal, J.
Collaboration in Performance of Physical Tasks: Effects on Outcomes
and Communication. In Proceedings of CSCW '96, Nov. 16'h
20'h, Cambridge MA, 1996, New York, NY: ACM Press.
[Mann 97] Mann, S. Smart Clothing: The "Wearable Computer" and WearCam. Personal Technologies, Vol. I, No. I, March 1997, SpringerVerlag.
[Nakanishi 96] Nakanishi, H., Yoshida, Nishimura, T., Ishida, T. FreeWalk: Supporting
Casual Meetings in a Network. In Proceedings of CSCW '96,
Nov. 16'h 20'h, Cambridge MA, 1996, New York, NY: ACM Press,
pp. 308314.
[O'Malley 96] O'Malley, C., Langton, S., Anderson,
A., DohertySneddon, G., Bruce, V. Comparison of facetoface
and videomediated interaction. Interacting with Computers
Vol. 8 No. 2, 1996, pp. 177192.
[Reichlen 93] Reichlen, B. SparcChair: One
Hundred Million Pixel Display. In Proceedings IEEE VRAIS '93.
Seattle WA, September 1822, 1993, IEEE Press: Los Alamitos,
pp. 300307.
[Rekimoto 95] Rekimoto, J., Nagao, K. The World
through the Computer: Computer Augmented Interaction with Real
World Environments. In Proceedings of User Interface Software
and Technology '95 (WITS '95), November 1995, New York: ACM,
pp. 2936.
[Schmalsteig 96] Schmalsteig et. al. (1996).
D. Schmalsteig, A. Fuhrmann, Z. Szalavari, M. Gervautz, Studierstube
An Environment for Collaboration in Augmented Reality.
In Proceedings CVE '96 Workshop, 19-20th September
1996, Nottingham, Great Britain.
[Schmandt 95] Schmandt, C., Mullins, A. AudioStreamer:
Exploiting Simultaneity for Listening. In Proceedings of CHI
95 Conference Companion, May 7 11, Denver Colorado,
1995, ACM: New York pp. 218219.
[Seller 92] Sellen, A. Speech Patterns in Video-Mediated
Conversations. In Proceedings CHI '92, May 37, 1992,
ACM: New York, pp. 4959.
[Seller 95] Sellen, A. Remote Conversations:
The effects of mediating talk with technology. Human Computer
Interaction, 1995, Vol. 10, No. 4, pp. 401 444.
[Siegel 95] Siegal, J., Kraut, R., John, B.,
Carley, K. An Empirical Study of Collaborative Wearable Computer
Systems,. In Proceedings of CHI 95 Conference Companion, May
711, Denver Colorado, 1995, ACM: New York, pp. 312313.
[Whittaker 1995] Whittaker, S. Rethinking
Video as a Technology for Interpersonal Communications: Theory
and Design Implications. Academic Press Limited, 1995.
[Whittaker 97] Whittaker, S., O'Connaill, B.
The Role of Vision in FacetoFace and Mediated Communication.
In VideoMediated Communication, Eds. Finn, K., Sellen,
A., Wilbur, S. Lawerance Erlbaum Associates, New Jersey, 1997,
pp. 2349.
[Williams 77] Williams, E. Experimental Comparisons
of FacetoFace and Mediated Communication. Psychological
Bulletin, 1997, Vol 16, pp. 963976.