WearCom: A Wearable Communication Space

WearCom: A Wearable Communication Space

Mark Billinghurst⁽¹⁾
Jerry Bowskill ⁽²⁾
Jason Morphett⁽²⁾

⁽¹⁾ Human Interface Technology Laboratory
University of Washington
Box 352142
Seattle, WA 98195 USA
grof@hitl.washington.edu

⁽²⁾ Advanced Perception Unit
BT Laboratories
Martlesham Heath
Ipswich, IP5 3RE
United Kingdom
{jerry.bowskill, jason.morphett} @bt-sys.bt.co.uk

ABSTRACT

Wearable computers provide constant access to computing and communications resources. We apply traditional virtual reality techniques to develop a wearable bodystabilized space which allows multiple remote people to communicate with a wearable computer user. By combining communication and computing facilities we are able to use spatialised 3D graphics and audio cues to aid communication. The result is a portable augmented reality communication space with audio enabled avatars of the remote collaborators surrounding the user. The wearable user can use natural head motion to attend to the collaborators they wish to communicate with while being aware of other conversations, and remote users can freely connect or disconnect from the communication space.

KEYWORDS

Collaborative Wearable Computing, CVE, CSCW

1 INTRODUCTION

One of the broad trends emerging in advanced humancomputer interaction is the increasing portability of computing and communication facilities. With mobile phones and conference calling people have access to wearable collaborative audio spaces. The addition of mobile computing and display facilities enables visual and audio enhancements to aid the communication process. However it remains an open question as to how computing and communications can be best integrated on a portable platform.

Wearable computers are the most recent generation of portable machines. Worn on the body, they provide constant access to computing and communications resources. In general, a wearable computer may be defined as a computer that is subsumed into the personal space of the user, controlled by the wearer and has both operational and interactional constancy, i.e. is always on and always accessible [Mann 97]. Wearables are typically composed of a belt or back pack PC, see-though or see-around head mounted display (HMD), wireless communications hardware and an input device such as touchpad or chording keyboard. This configuration has been demonstrated in a number of real world applications including aircraft maintenance [Espisito 97], navigational assistance [Feiner 97] and vehicle mechanics [Bass 97]. In such applications wearables have dramatically improved user performance, reducing task time by half in the case of vehicle inspection [Bass 97].

Many of the target application areas for wearable computers are those where the user could benefit from expert assistance. Internet enabled wearable computers can be used as a communications device to enable remote experts to collaborate with the wearable user. The presence of remote experts have been found to significantly improve task performance[Siegal 95], [Kraut 96]. However, current collaborative wearable applications have only involved connections between one local and one remote user. The problem we are addressing is how the computing power of the wearable can be used to support collaboration between multiple remote people. In particular we want to explore the following issues:

What visual and audio enhancements can be used to aid communication?

How can a collaborative communications space be created between users?

How can remote users be represented in a wearable computing environment?

These issues are becoming increasingly important as telephones incorporate more computing power and portable computers become more like telephones. However a key question is whether it is necessary to use visual and audio enhancements in collaborative spaces do we need computer mediated communication when a conference phone call may be just as effective ?

In the next section we review relevant related research from the teleconferencing and collaborative virtual environment (CVE) fields. The remainder of the paper then describes how communications and computing facilities can be combined in a wearable to form a collaborative wearable communication space. We describe our initial prototype, initial user experiences and possible application areas.

2 BACKGROUND

Previous research on the roles of audio and visual cues in teleconferencing has produced mixed results. There have been many experiments conducted comparing facetoface, audio and video, and audio only communication conditions as summarized by Sellen [Seller 95]. While people generally do not prefer the audio only condition, they are often able to perform tasks as effectively or almost as effectively as in the face-toface or video conditions. Sellen reports that the main effect on collaborative performance was due to whether the collaboration was technologically mediated or not, not on the type of technology mediation used. Naturally this varies somewhat according to task. While Williams [Williams 77] finds that facetoface interaction is no better than speech only communication for cognitive problem solving tasks, Chapanis [Chapanis 75] finds that visual cues were important in tasks requiring negotiation. Even though the outcome may the same, the process of communication is affected by the presence or absence of visual cues [O'Malley 96], although not for managing turn taking [Whittaker 95].

However, there is strong evidence that video transmits social cues and affective information, establishing "Social Presence"[Whittaker 97], although not as effectively as facetoface interaction [Heath 91]. In general, the usefulness of video for transmitting nonverbal cues may be overestimated and video may be better used to show the communication availability of others or views of shared workspaces [Whittaker 95]. So even when users attempt nonverbal communication in a video conferencing environment their gestures must be wildly exaggerated to be recognized as the equivalent facetoface gestures [Heath 91].

Based on these results, and the fact that speech is the critical medium in teleconferencing experiments [Whittaker 95], audio alone should be suitable for a shared communication space. An example of this, Thunderwire [Hindus 96], was a purely audio system which allowed high quality audio conferencing between multiple participants at the flip of a switch. In a 3 month trial Hindus et. al. found that audio can be sufficient for a usable communication space and that Thunderwire afforded a social space for its users.

However there were several major problems:

Users were not able to easily tell who else was within the space.
Users were not able to use visual cues to determine other's willingness to interact.

In addition, Thunderwire was rarely used by more than two or three users at once. With more users it becomes increasingly difficult to discriminate between speakers and there is a higher incidence of speaker overlap and interruptions. These problems are typical of audio only spaces and suggest that while audio may be useful for small group interactions, it becomes less usable the more people present.

These shortcomings can be overcome through the use of visual and spatial cues. For example, at BT Labs, enhanced audio conferencing is being developed in which a supporting web page shows visual information. This can include icons of those people within the conference and simple cues, such as microphone on/off, which allow a participants activity or interest to be inferred. In facetoface interaction, speech, gesture, body language and other nonverbal cues combine to show attention and interest in collaborative conversations. However the absence of spatial cues in most video conferencing systems means that users often find it difficult to know when people are paying attention to them, to hold side conversations, and to establish eye contact [Seller 92]. This may explain the similarity in results between audio only, and video and audio teleconferencing conditions, and the difference they both produce from facetoface results.

In collaborative virtual environments spatial cues can combine with visual and audio cues in a natural way to aid communication [Benford 93]. The well known "cocktailparty" effect shows that people can easily monitor several spatialised audio streams at once, selectively focusing on those of interest [Bregman 90]. Schmandt shows how a spatial sound system with nonspatial audio enhancements can allow a person to simultaneous listen to several sound sources[Schmandt 95]. Even a simple virtual avatar representation and spatial audio model of other users in the collaborative space enables users to discriminate between multiple speakers [Nakanishi 96]. Spatialised interactions are particularly valuable for governing interactions between groups of people; enabling crowds of people to inhabit the same virtual environment and interact in a way impossible in traditional video or audio conferencing [Benford 97].

In the remainder of the paper we describe how techniques from collaborative virtual environments can be used to develop a wearable communication interface.

3 A WEARABLE COMMUNICATION SPACE

While considerable work has been conducted on the development of collaborative immersive virtual environments there has been almost no research on collaboration in an augmented reality setting, such as with a wearable computer. Billinghurst et. al. have found that users located in the same room prefer collaboration in an Augmented Reality setting over a fully immersive interface [Billinghurst 96] and perform better in the augmented reality environment. Similar results have also been reported by Schmalstieg et. al. [Schmalstieg 96]. However there have been no augmented reality interfaces which support remote collaborators even though these types of interfaces are particularly relevant for wearable computing. In an earlier paper we describe how wearable computers are ideally suited for three dimensional CSCW because they allow enhancement of real world tasks and seamless interaction between the real world and virtual image overlays [Billinghurst 97].

The results in the previous section suggest that an ideal wearable communications space should have three elements:

High quality audio communication
Visual representations of the collaborators
An underlying spatial model for mediating interactions

One of the most important aspects of creating a collaborative communication interface is the visual and audio presentation of information. Most current wearable computers use seethrough or seearound monoscopic head mounted displays. With these displays visual information can be presented in a combination of three ways:

Headstabilized  information is fixed to the users viewpoint and doesn't change as the user changes viewpoint orientation or position.

Bodystabilized  information is fixed relative to the users body position and varies as the user changes viewpoint orientation, but not as they change position. This requires the users viewpoint orientation to be tracked.

Worldstabilized  information is fixed to real world locations and varies as the user changes viewpoint orientation and position. This requires the users viewpoint position and orientation to be tracked.

Body and World stabilized information display is attractive for a number of reasons. As Reichlen [Reichen 93] demonstrates, a bodystabilized information space can overcome the resolution limitations of head mounted displays. In his work a user wears a head mounted display while seated on a rotatable chair. By tracking head orientation the user experiences a hemispherical information surround in effect a "hundred million pixel display". Worldstabilized information presentation enables annotation of the real world with context dependent data, creating information enriched environments [Rekimoto 95]. This increases the intuitiveness of real world tasks. For example, researchers at the University of North Carolina register virtual fetal ultrasound views on the womb [Bajura 92]. Despite these advantages, most wearables only use headstabilized information display.

In our work we have chosen to begin with the simplest form of bodystabilized display; one which uses one degree of orientational freedom to give the user the impression they are surrounded by a virtual cylinder of visual and auditory information. Figure 1.0 contrasts this with the traditional head stabilized wearable information presentation.

Figure 1.0a Head Stabilized Information Display

Figure 1.0b One Degree of Freedom

BodyStabilized Display

In this case we just track head motion about the y (yaw) axis to change the user's view of the information space. Using only one degree of freedom has a number of advantages:

Users cannot become easily disoriented No additional input devices are needed It is natural to use since most head and body motion is about the vertical axis.
In a previous paper we have shown how users can locate information more rapidly with this type of body stabilized information display than the more traditional headstabilized wearable information space [Billinghurst 1998].

A head mounted display only allows the portion of the information space in it's field of view to be seen. So there are two ways the data can be viewed in a cylindrical bodystabilized space; by rotating the information space about the users head, or tracking the users head as they look about the space. The first requires no additional hardware and can be done by mapping mouse, switch or voice input to direction and angle of rotation, while the second requires only a simple one degree of freedom tracker. The minimal hardware requirements make cylindrical spatial information displays particularly attractive.

With this display configuration it is possible to have remote collaborators appear as virtual avatars distributed about the user (figure 2.0). As they speak their audio streams can be spatialised in real time so that they appear to emit from the corresponding avatar.

Figure 2.0 A Spatial Conferencing Space.

Just as in facetoface collaboration, users can turn to face the collaborators they want to talk to while still being aware of the other conversations taking place. Since the displays are seethrough or seearound the user can also see the real world at the same time, enabling the remote collaborators to help them with real world tasks. These remote users may also be using wearable computers and head mounted displays or could be interacting through a desktop workstation.

4 IMPLEMENTATION

Our research is initially focused on collaboration between a single wearable computer user and several desktop PC users. The aim is to develop software to support medium sized meetings (56 people) in a manner that is natural and intuitive to use.

The wearable computer we use is a custom built 586 PC with 20mb of RAM running Windows '95. A hand held Logitech wireless radio trackball with three buttons is used as the primary input device. The display is a pair of Virtual iO iglasses! converted into a monoscopic display by the removal of the left eyepiece. The Virtual iO head mounted display can either be used in seethrough or occluded mode, has a resolution of 262 by 230 pixels and a 30 degree field of view. The iglasses! have stereo headphones and a sourceless inertial and gyroscopic three degree of freedom orientation tracker. A BreezeCom wireless LAN is used to give 2mb/s internet access up to 500 feet from a base station. The wearable also has a soundBlaster compatible sound board with headmounted microphone. Figure 3.0 shows a user wearing the display and computer. The desktop PCs are standard Pentium class machines with internet connectivity and sound capability.

Figure 3.0 The Wearable Interface.

4.1 Wearable Interface

Our wearable computer has no graphics acceleration hardware so the interface was deliberately kept simple. The conferencing space runs as a full screen application that is initially blank until remote users connect. When users join the conferencing space they are represented by blocks with 128x128 pixel texture mapped pictures of themselves on them. Although the resolution of the images is crude it is sufficient to identify who the speakers are. The wearable user has their head tracked so they can simply turn to face the speakers they are interested in. As they face different speakers the speaker volume changes due to the 3D sound spatialisation. Users can also navigate through the space; by rolling the trackball forwards or backwards their viewpoint is moved forwards or backwards along the direction they are looking. Since the virtual images are superimposed on the real world, when the user rolls the trackball it appears to them as though they are moving the virtual space around them, rather than navigating through the space. Figure 4.0 shows the wearable interface from the wearable users perspective. The Microsoft DirectX libraries were used to implement the interface.

Figure 4.0 The User's View of the Wearable Conferencing Space

4.2 Desktop Interface

Users at a desktop workstation interact with the conferencing space through a similar interface as the wearable user, although in this case the application runs as a windows application on the desktop. Users navigate through the space using the mouse. Mouse movements rotate head position when the left mouse button is held down otherwise they translate the user backwards and forwards in space. Mapping avatar orientation to mouse movements mean that the desktop interface is not quite as intuitive as the wearable interface. Users at the desktop machine wear headmounted microphones to talk into the conferencing space.

5 SOFTWARE ARCHITECTURE

The wearable and desktop interfaces use multicast sockets to communicate with each other. As shown in figure 5.0, two multicast groups are used, one for the used positions and orientation and one for audio communication. As users move their avatar position and orientation values with unique avatar identify numbers are streamed to the position multicast group and rebroadcast to all the interested interfaces. Similarly when users speak, their speech is digitized an avatar identity number added and the speech is sent to the audio group to be rebroadcast. When the digitized speech arrives at the client computer the audio identity number is used to find the speakers position and spatialize the speech. All the connections to the multicast groups are bidirectional and users can connect and disconnect at will without affecting other users in the conferencing space.

Figure 5.0 Software Architecture

6 INITIAL USER EXPERIENCES

We are in the process of conducting user trials to evaluate how the use of spatialized audio and visual representations affects communication between collaborators. Pretrial results have found that users are able to easily discriminate between three speakers when their audio streams are spatialized, but not when nonspatialized audio is used. Users also prefer seeing a visual representation of their collaborators as opposed to just hearing their speech. They found that they could continue doing real world tasks while talking to collaborators in the conferencing space and it was possible to move the conferencing space with the trackball so that collaborators weren't blocking critical portions of the user's field of view.

However, as more users connect to the conferencing space the need to spatialize multiple audio streams puts a severe load on the CPU, slowing down the graphics and head tracking. This makes it difficult for the wearable user to conference with more that two or three people simultaneously. This problem will be reduced as faster CPU's and hardware support for Direct3D graphics become available for wearable computers. Spatial culling of the audio streams could also be used to overcome this limitation.

7 CONCLUSIONS

We have presented a prototype wearable communication space that uses spatial visual and audio cues to enhance communication between remote groups of people. Our interface shows what is possible when computing and communications facilities are coupled together on a wearable platform. Preliminary results have found that users prefer using both the audio and visual cues together and that spatialised audio makes it easy for users to discriminate between speakers. Although we have minimal avatar representations users found them to be socially engaging. We are currently conducting formal user studies to confirm these results and evaluate the effect of spatial cues on communication patterns.

In the future we plan to investigate how the presence of spatialised video can further enhance communication. We will incorporate live video texture mapping into our interface, enabling users to see their remote collaborators as they speak. This will also allow users to send views of their workspace as well, improving collaboration on realworld tasks.

ACKNOWLEDGMENTS

We would like to thank our colleagues at British Telecom for many useful and productive conversations and Nick Dyer for producing the renderings used in some of the figures.

REFERENCES

[Bajura 92] Bajura, M., Fuchs, H., Ohbuchi, R. "Merging Virtual Objects with the Real World: Seeing Ultrasound Imagery Within the Patient." In Proceedings of SIGGRAPH '92, Chicago, Illinois, July 2631, 1992, New York: ACM, pp. 203210.

[Bass 97] Bass, L., Kasabach, C., Martin, R., Siewiorek, D., Smailagic, A., Stivoric, J. The Design of a Wearable Computer. In Proceedings of CHI '97, Atlanta, Georgia. March 1997, New York: ACM, pp. l39146.

[Benford 93] Benford, S. and Fahlen, L. A Spatial Model of Interaction in Virtual Environments. In Proceedings of Third European Conference on Computer Supported Cooperative Work (ECSCW '93), Milano, Italy, September 1993.

[Benford 97] Benford, S., Greenhalgh, C., Lloyd, D. Crowded Collaborative Virtual Environments. In Proceedings of CHI '97, Atlanta, Georgia. March 1997, New York: ACM, pp. 5966.

[Billinghurst 96] Billinghurst, M., Weghorst, S., Furness, T. Shared Space: Collaborative Augmented Reality. In Proceedings CVE '96 Workshop, 1920th September 1996, Nottingham, Great Britain.

[Billinghurst 1997] Billinghurst, M., Weghorst, S., Furness, T. Wearable Computers for Three Dimensional CSCW. In Proceedings of the International Symposium on Wearable Computers, Cambridge, MA, October 1314, 1997. Los Alamitos, IEEE Press, pp. 3946.

[Billinghurst 1998] Billinghurst, M., Bowskill, J., Dyer, N., Morphett, J. An Evaluation of Wearable Information Spaces. Submitted to VRAIS 98.

[Bregman 90] Bregman, A. Auditory Scene Analysis: The Perceptual Organization of Sound. MIT Press, 1990.

[Chapanis 75] Chapanis, A. Interactive Human Communication. Scientific American, 1975, Vol. 232, pp 3642.

[Espisito 97] Espisito, C. Wearable Computers: FieldTest Results and System Design Guidelines. In Proceedings Interact '97, July 14th 18'h, Sydney Australia.

[Feiner 97] Feiner, S., MacIntyre, B. Hollerer, T. A Touring Machine: Prototyping 3D Mobile Augmented Reality Systems for Exploring the Urban Environment. In Proceedings of the International Symposium on Wearable Computers, Cambridge, MA, October 1314, 1997, Los Alamitos: IEEE Press, pp. 7481.

[Heath 91] Heath, C., Luff, P. Disembodied Conduct: Communication Through Video in a Multimedia Environment. In Proceedings of CHI '91 Human Factors in Computing Systems, 1991, New York, NY: ACM Press, pp. 99103.

[Hindus 96] Hindus, D., Ackerman, M., Mainwaring, S., Starr, B. Thunderwire: A Field study of an AudioOnly Media Space. In Proceedings of CSCW '96, Nov. 16th _20~h

Cambridge MA, 1996, New York, NY: ACM Press.

[Kraut 96] Kraut, R., Miller, M., Siegal, J. Collaboration in Performance of Physical Tasks: Effects on Outcomes and Communication. In Proceedings of CSCW '96, Nov. 16'h 20'h, Cambridge MA, 1996, New York, NY: ACM Press.

[Mann 97] Mann, S. Smart Clothing: The "Wearable Computer" and WearCam. Personal Technologies, Vol. I, No. I, March 1997, SpringerVerlag.

[Nakanishi 96] Nakanishi, H., Yoshida, Nishimura, T., Ishida, T. FreeWalk: Supporting Casual Meetings in a Network. In Proceedings of CSCW '96, Nov. 16'h 20'h, Cambridge MA, 1996, New York, NY: ACM Press, pp. 308314.

[O'Malley 96] O'Malley, C., Langton, S., Anderson, A., DohertySneddon, G., Bruce, V. Comparison of facetoface and videomediated interaction. Interacting with Computers Vol. 8 No. 2, 1996, pp. 177192.

[Reichlen 93] Reichlen, B. SparcChair: One Hundred Million Pixel Display. In Proceedings IEEE VRAIS '93. Seattle WA, September 1822, 1993, IEEE Press: Los Alamitos, pp. 300307.

[Rekimoto 95] Rekimoto, J., Nagao, K. The World through the Computer: Computer Augmented Interaction with Real World Environments. In Proceedings of User Interface Software and Technology '95 (WITS '95), November 1995, New York: ACM, pp. 2936.

[Schmalsteig 96] Schmalsteig et. al. (1996). D. Schmalsteig, A. Fuhrmann, Z. Szalavari, M. Gervautz, Studierstube An Environment for Collaboration in Augmented Reality. In Proceedings CVE '96 Workshop, 19-20th September 1996, Nottingham, Great Britain.

[Schmandt 95] Schmandt, C., Mullins, A. AudioStreamer: Exploiting Simultaneity for Listening. In Proceedings of CHI 95 Conference Companion, May 7 11, Denver Colorado, 1995, ACM: New York pp. 218219.

[Seller 92] Sellen, A. Speech Patterns in Video-Mediated Conversations. In Proceedings CHI '92, May 37, 1992, ACM: New York, pp. 4959.

[Seller 95] Sellen, A. Remote Conversations: The effects of mediating talk with technology. Human Computer Interaction, 1995, Vol. 10, No. 4, pp. 401 444.

[Siegel 95] Siegal, J., Kraut, R., John, B., Carley, K. An Empirical Study of Collaborative Wearable Computer Systems,. In Proceedings of CHI 95 Conference Companion, May 711, Denver Colorado, 1995, ACM: New York, pp. 312313.

[Whittaker 1995] Whittaker, S. Rethinking Video as a Technology for Interpersonal Communications: Theory and Design Implications. Academic Press Limited, 1995.

[Whittaker 97] Whittaker, S., O'Connaill, B. The Role of Vision in FacetoFace and Mediated Communication. In VideoMediated Communication, Eds. Finn, K., Sellen, A., Wilbur, S. Lawerance Erlbaum Associates, New Jersey, 1997, pp. 2349.

[Williams 77] Williams, E. Experimental Comparisons of FacetoFace and Mediated Communication. Psychological Bulletin, 1997, Vol 16, pp. 963976.