| Ellen Isaacs | ![]() |
|
Why Do Users Like Video? Studies of Multimedia-Supported Collaboration
By John C. Tang and Ellen Isaacs Published in 1993 in Computer-Supported Cooperative Work: An International Journal Volume 1, Issue 3, pp. 163-196.
AbstractThree studies of collaborative activity were conducted as part of a research effort to develop multimedia technology to support collaboration. One study surveyed users' opinions of a video conference room system. Users indicated that the availability of the video conferencing rooms was too limited, audio quality needed improvement, and a shared drawing space was needed. A second study analyzed videotape records of a work group when meeting face-to-face, video conferencing, and phone conferencing. The analysis indicated that the noticeable audio delay made it difficult for the participants to manage turn taking and to coordinate their gazes, which in turn impaired their ability to handle conflict and to engage in humor. In the third study, a distributed team was observed under three conditions: using their existing collaboration tools, adding a desktop conferencing prototype (audio, video, and shared drawing tool), and subtracting the video capability from the prototype. Data was collected by videotaping the team during all three conditions, by interviewing the team members and by recording the teams' usage of the phone, email, face to face meetings and desktop conferencing. The results showed that the presence of the video capability determined whether the group used the desktop conferencing prototype. Analysis of the videotape records showed that the video channel was used to help mediate their interaction and to convey visual information. Desktop conferencing was found to substitute for shorter, two-person meetings, longer phone calls, and e-mail usage.1. The promise and perplexity of multimedia-supported collaborationThe growing need to support technical and social activity that occurs across geographic distances has not been fully satisfied by the current technologies of phones, faxes, electronic mail, and video conference rooms. Video conferencing systems have been promoted as a technology to support remote collaboration at least since AT&T unveiled the Picturephone in the mid-1960's (Falk, 1973), but so far this vision has not been fully realized. However, recent technology and infrastructure developments are lowering some of the barriers that have prevented the widespread adoption and use of multimedia to support remote collaboration (Gale, 1992). The emergence of digital audio and video technology allows voice and images to be computationally manipulated and transmitted across the existing computer networks. Improved compression algorithms running on faster hardware will soon provide acceptable audio-video quality at viable network bandwidth rates. The availability of affordable workstations, proliferation of digital networks, emergence of compression algorithms and network protocol standards, and marketing hype are all converging to bring multimedia capability to personal desktops.Research prototypes that provide what is often referred to as desktop conferencing (audio, video, and computational connections between computer desktops) have already been demonstrated using analog (Root, 1988; Buxton & Moran, 1990; Bly et al, 1993) and digital (Watabe et al., 1990; Masaki et al., 1991) video technology. For example, Olson and Bly (1991) reported on the experiences of a distributed research group using a network of audio, video, and computer connections to explore ways of overcoming their separation in location and time. These prototypes have demonstrated the technical feasibility of desktop conferencing, experimented with some of its features, and provided a glimpse of how users might use it. However, increased costs (e.g., upgrading networks, buying media-equipped workstations) and uncertainty over the benefits of collaborative multimedia have been significant barriers to its widespread adoption and use. While videophone products have recently reappeared in the marketplace, the lack of commercial success of Picturephone since it was introduced almost 30 years ago indicates that there is much yet to be learned about the deployment and use of collaborative multimedia technologies (Francik et al. 1991). Furthermore, research to date on the effects of various communication media on collaborative activity has not provided convincing evidence of the intuitively presumed value of video (Williams 1977). Ochsman and Chapanis (1974) examined problem-solving tasks in various communication modes including typewriting, video only, voice and video, and unrestricted (working side-by-side) communication. They concluded that relative to communication modes using an audio channel, "...there is no evidence in this study that the addition of a video channel has any significant effects on communication times or on communication behavior." (Ochsman & Champanis, 1974, p. 618). Conrath et al. (1977) explored using telecommunication systems to support remote medical diagnosis. They compared four different systems: hands-free telephone (audio only), audio plus black and white still frame video images, audio plus full-motion black and white video, and audio plus color full-motion video. They concluded: "We found no significant differences in diagnostic accuracy, ... time taken for the diagnostic consultations, and the effectiveness of patient management across the four communication modes." (Contrath et al. 1977, p. 12). However, when they surveyed the patients' attitudes about the four systems, they found that the patients preferred the more sensory rich (full-motion video, color modes. Gale (1990) compared computer-mediated collaboration on experimental tasks under three conditions: sharing data only (via a shared electronic whitebaord), sharing data and audio, and sharing data, audio, and video. He concluded, "The results showed no significant differences in the quality of the output, or the time taken to complete the tasks, under [the] three conditions." (Gale, 1990, p. 175). However, Gale did find that collaborators' perceptions of productivity increased as communication bandwidth increased and suggested that higher bandwidth media enabled the groups to perform more social activities. Some research has begun to identify uses of video in support of remote collaborations. Smith et al. (1989) compared computer-mediated problem-solving activity in audio only, audio and video, and face-to-face settings. They found that the presence of the video channel encouraged more discussion about the task rather than the mechanics of the computer tool being used for the task. Fish et al. (1992) equipped mentor-student pairs of researchers with a desktop conferencing prototype and studied their informal communication over several weeks. They found that the participants used the prototypes quite frequently for relatively short interaction, similar to how they used the telephone. They also reported an interesting novel use of the system; some participants occasionally created a "virtual shared office" where two people kept a connection open for long periods of time. Most of the studies to date have used artificial groups (subjects randomly assigned to work together) working on short, contrived tasks (problems unrelated to their actual work). We hypothesized that evidence for teh value of video would be most visible in actual work activity of real working groups. In fact, while some of the previous research did not find that adding video affected their measures of the resulting product (quality of decisions, medical diagnosis, completion time, etc.), they did make observations on how video affected the process of interacting together (perceptions of productivity, task focus, degree of interactivity, etc.). If the main contribution of video is on the work processes, perhaps there would be no measurable effect on the work products on such short, artificial tasks. We set out to study real examples of synchronous, distributed, small group collaboration in order to understand how multimedia technology, and video in particular, could be designed to support that activity. The research pursued an iterative cycle of studying existing work activity, developing prototype systems to support that activity, and studying how people use those prototypes in their work (Tang, 1991b). In this paper, we first describe two background studies that examine existing remote collaboration work practice - a survey of users' perceptions of an existing video conference room system and a study of a geographically divided work group in various collaboration settings. Then we describe the development of a prototype desktop conferencing system that embodied some of the design implications identified by the background studies. In a third study, we used that prototype to observe a distributed team under three conditions: using their existing collaboration tools, adding the desktop conferencing prototype, and subtracting the video capability from the prototype. We conclude by discussing evidence from the three studies that helps explain why users like video and suggest ways in which video would be most effectively integrated into existing environments to support remote collaboration.
2. Survey of video conference room usersThe first background study surveyed users' perceptions of an existing video conference room system. The survey was conducted within Sun Microsystems, Inc., which at the time used commercially available video conference room systems (PictureTel Corporation model CT3100 operating at 112kb/s bandwidth). These systems connected conference rooms among sites in Mountain View, California; Billerica, Massachusetts; Colorado Springs, Colorado; and Research Triangle Park, North Carolina. In addition to the audio-video link, they could also send high quality video still images on a separate video display.A survey was sent via electronic mail to users of the video conference room system. The survey asked for usage information and for the users' perceptions of the system. A total of 76 users responded to the survey, with representatives from all four sites and a variety of types of meetings (e.g., staff meetings, presentations, design meetings). 2.1 Survey ResultsUsers were asked to indicate the best aspects of video conferencing. Respondents could check more than one item from a list of choices as well as add their own items. Most respondents (89%) liked having regular visual contact with remote collaborators. Many also indicated that it saved travel (70%) and time (51%), although this survey measured only the users' perceptions of saved time and travel, not whether those savings actually occurred.
![]()
Users were also asked to indicate the worst aspects of their video conferencing experience. The percentage of respondents who checked each aspect is shown in Figure 1. The most frequently indicated problem (72%) was difficulty in scheduling an available room. Poor audio quality (poor microphone pickup, moving the microphones into range of the speaker, echo, etc.) was indicated by 55% of the respondents, 53% mentioned not being able to see overheads and other materials used in presentations, and 52% complained about the time delay (latency) in transmitting audio and video through video conferencing. Between Mountain View, CA and Billerica, MA, the system exhibited about 0.57 second latency between capturing voice and video on one end and producing them on the other end. Poor video quality was relatively less troubling; only 28% mentioned it as a problem. Some sample comments from the survey: ... we need more video rooms! They are overbooked. ... audio quality is the real problem. Both audio quality and delay. Difficulty in being able to make a comment--can't see a verbal opening coming because of the delays and image and audio quality Can't provide direct feedback on written material (e.g., by pointing and/or annotating slides and drawings presented remotely)
![]()
Respondents were also asked to rank order a list of additional capabilities (to which they could add their own) they would like to have in video conferencing. Figure 2 shows the five most frequently requested features with each item's average rank labelled on the bar chart (rank 1 is the most urgently desired). The need for a shared drawing surface stood out as the most commonly requested feature; 68% of the respondents mentioned it as a desired feature, and it average rank order was 1.76. Respondents also indicated that they wanted a larger video screen (34%) and the ability to connect multiple sites together at the same time (30%). Only 18% wanted to incorporate computer applications, but its low average rank (1.75) suggests that those who wanted it considered it a highly desirable feature. Users suggested incorporating software such as word processing, spreadsheet, and shared whiteboard applications. More comments from the survey: A shared drawing surface could be really useful. It should be a single device, so that you draw on the device and see your marks and the other person's marks on that same device. Networked [computer workstation] in each conference room would be nice, especially one that could project onscreen at the local site and at the remote sites... [Larger video screen] so I can really tell who's talking and get a fix on facial/body talk better... ... a networked [computer workstation] in each conference room would be nice, especially one that could project onscreen at the local site and at the remote sites... [larger video screen] so I can really tell who's talking and get a fix on facial/body talk better... A similar survey of users of videoconferencing systems (Masaki et al., 1991) identified some of the same features as requirements for improving video conferencing. They found that users wanted a virtual common space (which would include a shared drawing space), integration of teleconference and computational tools, and multiple sites conferencing capability. 2.2 Design Implications From Surveying Video Conferencing UsersThe survey did show that video conferencing users broadly appreciated the capabilities of video conferencing. Users' comments indicated that collaboration between remote sites would not be as effective or even possible without video conferencing. The survey responses indicated that multimedia tools to support collaboration should:
3. Study of collaboration in various settingsAfter completing the video conferencing survey, we studied a work group composed of four members at two different sites: three in Billerica, near Boston, and one (the second author of this paper)in Mountain View, in the San Francisco Bay Area. Their discussions centered around graphical user interfaces for on-line help systems. The group conducted weekly video conference room meetings, supplemented by occasional phone conferences. At one point in the project, the primary participant in Mountain View visited Billerica for a week of face-to-face meetings. Although some participants knew each other from previous work contacts, this was the first time they worked together extensively as a project team.Over two months, meetings in each of the collaboration settings were videotaped and analyzed to identify characteristics of their collaboration that varied among the settings. The data consisted of eight video conferences, five face-to-face meetings, and one phone conference, amounting to over 15 hours of data. 3.1 Findings from the study of collaborations settingsFrom reviewing the videotapes, we found that the team experienced certain problems while using the video conference rooms that did not arise in face-to-face settings:
The participants also had occasional difficulty directing a remote collaborator's attention to the video display so that their gestures would be seen. We observed several examples of "just missed" glances between remote collaborators when one participant looked up from her notes to glance at her remote collaborator, but after not getting a reciprocal glance within the usual wait time, looked back down at her notes just before the remote collaborator looked up at his video display to glance at her. These missed glances could largely be explained by delayed reactions caused by the transmission delays. However, just missed glances have also been observed in audio-video links that do not have any perceptible delay (Smith et al., 1989; Heath and Luff 1991). Current research suggests that lack of peripheral vision and division of attention between video windows and the shared workspace disrupt the coordination of mutual glances. Difficulties in negotiating turn-taking and directing participants' attention in video conferencing apparently combined to reduce the amount of interaction between the remote parties compared to face-to-face meetings. Video conferences tended to consist of a sequence of individual monologues rather than interactive conversations. We observed less frequent changes of speaker turns, longer turns, and less back-channeling in video conferencing than in face-to-face meetings. This reduced level of interaction appeared to affect the content of the video conferences by suppressing complex, subtle, or other difficult-to-manage interactions. Compared with face-to-face meetings, participants seemed inhibited from expressing their opinions and, in particular, avoided working through conflict and disagreement. Video conferences also exhibited a marked lack of humor (in part because humor relies on precise timing). 3.2 Design implications from studying collaboration settingsComparing collaboration in video conferencing with face-to-face and phone conferencing settings underscored the need to provide responsive (minimally delayed) audio in technology to support interaction. The work group we studied was so frustrated by the audio delays in video conferencing that they turned off the audio provided by the videoconferencing system and placed a phone call (using speakerphones) for their audio channel. Although this arrangement eliminated the audio delay, the audio now arrived before the accompanying video (i.e. audio and video were no longer synchronized), the audio quality was poorer, and speakerphone audio was only half-duplex (only one party's sound was transmitted at a time). Nonetheless, the collaborators strongly preferred this arrangement to the frustrations they experienced with the delayed audio. The meetings they conducted under this arrangement appeared to exhibit more frequent changes in speaker turns, more back-channeling, and more humor than those using the normal video conference configuration. A related study (Isaacs and Tang, 1993) explores some of these issues by comparing interactions among phone, desktop video conferencing, and face-to-face meetings.This experience indicates that users prefer audio with minimal latency delay even at the expense of disrupting synchrony with the video. This observation again confirms research findings of the greater importance of audio relative to video (Gale 1990; Ochsman & Chapanis, 1974), and is also consistent with users' perceptions from the survey that audio quality and responsiveness are more important than video quality. This finding suggests that, given the limited bandwidth and performance available for desktop conferencing, more attention should be devoted to providing responsive, interactive audio. More research on the trade-offs and limits of degrading other parameters of desktop conferencing (e.g., vide quality, video refresh rate, audio quality, audio silence suppression) is needed. While our observations confirmed the greater importance of audio relative to video, they also provide evidence for the value of video in supporting remote collaboration. Through the video channel, gestures were used to demonstrate actions (e.g., enact how a user would interact with an interface), and the participants' attitudes (e.g. lively head nodding indicating strong agreement). Especially under the delayed audio conditions of video conferencing, vide was valuable in helping mediate interaction (e.g. using gestures to take a turn of talk) (Krauss et al., 1977). 4. Developing a desktop conferencing prototypeThe observations gained from these two studies helped guide the design of our research prototypes for new multimedia technology to support collaboration. An initial phase of this research project was to design and implement a prototype desktop conferencing system that provided real-time audio and video links and a shared drawing program among participants at up to three sites. The desktop conferencing prototype was built on a prototype hardware card that enabled real-time video capture, compression, and display on a workstation desktop. This prototype card, in conjunction with the workstation's built-in audio capability, enabled digital audio-video links among workstations on a computer network (Pearl, 1992).Figure 3 shows the user interface for establishing and managing desktop conferences. Initiating a conference was modeled after placing a telephone call. A user selected from a list of receivers to request a conference with them. An identical copy of the interface appeared on the receivers' screens, announced by three beeps. Each receiver could decide to join or decline the conference. A shared message area allowed users to send text messages among each other to negotiate joining or refusing a conference.
![]()
Once all receivers joined the conference, the collaborative tools selected in the conference manager were invoked. Figure 4 shows a screen image of the tools requested by the default settings in the desktop conferencing prototype. For a 2-way conference, each user's screen displayed:
![]()
Show Me allowed users to create shared freehand graphics and to grab bitmap images from their screen and share them with the others. By default, Show Me was started automatically as part of a desktop conference, although users could choose not to invoke it. The two studies of existing collaboration activity helped shape the design of the desktop conferencing prototype. The survey identified users' need for a shared drawing space, prompting a significant investment in the development of the Show Me shared drawing tool. The design of Show Me drew on previous research on shared drawing activity (Tang & Minneman, 1991; Minneman & Bly, 1991) to provide a drawing surface that remote collaborators could share in much the same way that face-to-face collaborators use a whiteboard. The study of collaboration settings identified the problem of audio delay and underscored the importance of the audio channel in mediating interaction. The one-way audio delay in the desktop conferencing prototype was minimized to be in the range of 0.22-0.44 seconds (depending on computational and networking constraints). Since the audio and video data streams were being sent through a computer network that was being shared with other users, spurts of heavy network traffic affected the prototype's performance. When network load prevented video frames from being delivered at the requested video rate, the video image would occasionally freeze until an updated frame was received. More severe network load caused cut outs in the audio signal. One characteristic of handling the audio and video data streams separately is that the timely delivery of video would degrade before the delivery of audio was disrupted. This behavior reflected the study's finding that immediate and responsive audio was more important than preserving audio-video synchrony. 5. A study of the use of desktop conferencingWe studied a distributed team using the desktop conferencing prototype (DCP) in three conditions:
By measuring the team's use of these communication media and analyzing actual work activity across the three conditions, we sought to learn how desktop conferencing, and the video channel in particular, would be used in remote collaboration. 5.1 BackgroundThe team we studied consisted initially of four members distributed across three locations. One member was located in Billerica, MA, another worked in a building in Mountain View, CA, and the remaining two members worked in offices near each other in a different building in Mountain View (over 500 yards away from the other building). The team worked together on developing automated software testing tools. They were previously all located together in neighboring cubicle offices at the Billerica site but, for reasons not related to this project, were relocated to these distributed sites a few months before the beginning of the study. During the second week of the study, a fifth team member was added at the Billerica site in a cubicle facing that of the other Billerica team member.Although the team had no formal hierarchy, there were differences in their job responsibilities. The project leader (referred to as PL) was located alone in one Mountain View building. The two members located in the other Mountain View building were software developers (SD1 and SD2) who wrote most of the computer code. The customer representative (CR1) in Billerica communicated customers' needs, requirements, and experiences to the rest of the team. The newly added member in Billerica (CR2) had a similar job to CR1. Altogether, we studied the team's work activity for 14 weeks. After three weeks in the pre-DCP condition, the desktop conferencing prototype was installed into each team member's workstation. This installation involved inserting a prototype hardware card into their existing workstation, adding a second display screen, outfitting each office with a camera and speakers, and adding software to their system. The second display screen was added to display the output of the prototype hardware card. Because CR1 had limited desk space, he substituted this display screen for his usual display. Due to equipment limitations, we were unable to give a prototype to the added member of the team (CR2) with a prototype, but he often joined CR1 in some desktop conferences or used CR1's workstation when he was not in the office. The team was studied in the full-DCP condition for 7 weeks in an attempt to go beyond the initial novelty effect of introducing a technology to a more routine pattern of use. For the DCP minus video condition, the hardware card and second display screen were removed; the speakers and camera (camera's microphone was used for audio input) were left in their offices. The team was studied in the DCP minus video condition for four weeks. Although the effects of ordering of the conditions might be of interest, we arranged to study only one group and did not have the luxury of controlling for or exploring the order effects in an empirical study of this scope. We chose the order of the three conditions (pre-DCP, full-DCP, DCP minus video) to explore the transitions between conditions that we thought would be most interesting. We sought to see the effects of introducing a desktop conferencing system to a working group in the transition from pre-DCP to full-DCP, and we hoped to isolate the role of video in desktop conferencing in the transition from full-DCP to DCP minus video. When operating at 30 frames per second (fps), a desktop conference (audio, video, Show Me) consumed approximately 1.6 Mbit/s of network bandwidth. The bandwidth demand came mostly from the video stream and could be reduced almost directly in proportion to the requested video frame rate. At 10 fps, desktop conferences could use the existing local area networks without overly disrupting other network traffic. However, dedicated network bandwidth was needed for robust connections between the Billerica and Mountain View sites. A 0.5 Mbit/s link was leased that provided enough bandwidth to support conferencing at 5 fps. Because of this limitation, the default video frame rate was set to 5fps for all desktop conferences among the team, although any user could change this rate before starting a conference. This video frame rate was noticeably less lively than the 30 fps used in full-motion video, and we wanted to learn if that video quality was usable. 5.2 Observation MethodologiesA combination of observational methods were used to obtain information from different perspectives for this study:
In addition to the quantitative data collected, we videotaped selected samples of collaborative activity in each of the three conditions. All videotapes were recorded from a stationary camcorder mounted on a tripod without any camera operator present at the meeting. Videotapes of the desktop conferences were captured by a camcorder aimed at the computer display screen of one of the participants. After each videotape was made, the participants were always given the option of erasing the videotape if they were uncomfortable with preserving a record of that interaction (which they requested in one instance). The videotape data captured 19 interactions including examples of: all-team video conference room meetings, all-team face-to-face meeting, two-person face-to-face meeting, three-way phone conference, two-way desktop conference, three-way desktop conference (involving all five team members), four-way desktop conference, and two-way Show Me conferences (with phone audio). These tapes were analyzed by a multi-disciplinary group that included the designers of the prototype, a psychologist, and user interface designers. The group studied the tapes in the tradition of interaction analysis (Tatar, 1989) to understand how the team accomplished their collaborative work and compared similar types of activity across different instances collected on videotape. Furthermore, we interviewed each team member individually to gather their perceptions about their work activity;
5.3 Limitations of the DataIt is important to note the context and limitations of these data to appropriately understand and apply the results from this study. Since this team previously had been co-located, they were in some respects not representative of distributed groups in general. On the other hand, they also knew how they had interacted when co-located, so they were in a better position to evaluate how well the prototype tools fulfilled those interactional needs. Since video is believed to be especially useful in supporting social activities (Gale, 1990), the team's existing social relationships provided an opportunity to see if video effectively supported those relations.Although we intended to collect data that would provide a clean comparison among the three conditions, several factors combined to complicate the data collection and the analyses that can be drawn from the data. In general, the qualitative data were relatively sparse and had large variances, making it less likely that we could demonstrate statistically significant differences. Several factors contributed to the variance in the data. Company holidays shortened weeks 1 and 5 by one day. Training classes or travel caused one or more team members to be away from the office for an entire week during weeks 3, 4, 7, 8, and 11 of the study. These absences not only affected the data, but also caused some adjustments in the duration of the three conditions. A total of 15 other individual days of absence (e.g., illness, day off) occurred during the study. During the third week of the full-DCP condition (week 6), both CR1 and CR2 from Billerica traveled to Mountain View to meet with the team and others there. Besides affecting the data collected for that week, the visit had an effect on the progress and nature of the team's subsequent work. Also, several uncertainties were discovered in the phone, e-mail, and face-to-face meeting logging. Problems with the automatic phone logging of the Billerica team members resulted in lost data. Consequently, our analyses are based only on the data of calls received by the Mountain View team members. Because CR2 was a new employee to the company, we excluded his e-mail data throughout the study. After the study started, PL realized that he was logging e-mail from only one of two sources from where he sends mail, resulting in some lost e-mail data from him. In addition, we allowed the team members to delete any messages that they did not want us to see before making the e-mail logs available to us. Although this added some uncertainty to the e-mail data collected, we felt that it was a worthwhile tradeoff to accommodate their participation in the study. Because we were relying on the team members to report their face-to-face meetings, some meetings were probably recorded inaccurately or not at all. These meeting logs also had some inherent uncertainty because individuals reported the same meeting differently (different start and stop times, different participants). We reminded them throughout the study to log their meetings to counteract any tendency to overlook their self-logging over time. While all of these factors frustrated our attempt to get clean, quantitative data to compare among the three conditions, they were accommodated to preserve the team's actual working activity with minimal disruption from the study. The quantitative data were used to identify trends and raise issues that we could examine through other qualitative data that we had collected. Even thought these variations limit some of the claims we can make based on the quantitative data, we accept them as a characteristic of studying actual work activity, rather than studying behavior in an isolated, laboratory setting. 6. Analyzing the use of desktop conferencingThe quantitative and qualitative data was analyzed for any patterns or changes across the three conditions. We conducted statistical tests of the quantitative data to identify any significant differences across the conditions. We used the videotape and interview data to discover changes across the conditions and to help explain patterns that were observed in the quantitative data. These analyses revealed that desktop conferencing:
6.1 No increase in overall interactive communication usageThe data show no evidence that introducing desktop conferencing systematically changed the total amount of interactive communication (face-to-face meetings, phone calls, desktop conferences) among the team. A measure of usage for each medium of interactive communication was calculated by multiplying the duration of each interaction by the number of people involved. Figure 5 graphs the combined measures of usage for the interactive communication media per week. The most visible feature of this graph is the spike in week 6. This was the week when the team members from Billerica traveled to Mountain View to meet face-to-face together with the team. Because week 6 was not a typical week, data from week 6 are excluded from all subsequent quantitative analyses.Besides the spike in week 6, there is no other visible pattern in the combined usage of interactive media throughout the three conditions. A one-way analysis of variance showed no significant differences in the total measure of usage across the three conditions (p < 0.53, where less than 0.05 indicates significance). This lack of an effect is itself a finding, suggesting that the team did not spend more time in interactive communication when they had the desktop conferencing capability. Instead, the introduction of desktop conferences apparently reduced the use of other forms of communication. A closer look at the data provides some insights into the usage relationships among the communication media.
![]()
6.2 Video determined whether desktop conferencing was usedThe data clearly show that the presence of the video capability correlated with more use of the desktop conferencing prototype. Figure 6 plots each of the communication media for the 14 weeks of the study (excluding the data for the atypical week 6). For e-mail, the number of messages was counted as a measure of usage. An analysis of variance showed that the desktop conferencing prototype usage significantly decreased during the DCP minus video condition when the video capability was taken away (p < .02). This result indicates that the video capability was the deterining factor in the team's decision to use the desktop conferencing prototype. Why did the users like using video so much?
![]()
Interviews with the team confirmed that they strongly liked the video because they could see each others' reactions, monitor if they were being understood, and engage in more social, personal contact through video. Besides using desktop conferencing for technical discussions, they also reported using it for informal chatting. Some members expressed that having the video markedly improved the communication among the team. Turning to the videotape data of their use of the desktop conferencing prototype, we could see specific evidence of their use of video that would contribute to these positive perceptions. Video played a crucial role in facilitating their interaction. For example, it clearly helped remote collaborators interpret long audio pauses. We observed many pauses in desktop conferences, lasting up to 15 seconds, but the participants did not mark them as problematic. The video channel provided visual cues that explained the purpose of the pause (e.g., looking elsewhere on the computer screen or in the office, looking up at the ceiling while considering how to respond, preparing an image to send in Show Me). Without the video channel, these pauses would have been mystifying, as evidenced in video records of phone calls where participants frequently asked for feedback (e.g., "Right?", "OK?"). Other visual cues that facilitated their interaction included leaning in to the camera when users could not hear what a remote collaborator said (usually prompting a repetition of the utterance) and hand gestures that indicated taking or yielding a turn of talk. Facial and body gestures often communicated whether a person was understanding what was being said, prompting the speaker to either continue explaining or move on to the next topic. The video channel conveyed many of the gestures people use to mediate their speech (Kendon, 1986). We also observed several examples of turn completions in desktop conferencing when one person would complete a sentence or a turn of talk for a remote collaborator. Completions are a demonstration of mutual understanding that require tight interaction and coordination among the participants (Wilkes-Gibbs, 1986). The prototype demonstrated that it can support accomplishing turn completions between remote participants. Completions were notably absent when using the video conference rooms, largely due to the more than half-second audio delay. The video channel was also used to visually convey information. Shrugs were often demonstrated through the video channel without any accompanying talk indicating indifference or "I don't know." Gestures and facial expressions were sometimes used to subtly express disagreement. In a three-way conference involving all five team members, shown in Figure 7, SD1 rhetorically asks "What does that benefit [this project]?" and emphatically answers the question by synchronizing a gesture indicating "zero" without saying anything else. Occasionally, objects were held in front of the camera to show them to the others. Team members sometimes noticed activities happening in the background through the video channel. For example, they could often see people walking by at the Billerica site (which had open cubicle offices) and would wave and engage them in conversations.
![]()
The team seemed to use gestures naturally in desktop conferences much as they would in face-to-face interaction. Some of the users' gestures were not transmitted through the video channel because they were not within the camera's field of view, indicating that in some ways the desktop conferencing prototype elicited an illusion of face-to-face interaction beyond what it could actually support. At other times, the team members were aware that they were deliberately using the video channel to convey gestures. In one example, PL noticed someone he knows in Billerica looking in on a desktop conference he is having with CR1. PL waved his hand, but it was not within the video camera's field of view. He quickly repositioned his wave to be within camera view, which finally elicited a response. The data contain evidence that users' activity both built on the familiar face-to-face experience and also accommodated the capabilities of desktop conferencing. As mentioned earlier, because of network bandwidth limitations between Billerica and Mountain View, the default video frame rate for desktop conferences was set to 5 fps. Although this rate is dramatically less than the 30 fps used in television video, the users found the lower frame rate to be usable for desktop conferencing purposes. There was only one instance (out of 72) where the users chose to increase the video frame rate (to 10 fps). When asked in the interviews about the slow frame rate, they commented that it was noticeably slow but tolerable. They did comment on a related problem of having the video image occasionally freeze when network traffic or computation load was heavy. Under severe loading conditions, images were sometimes frozen for several seconds before a new image was received. Users found this to be annoying, although it was sometimes amusing if the frozen image captured a humorous pose of one of the collaborators. In the interviews just prior to removing the video, all team members anticipated that they would hardly use the prototype once the video capability was removed. The prototype's audio quality was considerably worse than the telephone, due to the perceptible delay and echo. While the poor audio quality is another reason that the use of the prototype decreased in the DCP minus video condition, it actually provides more evidence of how much the participants valued the video capability. Despite the prototype's poor audio capability, they still used it when video was available, and reported positive perceptions of using it. However, once the video capability was removed, they were not interested in using the prototype relative to the superior audio provided by the phone. Although the Show Me shared drawing tool might have motivated continued use of the prototype after removing the video, it was no enough to counteract the lack of video and the poor audio quality. We did not collect statistics on the actual use of Show Me, but we got the impression that the team did not use Show Me heavily, even during the full-DCP condition. Comments from the interviews indicated that the team found Show Me very satisfying and helpful when they did use it. However, the artifact that the team worked on most often was a large document in a word processor, and Show Me was not optimally designed to work through many pages of a document. The awkwardness of using Show Me for multi-page editing may largely explain the relatively light use of Show Me during the study. Furthermore, it was less convenient to use Show Me in the DCP minus video condition. Show Me was integrated into their collaboration setting in the full-DCP condition, since it was started automatically with a desktop conference. In the DCP minus video condition, the team used the phone for interactive communication and Show Me would have been an additional tool to start on their computer workstations. Some participants commented that they did not think of starting Show Me while talking on the phone, even though they could think of situations when that might have been helpful.
6.3 Desktop conferencing reduced the use of e-mail messagesWe did not expect the availability of desktop conferencing, an interactive communication medium, to have any effect on the use of e-mail, which is asynchronous. However, an analysis of variance of the e-mail statistics in Figure 6 and Table 1 show a statistically significant drop in the average number of e-mail messages per day in the full-DCP condition compared with the pre-DCP or DCP minus video conditions (p < 0.02).
Table IV. Overall e-mail statistics across conditions. Total number of e-mail messages, average number of messages per day, standard deviation of the average, and average basic and reply messages per day across the pre-DCP, full-DCP, and DCP minus video conditions. Why would the availability of desktop conferencing affect the use of e-mail? In the interviews, the participants offered several explanations. One explanation offered in the interviews is that they would sometimes choose to respond to an e-mail by desktop conferencing instead of replying with e-mail. One member said that he sometimes started composing a reply e-mail message, but then realized that he could respond with a desktop conference instead and discarded the unfinished e-mail reply. Some team members also commented that they disliked using e-mail when handling certain topics because they often exchanged may messages before resolving an issue. Issues that might require several cycles of e-mail messages could be easily and quickly resolved in an interactive group desktop conference. We tested these explanations by reviewing the e-mail data to count the number of "reply" e-mail messages (containing the system generated "Re:" in the subject field) compared to the number of "initiated" messages (those not in reply to a previous message) across the three conditions, shown in Table I. The data show that the proprotion of reply messages was lower in the full-DCP condition than the other two conditions, but an analysis of variance did not find this pattern to be statistically significant (p < 0.27). Some team members also mentioned that the team rarely used e-mail among themselves when they were located together in Billerica (except when trying to avoid personal contact with someone). After moving to the three sites, they began using e-mail heavily, especially since the three-hour time difference between Billerica and Mountain View made it difficult to catch remote team members by phone. Comments from the interviews indicated that they did not prefer using e-mail (except to send computer files), but resorted to using it because the other modes of communication were not effective, given the distribution of the team in time and space. The reduction in e-mail usage during the full-DCP condition could indicate that desktop conferencing restored some of the interactions they had when they were located at the same site, thereby reducing their reliance on e-mail. These factors alone would not explain why using the phone was not perceived as offering the same benefits as desktop conferencing for reducing e-mail use. Phone calls, like desktop conferences, afford interactive rather than asynchronous communication, but they do not allow visual contact with the remote party. Perhaps the novelty effect of introducing new technology (desktop conferencing) attracted the team to use it in ways that they did not use an existing technology (the telephone). However, the data show that the use of desktop conferencing did settle down after the first two weeks of the full-DCP condition, but the diminished use of e-mail stayed relatively constant throughout the full-DCP condition. There is no evidence in teh data that the team, having learned the value of substituting interactive communication for e-mail, began using the phone to substitute for e-mail after the video capability was removed from the prototype. These observations reinforce the role of video in determining the use of communication media.
6.4 Perceived reduction of face-to-face and video conference room meetingsThe Mountain View participants reported that they believed they were having fewer face-to-face meetings during the full-DCP condition, and that the meetings they had tended to be longer. We found no statistically significant patterns in the quantitative data to support their perceptions, although we did find some interesting anecdotal indications.As Table II shows, the team averaged slightly fewer face-to-face meetings per day during the full-DCP condition than the pre-DCP and DCP minus video condition, but an analysis of variance did not show these numbers to be significantly different (p < 0.86). Similarly, the average duration of face-to-face meetings was slightly longer in the full-DCP condition compared with the pre-DCP and DCP minus video conditions, but again, the differences were not significant (p <0.60). High variability in the data may have precluded any indications of statistical significance.
Table II. Face-to-face meeting statistics across conditions. Total number of face-to-face meetings, average duration of meetings, and standard deviation of the average duration (indicating variance) across the pre-DCP, full-DCP, and DCP minus video conditions. Still, we found at least one occasion when we know a desktop conference substituted for a face-to-face meeting. In a desktop conference that we had video-taped for our study, the participants discussed a sensitive personnel issue. They asked us to erase our videotape of the conference so that there would be no record of the interaction. They said they normally would have had that discussion face-to-face, but were apparently comfortable enough with the desktop conferencing that they were willing to use it for a sensitive discussion. The team members also indicated that they thought it was worth the effort of walking to a common meeting site for longer meetings with more than two people. these two observations indicate that the participants believed they were more likely to use desktop conferences than face-to-face meetings for short interactions, even thought the statistics do not support this perception. However, since the face-to-face meeting data are based on self reports, and since the participants tended to round the meeting durations to the nearest quarter-hour, it would be interesting to investigate this perception with more precise data. That data do show that desktop conferencing eliminated this team's use of the video conference rooms. In the pre-DCP condition, the team had just started using a weekly one-hour time slot in the video conference rooms to have all-team meetings. In the full-DCP condition, the team never used that time slot, but they resumed using it two weeks into the DCP minus video condition. Even though the prototype was not designed to support a five-way connection, the team had all-team desktop meetings several times by having three-way conferences in which pairs of people shared a camera at two sites (See Figure 7). Although the team did not frequently meet all together via desktop conferencing, their sub-team desktop conferences apparently obviated the need for all-team video conference meetings. While the full desktop conferencing eliminated the use of video conference rooms for this five-person team, video conferencing between meeting rooms is generally a different kind of collaborative activity than conferencing between moderate-sized groups (perhaps five or more on each side) in an environment that is relatively free from interruptions (e.g., telephone calls, e-mail arrival, impromptu visitors). Desktop conferencing, on the other hand, allows spontaneous interactions between individuals or small groups where each person has access to the resources of their own workstation and office. Although desktop conferencing eliminated the use of video conference rooms for this small team of five people, we do not believe that desktop conferencing should be generally considered to replace video conference rooms. Desktop conferencing was used to increase the frequency of visual contact between Billerica and Mountain View. Of the 72 desktop conferences logged, 28 (39%) involved team members from both sites. Thus in the six typical weeks in the full-DCP condition, there were 28 cross-site desktop conferences, while in the seven weeks of the pre-DCP and DCP minus video conditions combined there were only seven video conference room meetings where collaborators at both sites could see each other. Although the team used desktop conferencing to gain visual access to remote team members, the majority of the desktop conferences (61%) were among the Mountain View team members. These team members could have walked to each other's building to meet face-to-face, but they elected to use the desktop conferencing prototype instead. Despite the degraded experience of desktop conferencing, it was used to avoid the six minute walk between buildings. On the other hand, SD1 commented that he thought it would be "decadent" to have a desktop conference between him and SD2, whose office was about 30 feet down the hall in the same building. There was only one short desktop conference recorded between SD1 and SD2 toward the end of the full-DCP condition. Thus, the data indicate a use of desktop conferencing among collaborators who are separated by thousands of miles as well as several hundred yards, but not as close as a few tens of feet. There are several likely explanations for the relatively higher frequency of desktop conferences among the Mountain View members contrasted with cross-site desktop conferences between Mountain View and Billerica. There were more team members in Mountain View (three) compared to Billerica (two), and each Mountain View member had a DCP workstation in their own office while the two Billerica members shared one DCP workstation in CR1's cubicle. Furthermore, because of the three-hour time difference between the sites, there were effectively only three to five hours (depending on their availability during the two lunch hours) during the typical work day when members at both sites might be present for a cross-site conference, compared with seven to eihter hours that were typically available among the Mountain View members.
6.5 Perceived reduction in the use of phone callsIn the interviews before installing the desktop conferencing prototype, some team members said they expected to use a desktop conference for anything they currently did over the phone. In the interviews after they had used the prototype for a couple weeks, all team members reported less phone use when they had the prototype. In contrast to their perceived reduction in phone call use, an analysis of variance on the measure of phone call usage (shown in Figure 6) did not show any significant differences across the three conditions (p < 0.17). The phone call statistics in Table III show a slight increate in the average number of phone calls per day over the three conditions. The average duration of phone calls was shorter in the full-DCP condition compared to the pre-DCP and DCP minus video conditions, suggesting that desktop conferences may have replaced longer phone calls but they still used the phone for shorter calls. However, there was wide variation in the relatively sparse data, and these patterns were not found to be statistically significant.
Table III. Phone call statistics across conditions. Total number of calls, average duration of calls, and average number of calls per day across the pre-DCP, full-DCP, and DCP minus video conditions. Interviews with the participants provided some reasons why participants continued to use phone calls rather than desktop conferences for short calls. The prototype was no optimized for quick performance; starting a desktop conference could take on the order of a half-minute. For quick calls (e.g., "Ready to go to lunch?," checking if someone was in the office before visiting), the users did not want to incur the overhead of starting a desktop conference since using the phone would be much quicker. Sometimes the participants would call to check if someone was in the office before going through the trouble of starting a desktop conference. Audio quality was also relatively poor compared to the phone due to the delay and echo. As mentioned earlier, desktop conferencing to someone nearby was also perceived to be less appropriate in some situations than calling or just walking down the hall. 6.6 Desktop conferencing is a novel collaboration settingFrom the analysis of the videotapes, it is clear that desktop conferencing is a distinctly different collaboration setting than meeting face-to-face or talking on the phone. In desktop conferences, all members are located in their own offices where each person has access to his or her own resources and distractions (e.g., phone calls, e-mail arrivals, passers-by). By contrast, face-to-face meetings are usually held in conference rooms (where everyone is isolated from their resources and distractions) or in one person's office (where only that person can access her books, phone calls, etc.). Consequently, in face-to-face office meetings, it is generally considered poor etiquette to take long phone calls or spend much time reading e-mail while other people waiting for attention. In the desktop conferences we analyzed, there were several examples of people reading e-mail and taking phone calls during the conference. They seemed to treat desktop conferencing as a medium for focused interaction (like a phone call or meeting), but also one that tolerated significant amounts of attention to personal distractions. This kind of interaction is similar to the ebb and flow of group and individual activity that occurs when sharing an office or working in computer-augmented meeting rooms (Stefik et al., 1987).There are several reasons why desktop conferencing afforded this type of collaborative activity. Because all participants are located in their own offices, if one member attends to a personal distraction, every other member can easily attend to their own personal work while waiting for the conference to refocus. Also, desktop conferencing affords many cues (largely through audio and video) that enable a remote collaborator to make sense of what is happening when one person temporarily stops participating. By contrast, in phone conversations it is often difficult to interpret long pauses. In addition, some users commented that since they did not have true eye contact with the remote collaborator, they felt that they were slightly detached from them, which allowed attending to personal work. Although the users of the desktop conferencing prototype found themselves in a novel collaboration setting, they interacted in this setting in a very routine and seemingly familiar manner. They smoothly migrated from group interaction to individual work in a way that could not occur in any other medium, yet they did so in a natural way without marking the activity as novel. We believe that this shows that desktop conferencing, largely through the video channel, provided enough cues for participants to interpret the transitions between group interaction and individual work and accommodate a new style of interaction. Although they were able to accommodate a new style of working in desktop conferences, interview comments indicate that they did not necessarily like it. Several team members found it annoying when someone stopped to take a long phone call or continued doing private work while desktop conferencing. Although the video channel helped them detect such distractions, it still required a delicate social negotiation to try to manage them directly. Just as participants in face-to-face conversation are often reluctant to direct their partner's action (e.g. "Excuse me, you need to wipe off some food smudged on your face"), so desktop conference participants did not feel free to tell their partners to stop doing other work or to reposition their head to be in camera view. It is notable that many of the rules of politeness that govern face-to-face interaction also appear to be in force in desktop conferencing. The group interaction that occurred in desktop conferencing was notably more like the exchanges seen in face-to-face meetings than in the commercial video conferencing rooms. In the videotaped desktop conferences, we saw instances when remote collaborators were able to interrupt each other, accomplish turn completions, and time jokes in their conversation. Such interactions were markedly absent in videotapes of their meetings held in the video conference rooms. This improved interaction was probably enabled by reducing the audio delay in the prototype. During the study, the one-way audio delay was measured to vary between 0.22-0.44 seconds (depending on processing and networking loads). Surprisingly, this slight improvement over the 0.57 second delay in the video conference rooms was apparently enough to noticeably affect the level of interaction they could accomplish. Still, the desktop conferencing's shorter audio delays were not ideal; while the participants were able to effectively compensate for them, comments in the interviews indicated that the delays were noticeable and sometimes bothersome. 6.7 Gaze awareness in desktop conferencingTo provide a sense of eye contact in desktop conferencing, the lens of the camera was positioned as close as possible to where the video window of the remote collaborator appeared on the screen. However, all the team members remarked that their inability to establish direct eye contact through the prototype felt strange. Rather than introducing half-silvered mirror devices that effectively provide eye contact (Buxton & Moran, 1990), we wanted to see if users could interact comfortably without true eye contact. Ishii and Kobayashi (1992) raised a distinction between eye contact (seeing eye-to-eye) and gaze awareness (being aware of where others are looking). While eye contact is the expected form of interaction from face-to-face meetings, providing each collaborator with a confident sense of gaze awareness may be sufficient to enable effective and comfortable interaction.We found considerable evidence in the videotapes of desktop conferences that the collaborators had a strong sense of gaze awareness and were able to make use of that information. Figure 8 shows a sequence of video images that show one example of the use of gaze awareness. In a desktop conference between CR1 and PL, CR1 visually expresses continued disagreement with PL by avoiding "looking at" PL. PL notices that CR1 is avoiding looking at him, so he gazes and speaks to CR1 in ways that invite CR1 to look up at him. After over 40 seconds of gaze avoidance, PL moves on to another topic, at which point CR1 immediately resumes looking up at him. The timing of the interaction strongly suggests that CR1 deliberately used gaze avoidance to express disagreement.
![]()
In the interviews, we asked whether the team members could tell when collaborators in a desktop conference were looking at them. After two weeks of use, some members were occasionally uncertain, but by the end of the study everyone said that they could. We believe that if everyone's equipment is configured to provide near eye contact, users can quickly gain a sense of gaze awareness and use that to convey cues in their interactions. Of course establishing actual eye contact would be ideal in desktop conferencing, but there may be situations where the trade-offs currently needed to accomplish that (e.g., added footprint and volume occupied by current half-silvered mirror devices) are not merited. 6.8 Design implications from the study of desktop conferencingThis study indicates that, for a working team that is already familiar with each other, desktop conferencing is a useful medium for distributed collaboration. In contrast to studies that did not find a strong effect of a video channel, we found that video was the determining factor in how much desktop conferencing was used. The video provided visual and gestural cues that enabled them to interact smoothly. Gaze awareness among the collaborators in particular was used to convey cues in their interaction. When they used the shared drawing tool, they found it valuable in supporting distributed collaboration.Users commented that the audio quality of the desktop conferencing prototype needed improvement. Because most team members used a speaker for audio output and an open mike for audio input, the system exhibited a considerable amount of audio echo. Those speaking often heard a delayed echo of their speech as it traveled to others' speakers and back through their microphones. The audio quality was worse in three-way conferencing, because mixing audio streams introduced even more echo, and the increased network traffic caused cut outs in the audio streams. Because of these problems, the team often resorted to using telephone audio in three-way conferences. Although we provided headsets that eliminated the audio echo problem, all but one user found them too bothersome to use. Additionally, our experiences with the desktop conferencing prototype indicated that the phone call model for establishing and managing conferences is too limited. Users were sometimes reluctant to use desktop conferencing to contact others because they could not tell in advance whether a person was available or interruptible. The prototype also did not have the equivalent of a phone answering machine to handle conference requests when no one was there, making it frustrating to try to catch someone. This problem is evidenced in the many unsuccessful conference attempts found in the logs of prototype use. In addition to the 72 desktop conferences recorded, 96 attempts to conference were unsuccessful (recipient not in office to receive conference request, recipient's workstation not operational, recipient declined to accept conference request). It would be helpful to integrate desktop conferencing with other communication mechanisms, such as e-mail or voice mail, which could be used to leave a message after an unsuccessful attempt to conference. The study also hinted at the value of the integration of tools. It was less likely that the team would use Show Me in the DCP minus video condition since it was no longer integrated with their medium for interactive communication (the phone). Tools such a a shared drawing space may not be used continuously throughout an interaction, but must be readily available when they are needed, or else users are less likely to expend the effort to access them. Environments that support interactive collaboration should integrate easy access to whatever shared tools the users commonly need to use. This study was also a methodological learning experience in trying to combine different observational perspectives to understand the team's work activity and reaction to the prototype. Although the quasi-experimental structure of the study (three conditions, quantitative measures) did not yield many statistically significant results, it was helpful in identifying patterns and trends that we could then explore by analyzing video recorded examples of work activity or interviewing the users for their perceptions. User perceptions elicited by the interviews also helped guide us in selecting samples of videotaped activity on which to focus our analysis and in suggesting explanations for patterns observed in the qualitative data. The multiple perspectives also provided a broader understanding of the activity that could not be found in any single observational method. In retrospect, it would have been even more useful to add a fourth condition to observe how the group returned to "normal" work activity after we removed the prototype. The multiple observational perspectives also presented some new problems. Collecting multiple types of data added complexity to the data collection process and resulted in a vast amount of data to sift through. Since our primary commitment was in observing actual team work activity, we did not have the luxury of establishing control conditions and exercising other manipulations often used in laboratory experiments to produce clean quantitative data for statistical comparison. 7. What we learned about multimedia-supported collaborationWhat can we learn from our studies about how to design multimedia technology to effectively support collaborative work? Two points clearly came out in each of the three studies presented: 1) users want video connections and 2) the quality of the audio connection is crucial. It is also important to distinguish desktop conferencing from other types of communication media (e.g., face-to-face meetings, video conference room meetings, phone calls) to understand how it and other new multimedia collaboration technologies will be incorporated into everyday use with existing communication technologies.7.1 Users want video Each of the three studies clearly indicated that the users wanted to have a video capability that would provide visual contact during their interaction. Why do the users wanted this video capability? Although these studies do not claim to definitively answer this question, they do present a variety of supporting evidence that helps explain why users like video.The video channel is clearly a valuable resource in mediating interpersonal interaction. Not only does the visual channel provide cues that facilitate the mechanics of turn-taking, but it also naturally affords gestures and other visual information that convey how much is being understood, reasons for pauses in speech, participants attitudes and other modifiers (e.g., humor, sarcasm) on what is being said. This support for interactional mechanisms make video-mediated communications more efficient, effortless and effective. A richer communication channel affords greater mutual understanding among the participants, and we would expect it to help improve the quality of of their collaborative work in the long term. Isaacs and Tang (1993) provide more analyses of the benefits (and limitations) of video from this same body of data. Users' comments clearly show that they perceived added value from the video. Besides the benefits we identified in our analyses of video-mediated activity, users reported that the video capability made their interactions generally more satisfying. While the long-term effects of this perceived increased satisfaction are difficult to assess, they could have a cumulatively positive effect on the amount of interaction and mutual understanding over time. These user perceptions should play a major role in guiding the design of technology to support collaboration. Why did our studies find such a strong effect of video whereas the studies cited earlier (Ocshman & Chapanis, 1974; Gale, 1990) found none? Firstly, the previous studies focused on effects that were associated with the resulting product (e.g., quality of the result, time to complete the task). We found that the video channel affected the process of interaction (e.g., supporting turn-taking mechanisms, demonstrating understanding and attitudes). Although these effects on interpersonal communication have been hypothesized (Short et al., 1976) and recognized (Conrath et al., 1977; Gale, 1990) in earlier studies, this paper presents specific evidence from real work activity of how video supports human interaction. While enhancing the interaction process may not have any measurable effect on the resulting product in the short term, we would expect tangible end product benefits over the span of weeks and months that is typical in real world work. Secondly, the observational methods used in this research differed from those used in the previous studies. The previous studies analyzed the activity of artificial groups working on contrived tasks for an hour or so. The studies presented in this paper examined the activity of actual working groups engaged in their real work activity over several weeks. Since audio and video tend to have the most effect on social, interpersonal communication, those effects would be most noticeable among a group in which social and personal relationships were well developed and exercised over time. Our ability to see how video supports social interaction was a direct result of studying actual working activity that had real social elements in it. The value of video that we observed did not even include one of the inherent strengths of the video media. Video is good for showing and manipulating three-dimensional objects, such as a component to be manufactured or a volumetric shape to be designed. Since the groups studied in this research worked mainly with documents and computer software, they did not exercise this potential capability of video. We would expect that working teams in a domain that involved physical artifacts would find even greater value in video. 7.2 Audio is crucialIn each of our three studies, audio quality was an issue. Audio plays a fundamental role in supporting human interaction and users' expectations of audio are formed by their experiences in face-to-face and phone interactions. Technologies that degrade the audio channel (e.g., delays, echo, incomprehensible audio quality) disrupt people's ability to smoothly interact with each other. Although the team using our desktop conferencing prototype was willing to endure the degraded audio to have the video capability, it was clearly the aspect that they most wanted to see improved in the prototype.Alhough the ideal is to strive for high fidelity audio and video, our experiences confirm that audio is relatively more important than video in supporting collaboration. Our desktop conferencing prototype design made several tradeoffs of degrading video performance to preserve audio quality. If high network traffic prevented transmitting all of the audio-video data between sites, the video data degraded first (image froze) to allow as much audio data to get through before cutting out. Audio was delivered with minimal delay, even allowing the audio to arrive before the accompanying video image, violating audio-video synchrony. As long as network constraints require trade-offs to conserve bandwidth, our experiences indicate that degrading video quality before degrading audio quality provides a more usable experience. 7.3 Desktop conferencing is not face-to-face is not vide conferencing is not...The data from our study of desktop conferencing demonstrated that it reduced certain amounts of other kinds of interaction (e.g., video conference room meetings, e-mail). Comments from the video conferencing room survey indicate that some users think that video conference room meetings substitute for face-to-face meetings. However, these findings should not be taken to imply that desktop conferencing could completely replace face-to-face meetings, video conference room meetings, e-mail, or any other form of interaction. As discussed earlier, desktop conferencing is a distinct setting for collaboration and is unlikely to completely replace existing forms of interaction. The adoption of video conferencing rooms and other multimedia technology has suffered from marketing myths that promote them as replacements for face-to-face interaction (Egido, 1990).Rather, we should strive to understand how new forms of interaction can be integrated with the existing ones into people's day-to-day work. By understanding how these new technologies augment, complement, and interact with people's existing work practice, we can design new technology that can be smoothly and naturally adopted. As we develop new technology for collaboration, more research is needed to understand existing collaborative practice as well as how users respond to the new technology in the context of their actual work. More research is needed into new issues that these technologies raise, such as privacy concerns of having ubiquitously available audio and video and how to apply multimedia support to collaboration settings that are non-cooperative. By iteratively cycling between developing new technology and studying how people use that technology, we can both design better technology that is matched to users' needs and increase our understanding of human work activity. AcknowledgementsWe would like to acknowledge the other members of the Conferencing and Collaboration group: David Gedye, Amy Pearl, Alan Ruberg, and Trevor Morris. They helped build the desktop conferencing prototype and provided many forms of support for this study. We thank the other participants in our regular video analysis sessions for the many insights and observations that they contributed: Monica Rua, Hagan Heller, Todd Macmillan, and Tom Jacobs. We thank Randy Smith and Jonathan Grudin for reviewing earlier versions of this paper. The Digital Integrated Media Environment (DIME) group in Sun Microsystems Laboratories, Inc. developed the SBus card that enabled the COCO conferencing prototype. This research was conducted at Sun Microsystems Laboratories, Inc. We especially thank our anonymous participants in the study for giving us generous access to their daily work activity.ReferencesBuxton, Bill and Tom Moran, 1990. EuroPARC's Integrated Interactive Intermedia Facility (IIIF): Early Experiences, Multi-User Interfaces and Applications, S. Gibbs and A. A. Verrijn-Stuart (Eds.), Amsterdam: Elsevier Science Publishers B.V., 11-34.Bly, Sara A. 1988. A Use of Drawing Surfaces in Different Collaborative Settings. In Prodeedings of the Conference on Computer-Supported Cooperative Work, 250-256. Portland, OR, USA. Bly, Sara A., Steve R. Harrison and Susan Irwin. 1993. Media spaces: Bringing People Together in a Video, Audio, and Computing Environment. Communications of the ACM 35(1):28-47. Conrath, David W., Earl V. Dunn, William G. Bloor and Barbara Tranquada. 1977. A Clinical Evaluation of Four Alternative Telemedicine Systems. Behavioral Science 22: 12-21. Egido, Carmen, 1990. Teleconferencing as a Technology to Support Cooperative Work: Its Possibilities and Limitations, Teamwork: Social and Technological Foundations of Cooperative Work, Jolene Galegher, Robert E. Kraut, and Carmen Egido (Eds.), Hillsdale, NJ: Lawrence Erlbaum Associates, Publishers, 351-371. Falk, Howard. 1973. Picturephone and Beyond. IEEE Spectrum: 45-49. Fish, Robert S., Robert E. Kraut, Robert W. Root, and Ronald E. Rice, 1992. Evaluating Video as a Technology for Informal Communication, Proceedings of the Conference on Computer Human Interaction (CHI) `92, Monterey, CA, 37-48. Francik, Ellen, Susan Ehrlich Rudman, Donna Cooper and Stephen Levine. 1991. Putting Innovation to Work: Adoption Stategies for Multimedia Communication Systems. Communications of the ACM 34 (12): 53-63. Gale, Stephen, 1990. Human aspects of interactive multimedia communication, Interacting with Computers, 2(2) 175-189. Gale, Steve, 1991. Recent Advances in Network and Video Communications, Hewlett-Packard Technical Report HPL-91-96, 1-21. Heath, Christian and Paul Luff, 1991. Disembodied Conduct: Communication Through Video in a Multi-media Office Environment, Proceedings of the Conference on Computer Human Interaction (CHI) `91, New Orleans, LA, 99-103. Isaacs, Ellen and John C. Tang. 1993. What Video Can and Can't Do For Collaboration: A Case Study. In Proceedings of the ACM Multimedia '93 Conference. Anaheim, CA. Ishii, Hiroshi and Minoru Kobayashi, 1992. ClearBoard: A Seamless Medium for Shared Drawing and Conversation with Eye Contact, Proceedings of the Conference on Computer Human Interaction (CHI) `92, Monterey, CA, 525-532. Kendon, Adam. 1986. Current Issues in the Study of Gesture. In The Biological Foundations of Gestures: Motor and Semiotic Aspects, eds. Jean-Luc Nespoulous, Paul Perron and Andre Roch Lecours, 23-47. Hillsdale, NJ: Lawrence Erlbaum Associates Publishers. Krauss, Robert M., Connie M. Garlock, Peter D. Bricker, and Lee E. McMahon, 1977. The Role of Audible and Visible Back-Channel Responses in Interpersonal Communication, Journal of Personality and Social Psychology, 35(7) 523-529. Masaki, Shigeki, Naobumi Kanemaki, Hiroya Tanigawa, Hideya Ichihara, and Kazunori Shimamura, 1991. Personal Multimedia-multipoint Teleconference System for Broadband ISDN, High Speed Networking, III, O. Spaniol and A. Danthine (Eds.), Amsterdam: Elsevier Science Publishers B.V., 215-230. Minneman, Scott L. and Sara A. Bly, 1991. Managing a trois: a study of a multi-user drawing tool in distributed design work, Proceedings of the Conference on Computer Human Interaction (CHI) `91, New Orleans, LA, 217-224. Ochsman, Robert B. and Alphonse Chapanis, 1974. The Effects of 10 Communication Modes on the Behavior of Teams During Co-operative Problem-solving, International Journal of Man-Machine Studies, Vol. 6, 579-619. Olson, Margrethe H. and Sara A. Bly, 1991. The Portland Experience: A Report on a Distributed Research Group, International Journal of Man-Machine Systems, Vol. 34, No. 2, February 1991, pp. 211-228. Reprinted: Computer-supported Cooperative Work and Groupware, Saul Greenberg (Ed.), London: Academic Press, 81-98. Pearl, Amy. 1992. System Support for Integrated Desktop Video Conferencing, Sun Microsystems Laboratories, Inc. Technical Report TR-92-4 Mountain View, CA. Root, Robert W. 1988. Design of a Multi-Media Vehicle for Social Browsing, Proceedings of the Conference on Computer-Supported Cooperative Work, Portland, OR, 25-38. Sacks, H., E. Schegloff, and G. Jefferson, 1974. A simplest systematics for the organization of turn-taking for conversation, Language, Vol. 50, 696-735. Short, John, Ederyn Williams, and Bruce Christie, 1976. The Social Psychology of Telecommunications, London: John Wiley & Sons. Smith, Randall B., Tim O'Shea, Claire O'Malley, Eileen Scanlon and Josie Taylor. 1989. Preliminary Experiments with a Distributed, Multimedia, Problem Solving Environment. In Proceedings of the First European Conference on Computer Supported Cooperative Work: ECSCW '89, 19-34. London, UK. Reprinted: 1991. In Studies in Computer Supported Cooperative Work: Theory Practice and Design, eds. J. Bowers and S. Benford, Amsterdam: Elsevier Science Publishers. Stefik, Mark, Gregg Foster, Daniel G. Bobrow, Kenneth Kahn, Stan Lanning, and Lucy Suchman, 1987. Beyond the chalkboard: Computer support for collaboration and problem solving in meetings, Communications of the ACM, Vol. 30(1) 32-47. Reprinted: 1988. Computer-Supported Cooperative Work: A Book of Readings, Irene Greif (Ed.), San Mateo, CA: Morgan Kaufmann Publishers, Inc., 335-366. Tang, John C., 1991. Findings from Observational Studies of Collaborative Work, International Journal of Man-Machine Studies, 34(2), 143-160. Reprinted: 1991. Computer-supported Cooperative Work and Groupware, Saul Greenberg (Ed.), London: Academic Press, 11-28 Tang, John, 1991. Involving Social Scientists in the Design of New Technology, Taking Design Seriously: Exploring Techniques Useful in HCI Design, John Karat (Ed.), Boston: Academic Press, 1991, 115-126. Tang, John C. and Scott L. Minneman, 1991. VideoDraw: A Video Interface for Collaborative Drawing, ACM Transactions on Information Systems, 9(2), 170-184. Tatar, Deborah, 1989. Using Video-Based Observation to Shape the Design of a New Technology, SIGCHI Bulletin, 21(2), 108-111. Watabe, Kazuo, Shiro Sakata, Kazutoshi Maeno, Hideyuki Fukuoka, Toyoko Ohmori, 1990. Distributed Multiparty Desktop Conferencing System: MERMAID, Proceedings of the Conference on Computer-Supported Cooperative Work, Los Angeles, CA, 27-38. Wilkes-Gibbs, Deanna, 1986. Collaborative Processes of Language Use in Conversation, Ph.D. dissertation, Stanford University. Williams, Ederyn. 1977. Experimental Comparisons of Face-to-Face and Mediated Communication: A Review. Psychological Bulletin 84(5); 963-976.
|