Direct Manipulation and lmmersive Environments

•· Leibniz sought to make the form of a symbol reflect its content. ' In signs,' he wrote, 'one sees an advantage for discovery that

is greatest when they express the exact nature of a thing

briefly and, as it were, picture it; then, indeed, the labor of '' thought is wonderfully diminished.'

CHAPTER OUTLINE 7. 1 Introduction

7.2 What Is Direct Manipulation?

7.3 Some Examples of Direct Manipulation

7 .4 2-D and 3-D Interfaces

7 .5 Teleoperation and Presence

7 .6 Augmented and Virtual Reality

Frederick Kreiling

"Leibniz," Scientific American, May 1968


230 Chapter 7 Direct Manipulation and lmmersive Environments

7. 1 Introduction

Certain interactive systems generate a glowing enthusiasm among users that is in marked contrast with the more common reaction of reluctant acceptance or troubled confusion. The enthusiastic users report the following positive feelings:

• Mastery of the interface

• Competence in performing tasks

• Ease in learning originally and in assimilating advanced features

• Confidence in the capacity to retain mastery over time

• Enjoyment in using the interface

• Eagerness to show off the interface to novices

• Desire to explore more powerful aspects

These feelings convey an image of a trul y pleased user. The central ideas in sucl1 satisfying interfaces, now widely referred to as direct-m.anipulation interfaces (Shneiderman, 1983), are visibility of the objects and actions of interest; rapid, reversible, incremental actions; and replacement of typed commands by a point­ing action on the object of interest. Direct-manipulation ideas are at the heart of many contemporary and advanced non-desktop interfaces. Game designers continue to lead the way in creating visually compelling 3-D scenes with charac ­ters (sometimes designed and user -created) controlled by novel pointing devices. At the same time, interest in remote-operated (teleoperated) de, rices has blossomed, enabling operators to look through distant microscopes or fly drones. As the technology platforms mature, direct manipulation increasingly influences designers of mobile devices and webpages. It also inspires designers of information-visualization systems that present thousands of objects on the screen wi th dynamic user controls (Chapter 16).

Newer concepts that extend direct manipulation include virtual reality, aug ­mented reality, and other tangible and touchable user interfaces. Augmented reality keeps users in the normal surroundings but adds a transparent overlay with information su ch as the nam es of buildings or v isuali zations of nidden objects. Tangibl e and touchable user interfaces give users physical objects to manipulate so as to operate the interface-for examp le, putting several plastic blocks near to each other to create an office floor plan. Virtual reality puts users in an immersive environmen t in which the normal surroundings are blocked out by a head-mounted display that presents an artificial world; hand gestures allow users to point, select, grasp, and navigate. All of these concepts are being applied not only in individual interactions but also in wider artificia l worlds, creating collaborative efforts and other types of social-media interactions.

This chapter defines the principles, attributes, and problems of direct manipu­lation, including a way to categorize direct manipu lation (Section 7.2). Some

7.2 What Is Direct Manipu lation? 231

examples of direct-manipulation use are provided in Section 7.3. Section 7.4 discusses 2-D and 3-D interfaces. Teleoperation and presence are covered in Sec­tion 7.5. Lastly, augmented and virtual reality are discussed Section 7.6. Although the tenets of direct manipulation still hold true, regardless of the sophistication of the technology, the technology in this chapter is advancing rapidly. The refer­ences for this chapter include a combination of books and articles. The articles are taken from the recent conference proceedings showing some of tl1e innovations and interesting projects being developed in research labs of industries and academia. Many pundits and popular press sources (Kushner, 2014; Kofman, 2015; Metz, 2015; Mims, 2015; Stein, 2015) feel the time for virtual and augmented reality is now. Researchers are looking into the theoretical challenges and oppor­tunities in virtual worlds (de Castell et al., 2012) and continuing to improve upon the gaming experience (Kulshreshth and La Viola, 2015).

7 .2 What Is Direct Manipulation?

Direct manipulation as a concept has been around since before computers. The metaphor of direct manipu lation works well in computing environments and was introduced in the early days of Xerox PARC and then widely disseminated by Shneiderman (1983). Direct-manipulation designs can provide the capability for differing populations and easily stretch across international boundaries. Sec­tion 7.2.1 explains the three principles of direct manipulation and advantages of using direct manipulation. Section 7.2.2 provides a way of discussing direct manipulation using a translational concept of strength. Section 7.2.3 discusses some problems with direct manipulation. Section 7.2.4 discusses the continuing evolution of direct manipu lation.

A favorite example of direct manipulation is driving an automobile. The scene is directly visible through the front window, and performance of actions such as braking and steering has become common knowledge in our culture. To turn left, for example, the driver simply rotates the steering wheel to the left. The response is immediate and the scene changes, providing feedback to refine the turn. Now imagine how difficult it would be trying to accurately tum a car by typing a command or selecting "turn left 30 degrees" from a menu. The graceful interaction in many applications is due to the increasingly elegant app lication of direct manipu lation. Although there is lively discussion on the

232 Chapter 7 Direct Manipulation and lmmersive Environments

impact of driverless cars and their uses, research still continues. Driverless cars may soon respond to commands like "take me to Baltimore airport," but they are a long \,vay from matching the skills of drivers at the whee l while navigating snow-covered roads or police hand signals at accident sites.

Before designing for current devices, it makes sense to reflect where early design has been. In the early days of office automation, there was no such thing as a direct-manipulation word processor or a presentation system like Power­Point. Word processors were comm.and-line-driven programs where the user typically saw a single line at a time. Keyboard commands were used along with inserting special commands to provide instructions for viewing and printing the documents often as a separate operation. Similarly, \,vith presentation programs, specialized commands were used to set the font style, color, and size. Obviously, these were very limited compared to tl1e numerous font families available today. Most users today are used to a WYSIWYG (What You See Is What You Get) environment enhanced by direct-manipulation widgets.

7.2. l The three principles and attributes of direct manipulation

The attraction of direct manipulation is apparent in the enthusiasm of the users. The designers of the examples, provided in Section 7.3, had an innovative inspi­ration and an intuitive grasp of what users would want. Each examp le has prob­lematic features, but they demonstrate the potent advantages of direct manipulation, which can be summarized by three principles:

1. Continuous representations of the objects and actions of interest with meaningful visual metaphors

2. Physical actions or presses of labeled interface objects (i.e., buttons) instead of complex syntax

3. Rapid, incremental, reversible actions whose effects on the objects of interest are visible immediately

Simple metaphors or analogies with a minimal set of concepts-for example, pencils and paintbrushes in a drawing tool- are a good starting point. Mixing metaphors from two sources may add complexity that contributes to confusion. Also, the emotional tone of the metaphor should be inviting rather than distaste­ful or inappropriate. Since the users are not guaranteed to share the designer's understanding of the metaphor, ai1alogy, or conceptual model used, ample test­ing is required.

Using these three principles, it is possible to design systems that have these beneficial attributes:

• Novices can learn basic functionality quickly, usually through a demonstra ­tion by a more experienced user.

• Experts can work rapidly to carry out a wide range of tasks, even defining new functions and features.

7.2 What Is Direct Manipu lation? 233

• Knowledgeable intermittent users can retain operational concepts.

• Error messages are rarely needed.

• Users can immediatel y see whether their actions are furthering their goals, and if the actions are counterproductive, they can simply change the direc­tion of their activity.

• Users experience less anxiety because the interface is comprehensible and be­cause actions can be reversed easily.

• Users gain a sense of confidence and mastery because they are the initiators of action, the y feel in control, and they can pr edict the inter face's responses.

In contrast to textual descriptors, dealing with visual representations of objects may be more "natural" and in line with innate human capabilities: Action and visual skiJJs emerged well before language in human evo lution. Psy­chologists have long known that people grasp spatial relationships and actions more quickly when they are given visual rather than linguistic representations. Furthermore, intuition and discovery are often promoted by suitable visual rep­resentations of formal mathematical systems.

7.2.2 Translational distances with direct manipulation

The effectiveness and reality of the direct-manipulation interface are based on the va lidity and strength of the metaphor chosen to repre sent the actions and objects. Using familiar metaphors creates easier learning conditions for users and lessens the number of mistakes and incorrect actions. Adequate testing is needed to validate the metaphor. Special attention needs to be paid to the user characteristics such as age, reading level, educational background , prior experi­ences, and any physical disabilities.

One way of trying to understand and categorize the direct -manipulation metaphor is by looking at the translational distance between users and the repre­sentation of the metaphor, which will be referr ed to as strength. Strength can be perceived along a continuum from weak to immersi ve (See Box 7.1). This can be further described as the level of indirectness between the user's physical actions and the actions in the virtual space.

BOX 7. 1 Examples of trans lationa l dis tances (strength).

• Weak-early video game controllers (Fig. 7.5)

• Medium-touchscreens, multi-touch (Fig. 7 .1)

• Strong-data g love, gesturing, manipulating tangible objects (Fig. 7.2)

• lmmersive - virtual reality, i.e, oculus rift (Fig. 7.14)

234 Chapter 7 Direct Manipulation and lmmersive Environments

Weak direct manipu lation is what can be described as basic direct manipula­tion. There is a mouse, trackpad, joystick, or similar device trans lating the user's physical action into action in the virtual space using some mapping function. The translational difference is large because interaction is completely indirect. For example, the user moves the mouse on a 2-D desk within a small circum­scribed region and the mouse moves on the screen (again 2-D). Because this mapping function is not always fully understood and processed correctly by the user, sometimes the user will actually run the mouse off the surface of the desk. Weak direct manipulation was used with early game controllers that provided buttons and joysticks, where the action of the controllers needed to be learned explicitly by the players.

Medium direct manipulation is the next step moving along the continuum. The translational distance is reduced. Instead of communicating with the virtual space with the device, the user reaches out and touches, moves, and grabs the entities in the on-screen representation. Examples of this include touchscreens (mobile, kiosk, and desktop). This is still limited by the glass of the screen; the world is beyond the glass. This direct-manipulation strength supports pointing

flGURE 7.1 Three users working concurrently on a large tabletop touch device. They can use their hands/figures to manipulate the objects on the device. Note the different hand gestures being used. (www.reflectivethinking.com)

7.2 What Is Direct Manipulation? 235

and tapping, but other activities that would include the third dimension, like reaching into the device, cannot be accommodated by the simple metaphor. Instead, creating these other actions requires stepping outside the metaphor with a new artifact such as double-tap and assigning a corresponding action to it. This again requires learning on the user's part. Multi-touch (Fig. 7.1) allows new actions to be assigned to various combinations of finger touches. The two­finger actions like zoom in/ out are intuitive, but others must be learned and take longer to discover. This accounts for why a young child can easily learn to tap, change screens, and touch on a tablet (the intuitive actions) but doesn't have the skills to rearrange the icons on the screen (the learned actions).

Strong direct manipulation involves actions such as gesture recognition with various body parts. It may be the user's hand, foot, head, or full body (whatever controls the action) that is "virtually'' placed inside the physical space (Fig. 7.2). The users can see their hand in the 3-D space and can grasp, throw, drop,

F GURE 7.2 A tangible user interface for molecular biology, developed in Art Olson's Laboratory at the Scripps Research Institute, utilizes autofabricated molecular models tracked with the Augmented Reality Toolkit from the University of Washington Human Interface Techno logy Lab. The video camera on the laptop captures the mo lecule's position and orientation, enabling the molecu lar mode ling software to disp lay information such as the attractive/repu lsive forces surrounding the mo lecule.

236 Chapter 7 Direct Manipulation and lmmersive Environments

manipu late, and so forth. The users themselves still remain on the outside looking in. This works well when the spaces are small and simple, but when the spaces get bigger, the users need to move themselves outside the initial metaphor and en ter another mode, sucl1 as move mode, ai1d then traverse to the new region. See Chapter 10 for more on devices.

The notion of tangible and immersive user interfaces-in which users grasp physical objects to manipulate a graphica l display that represents the object are becoming quite popular . Tai1gible devices use haptic interaction skills to maiup­ulate objects and convert the physical form to a digital form (Ishii, 2008).

The last dimension is imn1ersive direct manipulation. Here is where direct manipulation is combined with virtual reality (see Section 7.6). The users put on glasses or some other device and they are inside the space. The users can see them selves and can wa lk/ fly through tl1e space by walking, leaniI1g in, and so forth - the scenery changes with the moves.

7.2.3 Problems with direct manipulation Graphical user interfaces were a setback for vision -impaired users, who appre ­ciated the simpl icity of linear command languages. However, screen readers for interfaces, speech-enabled devices, page readers for browsers, and audio designs for mobile devices enable vision -impaired users to understand some of the spatial relationships necessary to achieve their goa ls.

Direct-manipulation designs may consume valuable screen space and thus force va luabl e information off-screen, requiring scrolling or multiple actions. This is an issue in the mobi le world, where screen space is very limited.

Another issue is that users must learn the meanings of visual representations and graphlc icons. Titles that appear on icons (flyover help) when the cursor is over them offer only a partia l solution. The visual representation may sometimes be mis­leading. Users may grasp the analogical representation rapidly but then may draw incorrect conclusions about permissible actions, overestimating or underestimating the functions of the computer-based analogy. Ample testing must be carried out to refine the displayed objects and actions and to minimize negative side effects.

For experienced typists, taking a hand off the keyboard to move a mouse or point with a finger may take more time than typing the relevant command. This problem is especially likely to occur if the users are familiar with a compact notation, such as for arithmetic expressions, that is easy to enter from a key­board but may be more difficult to select with a mouse. While direct manipula­tion is often defined as replacing typing of commands with pointing with devices, sometimes the keyboard is the most effective direct-manipulation device. Rapid keyboard interaction can be extremely attractive for expert users, but the visual feedb ack must be equally rapid and comprehensible.

Small mobile devices have limited screen sizes. A finger pointing at a device may partially block the display, rendering a good portion of the device not

7.2 What Is Direct Manipu lation? 237

visib le. Also, if the icons are small because of the limited screen size, they may be hard to select or, because of limited resolution and viewing capabilities (especially for older adults), not clearly distinguishable, resulting in their meanings becoming lost or confused.

Some direct-manipulation principles can be surprising ly difficult to realize in software. Rapid and incremental actior\s have two strong implications: a fast perception/ action loop (less than 100 ms) and reversibility (the undo action). A standard database query may take a few seconds to perform, so implementing a direct-manipulation interface on top of a database may require specia l program­ming techniques. The undo action may be even harder to implement, as it requires that each user action be recorded and that reverse actions be defined. It changes the style of programming because a nonreversible action is imple­mented by a simple function call whereas a reversible action requires recording the inverse action.

7.2.4 The continuing evolution of direct manipulation

A successful direct-manipulation interface must present an app ropriate representation or model of reality. With some applications, the jump to visual language may be difficult, but after using vis ual direct-manipulation" interfaces, most users and designers can hardly imagine why anyone would want to use a complex syntactic notation to describe an essentially visual process. It is hard to conceive of learning the commands for the vast number of features in modern word processors, drawing programs, or spreadsheets, but the visua l cues, icons, menus, and dialog boxes make it possible for even intermittent users to succeed. See Box 7.2 for a summary of the advantages and disadvantages of direct manipulation.

BOX 7.2 Advantages and disadvantages of direct manipu lation.

Direct Manipulation


• Visually presents task concepts

• Allows easy learning

• Allows easy retention

• Allows errors to be avoided

• Encourages exploration

• Affords high subjective satisfaction


• May be hard to program

• Accessibil ity requires special attention

238 Chapter 7 Direct Manipulation and lmmersive Environments

Users are trying to better understand all the data and other visua l conten t that are now available. One way to manage this information is through the use of a dashboard (Few, 2013). Being able to see a large volume of information (big data) at one time and to directly manipulate it and observe the impact visually is a power­ful concept. Businesses and companies are bombarded by volumes of data every day. The ability to organize this user-generated data into a useful graphical for­mat can help them manage resources and spot trends (Chapter 16). Dashboards provide ways for users to manipulate data t1sing the var ious widgets provided. Companies such as Tableau Software, SAP Lumira, and IBM Cognos provide this capability as do smaller user-oriented companies like dashboardsbyexample.

Weiser's (1991) influential vision of ubiquitous computing described a world where computational devices were everywher e- in your hat1ds, on your body, in your car, built into your home, and pervasively distributed in your environme11t. The 1993 special issue of Conrrnunications of the ACM (Wellner et al., 1993) showed provocative p rototypes that refined Weiser's vision. It offered multiple visions of beyond-the-desktop designs that used freehand gestures and small mobile devices whose displays changed depending on wl1ere users stood and how tl1ey pointed the devices. Almost 25 years later, Weiser's full vision has not yet been realized, but the social-media aspect of ubiquitous computing has blossomed.

Touchable displays from the small to ilie large (as large as wall size [Figs. 10.20 and 10.21] or even mall size) are becoming available as well . Interaction is all accomplished without users entering a long string of commands; instead, users physically manipulate the items of interest with their hands. An application of this is often seen on news programs, where the commentator can move the objects of interest on the screen and drill down to more detailed levels. Another application is virtual maps, which can be manipulated and zoomed by using hand motions as a multi -touch interface (Han, 2005). On a touchab le display, interactions with both hands seem quite natural (although with small displays, issues of occlusion can be problematic).

There will cer tainly be many future variations of and extensions to direct manip­ulation, but the basic goals will remain similar: comprehensib le interfaces that enable rapid learning, predictable and controllable actions, and appropriate feed­back to confirm progress. Direct manipulation has the power to attract u sers because it is rapid, and even enjoyable . If actions are simple, reversibility is ensured, retention is easy, artXiety recedes, users feel in control, and satisfaction flows in.

7 .3 Some Examples of Direct Manipulation

No single interface has every admirable attribute or design feature-such an interface might not be possible. Each of the examples discussed here, however, has sufficient features to win the enthusiastic support of many users.

7.3 Some Examples of Direct Manipulation 239

7.3.1 Geographical systems including GPS (global positioning systems)

For centuries, travelers have relied on maps and globes to better understand the Earth and geographical systems. As graphic- and image -capture capabilities increased (both real-world and human-generated), it was a natural progression to create systems to represent both a current location- "where we are" - and a target location - "where we want to go ." Of course, as prices dropped, these types of systems became available as commercial GPS systems for cars, for walk­ing, and even for the mobile phone. Being able to directly see the alternatives on the devices as well as how to move from the current location to the target location including martipulating the routes is another application of direct martipulation.

Google Maps ™, MapQuest, Google Street View, Garmin, National Geo­graphic, and Google Earth TM combine geographic information from aerial pho­tographs, satellite imagery, and other sources to create a vast database of graphical information that can easily be viewed and displayed. In some areas, the detail can go down to an individual house on a street or even inside a build­ing (Fig. 7.3). With the well-populated databases of geographic points of interest, these systems provide an easy-to-use facility to point and select the nearest gas station or specific type of restaurant. Some systems provide real-time traffic to facilitate alternative routing in traffic-laden situatior1s.

FIGURE 7 .3 This is a screenshot from Google Street View of the inside of the University Center at Nova Southeastern University in Florida. On the bottom is a scrollable image of other views on campus. In the bottom left corner is a more conventiona l static map showing the physical street location of the campus. Users can move the "person" to a different location on campus, and the views will change accord ingly.

240 Chapter 7 Direct Manipulation and lmmersive Environments

7.3.2 Video games For many people, the most exciting, well-engineered, and commercially suc­cessful application of the direct-manjpulation concepts lies in the world of video games. The early but simp le and popular game Pong® (created in 1972) required the user to rotate a knob that moved a white rectangle on the screen. A white spot acted as a ping-pong ball that ricocheted off the wall and had to be hit back by the movable white rectangle. Users developed speed and accuracy in placing the "paddle" to keep the increasingly speedy ball from getting past, while the computer speaker emitted a ponging sound when the ball bounced. Watching someone else play for 30 seconds is all the training that a person needs to become a competent novice, but many hours of practice are required to become a skil led expert. The interface objects were a single paddle, a ball, a single player, and some rudimentary sound. Games have come a long way v.rith various controls, including full body, multiple objects of interest (both good and evil), full stereo sounds, detailed graphical environments, changing backgrounds, and the pos­sibility of multiple players sitting physically next to one another or virtually across the globe.

Some cataloguers state tl1at we are in the eighth generation of video games. Parkin (2014) provides an illustrated history of five decades of video games. Last generation's Nintendo Wii, Sony PlayStation 3, and Microsoft Xbox 360TM have given way to this generation's Nintendo Wii U, Sony PlayStation 4, and Microsoft Xbox One in a very short time, and continued advances are expected. These gaming platforms have brought powerful 3-D graphics hardware to the home and have created a remarkable international market. Gaming experiences are being enhanced by combining 3-D user-interface technologies, such as ste­reoscopic 3-D, head tracking, and finger-count gestures (Kulshreshth and La Vi­ola, 2015). For a detailed survey of visual, mixed, and augmented reality gaming, refer to Thomas (2012).

Wildly successful games include violent first-person shoo ters, fast-paced rac­ing games, and more sedate golfing games . Small handheld game devices still exist, but now users are playing games on their phones and other mobile devices. Multi-player games on the internet have also caught on with many users providing the additional opportunity for social encounters and competitions. Gamjng magazines and conferences attest to the widespread interest. In Rochester , New York, part of the Museum of Play houses the International Center for History of Electronic Games (http:/ /www.museumofplay.org/ic heg).

There is a wide genre of games, and the borders between genres are becom­ing blurred. Some games are single-player games; others have multiple players. For a list of gamjng genre acronyms, see Box 7.3. Players can be in the same physical space or a different physical space but shared virtual space. Players themselves can be virtual. For a more complete taxonomy of gaming sys tems, see Pagulayan et al. (2012). In conducting research with player performance and

7.3 Some Examples of Direct Man ipulation 241

BOX 7 .3 Gaming genre acronym s.

The computer world is filled with a list of gaming genre acronyms. Some of the more widely used acronyms include:

• AA -act ion adventure games

• ARPG - action role play games

• FPR-first-person shooter

• MMORPG-massively multi-player online role-p layi ng games

• MOBA-massive online battle arena

• RPG-role-playing games

• RTS- real-time shooter

experience, the games with multiple players seem to hold more interest in the social connection with others, teamwork, and collaboration. The single-player games seem to focus more on the game narrative and the charac ters, and players show more interest in the degree of immersion (Johnson et al., 2015).

Game environments provide intriguing, successful app lications of 3-D repre­sentations. These include first-person action games in which users patrol city stree ts or race down castle corr idors while shoo ting at opponents as well as role­playing fantasy games with beautifully illustrat ed island havens or mountain strongho lds. Many games are socially enriched by allowing users to choose ava ­tars to represent themselves. Users can choose avatars that resemble themselves, but often they choose bizarre charact ers or fantasy images with desirable char­acteristics such as unusual streng th or beauty (Boellstorff, 2008).

Some web -based game environments may involve millions of users and thousands of user -constructed "worlds," such as schools, shopping malls, or urban neighborhoods. Game devotees may spend dozens of hours per week immersed in their virtua l worlds, chatting w ith coJJaborators or negotiating with opponents. World of Warcraft (developed and publi shed by Blizzard Enter­tainment) has been the mainstay and most popular of the MMORPG games with more than 5.6 million subscribers as of 2015 (Fig. 7.4). New games are constan tly hitting the market, and the comp etition is fierce. A relatively new entry to the mark et (2012), Guild Wars 2 (deve loped by Arena Net and published by NCsoft) already has sold more than 5 million copies . This game is slightly different from other MMORPG games because the game is responsive to individual player actions, which is more common in sing le-player role-playing games.

The Nintendo Wii, introduced in 2006, changed the demographics of the gam­ing wor ld. Instead of young children (typical ly boys), older adults were using the

242 Chapter 7 Direct M anip u lat ion and lmmersi ve Env i ronments

FIGURE 7.4 A woman playing World of Warcraft . She is using both her keyboa rd and mouse . She also can hear the sou nds of the game via her headset .

Wii to play games like tennis and bowling . It also became an early fitness/well­ness platform . With the introduction of the Kinect by Microsoft for Xbox in 2010 and then for Windows in 2012, more worlds opened up, and with a software development kit (SDK), developers can create their own worlds. These interfaces have been referred to as a natural user interface because the entire body can be used, but the possible actions still remain limited and need to be learned. The early Wii controller was modified with the addition of a wrist strap, since gamers were so immersed in play they sometimes accidentally hurled the controller at the screen. There is no syntax to remember, and therefore there are no syntax­error messages . Error messages in general are rare because the results of actions are obvious and can be reversed easily: If users move themselves too far to the left, they merely use the natural inverse action of moving back to the right. These principles, which have been shown to increase user satisfaction, could be applied to other environments. Examples of various game controllers are shown in Fig. 7.5. Customized controllers exist for games such as Guitar Hero (Fig. 1.8), flight control (Fig. 10.9), and Leap Motion (Fig. 10.16).

F GURE 7.5

7.3 Some Examples of Direct Manipulation 243

·• • I I••• -. / ' , '♦ ' I I • --•

• • •

Various game controllers, Some are very specific and include a steering wheel or joystick; others use a series of buttons and direction arrows. The Wii Contro ller with t he wrist strap is show n in t he upper right corner. Although these game controllers do provide direct-manipulation actions, users sti ll have to learn the meaning of t he various bu ttons.

Most games continuously display a numeric score so that users can measure their progress and compete with their previous performance, with friends, or with the highest scorers. Typically, the 10 highest scorers get to store their ini­tials in the game for pub lic display. This strategy provides one form of positive reinforcement that encourages mastery. Studies with elementary-school chil­dren have shown that continuous display of scores is extremely valuable. Machine-generated feedback-such as "Very good" or "You're doing great!"-

244 Chapter 7 Direct Manipulation and lmmersive Environments

is not as effective, since the same score carries different meanings for different peop le. Most users prefer to make their own subjective judgments and perceive the machine-generated messages as an annoyance and a deception. Providing this combination of behavioral data and attitudina l data adds to the immersiorl quality of the game (Pagulayan et al., 2012).

Although the marketing focus and consumer popularity have concentrated on action-type games, there are other game environments, and gaming (or gam­ification) has become a popular metaphor used in training and evalua tion. Sim­ulation and educational games abound. Games have been developed for young children (pre-readers) where the intuitiveness of the icons and real-world-type interfaces (buttons, sliders, finger pointing, etc.) control the game. Females seem more interested in role-playing games and games wi th narratives. A whole new generation of female gamers now exists. Games are also used for wellness ben­efits (Calvo and Peters, 2014; Jones et al., 2014). Researchers are trying to better understand how users think and get into their flow state (Csikszentmihalyi, 1990; Ossola, 2015). Gaming can be used to learn and enhance physical skills, modify bellaviors, and increase we llness. Although there are some 11egative implica­tions of gaming, McGonigal (2011) offers some rules for the positive impact of gaming: limit yourself to no more than 21 hours a week; play games face to face with friends and family; and play cooperative games or games that have a creator mode.

Studying game design is fun (Lecky-Thompson, 2008), but there are limits to the applicability of the lessons. Game players are engaged in competition with the system or with other players, whereas app lications-sys tems users prefer a stro ng internal locus of control, which gives them the sense of being in charge. Likewise, whereas game players seek entertainment and focus on the challenge, application users focus on their tasks and may resent too many playful distractions. The random events that occur in most games are meant to challenge the users; in non-game designs, however, predictable sys tem behavior is preferred. Throughout this book, we discuss the u ser experience (UX); the gaming world now designs for the player experience (PX). Research is continuing with the development of a growing set of metrics to measure PX Gohnson et al., 2015). Additional research is ongoing in the quantification and eva luation of playfulness providing meaningful and memorable experiences (Lucero et al., 2014).

Courses and majors (or minors) in video game design exist. Some are in computer science departments, but others show the more interdisciplinary nature of the subject and can be found in media design, visual communication, and art departments. The important take-away is to use clear affordances, good instructions, and informative feedback; limit the complexity; and be aware of human variability (Fisher et al., 2014). All these are basic tenets of HCI design as described in Section 3.3.4 (The Eight Golden Rules of Interface Design).

7.3 Some Examples of Direct Manipulation 245

7.3.3 Computer-aided design and fabrication Most computer-aided design (CAD) systems for automobiles, electronic cir­cuj try, aircraft, or mechanical engineering use principles of direct manipulation. Building and home architects now have at their disposal powerfu l tools, pro­vided by companies such as Autodesk, that provide components to handle structural engineering, floor plans, interiors, landscaping, plumbing, electrical installation, and much more. With such applications, the designer may see a circuit schematic on the screen and, with mouse clicks, be able to move compo­nents into or out of the proposed circuit. When the design is complete, the com­puter can provide information about current, voltage drops, and fabrication costs and warnings about inconsistencies or manufacturing problems. Similarly, newspaper-layout artists or automobile-body designers can easily try multiple designs in minutes and can record promising approaches until they find even better ones. The pleasure of using these systems stems from the capacity to manipulate the object of interest directly and to generate multiple alternatives rapidly.

There are large manufacturing companies using AutoCAD ® and similar systems, but there are also other specialized design programs for kitchen and bathroom layouts, landscaping plans, and other homeowner-type situations. These programs allow users to control the angle of the st1n during the various seasons to see the impact of the landscaping and shadows on various portions of the house. They allow users to view a kitchen layout and calculate square foot­age estimates for floors and countertops and even print out materials lists directly from the software . Some of the players in the field of interior-design software for residential and commercial markets include Floored, Inc. (Fig. 7.6), 2020 Spaces, and Home Designer Software. Their products are designed to work across multiple environments, desktop to web; they provide various views (top-down, architectural, front-view) to generate a more realistic overview of the design for the client.

Related applications are for computer-aided manufacturing (CAM) and pro ­cess control. Honeywel l's Experian ® Process Knowledge System Orion provides the manager of an oil refinery, paper mill, or power-utility plant with a colored schematic view of the plant. The schematic may be displayed on multiple dis­plays or on a large wall-sized map, with red lines indicating any sensor values that are out of the normal range. With a single click, the operator can get a more detailed view of the troubling component; with a second click, the operator can examine individual sensors or can reset valves and circuits. A basic strategy for this design is to eliminate the need for complex commands that the operator might need to recall only during a once-a-year emergency. The visual overview provided by the schematic facilitates problem solving by analogy because the linkage between the screen representations and the plant's temperatures or pressures is so close. The latest version of this software provides capabilities for

246 Chapter 7 Direct Manipulation and lmmersive Environments

-,----.----- / ' /

• -

FIGURE 7.6 An office space layout from a company called Floored, Inc. Th is 3·D virtual CAD representation he lps designers lay out office space. Items can be moved around between and within rooms; the design will be re-created to reflect any changes (http ://www. fl oored .com) .

virtualization and cloud support and includes customized dashboards to show status.

Another emerging use of direct manipulation involves home automation. Since so much of home contro l invo lves floor plans, direct-manipulation actions naturally take place on a display of the floor plan with selectable icons for each status indicator (such as a burglar alarm, heat sensor, or smoke detector) and for each activator (such as controls for opening and closing curtains or shades, for air conditioning and heating, or for audio and video speakers or screens). For exam­ple, users can route a recorded TV program being watched in the living room to the bedroom and kitchen by merely dragging the on-screen icon into those rooms, and they can adjust the volume by moving a marker on a linear scale. The action is usually immediate and visible and can be easily reversed as well.

With the advent of these types of systems, not only are graphical, sophisti­cated 3-D disp lays generated, but with 3-D printing technology, actual work­able models can be generated. These models provide a more realistic view for clients and customers. These models can include an overall outside view or ever1 be broken down to show component parts if necessary. The cost saving of these models versus building the actual structure or device can be enormous coupled

7.3 Some Examples of Direct Manipulation 247

FIGURE 7.7 Astronaut Bruce Wilmore onboard the International Space Station with the ratchet wrench that was created with Made in Space's 3-D printer. This device was designed, qualified, tested, and printed in space in less than one week.

with the ease for incremental or larger modification or changes. 3-D printers have been installed on the NASA space station, where actual parts can be fabricated (Fig. 7.7).

7.3.4 Direct-manipulation programming and configuration

Performing tasks by direct manipulation is not the only goal. It should be pos­sible to do programming by direct manipulation as well, at least for certain problems. How about moving a drill press or a surgical tool through a complex series of motions that are then repeated exactly? Automobile seating positions and mirror settings can be set as a group of preferences for a particular driver and then adjusted as the driver settles in place. Likewise, some professional tele­vision-camera supports allow the operator to program a sequence of pans or zooms and then to replay it smoothly when required.

Programming of physical devices by direct manipu lation seems quite natural, and an adequate visual representat ion of information may make direct­manipulation programming possible in other domains. Spreadsheet packages such as Excel™ have rich programming languages and allow users to create

248 Chapter 7 Direct Manipulation and lmmersive Environments

portions of programs by carrying out standard spreadsheet actions. The result of the actions is stored in another part of the spreadsheet and can be edited, printed, and stored in a textual form. Dat abase programs such as Access TM allow users to create buttons that when activated will set off a series of actions and commands and even generate a report. Similarly, Adobe Photoshop records a history of user actions and then allows users to create programs with action sequences and repetition using direct manipulation.

It would be helpful if tl,e computer could recognize repeated patterns reliably and create useful macros automatically while the user was engaged in performing a repetitive interface task. Most cellphones have buttons that can be programmed to call home or call the doctor or another emergency number. This allows the user to encounter a simpler interface and be shielded from the details of tl,e tasks.

7 .4 2-D and 3-D Interfaces

Some designers dream about building interfaces that approach the richness of 3-D reality. They believe that the closer the interfaces are to the real world, the easier usage will be. This extreme interpretation of direct manipulation is a dubious proposition, since user studies show that disorienting navigation, com­plex user actions, and annoying occlusions can slow performance in the real world as well as in 3-D interfaces (Cockburn and McKenzie, 2002). Many inter­faces (sometimes called 2-D interfaces) are designed to be simpler than the real world by constraining movement, limiting interface actions, and ensuring visi­bility of interface objects. Howe ver, the strong utility of "pure" 3-D interfaces for medical, architectural, product design, and scien tific visua lization purposes means that they remain an important challenge for interface designers. So the power of 3-D interfaces lies in applying them in the appropriate domain or context where the added dimension provides more understanding and improves task outcomes.

An intriguing possibility is that "enhanced" interfaces may be better than 3-D reality. Enhanced features might enab le outside of real human capabilities, such as faster-than -light teleportation, flying through objects, multiple simultaneous views of objects, and x-ray vision. Playful game designers and creative applica­tions developers have already pushed the technology further than those who seek merely to mimic reality.

For some computer-based tasks-such as medical imagery (Fig. 7.8), architec­tural drawing, compute r-assisted design, chemical-structure modeling (Fig. 7.2), and scientific simulations - pure 3-D representations are clearly helpful and have become major industries. Howe ver, even in these cases, the successes are often due to design features that make tlle interface better than reality.

7.4 2-D and 3-D Interfaces 249

FIGURE 7 .8 By using a medical simulation inserted into a large -scale visuali zation (using CAVE tec hnology), physicians were able to find a solution that would not have been possible with the actual surgery. (http://www.nsf.gov/news/news_summ.jsp?cntn _id=126209)

Users can magica lly change colors or shapes, duplicate objects, shrink/ expand objects, group/ungroup components, send them by various electronic means, and at tach floating labels. Users can go back in time and even undo recent actions .

Among the many innovations, there have been questionable 3-D prototypes, such as for air-traffic control (showing altitude by perspective drawing only adds clutter when compared to an overview from directly above), digital librar­ies (showing books on shelves may be nice for browsing, but it inlub its search­ing and linking), and file directories (showing tree structures in three dimensions sometimes leads to designs that increase occlusion and navigation problems). Other questionable applications include ill-considered 3-D features for situa­tions in which simple 2-D representations would do the job. For example, add­ing a third dimension to bar charts may slow users and mislead them (Hicks et al., 2003), but they are such an attraction for some users that they are included in most business graphics packages (Cognos, SAS/GRAPH, SPSS/SigmaPlot).

A modest use of 3-D techniques is to add highlights to 2-D interfaces, such as buttons that appear to be raised or depressed, wirldows tha t overlap and leave shadows, or icons that resemb le real-world objects. These may be enjoyable, recognizable, and memorable because of improved use of spatial memory, but

250 Chapter 7 Direct M anip ulat ion and lmmersive Environments

they can also be visually distracting and confusing because of additional visual complexity.

This enumeration of features for effective 3-D interfaces might serve as a checklis t for designers, researchers, and educators:

• Use occlusion, shadows, perspective, and other 3-D techniques carefully.

• Minimize the number of navigation steps required for users to accomplish their tasks.

• Keep text readable (better rendering, good contrast with background, and no more than 30-degree tilt).

• A void unnecessary visual clutter, distraction, contrast shifts, and reflections.

• Simplify user movement (keep movements planar, avoid surprises like going through wa lls).

• Prevent errors (that is, crea te surgical tools that cut only where needed and chemistry kits that prod uce only realistic molecules and safe compounds).

• Simplify object movement (facilitate docking, follow predic table paths, limit rotation).

• Organize groups of items in aligned structures to allow rapid visua l search.

• Enable users to construct visual groups to support spatial recall (placing items in corners or tinte d areas) .

Breakthroughs based on clever ideas seem possible. Enriching interfaces with stereo displays, hap tic feedback, and 3-D sound may yet prove beneficial in more than specialized applications. Bigger payoffs are more likely to come sooner if these guidelines for inclusion of enhanced 3-D features are followed :

• Provide overviews so users can see the big picture (plan view disp lay, aggregated views).

• Allow teleportation (rapid context shifts by selecting destination in an overview).

• Offer x-ray vision so users can see into or beyond objects.

• Provide history keeping (recording, undoing, replaying, editing) .

• Permit rich user actions on objects (save, copy, annotate, share, send).

• Enable remote collaboration (synchronous, asynchronous).

• Give users control over explanatory text (pop -up , floating, or excentric labels and screen tips) and let them view details on demand.

• Offer tools to select, mark, and measure.

• Implement dynamic queries to rapidly filter out unneeded items.

• Support semantic zooming and movement (simple action brings object front and center and reveals more details).

7.5 Teleoperation and Presence 251

• Enable landmarks to show themselves even at a distance.

• Allow multiple coordinated views (users can be in more than one place at a time and see data in more than one arrangement at a time).

• Develop novel 3-D icons to represent concepts that are more recognizable and memorable.

3-D environments are greatly appreciated by some users and are helpful for some tasks (Laha et al., 2012). They have the potential for novel social, scientific, and commercial applications if designers go beyond the goal of mimicking 3-D reality. Enhanced 3-D interfaces could be the key to making some kinds of 3-D teleconferencing, collaboration, teleoperation, and telepresence popular. Of course, it will take good design of 3-D interfaces (pure, constrained, or enhanced) and more research on finding the payoffs beyond the entertaining features that appeal to first-time users. Success will come to designers who provide compel­ling content, relevant features, appropriate entertainment, and novel social­media structure support. By studying user performance and measuring satisfaction, those designers will be able to polish their designs and refine guide­lines for others to follow.

7 .S Teleoperation and Presence

Teleoperation has two parents: direct manipulation in personal computers and process control, where human operators control physical processes in complex environments . Typical tasks are operating power or chemical plants, controlling manufacturing, surgery, flying airplanes or drones, or steering vehicles. If the physical processes take p lace in a remote location, we talk about teleoperation or reniote control. To perform the control task remot ely, the human operator may interact with a computer, which may carry out some of the control tasks without any interference by the human operator.

There are great opportunities for the remote control or teleoperation of devices if acceptable user interfaces can be constructed. When designers can provide adequate feedback in sufficient time to permit effective decision mak­ing, attractive applications in manufacturing, medicine, military operations, and computer-supported collaborative work are viable. Home-automation applica­tions extend remote operation of various devices to security and access systems, energy control, and operation of appliances. Scientific applications in space, underwater, or in hosti le environments enable new research projects to be con­ducted economically and safely. The recent introduction of affordable drones will be yet another facet of teleoperation.

252 Chapter 7 Direct Manipulation and lmmersive Environments

In traditional direct-manipulation interfaces, the objects and actions of inter­est are shown continuously; users genera lly point, click, or drag rather than type, and feedback indicating change is immediate. However, when the devices being opera ted are remote, these goals may not be realizable, and designers must expend additional effort to help users to cope with slower responses, incomplete feedback, increased likelihood of breakdowns, and more complex error-recovery procedures. The problems are strongly connected to the hardware, physical environmen t, r1etwork design, and task domain.

A typical remote app lication is telem.edicine, or medical care delivered over communication links (Sonnenwald et al., 2014). Telemedicine can be used more broadly to allow physicians to examine patients remotely and surgeons to carry out operations across continen ts. Telehealth is being wide ly used in the Veteran's Administration (Fig. 7.9).

Veterans can come into the local VA office where technology visits with the various medical personnel can be conducted via Telehealtll. Cameras with

. ' 9• I

FIGURE 7 9 Erica Taylor, Nurse Director for t he Telehealth Program at Landstuhl Regional Medical Center, demonstrates using the Telehea lth cart otoscope to conduct a real-time tympanic membrane exam. On the screen is Physician Assistant Steven Cain, who from a remote location can see and evaluate the patien t and provide an appropriate plan of care. Photo by Phil Jones.

7.5 Teleoperation and Presence 253

FIGURE 7.10 When doing robotic su rgery, the surgeon sits at the computer console and contro ls t he robotic came ra and surgical inst ruments remotely. Various devices on the contro ller can be adjusted by the surgeon including adj ustments/magnifiers to clearly see the f ield of view.

high-resolution images can allow the doctor to see the physical condition as well as the added benefit of seeing the affect of the patient. A trained medical person can be in the office with the patient to help facilitate the examination. Other medical applications include robotic surgery. Robotic surgery is an alternative to conventional surgery that enables a smaller incision and more accurate and pre­cise surgical movements. The robotic platform expands the surgeon's capabili­ties and provides a highly magnified 3-D image (Fig. 7.10). In addition, the surgeon has control over hand, wrist, and finger movement through robotic instrument arms. The surgeon is comfortably seated across the operating room at a console rather than being over the patient, and the system damps out some involuntary movements that can be problematic.

The architecture of remote environments introduces several complicating factors:

• Time delays. The network hardware and software cause delays in sending user actions and receiving feedback: a transmission delay, or the time it takes for the command to reach the microscope (in our example, transmitting the command over the network), and an operation delay, or the time unti l the microscope

254 Chapter 7 Direct Manipulation and lmmersive Environments

responds. These delays in the system prevent the operator from knowing the current status of the system.

• Incomplete feedback. Devices originally designed for direct control may not have adequate sensors or status indicators. For instance, the microscope can transmit its curr ent position, but it opera tes so slowly that it does not indicate the exact current position.

• Unanticipated interferences. Since the operated devices are remote, unanticipated interferences are more likely to occur than with physically pre sent direct-manipulation environments. For instance, if a local operator accidentally moves the slide under the mjcroscope, th e positions indicated might not be correct. A breakdown might also occur during the execution of a remote operation without a good indication of this event being sent to the remote site.

One solution to these problems is to make explicit the network delays and breakdowns as part of the system. The user sees a model of the starting state of the system, the action that has been initiated, and the current state of the system as it carries out the action. It may be preferable for users to specify a destination (rather than a motion) and wait until the action is comp leted before readjusting the destination if necessary. Avenues for continuous feedback also are important.

Teleoperation is also commonly used by the military and by civilian space projects. Military applications for unmanned aircraft gained visibility during the recent wars in Afghanistan and Iraq. Reconnaissance drones and teleoper­ated missile -firing aircraft were widely used. Agile and flexible mobile robots exist for many hazardous duty situations (Murphy, 2014). Military missions and harsh environments, such as undersea and space exploration, are strong drivers for improved designs.

Telepresence was initially defined by Marvin Minsky (1980), but today the operative term is presence. The concept was that of not being remote but giv­ing the feeling of "being there." Advances are being made with telepresence, and toda y's technologie s and the inter11et-connected world have opened up additional possibilities. The commercial market is seeing a set of technologies called mobile remote presence (MRP) systems (Fig. 7.11). These are advanc­ing video conferencing sys tems and allowing remote workers to have a feel­ing of presence. These devices facilitate formal communica tions as well as more informal chats in hallways. Some of the companies creating these devices include Suitable Technologies Beam, Mantarobot, Doublerobotics, and VGO. The controlling of these dev ices is another application of direct manipulation. Another application that extends the idea of video conferenc­ing, made popular by Skype and other techno logies, is a shared work space called ImmerseBoard , where the users are co-located but can work on the same screen (Fig. 7.12).

7.5 Teleoperation and Presence 255

FIGURE 7.11 Three peop le having a conversation in a work environment , two are participating using MRP devices .

FIGURE7.12 lmme rseBoard allows two users to be co-located and work on the same shared screen (Higuchi et al., 2015).

256 Chapter 7 Direct Manipulation and lmmersive Environments

Robotics is a subfie ld of telepresence. Robots are being used in medical settings, office settings, education, and other specialized applications. New usage norms are being established for these types of devices and interactions (Lee and Takayama, 2011). The remote coworkers are often referred to as pilots. They can wander the hallways or "just hang out." Frameworks are being created with various design dimensions to better understand presence (Rae et al., 2015). It is important to understand the perspective of the users and especially that of the remote user. Doing various tasks with remote users in this type of set up can increase cognitive load. The remote person needs to concentrate on the task at hand as well as operating and positioning the device properly (Rae et al., 2014). Kristof fersson et al. (2013) provide an in-depth review of mobile robotic presence. Future work needs to be done on how mobility affects remote collaboration and on better understanding the desigrl of mobility features. For individuals with limited mobility, robotics can facili­tate more active p articipation. A full discussion of robotics and HCI is beyond the scope of this book.

7 .6 Augmented and Virtual Reality

Flight-simulator designers work hard to create the most realistic experience for fighter and airline pilots. The cockpit displays and controls are taken from the same production line that creates the real ones . Then the windows are replaced by high-reso lution computer displays, and sounds are choreographed to give the impression of engine start or reverse thrust. Finally, the vibration and tilting during climbing or turning are created by hydraulic jacks and intricate suspen­sion systems. This elaborate technology may cost $100 million, but even so, it is a lot cheaper, safer, and more useful for training than the $400-million jet that it simulates. (And for training actual pilots, the reasonable flight simulators that millions of home computer game players have purchased won't quite do the trick!) Flying a plane is a complicated and speciali zed skill, but simulators are available for more common-and some surprising- tasks under the alluring name of virtual reality or the more descriptive virtual environ1nents.

The gurus of virtua lity are promoting immersive experiences. The miniaturization of electronics has provided less bulky gear to do that exploring. As compu ter sys tems continue to run faster, the obstacles that were in the way of immersive experiences are disappearing and the technology is becoming more affordable. Head-mounted displays are available from various manufacturers: Oculus Rift, Razer OSVR, HTC Vive, Sensics, Sony Glasses, and

7.6 Augmented and Virtual Reality 257

FIGURE 7.13 Image-guided surgery can be done with the surgeon's hand attached to multiple sensors that can mimic the hand and finger positions and create accurate control. In the past, gloves were often used to att ach t he sensors and did not offer the flexibility and accuracy of the directly attached sensors. (http://po lhemus.com/micro-sensors)

Polhemus. Bulky gloves are being replaced by more-lightweight materials (Fig. 7.13) and less-cumbersome connections (Fig. 7.14). Companies are advancing this technology very quickly. Magic Leap has just applied for a patent for a con­tact lens to facilitate augmented or virtual reality (Kokalitcheva, 2015).

The direct -manipulation principles out lined in Section 7.2.1 may be helpfu l to people who are designing and refining virtual and augmented reality environments. When users can select actions rapidly by pointing or gesturing

FIGURE7.J4 Oculus Rift head gear. This is an example of a virtual reality head-mounted display .

258 Chapter 7 Direct Manipulation and lmmersive Environments

-------- Mixed Reality -------~

Real Environment


Augmented Reality

Augmented Virtuality

Reality-Virtua lity (RV) Continuum

Virtual Environment

This figure shows the reality -virtuality continuum initia lly sketched by Milgram and Kishi no in 1994. It still holds t rue today. Mixed rea lity is the reality that has some aspects of augmented reality within a virtua l environment.

and display feedback occurs immediately, users have a strong sense of causal­ity. Interface objects and actions should be simple so that users view and manip­ulate task-domain objects.

Graphics researchers have been perfecting image displays to simulate light­ing effects, textured surfaces, reflections, and shadows. Data structures and algorithms for zooming in or panning across an object rapidly and smoothly are now practical on common computers and even some mobile devices. The immersive environment has some problems, including simulator sickness, nau­sea, and discomfort from wearing head-mounted gear and other equipment. Some of these problems are minimized by less-jumpy graphic transitions. Better understanding of the usability challenges, such as how much reality should be incorporated and when and how it can improve the user experience, is needed (McGill et al., 2015).

As our systems become more soph istica ted, the distinction between differ­ent leve ls of virtual ity blurs. It is best portrayed as originally conceived by Milgram and Kishino (1994): a continuum (Fig. 7.15). The last two sections of this chapter discuss augmented reality (Section 7.6.1) and then virtual reality (Section 7.6.2).

7.6.1 Augmented reality

Augmented reality enables users to see the real world with an overlay of addi ­tional information; for example, while users are looking at the walls of a build­ing, their semitransparent eyeglasses may show the location of electrical wires and studwork . Medical applications, such as allowing surg eons or their assis­tants to look at patient while they see an over lay of a sonogram or other perti ­nent information to help locate a tumor, also seem compelling (Fig. 7.16). Augmented reality could show users how to repair equipment or guide visitors through cities (Fig. 7.17). Augmented reality strategies also enab le users to

7.6 Augmented and Virtual Reality 259


/ Incisio n 24 ,.,

Virtual real ity might be used to help surgeons or their assistants during surgery, by showing perti nent information superimposed on a view of the real world. ( http://a ug menta ri um.um iacs. u md.edu)


manipulate real-world artifacts to see results on graphical models (Poupyrev et al., 2002; Ishii, 2008) with applications such as manipulating protein molecules to understand the attractive/ repul sive force fields between them. Using augmented reality systems to enhance social pretend play by young children (ages 4-6) promotes reasoning abou t emo tional states as well as communication and diver­gent thinking (Bai et al., 2015).

An interior designer walking through a house with a client should be able to pick up a window-stretching tool or pull on a handle to try out a larger window or to use a room-painting tool to change the wall colors while leaving the win­dows and furniture untouched. Companies like IKEA are providing augmented reality tools so customers can visua lize the products via their catalog in their own homes and rooms (Fig. 7.18).

7.6.2 Virtual reality The presence aspec t of vir tual reality breaks the physical limitati ons of space and allows users to act as though they are somewhere else . Practical think ers immediate ly grasp the connection to remote direct manipulation, remote control, and remote vision, but the fantasists see the potential to escape current

260 Chapter 7 Direct Manipulation and lmmersive Environments

Prentiss Brown Thea ...

** * 0.44mi

'.'· -\

': , . . i,I • ... . Banana Republ **** IC

~ ti -·­,., ~

Olrl Town 0.04mi





1.-\ : ' ..

~ . ~ ~~·> -

... - -

0.04m i , ·

. -. -. "'I . . • . I r ~: , :""l'l •

- ... . ' -. l'.,. ·. , :



·=· •••• •• •• • •• • • •

Steamer's Grill House

** ** 0.04m i \~ ?'! , ...... -~ .... , . ~ · ..... ~ - . ' . ~

Wine Cellar

* * * *; O.OSmi

Using augmented reality overlays, the HERE City Lens app shows various points of interest on a mobile phone . Icons represent the types of places (food, shopping, etc.) and distances from the current location. In addition, links are provided to user reviews .

FIGURE 7.18 Customers can use their persona l mobi le devices to pul l up objects from the IKEA catalog and see how the various items would look in their own house .

7.6 Augmented and Virtual Reality 261

reality and to visit science-fiction worlds, cartoonlands, previous times in history, galaxies with different laws of physics, or unexplored emotional territories.

Tltere have been many medical successes using virtual environments. For example, virtual worlds can be used to treat patients with a fear of heights by giving them an immersive experience with control over their viewpoint and movement. The safe immersive environment enables phobia sufferers to accommodate themselves to frightening stimuli in preparation for similar experiences in the real world. Another dramatic result is that immersive envi­ronments provide distractions for patients so that some forms of pain are con­trolled (Fig. 7.19). The immersive virtual reality environment has been used to treat military personnel suffering with PTSD (Fig. 7.20). Virtual worlds can be used for positive computing (Calvo and Peters, 2014) and wellness issues (Fig. 7.21).

FIGURE 7.19 A patient using UW HITLab/Harborvview's SnowWorld pain distraction at Shriners Children's Burn Center Galveston. UW designer/researcher Hunter Hoffman's latest version of SnowWorld was created for the UW by gifted worldbui lders at www .firsthand.com using www.3ds.com Virtual World Deve lopment Software. The immersive experience seems to lessen the painfu l experiences.

262 Chapter 7 Direct Manipulation and lmmersive Environments

FIGURE 7 . 20 Soldiers can "re•live" portions of their combat experiences in a virtual reality setting with full immersion and sounds. Some systems even provide full immersion to include shaking and movement to make the experience as realistic as possible. Working with trained therapists, the soldier can be slowly desensitized from the traumatic experiences. (http://ict.usc.edu)

The opportunities for artistic expression and public-space installations are being explored by performance artists, museum designers, and building architects. Creative installations include projected images, 3-D sound, and sculptural components, sometimes combined with video cameras and user control by mobile devices. Other creative ideas include virtual dressing rooms where users can try on clothes on a model of themselves. The possibilities are truly endless.

Further information on virtual and augmented reality can be found in the wide assortment of textbooks available (Fuchs et al., 2011; Boellstorff et al., 2012; Kipper and Rampolla, 2012; Craig, 2013; Hale and Stanney, 2014; Barfield, 2015, Jerald, 2016). Billinghurst et al. (2014) recently compiled a comprehensive sur ­vey of augmented reality that gives both history of the field and details about the technologies and tools, including future research directions. The field is

Practitioner's Summary 263

FIGURE 7.21 Image of a virtual meditative wor ld for engaging in meditat ion act ivi t ies. The virtual world has sounds tha t change with each chakra (stage) of the meditation process. This is an application of positive computing. (http://nsuworks.nova.edu/gscis_etd/65/)

changing rapidly, and although avatars and virtual worlds still exist and are being explored (Blascovich and Bailenson, 2011), other virtual worlds like Second Life have almost disappeared (Boellstorff, 2008).

Practitioner's Summary

Among interactive systems that provide equivalent functionality and reliabil­ity, some systems have emerged to dominate the competition. Often, the most appealing systems have an enjoyable user interface with customized user ­generated content that offers a natural representation of the task objects and actions-hence the term direct manipulation (Box 7.2). These interfaces are easy to learn, to use, and to retain over time. Novices can acquire a simp le st1bset of the actions and then progress to more elaborate ones. Actions are rapid, incremen­tal, and reversible, and they can be performed with physical movements instead of complex syntactic forms. The results of actions are visible immediately, and error messages are needed less often.

264 Chapter 7 Direct Manipulation and lmmersive Environments

Using direct-manipulation principles in an interface does not ensure its success. A poor design, slow implementation, or inadequate functionality can undermine acceptance. For some applications, other approaches may be more appropriate. However, great potential exists for multiple and varied applica­tions of direct-manipulation concepts. Compelling demonstrations of virtual and augmented reality are being applied in a growing set of app lications with enhanced social interactions. Iterative design (Chapter 4) is especially important in testing advanced direct-manipulation sys tem s because the novelty of these approaches may lead to unexpected problems for designers and users.

Researcher's Agenda

Research needs to refine our understanding of the contributions of each feattue of direct mai1ipulation: analogical representation, incremental action, reversibil­ity, physical action instead of syntax, immediate visibili ty of results, characteris ­tics such as translational distances, and graphic displays. Reversibility is easily accomplished by a generic undo action, but designing natural inverses for each action may be more attractive. Complex actions are well-represented with direct manipu lation, but multi-la yer design strategies for graceful evolution from novice to expert usage could be a major contribution. For expert users, direct ­manipulation programming is still an opportuni ty, but good methods of history keeping and edi ting of action sequences are needed as well as increased atten­tion to user-generated content. Better understanding of touchable interfaces and their uses as well as research on two-handed versus one-handed operations are needed. The allure of 3-D interaction is great, but researchers need to provide a better understanding of ho,.v and when (and when not) to use features such as occlusion, reduced navigation, and enhanced 3-D actions such as telepor tation or x-ray vision and what are the best widths for field of view. Providing bet ­ter semantic understanding of 3-D images can provide information for visually impaired users to better understand their environment. The impact of immer­sion on gaming and virtual worlds using rich socia l-media interactions across various ages and activities needs to be understood better.

Beyond the desktops and laptops, there is the allure of presence, virtual environments, augmented realities, and context-aware devices. Research is needed into how presence affects bel1aviors and interactions including privacy issues. The playful and enjoyable aspects will certainly be pursued, but the real challenge is to find the practical designs and a better understanding of "being there" when looking at 3-D worlds, both as individuals and as collaborators and players in tl1e enriched social-media environments. A new set of tools is needed to investigate and better understand digital games research and its imp lications, both good and bad.

Discussion Questions 265


1. Describe three principles of direct manipulation.

2. Give four benefits of direct manipulation. Also list four problems of direct manipulation.

3. Explain the differences between various kinds of direct manipulation with respect to translational distances.

4. An airline company is designing a new online reservation system. They want to add some direct-manipulation features. For example, they would like cus­tomers to click a map to specify the departure cities and the destinations, and to click on the calendar to indicate their schedules. From your point of view, list four benefits and four problems of the new idea compared witll their old system, which required the customer to do the job by typing text.

266 Chapter 7 Direct M anip ulat ion and lmmersi ve Env i ronments

5. Explain how virtual reality can be used for medical purposes .

6. List an example of teleoperation or virtual reality. Consider what a future application (that does not present ly exist) might do. Be creative!


W. H. Auden A Certain World, 1970


272 Chapter 8 Fluid Navigation

8. 1 Introduction

This chapter addresses design issues related to navigation, which can be defined as enabling users to know where they are and to steer themselves to their intended destination. In short, navigation is about getting work done or having fun through a series of actions, much like sailors who steer their boat to a harbor. Navigation is key to successfully operating interactive applications, such as installing a mob ile app, filling in a survey, or purchasing a train ticket (task navigation). It is also the key to fiI,ding information on a website or browsiI,g social media (web na, rigation) or to finding the action needed in a desktop application (command menu navigation).

Navigation harnesses users' ability to rapidly skim choices, recognize what is relevant, and select what they need to realize their intentions. The goal for designers is to enable fluid navigation that allows users to gracefully and confi­dently get to where they want to go, explore novel possible routes, and back­track when necessary. Na vigation depends on recognition of landmarks that travelers use to guide their choices, wl,ich differs greatly from search, which requires users to describe what they want by typing keywords in a blank search box (see Chapter 15).

While the search box is the main technique to initiate the process of finding information in vast informa tion spaces (like the internet or digital libraries), navigation techniq ues such as small or large menus, embedded links, or tool palettes are the workhorses of navigation. Users indicate their choices with a touch, tap, or swipe of the finger or by using a pointing device (see Chapters 7 and 10) and get immediate feedback indicating what they hav e done. Nav igation by selection is an interaction style that is especially effective for users who are novice or first -time users, are knowledgeable intermittent users, or need help in structuring their decision -making processes. However, with careful design of complex menus and rapid interaction, menu selection can be appealing even to exper t frequent users . These strategies can be used in combina tion with com­mand languages (see Section 9.5), allowing users to transition smoothly from novice to expert because menus offer cues to elicit recognition rather than forcing users to recall the syntax of a command from memory. Careful design, keyboard shor tcuts, and gestures alJow expert users to navigate quickly through large information structures.

A loose definition of menus is used here as a representation of available choices, which can describe the rich array of techniques designers use to present choices and guide users as they select what they want. Arrays of check boxes or form fill-in can be seen as primarily data-entry technique s, but those techniques contribute to the experience of steering an application or website navigation (e.g., to complete a survey, sign up for a service, or make a purchase), so they are discussed in this

8.1 Introduction 273

chapter as well. Similarly, dialog boxes contribute to allowing users to express their choices, so dialog box design is described at the end of the chapter.

Very early studies demonstrated the importance of organizing menus in a meaningful structure, resulting in faster selectiort time and higher user satisfac­tion (see Section 8.4). Navigation may follow a linear sequence (e.g., in a wizard or survey), a hierarchical structure that is natural and comprehensible (e.g., an ebook split into chapters, a store into departments, or the animal ki.J.1gdom into species), or a network strt1cture when choices may be reachable by more than one path (e.g., websites).

By harnessing the latest versions of HTML or CSS, even webpages and mobile applications now include smooth animations and sleek graphic design that turn basic menus into custom widgets that help define the entire look and feel of a website or application. When links and menus or cl1oices and commands are designed using familiar terminology or recognizable visual elements and are organized in a meaningful structure and sequence, users can navigate complex information structtu es eas ily with a few mouse clicks or tap s of the finger or smoo thly scroll through sleek presentations of the possible next steps to accom­plish their tasks. Carefully selected gestures can add a sense of delight and flu­idness to the navigation on touchscreen devices.

Of course, just because a designer uses slick graphical menus, elegan t form fill-in, or well-known gestures does not guarantee that the interface will be appealing and easy to use. Effective interfaces emerge only after careful consid­eration of and testing for numerous design issues, such as task-related organiza ­tion, phrasing of items, seqt1ence of items, graphic layout and design, responsive design to adapt to vario us sizes of devices, sho rtcut s for knowledgeable fre­quent users, online help, and error correction (Bailly et al., 2015).

This chapter starts by reviewing the rich array of availab le techniques for allowing users to specify their choices, from single techniques to the combina­tions of multip le techniqu es (Section 8.2). Section 8.3 discusses issues related to small displays. Content organization is discussed in Section 8.4. Finally, Section 8.5 discusses the needs of audio menus, and form fill-in and dialog boxes are covered in Section 8.6.

274 Chapter 8 Fluid Navigation

8.2 Navigation by Selection

Choices can be presented explicitly, in that there is an orderly enumeration of the items with little extraneous information, or they can be embedded in text or graphics and still be selectable. Embedded links of webpages were first popular­ized in the Hyperties system (Koved and Shneiderman, 1986), which v.ras used for early commercial hypertext projects and became the inspiration for the hot/inks of the World Wide Web. Highlighted names, places, or phrases became menu items embedded in text that informs users and helps to clarify the meanit1g of the menu items. Graphical techniques are a particu larly attractive way to present choices while providing context to help users specify what tl1ey want. For example, maps can orient users about the geography of the area before users select an item of interest, and calendars or timelines can inform users of availability and constraints before a date or time is selected (e.g., see HIPMUNK in Fig. 1.7). Interactive visualization of information can also help analysts navigate large amount of data in a fluid visual manner (Elmqvist et al., 2011 and Chapter 16).

The simplest case of explicit menus is a binary m.enu for yes/no, true/false choices (Fig. 8.1).

Another example of a simple menu is the grid menu popularized by mobile devices, with a small set of icons and labels (Fig. 8.2).

When users 11eed to make a series of choices (e.g., in a survey or to select parameters of an application), there are well -established methods of presenting choices.

Radio buttons support single-item selection from a multiple-item menu (Fig. 8.3), while check boxes allow the selection of one or more items in a menu. A multip le-se lection menu is a convenient method for handling mu ltip le binary


For an extra $5 you can add a gift wrap selected from dozens of choices

[ Add gift wrap ) No t hanks

A simp le menu with two choices. A short exp lanation is provided. Buttons are large enough to be easy to select and have informative labels, and one answer has been highlighted as the most like ly answer.

••••o AT&T 9 16 : 58 @ * 93% a.)

( NatureNet Activitie s G

Ask a Naturalist <D Tracks <D Native or Not? <D Free Observation <D Snow Study <D Red Mounta in <D How Many Mallards? <D Heron Spott ing <D Who's Who? <D


My Backyard <D



8.2 Navigation by Selection 275

~ Story Missions


1. Jolly Alpha Five Niner Your story bcg,ns here

2. Distraction Dead herrmg

3. Lay of the Land There's no place like home

4. A Lost Child All th,ng'.; wicked ~1.1rt from 1nnoccncc

5. Paul Revere

6. Supply Run Viva ta resistance

Two examp les of simp le menus. On the left, the NatureNet citizen science app shows the nine functions of the main menu . On the right, the Zoom bi es, Run! app lists the possible missions of Season 1 of the immersive running game and audio adventure.


Does anyone in your household currently smoke?

0 Yes, someone does

0 No, no one does

Q Not sure

Three radio buttons constitute a menu that steers users to appropriate information in a health risk assessment webs ite.

276 Chapter 8 Fluid Navigation


What treatment would you like to learn about?

0 Surgery

Physical therapy

g Medication

(not available in your plan)

Check boxes allow users to indicate their preferences about treatment they would like to discuss. Feedback is provided by a check mark. Unavailable choices can be grayed out.

choices, since the user is able to scan the fttll list of items while deciding (Fig. 8.4). Unavailable choices can be grayed ou t.

8.2.1 Menu bars, pop-up menus, toolbars, palettes, and ribbons

Menu bars are typically found at the top of the each application (Fig. 8.5 but also Fig. 1.2 or 1.4) or both at the top and on the side of the screen. Common items in desktop or tablet applications are File, Edit, View, and Help, and menus that follow this order will seem familiar to most users. Clicking on a menu tit le brings up a list of related items, and users can then make a selec­tion by moving the pointing device over the items (which respond by high­ligl1ting) and clicking on the desired choice. Since positional constancy is such a strong principle, when an item is not available for selection, it is impor­tant to gray it out rather than removing it from the list (e.g., ''Copy to Scrap­book'' in Fig. 8.5).

The increasing ease of creating custom widgets allows designers to create endless var iants of the original menu bars. Pr eserv ing readability and ensur­ing that users will be able to identify menus as such are important goals whe11 creating these new de signs. Many rely on mu ltiple menu bars, placing menus at the top but also on the side and bottom of the screen or webpage. When placed on the side, submenus can open in place using an accordion n1enu style expansion, or to the side . Accordion menus work well when the submenus have few items and do not force users to scroll too far to collap se tl,e accor­dion, but accordions may also increase user disorientation when the indent ­ing scheme is unclear or the menu structure is more than two or three levels deep. Large submenus are better expanded below or to the side (e.g ., the REI

PowerPoi nt Fi e

ft Home l'omut

Nf w Slidt

i□ :al o

• -


View Insert Fo,

Undo Move Object Can't Repeat

Cut Copy Copy co Sc(apbook "-XC

Paste xv Paste Special ... ~xv

Clear S.lect All ICA Oupllcate XO Delete Slide

Remove Section Rename Secuon

Find ►

Special Characters .•

Text Objec t

Start Dictation ... fn fn

·- •-:=. • 1=. ..

Find... X F Find Next XC Find Previous OXC

Replace... O X H Advanced And ...

o X o OSome text here Q 0 0 0


•• --

Click b:> acid noties


8.2 Navigation by Selection 277

He p • Q. • Starch In Present.nlon


lnspred EHRs

Slide 24 ol 2•

,· ai.· Quiet S:yl.u ~ •


I !

112" ~ X

On the top menu bar of Microsoft PowerPoint, the Edit cascading pull-down menu (also called pulled-right) is open, followed by the Find menu. The menus allow users to exp lore the functions of the applicat ion. To faci litate discovery and learning, icons and keyboard shortcuts are indicated on the right of the menu items (for example, ~C for Copy or ~F for Find). A small black triangle indicates that selection of the menu item will lead to a submenu. Three dots( ... ) indicate t hat the selection will lead to a dialog box. Partial ly hidden behind the Edit menu, the application ribbon is visible, revealing the large number of choices available in the selected tab (Format).

\hfebsite lists all the Cycle subcategories in a large menu that expands to the right, filling mo st of the screen; see Fig. 8.6).

The limited screen space of mobile devices leads designers to strive to limit the number of menu items. To leave more room for content, most or all menu items can be moved into a separate screen that is accessible from a main menu icon, sometimes called the hamburger menu icon § for its shape and which can be placed on every screen (Fig. 8.7).

Toolbars, iconic 1nenus, and palettes can offer many actions that users can select with a click and apply to a displayed object (Fig. 1.10). A large number of tool­bars can be overwhelming, so users need to be able to customize which toolbars

278 Chapter 8 Fluid Navigation


'"'· ,~ --~ ... ..... _ ... Oft -·-__ .., ...... • ...,_._b ....... _ _.,_.,_ (idoMiif"«.


O<!,t .. ()I) ........ ~""'"°,o (U)

<I~ d"l4nt (2'1 ,_,..,._ ..,. lob<K tn Wole,,l!Xll'(l5)

!~ (II)

&11\ecM (6}

.,_..tnUSA (S'J

1, and

----· -Td'ltt•"",..._ -"'.,_ ........ .....,.,,_ ... _....

"' ... - u"""'

• •••♦♦ (I)


.......... ._ -u .. ..... "--,~-,_ "'-~---"'"""""""""'"""

-- 0:0-.:111 Wl!UJ•S­FAEI SHIPPING Wthsei0mt1-~

N ,Qi,.. e" , i. -&-NIHISMI,_. ~·~--...,. '"""·-"--_c-..., ..... .,_ - -GIVETHE

GIFTCJGUR AIIYTIM(, i llftl l!U[ AEOl&..itc.d•

• • •••• (I) • •••• (9)


In the REI website, the categories for "Cycle" are expanded all at once below the top menu, show ing 34 items organized in a meaningfu l hierarchy as a large menu .


FIGURE 8.7 The main menu of Soundhound has on ly six items, but it is still too much to be displayed on every page , so a main menu "hamburger" icon appears at the top right of all appropriate pages; for example, it appears in A but not in the recording screen C, where only t he X close icon is visible. The main menu (B) animates from the right , so most users will learn that a swipe to the left also opens the main menu and a swipe to the righ t will close it. In B, part of the previous screen is still visible on the left, re inforcing the suggestion that swiping can be used.

8.2 Navigation by Selection 279

are visible and to control the placement of those toolbars. Tool palettes (such as color wheels or layers) may be separated from the menus and moved so they do not hide the content. Users who wish to conserve screen space can eliminate most or all of the toolbars and palettes. Dense menus wi th many small icons can be overwhelming for novice users but appreciated by experts because of their small footprint and quick access.

Pop-up menus appear on the display in response to a click or tap with a point­ing device. When the content of the pop-up menu depends on the cursor posi­tion, it is called a context menu. Since the pop-up menu covers a portion of the display, there is strong motivation to keep the menu text short (so that it does not cover the context of the menu). Pop-up menus can be hard to discover, so alternative access may need to be provided. Pop-up menus can also be orga­nized in a circle to form pie m.enus also called ntarking m.enus (Figs. 2.5 and 8.8). Those menus have the advantage that the average distance to travel to select an item is smaller than linear menus, and with practice they can be used without visual attention if users memorize the direction of the item (which is easier with four to eight items) . Tlus is particularly useful in design applications that require constant menu selections (Fig. 8.8).

Ribbons were introduced by Microsoft in Office 2007. Ribbons attempt to replace menu s and toolbars by one-inch tabs grouping commands by task (Fig. 8.5). While this approach might be beneficial for new users, exper t users had difficulties adapting to the reorganized menus and finding items they knew existed before, highli ghtin g the challenge of versioning and menu reorganiza­tion in professional applications . Ribbons also reduce the screen space for the doct1ment, which is a drawback for many users .

8.2 .2 Shortcuts and gestures for rapid interaction

For rapid selection, keyboard shortcuts (also sometime calJed hotkeys, suc h as Ctrl-C on PCs or ~-C on Macs for Copy) are essential for expert users using desktop computers (Fig. 8.5). Users can memorize the keyst rokes for the menu items they use often and thus speed up the interaction considerably. The first letter of the command is often used for the shortcut to favor memorability, but caution is required to avoid collisions. If at all possible, shortcuts should be used consistently across applications; for example, Ctrl-S on a PC or ~-S on a Mac is usually used for Save and Ctrl-P or ~-P for Print. Keyboard shortcuts should be indicated next to their corresponding menu items and in the tooltip of the menu icons. Leaming shortcuts is one of the useful paths to reaclung expert perfor­mance (Cockburn et al., 2014), but many users never even attempt to learn them. Using a modifier key to reveal all the available shortcuts at once was found to be helpful to increase their use (Malacria, 2013).

Since typil1g keyboard shor tcuts become impractical or impossible with touchscreen devices, other techniques are being devised for smart phones and

280 Chapter 8 Fluid Navigation

Autodest f l.l$f0fl 360

iii II· 8 ..,. ,... • Ulilily_K..AOO,U V1' X



.::1 ~ (b llblltyJ(IIISe20"'129HOCl41 Y1

() e:J N.,._ V.wt

C Uf'IU ~

I> Q 1:1 -(> Q E:J= 1>9...P-. () Q eJ c. ...... u,s

(> Q fJ Sldt."fl

I> oo-1> t) (ii ,.,,. () q (i R,ohl l

I> 0 ""'" (> ~· 0 G,02"1

I> 0 -· (> i;j O Bltlde Crde 1


C R<epc.C l'Jttt, FIA

De;ete Q !(} frcuf'll1

.., ... ....,

''"" . I ""-,9ietlJ.1.cier111 A,ppe-••"""

<0 T«dwl'e,U11CO!IVM WO._. • .. _ • .,.~ • ·- • ...... • con,1n;e1 • .. ,.., • - •



Fusion 360TM , an AutodeskTM 3D Computer Aided Design tool, al lowed an engineer to design a uti lity knife . A click on the background of the image brings a pop -up marking menu with eight context-dependent menu items arranged in a circle (as we ll as a conventional linear menu below it). Sliding the mouse to the left selects the Undo command, now highlighted by a pie-shaped gray background. When the click+ move is done rapidly, the menu itself doesn't appear on the screen, allowing rapid command selection via simple gestures (http://www.autodesk.com/products/fusion-360).

tablet s. Gestures often serve as a shortcut for rapid selection (Box 8.1). Firs t made widely available by the Apple iPhone, gestures have transformed naviga­tion with tablets and smart phones. Still, they can be hard to discover and learn and have few or no affordances . Redundancy is recommended, i.e., an altern ate traditional way of selecting the action may be needed instead of relying solely on gestures (see Fig. 8.7). Careful design and use of gestures (Wigdor and Wixon, 2011; Zhai et al., 2012) can lead to fluid navigation for expert users but cause frustration when actions are triggered inadvertently. Newer approaches take advantage of the multi-touch capabilities of tou chscreens; for example, a two­finger swipe to the right might be associated with the back button of a browser.

8.2 Navigation by Selection 281

BOX 8.1 Examples of Common Gestures and Their Effects Gestures can speed interaction, and their d irectness is compelling, but they are hard to discover. Gestures may have different actions when applied on an object, on the backgrou nd, or toward the edge of the screen, wh ich can be frustrating when app lied inadverte nt ly; therefore, it is important to ensure easy reversal of actions . Consistent application of gestures remains an issue.

• Tap: select

• Long press: varied, from magnified cursor (iOS) to showing a tooltip (Windows 8)

• Double tap: va ried (e.g., zoom [iOS))

• Small swipe: varied (e.g ., move location or order of objects, reveal a de-lete button)

• Large swipe: usually scroll

• Rapid swipe or fling: fast scroll with inertia

• Pinch and spread: zoom in and out

• Variation with two or more fingers: varied effects

FastTap allows user s to select commands by combinin g a thumb tap (to disp lay the menu) and an index finger tap to select (Gutwin et al., 2014). As users learn the location of menu items relativ e to their thumb, they can select rap idly before the menu is even displa yed. Allow ing users to customize the gestur es may help users remember them and pro vide better accessibility than pre -defined gestures , but user s have limited underst andin g of the reco gnizer 's ability to recognize ges tur es they propose, often leacling to poo rly recognized ges tures (Oh and Findl ater, 2013).

Other aspects of design contribute to rapid na vigation , such as error pre vention , avoidin g scrolling, and laying out menus on the screen such that the distance traveled to perform the most comm on tasks is minimi zed (see Chapter s 12 and 13).

8.2.3 Long lists Sometime s the list of menu items may be lon ger than the 30 to 40 line s that can reasonabl y fit on a displa y . One comm on so lution is to create a tree-s tructur ed menu (Section 8.4.1), but sometim es the desire to limit the int erface to on e conceptual menu is stron g- for example , when users must select a state from

282 Chapter 8 Fluid Navigation

the 50 states in the United States or a country from an extensive list of possibilities. Typical lists are alphabetically ordered, but categorical lists may be useful. The principles of menu-list sequencing apply (Section 8.4.2).

Scrolling menus, combo boxes, and fisheye menus Scrolling menus display the first portion of the menu and an additional menu item, typically an arrow that leads to the next set of items in the menu sequence. The scrolling (or paging) menu might continue with dozens or thousands of items. Allowing users to type the letter "M" to scroll directly to the first word starting with the letter "M" will reduce manual scrolling, but this feature is not always discovered. Similarly, typing M twice can move to the second word starting with "M". Conibo boxes make this option more evident by combining a scrolling menu with a text-entry field.

Users can type in leading characters to scroll quickly through the list. Another alternative is the fisheye menu, which displays all of the menu items on the screen" at once but shows only items near the cursor at full size; items further away are displayed at a smaller size. Fisheye menus have been made popular by Apple's Mac OS X (Fig. 1.2) and are attractive for menus of 10 to 20 items where the zoom ratio remains small and all items are readable at all times. When the num­ber of items is such that smaller items become u1ueadable, fisheye menus have the potential to improve speed over scrolling menus, but hierarchical menus remain faster (Hornb~k and Hertzum, 2007). Fisheye menus can be an eye­catching option but are not recommended as a default menu style for long lists.

Sliders and alphasliders When the availab le choices are continuous numeri­cal va lues, a slider is a natural choice to allow the selection of a sing le value. Ranges of values can also be selected with double-sided (range) sliders. Users select values by using a finger or pointing device to drag the slider thumb (scroll box) along the scale (see Fig. 1.7). When greater precision is needed, the slider thumb can be adjusted incrementally by clicking on arrows located at each end of the slider. A similar technique that allows users to select a name or category among even large numbers of ordered items is an alphaslider (see Fig. 8.9). Be­cause of their compactness, sliders, range sliders, and alpl-tasliders are often used in the control panels of in teractive visualization sys tems (Chapter 16). When results are available in real time, a sweep of the slider thumb allows rapid comparisons between the results of dozens of choices within seconds (without having to even look at the slider). This would be very tedious with a standard menu that requires users to start the selection process again for each new value.

Two-dimensional mega menus Alternatively, menus that fill all the available space might be used. Two-dimensional mega menus give users a good overview of the choices, reduce the number of required actions, and allow rapid selection. The ease of scrolling on touchscreens has encouraged designers to make heavy use of scrollable two-dimensional 1nenus in webpage design (e.g., http:/ /vvww .pinterest.com or the NASA website; see Fig. 8.10). Website competitions



(All) [ [11[

Player Name

Alfonso Soriano

8.2 Navigation by Selection 283

I I ► I

l ◄ , , , ,[uj, , , , , , , , , , , ► !

An alphaslider (also called an item slider) in the Spotfire visualization tool from Tibco. The alphaslider allows users to select one item from a large number of categorical items and rapid ly step through the other items (http:/ /spotfire.tibco.com) .


....,.l'llt ___ _ _.,...,.... . .... "-Clio ..

111\.l•MNllll'"""""'""--' --~--nN _____ ._ __ _ __ .. .,...., --.i. .... n1o1oc .. ,. •'111••,i~


The NASA website consists of a large scrol lable 2-dimensional menu . Below the main menu , each square or rectangle is a large button . Scro lling gives access to dozens of items easily updated and rearranged. This adaptive grid design scales down nicely to the small displays. On the right, the same page is displayed on an Android phone . The gr id now appears as a single column of items.

284 Chapter 8 Fluid Navigation

(e.g., http:/ /www.awwward.com or http:/ /v.rww.webbyawards.com/) gave the 2015 awards to sites with home pages filled with bright photos and snazzy graphic interspersed with selectable objects. The top section of the webpage that is visible at first (called the topfold or above the fold) remains critical as users 11eed the confidence that the site answers their needs before they start to scroll down (why go further if not impressed?) , but users earl be exposed to potentially hundreds of selectable zones or menu items within seconds of scrolling, which remains entirely in users' control.

In stark contrast, some designers choose the more sober style of a text -only large 2-dimensional menu (e.g., craigslist in Fig. 8.11). Compact text menus allow users to rapidly scan hundreds of choices without dizzying effects or reorientation. This utilitarian solution is appealing for websites with little or no competition (e.g., company intranets) or home pages of sites whose success comes entirely from direct access to their lower -level pages through search

craigslist post to classifieds

my account

search craigslist l~earch !

event calendar .§. M W T F S

2829301234 5 6 7 8 9 10 11 1 12 13 14 15 16 17 18 19 20 21 22 23 24 25

help, raq, abuse, legal

avoid seems & fraud

per.;onal safety Ups

terms of use

privacy policy

system status

about cratgslist

craigsfist is hiring in sf

craigslist open source

craigslist blog


craigslist TV


craig connects


washington, DC w doc nvo mid

community activities artists childcare classes events general groups

local news lost+found musicians pets sx>litics rideshare volunteers

personals stnctty platonic women seek women women seeking men men seeking women men seeking man misc romance casual encounters missed connections rants and raves

discussion forums apple .,,. atheist ·­~uly l>l<es 001e1>• comp cn,lls diet divorce dying aco aduc feedbk film filne$$ fi>dt

help OiStoty housing jobs jokes kink legal llnux m4m manners marriage media money motocy music nonproftl open OUldoor

photo p.o.c. polltlcs psych queer

"""""' renglon romance science spirit spans tax travel Iv vegan w4w wad wine

housing apts I housing housing swap housing wanted office / commercial parking I storage real estate for sale rooms I shared rooms wanted sublets / temporary vacation rentals

for sale antiques appliances arts+crafts atv/utv/sno auto parts baby+kid barter beauty+hlth bikes boats books business cars+trucks cds/dvdl\lhs cell phones clolhes+acc collectibles oomputers electronics

farm+garden free fumlture garage sale general heavy equip household jewelry materials motorcycles music instr photo+video rvs+camp sporling tickets tools toys+games video gaming wanted

automotive beauty

services legal lessons

jobs accountlng+finance admln / office arch I engineering art I media I design biotech / science business/ mgmt customer servioe education rood/ bev / hosp general labor government human resources Internet engineers legal / paralegal manufacturing marketing I pr I ed medical I health nonprofit sector real estate retail / wholesale sales/ biz dev salon/ spa/ fitness S80.Jrity

sldlled trade I craft software I qa / dba systems/ networlc technical support transport tv / film I video web/ info design ,witi ng I editing [ETC] [ part-time ]


( english

nearby cl

allentown altoor,a

annal)Ob betlimore cen1ral nj

chartottesvllle <Unberland val

de!eware eastern sho,e

easlernwv frederick

frederick:sborg harrlsburg

ha.rriSOt\burg jersey shore

lancastar lynchburg

mo,gan1own no-philadefphia

pocon .. reading

richmond southemm d south Jersey state OOllogo westemmd williamsport wincheSlet


us states


cl wortctwlde

The craigslist home page is a text -only, 2-dimensional mega menu. It al lows users to rapidly read hundreds of choices wi t h little or no scro ll ing requ ired. Items are organized h ierarchically. (http://www.craigsl ist.org/)

8.2 Navigation by Selection 285

engines. Similarly, a site map lists every single page of a website and is usefu l as a table of contents.

With such compact text-oriented designs-as well as with all other more graphic-oriented designs - accessibility issues need to be addressed (Fig. 2.1).

Users browsing user -generated content such as pho to or document collec­tions also need to choose among non-curated lists of terms or tags attached to items in the collections. Tag clouds were fashionable until recently as compact 2-dimensional text menus. In tag clouds, the larger the font size of the tag, the more items are available. While attractive and fun, tag clouds are often misin ter­preted because longer tags have more prominence than short ones and users believe that the position in the tag cloud has meaning even when it does not. To address this probl em, tag indexes are now gaining popularity with tags sorted by number of items so users make no mistakes wl1en looking for tl1e tags that have the most items (Fig. 8.12). A horizontal layout may be convenient when the list is long, but arranging the tags vertically will facilitate scanning of the list.

8.2.4 Linear versus simultaneous presentation

Often, a sequence of interdependent menus can be used to guide users through a series of choices. For example, a pizza-ordering interface might include a lin­ear sequence of menus in which users choose the size (small, medium, or large), thickness (thick, normal, or thin crust), and finally toppings. Other fami liar examples are online examinations that have sequences of multiple-choice test items, each made up as a menu , or wizards (a Microsoft term) tl1at steer users through software installation by presenting a sequence of menu options. Linear sequences guide users by presenting one decision at a time and are effective for


Site of The Day CategOty v Tag A Color v Country v -- .. .-, == .n ._.

HTMLS (7301 Oean (592) CSS3 (573) ResponsJve Design (474) Design (43,4) JOuery 1363) Animation (480) Fu\!$Creen (372) Mlrumal (339)

Typography (291) V,deo 1264) e.g Background lrNges <26U Unusu11I NaviQ&lion 125U 1nr,nite Serou C2141 Single p;,ge 1211)

Photography (199) Flexible 1195) Colorful 1188) Parallax (148) Graphic design (146) Scroll (1361 Wordpress (113) Trend 192) Rat Design 191)

Social Media (89) Bright (83) Texture (BlJ WebGl (80) Navigation 179) Icons 176) Retro (461 Vector (40) SVG C3n E~Commerce (34)

CMS (23) App Style 120) Orupal (13) Horizonta l Layout (1lJ CSS Framework (9) Web Fonts (9) WebSocket (7l SEO (3) r-) Popular

FIGURE 8.12 Awwwards.com gives awards to a large number of websites, which are tagged. A tag index at the top of the page displays all the tags sorted by total count. The counts are indicated in parenthesis . The green-colored tags are the popular tags t hat have been selected more often (which most likely will lead to even mo re selection) .

286 Chapter 8 Fluid Navigation

~ • I And greot gear ond clotMng




wetcome to REIi I w.J.Q or ReaiStcr

FREE SHIPPING Wrtll $50 mirimum purchase.


Cam p & Hike Climb Cycle Rtness Run Padd le Snow Travel Men Women Kids Footwear More Deals

SAVE UP TO 30 % Shop the REI 41h ol Ju ly Sa le & Cleara nce throug h July 6 Get the dea ls now '

• Categories

8.ackpacklng Tents ( 160)

Shelters {69)

Camping Tents (60)

Hammock Tents (14)

Bivy Sacks (9)

Ten t Accessories (23 1)

• Sleeping Capacity

1-pcr-son (S6)

2-pcrson (94)

3-person (44)

4-person ( 43)

s-person (1)

6-person { 19)

a+ people (B)

• Brand

ALPS Mountai neering (12)

Big Agnes (127)

Brunton ( 1)

Cadd is (8)

CGear Multim ats (3 )

Coghlan's (9)

Coleman (2)

See 33 More

• Seasons

2-scason (5)

3 - 4-sc.oson (8)

3-sca.son (240)

4-se.uon (30)

FIGURE 8. 13

Results for "tent" (541 matches)

Relevance : : Items per Page : 30 90 · 1 2 3 4 S 6? 8 9 10 .. . 19 t - - -. .. . . ........... .

;--***** (19) ' ***** (6) Big Agnes copper Spur UL 2 Tent . Mllrmot Tungsten JP Tent

$299.99 $ 399 .9 S : $179.99 $249.99

You save 24 ~ : You save 27~

I Compare I ; I compare '


• • • • ***** (14) : ***** (122)

--***** (8) Marmot Tungsten 2P Tent

$149.99 $ 199 .09

You save 24~



***** <146)

Faceted search interfaces allow shoppers looking for tents to narrow the list of results by indicating their choices in the simu ltaneous menus on the left: categories, sleeping capacity, brand, seasons, and so on. Results can be laid out in a row or a grid, and sorting can be done by price or rating. (http://www .rei .com/)

novice users performing simple tasks. They may be the only possible option for a smal l display.

Simultaneous menus present multiple active menus (also called filters) on a screen at the same time and allow users to enter choices in any order. They require more display space; l1owever, experienced users performing complex tasks benefit from simultaneous menus. Faceted search menus are a very powerful application of simul taneou s menus now used extensively in online shopping, library catalogs, and other database searches (see Fig. 8.13 and Chapter 15).

8.3 Small Disp lays 287

8.3 Small Displays

While most designs adapt fairly easily from desktop displays to the larger tablets (once the design has been reviewed for touchability), small displays make most desktop designs impractical, and dumbed-down designs are very likely to fail. Small displa ys require a radical rethinking of what functionalities should be included and often lead to novel interface and menu designs specifically adapted to particular devices and applications.

The smaller the screen, the more temporal the interface becomes (all the way to entirely linear audio interfaces when no display is available). For example, linear sequences of menus are possible, while simultaneous menus are much harder to fit in. On tiny devices (such as watches or fitness wearables), a deck of card menu can be used, where each single tap advances to the next choice and a long press or two-finger press may select the item to access more information. Animated attention-catching ticker menus have also been used. Users don't need to manually scroll or page through the menu items, and with a single touch they ca11 stop the scrolling and select an item in view . On the other hand, having to wait for an item to appear or reappear will be frustrating to some users, especially as the number of items grows.

Temptation is great to include menu items just because they fit, but successful designs limit the number of functions to the most essential ones (Box 8.2 and Fig. 8.14). They may push other features in less accessible parts of the interface, relegate them to counterpart applications on desktop or tablet, or eliminate the features altogether. An often -mentioned rule of thumb for small devices is "less is mor e."

BOX 8.2 Design considerations for small displays.

• Simplify: Less is more.

• Strive to reduce or eliminate data entry.

• Learnability is key.

• Consider use frequency and importance.

• Plan for interruptions.

• Use of contextual information .

• Make clear what is selectable and what is not.

• Leave room for scroll and swipe gestures to avoid inadvertent actions.

• Consider relegating less important functions to other platforms.

288 Chapter 8 Fluid Navigation

FIGURE 8. 14 Small devices have very focused functionalities and few selectable areas. Discoverability is often an issue.

Apps need to be learned in a few seconds or risk being abandoned. Sequenc­ing menu items by frequency of use can be more useful than sequencing by cat­egory or alphabetical order, as speed of access to the most commonly used options is critical . For example, it is likely that flight status and check-in are more common than booking a flight on a mobile device. This can be verified by logging usage data.

Designers also need to allow users to deal with interruptions and distractions in their environment; for example providing an automatic Save function addresses interruption issues (e.g., when the phone rings) and simp lifies the interface.

Concise writing and careful editing of titles, labels, and instructions will lead to simpler and easier- to-use interfaces. Every word count s on a small screen, and even unnecessary letters or spaces should be eliminated. Consistency remains important, but clear differentiation of menu types helps

8.3 Small Disp lays 289

users remain oriented when no context can be provided. Tiny icons are difficult to design and are rarely used, as they take up space and require labels anyway. On the other hand, large color icons, such as those used in car navigation systems or tl1e main screen of most smart phones, can be used successfu lly because they can be recognized at a glance once they have been learned.

Data entry is a difficult challenge for small devices and should be avoided as much as possible. The use of contex tual information such as location (e.g., with a global positioning sys tem [GPS]) or the proximity to objects (e.g., using radio -frequency identification [RFID] tags or scanned QR codes) complemented with simple touch widgets may facilitate the navigation to relevant information. For example, using the current location as default when searching for a hotel using a smart phone will eliminate data entry in many situations. Making all phone numbers and e-mail addresses selectable for easy calls or e-mails, addresses load able in maps, and dates a tap away from calendars can dramatically shor ten navigation tim e . In cer tain cases, hand-off to another larger device may be the best approach (e.g., the login and password function on a watch app can be handed off to be executed on a nearby phone or laptop.)

Position information relative to the u ser's body can also lead to new modes of interaction with menus of small devices. For example, users can move the device in front of them horizontall y or vertica lly to scroll through long lists or pan across maps. Using the back of the device as a touch -sensitive pad might help emich selection mechanisms.

Responsive l'nenus that adapt to different screen sizes remain a challenge. Less important functions can be removed or relegated to other platforms (e.g., delet­ing names in a directory). Different sty lings may allow more buttons to fit in small spaces, but they need to be large enough to allow easy selection on touch­screen devices. Menu labels can be carefu lly abbreviated or replaced by icons on ly. Menus can appear in a different location or be bundled on a separate screen (e.g., using the hamburger menu icon (Fig. 8.7)). One successful strategy is to design for mobile first instead of dumbing down the design for larger disp lays (Wroblewski, 2011).

Designing for older feature phones will open the door to a wider audience­for examp le, in emerging markets (Medhi et al., 2013, Fig. 2.1). Such phones typically have dedicated hard buttons to control the connect and disconnect functions and up and down buttons to navigate lists as well as "soft" keys witl1 matching on-screen labels tl1at change dynamically depending on the context. Soft keys are extremely useful as they allow designe rs to provide direct access to the next-most-logical command at every step. Consistent placement of the commands will speed interaction-for example, user selec­tions on the left side and back or exit options on the right side.

290 Chapter 8 Fluid Navigation

8 .4 Content Organization

Meaningful grouping and sequencing of choices, alor\g with careful editing of titles and labels and appropriate layout design, can lead to easier-to-learn menus and increased navigation speed. In this section, This section reviews the content­organization issues and provides guidelines for design. This area of design has been heavily researched in the context of traditional menus for desktop applica­tions, but most results are useful for website and phone application designs (Krug, 2014). Webpages act as large menus where items are the embedded links or buttons that can be used to navigate to another page.

Some lessons can be learned from restaurant menus. Restaurant menus sepa­rate appetizers, main dishes, desserts, and beverages to help customers organize their selectio11s. Menu items should fit logically into categories and have readily understandable meanings. Restaurateurs who list dishes with idiosyncratic names such as "veal Amelie", unfamiliar labels such as "wor shu op", or vague terms such as "house dressing" should expect that customers will be puzzled or anxious and that waiters will waste time providing explanations. Similarly, for computer menus, the categories should be comprehensible and distinctive so that users are confident in making their selections and have a clear idea of what the resu lt will be. Computer menus become more difficult to design than restau­rant menus when the number of choices and the level of complexity increas e­and there are no waiters to turn to for help .

8.4. l Structure and breath versus depth When a collection of items grows, designers can form categories of similar items, creating a tree structure (Box 8.3). Some collections can be partitioned easily into mutually exclusive groups with distinctive identifiers. For example, the prod­ucts in an online grocery store can be organized into categories such as produce, meat, dairy, cleaning products, and so on. Produce can then be organized into vegetab les, frt1its, and nuts, wh ile dairy is orgai1ized into milk, cheese, yogt1rt, and so on.

Even these groupings may occasionally lead to confusion or disagreement. Classification and indexing are complex tasks, and in many situations, there is no single solution tha t is acceptable to everyo11e. Card sorting exercises are use­ful to engage users and reach an initial design, which can then be refined with usability or A/B testing (see Chapter 5). Over time, as the structure is improved and as users gain familiarity with it, success rates will improve.

Tree-structured menu systems have the power to make large collections of data available to novice or intermittent users. If each menu has 10 items, a menu tree with four levels has the capacity to lead an untrained user through a

8.4 Content Organization 291

BOX 8.3 Rules for forming menu trees. Grouping menu items in a tree such that they are comprehensible to users and match the task structure is sometimes difficult. The problems are akin to putting kitchen utensils in order; steak knives go together and serving spoons go together, but where do you put butter knives or carving sets? Problems include overlapping categories, ext raneous items, conflicting classifications in the same menu, unfa­miliar jargon, and generic terms.

• Use task semantics to organize menus.

• Limit the number of leve ls (i.e., prefer broad-shallow to narrow-deep).

• Create groups of logically similar items: e.g., Level 1: countries, Level 2: states, Level 3: cities.

• Form groups that cover all possibilities: e.g., age ranges: [0-9] (10-19] (20-29] and [>= 30].

• Make sure that items are non-overlapping: e.g., use "Concerts" and "Sports" over "Entertainment" and "Events".

• Arrange items in each branch by natural sequence (not alphabetically) or group related items.

• Keep ordering of items fixed (or possibly dup licate frequent items in dedicated sections of the menu).

collection of 10,000 destinations. That number would be excessive ly large for a word processor but is realistic in a newspaper, a library, or an enterprise web portal.

If the groupings at each level are natural and comprehensible to users and if users know the target, menu traversal can be accomplished in a few seconds-it is faster than flipping through a book. On the other hand, if the groupings are unfamiliar and users have only vague notions of the items that they're seeking, they may get lost for 1-lours in the tree ment1s. Terminology from the user's task domain can help orient the user: Instead of using a title that is vague and empha­sizes the computer domain, such as "Main Menu Options ", use terms such as "Friendlibank Services" or simply "Games".

Menus using large indexes, such as library subject headings or comprehen­sive business classifications, are cl,allenging to navigate, making search a valuable alternative (Chapter 15).

The depth, or number of levels, of a menu tree depends ii, part on the breadth, or number of items per level. If more items are put into the main menu, the tree spreads out and has fewer levels. Tlus shape may be advantageous, but 01'\ly if clarity is preserved. Several authors urge using four to eight items per menu,

292 Chapter 8 Fluid Navigation

but at the same time, they urge using no more than three to four le,,els. With large menu applications, one or both of these guidelines must be compromised.

Many empirical studies have dealt with the depth/breadth tradeoff (Cockburn and Gu twin, 2008), and the evidence is strong that breadth should be preferred over depth as long as users can anticipate target location at each level. The navigation problem (getting lost or using an inefficient path) becomes more and more treacherous as the depth of the hierarchy increases. Of course, screen clutter must be considered in addition to the semai1tic organization. Give11 sufficient screen space, it is possible to show a large portion of the menu structure and to allow users to rapidly point in the flattened tree structure (Figs. 8.6 and 8.11).

Although tree structures are appealing, sometimes network structures are more appropriate. For example, in online shopping, it might make se11Se to provide access to bat'lking information from both the personal profile and the checkout section of a link structure. A second motivation for using menu networks is that it may be desirable to permit paths between disparate sections of a tree rather than requiring users to begin a new traversal from the main menu. It is helpful to provide site maps and to preserve the notion of levels, as users may feel more comfortable if they have a sense of how far they are from the main menu.

8.4.2 Sequence, phrasing, and layout

Sequence Once the items in a menu have been chosen, the designer is still confronted with the choice of presentation sequence. If the items have a natural sequence-such as days of the week, chapters in a book, or sizes of eggs-the decision is trivial. Many cases have no task -related ordering, though, so the de ­signer must choose from either alphabetic order, grouping of related items, and most frequently used items first. Categorical organization is generally prefer­able over alphabetical. Using frequency of use does speed up selection of the topmos t items, but the loss of a meaningfu l ordering for low-frequency items may be disruptive, so it is best limited to small lists. Varying the sequence adap­tive ly to reflect the current pattern of use has been shown to be disruptive, in­creasing confusion and selection time. In addition, users may become anxious that other changes might occur at any moment, undermining the users' learning of menu structures. To avoid disruption and unpredictable behavior, it is wise to allow users to specify if and when they want the menu restructured. A sen­sible compromise is to extract three or four of the most frequently selected items and put them near the top whi le preserving the order of the remaining items. This split-1nenu strategy proved appealing, statistically significantly improved performance, and has been adopted by commercial software (Fig. 8.15).

Adaptable n1enus (i.e., providing users with control over the sequence of menu items) is an attractive alternative to adaptive menus that adapt

Apple Casua l f· .I ll I· B T Tl (]q :,;:: =; ;c;

Font Collect lons ►

Cal1bri l ight (Theme Headings)

Calibri (Theme Body)

.,/ Apple Con.cl

Arial Black

Times New Roin:m

lo .. •w•s

.U.acli NT CoNa.rtsad &t ra lol4

Ab.id, MT c.nd"""d ~ii>'

Ac:idemy Engn,·td LET

AmeMcan Typewriter

Andale Hono

.,/ Appl, Con.o.l


Example of adaptive split menus in M icrosoft Office. A font• selec t ion menu lists the theme fonts and then the recently used fonts near the top of the menu (as well as in the full list), making it easier to quickly select th e popular fonts. A thin line separates the sections.

8.4 Cont ent Organization 293

automatica lly. One study compared the Microsoft Word version using adaptive menus with a variant providing users with the abili ty to swi tch between two modes of operation: the normal full -featured mode and a personal mode that users could customize by selecting which items were included in the menu s (McGrenere et al., 2007). Results showed that participan ts were better able to learn and navigate through the menus witl1 the personal ly adaptable version. Preferences varied grea tly among users, and the s tudy revea led some users' overal l dissatisfac ­tion with adaptive menus but also the relu ctance of others to spend significant time cus tomizing the interface. Novel approaches have used ephemeral adapta ­tion (Findlater et al., 2009) to help users quickly identify imp ortant commands. With thi s technique, a small subset of menu items is immedi ately shown when the menu is displayed, while the remain ­ing items are gradually faded into view over a few hundr ed milli seconds.

Phrasing For sing le menus, a simple descriptive title that identifies the situ ­ation is all that is necessary. For tree-structured menus, choosing title s is more difficult. One helpful rule is to us e the words used for the menu items as the titles for the submenu or next pages. For example, it is reassuring to users to find that when they select "Business and financial services", they are shown a display that is titled "Business and financial services". It might be unsettling to get a display titled "Managing your money", even though the intent is similar. For webpages, a distinctive short title displayed as browser tab label w ill help users return to the page after they visi t other tabs. A distinctive icon improves the tab label as well.

Just because an interface has words, phrases, or sente nces as menu choices is no guarantee that it is comp rehensible or provides adequa te information scent (see Section 3.4 on theories).

Individual words (for example, "expunge") may not be familiar to some users, and often two menu items may appear to satisfy the user's needs when on ly or1e actually does (for example, "disconr1ect" or "eject"). This enduring problem has no perfect solution, but designers can gather useful feedback from

294 Chapter 8 Fluid Navigation

colleagues, users, pilot studies, acceptance tests, and user-performance monitor­ing. The following directives may seem obvious but are listed here because they are so often violated:

• Use farniliar and consistent ter,ninology. Carefully select terminology that is familiar to the designated user community and keep a list of these terms to facilitate consistent use .

• Ensure that items are distinct fro1n one another. Each item should be distinguished clearly from other items. For example, "Slow tours of the countryside", "Journeys with visits to parks", and "Leisurely voyages" are less distinctive than are "Bike tours", "Train tours to national parks", and "Cruise-ship tours".

• Use consistent and concise phrasing. Review the collection of items to ensure consistency and conciseness. Users are likely to feel more comfortable and to be more successful with "Animal", "Vegetable", and "Minera l" than with "Information about animals", "Vegetable choices you can make" and "View­ing mineral categories".

• Bring the key.vord to the fore. Try to write menu items such that the first word aids the user in recognizing and discriminating between items- us e "Size of typ e" instead of "Set the type size". Then, if the first word indicate s that this item is not relevant, users can begin scanning the next item.

Layout While the layout of applications and websites can be assisted by the use of templates and website management tools, designers who establish guidelines for consistency across dozens or hundreds of screens will reduce users' anxiety by offering pr edictability (see Section 3.2). The following elements can be includ ed:

• Titles. Some people prefer centered titles, but left justification is also acceptable.

• lte1n placement. Typically, items are left justifi ed, with the item numb er or letter preceding the item description. Blank lines may be used to separate meaningful groups of items. If multiple colLunns are used, a consistent pat ­tern of numbering or lettering should be used (for example, it is easier to scan down columns than across row s). See also Section 12.2 on disp lay design.

• Instructions. The instructions should be identical in each menu and should be placed in the same position. This rule includes instructions about traversals, help, or function-key usage.

• Error n1essages. If the users make unacceptable choices, the error messages should appear in a consistent position and should use consistent terminology and syntax. Graying out unacceptable choices will help reduce errors.

8.5 Audio Menus 295

Since disorientation is a potential problem, techniques to indicate position in the menu st ructu re can be useful. In books, different fonts and typefaces may indicate chapter, section, and subsection organization. Similarly, in menu trees, as the user goes dow11 tl1e tree structure, the titles can be designed to indicate the level or distance from the main menu. Graphics, fonts, typefaces, or high­lighting techniques can be used beneficially. For example, this set of headers from the Library of Congress collections webpages gives a clear indication of progress dow11 the tree:

BROWSE BY TOPIC Sports, Recreation & Leisure Baseball Baseba ll Cards 1887- 1914

\A/hen users want to do a traversal back up the tree or to an adjoining menu at the same level, they wil l feel confident about what action to take.

8.5 Audio Menus

Audio menus found in interactive voice response (IVR) systems (Lewis, 2010) are useful when hands and eyes are busy, such as when users are driving or testing equipment and are ubiquitous in phone surveys or services and public-access situations that need to accommodate blind or vision-impaired users, such as information kiosks or voting machines.

With audio menus, instruction prompts and lists of options are spoken to users, who respond by using the keys of a keyboard or phone or by speaking. While visual menus have the distinct advantage of persistence, audio menus have to be memorized. Similarly, visual highlighting can confirm users' selec­tions, while audio menus have to provide a confirmation step following the selection. As the list of options is read to them, users must compare each pro­posed option with their goal and place it on a scale from no match to perfect match. To reduce dependence on short-term memory, it is preferable to describe the item first and then give the number. A way to repeat the list of options and an exit mechanism must be provided (preferably by detecting user inaction).

Complex and deep menu structures should be avoided. A simple guideline is to limit the number of cl1oices to three or four to avoid memorizatio11 problems,

296 Chapter 8 Fluid Navigation

but this rule should be re-evaluated in light of the application. For examp le, a theater information system will benefit from using a longer list of all the movie titles rather than breaking them into two smaller, arbitrarily grouped menus. Dial-al1ead capabilities allow repeat users to skip through tl1e prompts. For example, users of a drugstore telephone menu might remember that they can dial 1 followed by Oto be connected to the pharmacy immediately without hav­ing to listen to the store's welcome message and the list of options.

Voice recognition has finally reached an acceptable recognition rate and enables users to speak their options instead of pressing letter or number keys (see Section 9.2). Most systems still use numbered options to allow both keypad and voice entry (e.g., "To hear the options again, press or say nine"), but it leads to longer prompts and longer task-completion times.

To develop successfu l audio menus, it is critical to know the users' goals, make the most common tasks easy to perform rapidly, and keep prompts to a minimum (e.g., avoid permanent "Listen carefully, as our menu options have recen tly changed.") . See Chapter 9, in particular Section 9.2, for more discussion of interactive voice response (IVR) systems.

8.6 Form Fill-in and Dialog Boxes

Selection is effective for choosing an item from a set of choices, but if the entry of names or numeric values is required, typing becomes more attractive. When many fields of data are necessary, the appropriate interaction style is form fill-in (Fig. 8.16). The combination of form fill-ins, menus, and custom widgets such as calendars or maps suppor ts rapid navigation for a vas t array of applications from airline -ticket booking to triage of new patients in the emergency room.

8.6.1 Form fill-in There is a paucity of empirical research on form fill-in, but several design guide ­lines have emerged from practitioners (Jarrett and Gaffney, 2008). Software tools simplify design, help to ensure consis tency, ease main tenance, and speed imple­mentation, but even with excellent tools, the designer must still make mai1y complex decisions.

The elements of form fill-in design include the following:

• Meaningful title. Identify the topic and avoid computer terminology.

• Comprehensible instructions. Describe the user's tasks in familiar termin ology. Be brief; if more information is needed, make a set of help screens available to the novice user. A useful rule is to use the word "type" for entering information

Create an IEEE Account 0

* Required field

Provide your personal Information

* Given/First name:


Middle name:

* Last/Family/Surname:

Set security questions

8.6 Form Fill-in and Dialog Boxes 297

Enter e•mall address & password

The e--mall provided here will be the usemame of your aooount.

* l!:•mail address:


* Re•enter e•mail address:

* Password:

1 ............. .

* Confirm password:

• The e~mall address provided ls not In a valid e•mail format (for example: j.doe@noma il.c.om). Please try again .

• Your p,e9'WOl'd IS good

Passwords must be between 8 and 64 characters, and include at least one number. More ...


For your security, IEEE Aooounts are required to h-ave two security Questions and answers.

* Security question 1:

* Type your answer:

* Security question 2:

* Type your answer:

> Privacy & Opting Out of Cookies


. .


Create Account and continue Joining >Cancel

This form fi ll-in allows users to enter information when joining the IEEE Society . Fields are grouped meaningfully, and field-specific rules such as password requirements are provided next to the fields. The information is validated as it is provided (as opposed to when the form is submitted), and error messages explain how to correct problems (http://www.ieee.org).

and the word "press" for special keys such as the Tab, Enter, or cursor ­movement (arrow) keys. Since "Enter" often refers to the special key with that name, avoid using it in the instructions (for example , do not use "Enter the address"; instead, stick with "Type the address"). Once a grammatical style for instructions is dev eloped, be careful to apply that style consistently.

• Label the fi elds. Place the label in a consistent location (e.g., top or left of the field). A less desirable location is to place labels inside the fields, using a grayed­out font. It saves space, but the labels disappear as soon as users start typing, requiring users to remember what is needed, which often leads to errors.

• Limit data entnJ. Make sure all fields are really needed. Carefully set default value s (e.g., use the current location). This is particularly important for small displays (see Box 8.4)- for examp le, using on ly the zip code instead of the city

298 Chapter 8 Fluid Navigation

BOX 8.4 Additional form fill -in guidelines for small displays.

• Include only critical data fie lds.

• Break long forms in multiple smaller ones .

• Use sensib le defaul ts (e.g., current location or date) .

• Place short labels on top of the fie lds, not to their left.

• Set the touch keyboard to match the data (e.g., numeric keyboard to enter a number) .

and state. Maybe only a single phone number is enough, instead of asking for several alternatives. Some fields may be removed entirely and reserved only for large devices.

• Explanatory messages for fields. Information about a field ( e.g., "Your e-mail ad ­dress will be the user name of your account") or its permissible values should appear in a standard position, such as next to or below the field, preferably using a different font and style.

• Error prevention. Where possible, prevent users from er1tering incorrect values. For examp le, in a field requiring a whole number, do not allow the user to enter letters or decimal points.

• Error recovery. Summarize errors at the top of the page. Highlight errors in the form. If users enter unacceptable values, indicate permissible values for the field; for example, if the zip code is entered as 281<21 or 2380, the message might be "Zip codes should have 5 digits".

• Immediate feedback. Immediate feedback about errors is preferable. When feedback can be provided only after the entire form has been submitted, the location of the field needing correction should be made clearly visible (for example, by displaying the error message in red next to the field in addition to general instructions at the top of the form).

• Logical grouping and sequencing of fields. Related fields should be adjacent and should be aligned with blank spaces for separation between groups. The se­quencing should reflect common patterns - for example, city followed by state followed by zip code.

• Visually appealing layout of the form. Alignment creates a feeling of order and comprehensibility. For example, the field labels "Name", "Address", and "City" can be right-justified so that the data-entry fields are vertically aligned. This layout allows the frequent user to concentrate on the entry fields and to ignore the labels.

8.6 Form Fill-in and Dialog Boxes 299

• Fa,niliar field labels. Common terms should be used. If "Home Address" were replaced by "Domicile", many users would be uncertain or anxious about what to enter.

• Consistent ter1ninology and abbreviations. Prepare a list of terms and acceptable abbreviations and use the list diligently, making additions only after care­ful consideration. Instead of varying between such terms as "Address", "Emp loyee Address", "ADDR.", and "Addr.", stick to one term, such as "Address".

• Visible space and boundaries for data-en.try fields. Users should be able to see the size of the field and to anticipate whether abbreviations or other trimming strategies will be needed. An appropriately sized box can show the maxi­mum field le11gth.

• Convenient cursor movement. Provide a mechanjsm for moving the cursor be­tween fields using the keyboard, such as the Tab key or cursor-movement arrows.

• Required fields clearly marked. For fields that must be filled in, the word "Re­quired" or some other indicator (e.g., *) should be visible. Optional fields sho uld follow required fields whenever possible.

• Privacy and data sharing inforn1ation. Users will be anxious sharing their personal information and want to know how the data will be used and who will have access to it.

• Accessibilih;. For example, make sure the forms are navigable with a screen reader.

• Completion signal. It should be clear to the users what they must do when they are finished filling in the fields. Generally, designers should avoid automatic form subm ission whe n the final field is filled in because users ma y wish to review or alter previous field entries. When the form is very long, multiple Submit or Save buttons can be provided at different points in the form.

These considerations may seem obvious, but often designers will omit the title or an obvious way to signal completion or will include unnece ssary computer file names, strange codes, unintelligible instructions, unintuitive groupings of fields, cluttered layouts, obscure field labels, inconsistent abbre­viations or field formats, awkward cursor movement mechanisms, confusing error-correction procedures, or hostile error messages.

8.6.2 Format-specific fields Using custom widgets and direct-manipulation interaction techniques can facil­itate data entry and reduce errors. Calendars can be used to enter dates, seating maps can help users select airplane seats, and menus using photographs might clarify choices of pizza style.

300 Chapter 8 Fluid Navigation

Apps for touchscreen devices need to open the keyboard with the appropri­ate preset; for example, when a number is requested, the numerical keyword should appear by default. For e-mail addresses, the"@" and"." buttons need to be sllown. For URLs, the":" and"/" will be handy.

Alphabetic fields are customarily left-justified on entry and on display. Numeric fields may be left-justified on entry but then become right-justified on display. When possible, avoid entry and display of leftmost zeros in numeric fields (with zip codes being an exception). Numeric fields with decimal points should line up on the decimal points.

Pay specia l attention to such common fields as these:

• Telephone nu1nbers. Offer a form to indicate the subfields:

[ Text_M_e __ __ :_,) * ( 301 ) I ::J-If outside the U.S. 011 -

Be alert to special cases, such as the addition of extensions or the need for nonstandard formats for international numbers. When the user has typed all the needed digits of a field, the cursor should jump to the leftmost position of the next field.

• Dates. Providing a pop-up graphical calendar showing the current month will reduce the number of errors in some cases, but users may still want to type in the numbered field if moving the calendar to the correct date requires a large number of clicks (e.g., to enter a date of birth). Different formats for dates are appropriate for different tasks, and European rules differ from American rules. An acceptable standard may never emerge. Instructions need to show an example of correct entry. For example:

Date: _ _ I __ I ____ (04/22/2016 i ndicates April 22 , 2016)

For many people, examp les such as this one are more comprehensible than abstract descriptions like MM/DDIYYYY.

• Tin1es. Even though the 24-hour clock is convenient, many people in the United States find it confusing and prefer a.m. and p.m. designations.

• Dollar amounts (or other currenet;). The currency sign should appear on the screen so users enter only the amount. If a large number of whole-dollar amounts are to be entered, users might be presented with a field such as

Deposi t amount: $ _____ . __

8.6 Form Fill-in and Dialog Boxes 301

with the cursor to the left of the decimal point. As the user types, the numbers should shift left, calculator style. To enter an occasional cei1ts amount, the user can place the cursor on the right field (but remember that countries have different conventions for entering numbers - for example, many countries use a comma instead of a decimal point).

• Pass1.vords. When asked to type a password, users also need a means to re­trieve or change the password if they have forgotten it, but it is also impor­tant to avoid malicious use of that functionality. Designers who work with a security team will reach a higher level of security that matches the im­portance of the data and application (Bonneau et al., 2015; Shay et al., 2015) (Box 8.5). For examp le, two-factor identification (e.g., passv.rord and a code sent to a separate device) is strongly recommended for a bank application or an e-mail password change, but users will be annoyed if such procedures are required for unimportant accounts with little or no personal informa­tion. When asking users to create a new password, having them enter the password twice helps users catch typos and provides an opportunity to practice typing the password just created. Providing guidance and explana­tions of why a proposed password is not acceptable will ltelp users generate stronger passwords (with possibly a meter that reflects the strength of the password).

• CAPTCHAs. A CAPTCHA (acronym for Completely Automated Public Tur­ing test to tell Computers and Humans Apart) requires users to type text pre­sented graphica lly to be illegible to computers. Including an audio option is necessary to make the CAPTCHA accessible to users with visual impair­ments. Newer versions observe user behavior with the CAPTCHA to predict whether a human or a robot is interacting (Fig. 8.17).

BOX 8.5 Guidelines for password creation.

• Use two-factor authentication for secure accounts.

• Indicate the rules for password creation.

• Ask for the password to be entered a second time.

• Hide the password with **** by default for privacy.

• Provide an option to unhide the password.

• Provide feedback encouraging strong password selection.

302 Chapter 8 Fluid Navigation

✓ I'm not a robot reCAPTCIIA

[ I'm not a robot

Privacy • T etmS

lrype the text

C O (D

FIGURE 8.17 Google introduced a new reCAPTCHA in 2014. Observing the interaction, it predicts whether a human or a robot is clicking on the box but presents a more difficult CAPTCHA when in doubt. An audio version can play hard •to •understand words instead of the visual hard•to•read text.

8.6.3 Dialog boxes Man y tasks are int errupt ed to requ est u sers to select option s, perform limit ed data entr y, or rev iew alerts and error messa ges (see Section 12.8). The most com ­mon solution is to pro vide a dialo g box (Fig . 8.18).

Dialog-box design combin es menu -selection and form fill-in issues with ad.di­tional concerns about consistency across pot entially hundr eds of dialog boxes and relationships with other items on the screen. A guidelines document for dialog boxes will help strive for consistency. Dia log boxes should ha ve meanin gful titles to identify them, and the titles should have consistent visual properties. Dialog boxes are often shaped and sized to fit each situation , but distinctive sizes or aspect ratios may be used to signal errors, confirmations , or components of the application.

FIGURE 8. 18

Oelt-te File

'~ Attyou wreyou w,11nt to moon this file to the A.tcyc:le 8,n? (l,

UXGutde Type Tm. Document SizeU.8 KB Date modified: 4/26/2007 12:50 AM

This dialog box includes a binary menu with two choices ("Yes " and "No"). The blue highlighting on Yes indicates that this selection is the default and that pressing Return will select it. Specific keyboard shortcuts can be made avai lable . Escape closes the dialog box. Typing the letter "N " will select No, as indicated by the underlined letter "N".

8.6 Form Fill-in and Dialog Boxes 303

Since dialog boxes usually pop up on top of some portion of the screen, there is a danger that they will obscure relevant information. Therefore, dialog boxes should be as small as is reasonable to minimize the overlap and visual disruption. Dialog boxes should appear near, but not on top of, the related screen items: When a user clicks on a city on a map, the dialog box about the city should appear just next to the click point. The classic annoying example is to have the Find or Spell Check box obscure a relevant part of the text. When multiple large displays are used, placing the dialog box in mt1ltiple locations simultaneo usl y can result in faster interaction (Hutchings and Stasko, 2007).

Dialog boxes should be distinct enough that users can easily distinguish them from the background but should not be so harsh as to be visually disruptive. On desktop compu ters, keyboard shor tcuts are essentia l to speed tl1e response to dialog boxes. A common convention is to use Escape to cancel and close the dia­log box and Enter to select the default command when appropriat e. Dialog boxes do not always require users to answer or close them (e.g., the Find box in many applications can remain open after the search is performed). Modal dialog boxes require users to indicate their choice immediately, but non-modal dialog boxes allow users to continue their work and return to the dialog box again at a later time. When an alert is critical, dialog boxes may require immediate atten ­tion (Fig. 8.19) (https:/ /sbmi. uth. edu/ nccd/SED/Briefs/sedb-m u03.htm).

Signal Word

Provide clear visual cues and type of alerts

Nature of hazard

Provide succinct reason for the alert


0 Provide a list of actions to respond t the alert

User Feedback

Provide ability to capture user feedback




" -

& WARNING! Drug - Drug Interaction

Warfarin - Aspirin Increased risk of bleeding @guidelines


Aspirin Keep Aspirin, do not order Warfarin

Warfarin Keep Warfarin, cancel Aspirin

Over de Order both Warfarin and Aspirin □Confirm ovemde

Check INR frequently and advise patient for warning signs ol bleeding

Cancel e!lrr'.ld!! feedback QD Ibis a111n

This dialog box is used to alert clinicians who try to prescribe the drug Warfarin because it increases the risk of bleeding in patients already on aspirin. Several possib le actions are proposed. Overriding the alert is possible but requires confirmation by clicking a check box. Because of the severity of the alert, this is a moda l dialog box and requ ires immediate action.

304 Chapter 8 Fluid Navigation

When tasks are comp lex, multiple dialog boxes may be needed, leading some designers to choose to use a tabbed dialog box in which two or more pro­truding tabs in one or several rows indicate the presence of multiple dialog boxes. This teclmique carries with it the potential problem of too much frag­mentation; users may have a hard time finding what they want underneath the tabs. A smaller number of larger dialog boxes may be advantageous, since users usually prefer doing visual searches to having to remember where to find a desired control.

Practitioner's Summary

Designers who focus on organizing the stru cture and sequence of menus are more likely to match the users' tasks, priorities, and environment. If each menu is a meaningful, task-related unit, then the individual items will be dis­tinctive and comprehens ible. Favor broad and shallow hierarchical menus. For users who make frequent use of the sys tem, shortcuts and gestures will greatl y increase the speed of interaction. Permit simple traversals to the pre, ,i­ously displayed menu and to the main menu. Remember that audio menus and menus designed for small devices require careful rethinking of what functions to include. For such menus, carefully limit the nt1mber of items, and consider frequency of use as a criterion for sequencing menu items. Gestures are use­ful for fluid interaction but are hard to discover and learn and often require complemen tary means of interaction. Consider direct-manipulation graphical widg ets such as calendars or maps to facilitate data entr y with form fill-in. Such widgets, along with immediate feedback and dynamic help, will help reduce errors and speed data entry.

Be sure to conduct usability tests and to involve human-factors specia lists in the design process. When the interface is implemented, collect usage d.ata, error statistics, and subjective reactions to guide refinement. Consider user-adaptable menu designs.

Researcher's Agenda

Experimental research could help to refine the design guidelines concern­ing organization of menus. How can differing communities of users be satis­fied with a common organization when their information needs are markedly different? Should users be allowed to tailor the structure of the menus, or is there greater advantage in compelling everyone to use the same structure and

Researcher's Agenda 305

terminology? Should a tree structure be preserved even if some redundancy is introduced? What's the best way to progressively introduce new users to large menu structures? How can users be encouraged to discover and learn new gestures or keyboard shortcuts? What further improvements will speed menu selection on small and very large displays? Can better guidance and feedback during password creation improve usability and security?

Research opportunities abound, and the quest for novel menu-selection strat­egies for small and large displays continues. Implementers would benefit from advanced software tools to automate the organization of menus (e.g., Bailly and Oulasvirta, 2014) and facilitate the design of responsive menus and their evolu­tionary refinements.


www. pearsonglobaleditions . com/shneiderman

• An extensive review of menu techniques: http://www.gillesbailly.fr/menua/

• Major suppliers describe the use of gestures in their guidelines: Google's Android, Apple's iOS, and Microsoft's Windows 8: https://www.google

.com /design/spec/patterns/gestu res. htm I,

https://developer.apple.com/library/ios/documentation/UserExperience /Concept ua 1/M obi I eH I G/1 nteract ivityl nput. htm 1#//a pple _ref/doc/u id /TP40006556-CH55-SW1,

https://msd n. m icrosoft.com/en-us/1 i bra ry/wi ndows/desktop /dd940543(v=vs.85) .aspx

• Stories of "less is more" for mobile devices: http://www.fastcompany .com/18 16610/sha ri ng-app-bu m p-30-slashes-most-featu res-proves-less­

rea I ly-can-be-more

• Design patterns suggested by the UK government: https://www.gov.uk /service-man ua 1/user-centred-desig n/resou rces/patterns

• Design winners in various categories (website, tablet, smart phone, etc.): http://www.awwwa rds.com

• Website accessibility example: http://www.raisingthefloor.com

The most interesting experience is browsing the web to see how design­

ers have laid out menus or form fill-ins in online commerce, government websites, and intranets.

306 Chapter 8 Fluid Navigation

Discussion Questions

1. A telephone-based menu system is being designed for a maga zine subscrip­tion service system. There are seven magazines available- National Geo­graphic, Travel and Leisure, Entrepreneur, Tirne, Golf, U.S. News & World Report, and Fortune. Describe three reasonable orderings of the voice menus and jus ­tify each.

2. What are the elements of form fill-in design?

3. Design a touch screen music jukebox, which allows the user to select from a menu of the five most popular songs of the week. Draw a sketch of this in­terface for each of the following menu types - binary menu, multiple-item menu, check boxes, pull-down menus. Argue which design serves the user best.

4. You are in charge of designing a menu tree for navigating 1,250 books in a digital library. Present an argument of whether the menu should have larger depth (number of levels) or breadth (number of items per level).

5. Frequent menu users can become annoyed if they must make severa l menu selections to complete a simple task. Suggest two ways you can refine the menu approach to accommodate expert or frequent users.

6. When users are navigating through a menu st ructure, they may become dis­oriented. The authors suggest techniques to help alleviate this disorientation such as indicating the current position in the menu. Draw a sketch of how you can show users their position for an 01-tline car showroom, assuming the user has browsed with the following path:

Main Menu - Mid-size Cars - Honda - Accord

7. Data entry is challenging for small devices. What are some of the ways i11 which this issue can be addressed?

References 307

8. Crit ique the des ign of the d ialog box below. This dialog box is used to alert clinicians who try to prescri be the drug Warfarin because it increases the risk of bleeding in patients already on Aspirin.

Signal Word

Provide clear visua l cues and type of alerts

Nature of hazard

Provide succinct reason for the alert


0 Provide a list of actions to respond t the alert

User Feedback

Provide ability to capture user feedback




,,. ~


&WARN ING! Drug - Drug Interaction

Warfarin - Aspirin Increased risk of bleeding @g!.!ideli □e~


Aspirin Keep Aspirin, do not order Warfarin

Warfarln Keep Warfarin, cancel Aspirin

Over de Order both Warfarin and Aspirin D Confirm ovemde

Check INR frequently and advise patient fo, warning signs ol bleeding

Cancel PrQ!lld!! feedback Q!l !his i!l!!r:l

Bailly, G., Lecolinet, E., and Nigay, L., Visual menu techniques, Researc h Report hal-01258368, Te lecom Paris Tech (2016) https: / /hal.archives -ouvertes.fr /ha l-01258368

Bailly, G., and Oulasvirta, A., Toward optimal menu design, Interactions 21, 4 (2014), 40-45.

Bonneau, J., Her ley, C., van Oorschot, P. C., and Stajano, F., Passwords and the evo lu­tion of imperfect authentication, Co,nmunications of the ACM 58, 7 (2015), 78- 87.

Cockburn, A., Gutwin, C., Scarr, J., and Malacria, S., Supporting novice to expert transitions in user interfaces, ACM Con1put. Surv. 47, 2 (2014), 36 pages.

Cockburn, A., and Gutwin, C., A predictive mode l of human performance w ith scrolling and hierarchical lists, Hu1nan Computer Interaction 24, 3 (2008), 273-314.

Elmqvist, N., Vande Moere, A., Jetter, H.-C., Cemea, D., Reiterer, H., and Jankun-Kelly, T., Fluid interaction for information visualization, Infor111ation Visualization 10, 4 (2011), 327- 340.

Find later, L., Moffat t, K., McGrenere, J., and Dawson, J., Ephen,eral adap tation: The use of gradual onset to improve menu selection perforn1ance, Proceedings of the SIG­CHI Conference on Hu,nan Factors in Con1puting Systems, ACM Press, New York (2009), 1655- 1664.

308 Chapter 8 Fluid Navigation

Gutwin, C., Cockburn, A., Scarr, J., Malacria, S., and Olson, S. C., Faster command selec­tion on tablets vvith FastTap, Proceedings of the SIGCHI Conference on Human Factors in Co1nputing Syste-ms, ACM Press, New York (2014), 2617- 2626.

Hornbrek, K., and Her tzum, M., Un tangling the usabil ity of fisheye menus, ACM Transactions on Cornputer-Hiunan Interaction 14, 2 (2007), 6.

Hutchings, D. R., and Stasko, J., Consistency, multiple monitors, and multiple windows, Proceedings SIGCHI Conference on Hun1an Factors in Con1puting Syste-ms, ACM Press, New York (2007), 211- 214.

Jarr ett, C., and Gaffney, G., Forms That Work: Designing Web Fonns for Usability, Morgan Kaufmann (2008).

Koved, L., and Shneiderman, B., Embedded menus: Menu selection in context, Comn·zunications of the ACM 29 (1986), 312- 318 .

Krug, S., Don't Make Me Think: A Conunon Sense Approach to Web and Mobile Usability, New Riders (2014).

Lewis, J., Practical Speech User lnte-rface Design, CRC Press (2010) .

Malacria, S., Bailly, G., Harrison, J., Cockburn, A., and Gutwin, C., Promoting hotkey use through rehearsal with ExposeHK, Proceedings of the SIGCHI Conference on Hun1an Factors in Cornputing Systerns, ACM Press, New York (2013), 573- 582 .

McGrenere, Joanna, Baecker, Ronald M., and Booth, Kellogg S., A field evaluation of an adaptable two-interface design for featur e-rich software, ACM Transactions on Con1puter-Human Tnteraction 14, 1 (2007), 3.

Medhi, I., Toyama, K., Joshi, A., Athavankar, U., and Cutrell, E., A comparison of list vs. hierarchical Uls on mobile phones for non-literate users interface layout and data entry, Proceedings of JFJP 1NTERACT'13: Hurnan-Con1puter Interaction 2 (2013), 497-504 .

Oh, U., and Findla ter, L., The challenges and pote n tial of end-user gestur e customization, Proceedings of SIGCHI Conference on Hiunan Factors in Cornputing Sys terns, ACM Press, New York (2013), 1129-1138.

Shay, R., Bauer, L., Christin, N., Cranor, L. G., Forget, A., Komanduri, S., Mazurek, M. L., Melicher, W., Segreti, S., and Ur, B., A spoonful of sugar? The impact of guidance and feedback on password -crea tion behavior, Proceedings SIGCHI Conference on Hu1nan Factors in Cornputing Systerns, ACM Pre ss, New York (2015), 2903-2912.

Wigdor, Dani el, and Wixon, Dennis, Brave NUI World: Designing Natural User Interfaces for Touch and Gesture, Mo rgan Kaufmann, San Fran cisco, CA (2011).

Wrob lewski, L ., Mobile First, A Book Apart (2011).

Zhai, S., Kristensson, P. 0., Appert, C., Andersen, T. H., and Cao, X., Foundational issues in touch-screen stroke gesture design: An integrative review, Foundations and Trends in Human-Camputer Interaction, The essence of knowledge, 5, 2 (2012), 97- 205.

,,,,,. - ---,,,,


• •

•• I soon felt that the forms of ordinary language were far too diffuse . . . . I was not long in deciding that the most favorable path

to pursue was to have recourse to the language of signs. It then became necessary to contrive a notation which ought, if possible,

to be at once simp le and expressive, easily understood at the com- ,, mencement, and capable of being readily retained in the memory.

Charles Babbage "On a Method of Expressing by Signs the Action of Machinery," 1826

CHAPTER OUTLINE 9. 1 Introduction

9 .2 Speech Recognition

9 .3 Speech Production

9 .4 Human Language Technology

9.5 Traditional Command Languages


312 Chapter 9 Expressive Human and Command Languages

9 .1 Introduction

The dream of speaking to compu ters and having computers speak has long lured researchers and visionaries. Arthur C. Clarke's 1968 fantasy of the HAL 9000 computer in the book and movie 2001: A Space Odyssey has set the standard for performance of computers in science fiction and for developers of natural language sys tems . The reality is more complex and sometimes more frustrating than the dream, but much-improved speech recognizers have now joined the well-established speech telephone-ba sed menu applications to reach a wide array of applications. Errors remain a significant challenge, and not all situa­tions benefit enough from speech input to balance the cost of errors and the frustration of error correction. Once the commands, questions, or statements have been recognized, human language technologies may be needed to execute the appropriate action, initiate a clarifying dialogue, or provide translations.

Some applications simu late natural language interaction. They require users to speak a restricted set of the spoken commands that users have to learn and remember. Similarly, some textual interaction systems rely on the availability of vast text repositories that can be searched using standard search algorithms to find answers to questions written in full sentences. Repositories of translated text, such as the multiple language translations from the United Natio11s, can also help make good-quality translations of words, snippets, or full sentences.

See also:

Chapter 14, Documentation and User Support (a.k.a. Help)

Chapter 15, Information Search

The use of command languages in the early days of computing (e.g., DOS or Unix) receded with the advent of graphical user interfaces. However, command languag es are stilJ widely used by expert users of specialized applications from computer programmers to the millions of engineers and scientists using tools like MATLAB®, which combine a command language and graphical environ­ment. In fact, one could argue that the spread of speech interfaces is re-invigorating the development of command languages as designers choose which combina­tions of words will be recognized as commands in speech interfaces.

While understanding natural language remains an unattainable dream, there are many applications that can successfully make use of the words people say, type, or listen to (Box 9.1).

This chapter starts '"'ith the rapidly growing speech interfaces (from speech recognition in Section 9.2 to speech production in Section 9.3) and then discusses

9.2 Speech Recognition 313

BOX 9.1 Speech technolog ies.

• Store and replay (museum guides)

• Dictation (document preparat ion, web search)

• Close captioning, transcription

• Transactions over the phone

• Persona l "assistant" (common tasks on mobi le devices)

• Hands-free interaction with a device

• Adapt ive technology for users with disabilities

• Translation

• A lerts

• Speaker identification

human language technologies (Section 9.4) including trans lation educationa l applications. Finally, Section 9.5 reviews the traditional, yet expressive, com­mand language interfaces.

9 .2 Speech Recognition

Speech recognition has made significant progress in recent years (Huang et al., 2014) and is now being used in a number of welJ-targeted knowledge domains such as airline information, lost luggage, medical-record data entry, and persona l digital assistants (Cohen 2004; Karat et al., 2012; Pieraccini, 2012; Bouzid and Ma, 2013; Neustein and Markowitz, 2013; Mariani et al., 2014). Driven by the diffi­culty of typing while using mobile devices (phones or touch tablets), spoken input has gained acceptabi lity. More users learn to use spoken commands such as "Where is the closest coffee shop?" or "Tell John I will be late." Discoverability and leamability are often an issue, but commands can be spoken without looking at the screen, while driving a car (equipped with a hands-free phone), or while hik­ing 011 a bumpy trail. However, commands such as "Make space in my drive" are still a great challenge and would require extensive dialog design (see Section 9.4 on human language technology). Improved recognition rates are making dicta­tion and transcription possible, but error correction remains a challenge, and most applications require users to learn and remember complex sets of com­mands to accomplish their tasks. Background noise and variations in user speech performance make the challenge of speech recognition still greater.

314 Chapter 9 Expressive Human and Command Languages

9.2.1 The place for spoken interaction While speech recognition is used successfully in a growing number of applica­tions, the vision of comp ut ers cha tting leisurely wi th users about var ied open­ended topics remains more of a fantasy than a reality. Whi le HAL 9000 of 2001: A Space Odyssey communicated with the ship crew mostly by voice, newer science-fiction writers have shifted their scenarios, with reduced use of spoken interactio n in favor of larger visua l displays and gestures, from Star Trek: Voyager to Minority Report and Avatar or Mission Impossible 4. Voice interaction with emo­tion-evoking robots remains a theme in movies such as Her and Ex Machina.

While early applications of speech recognition were mostly limited to dis­crete-word recognition (with extensive training for the system to learn a par­ticular user 's voice), the major breakthrough in the past decade has been the impro vement of continuous-speech recognition algorithms and the avai lability of very large repositories of voice data on the web, which can be ana lyzed to train algorithms. The other significant advance that made speech recognition possible on mobile devices is the ability to process the spoken input remotel y and quickly enough for rapid interaction. Reduced training (or its elimination with speaker-independent systems) has greatly expanded the scope of com­mercial applications. Quiet environments, head-mounted high-quality micro­phones, and careful choice of vocabu laries improve recognition rates in all cases. Low-cost speech chips and compact microphones and speakers enable designers to include speech systems in higll-volume products, such as dolls and other toys.

Applications are successful when certain condi tions exist (see Box 9.2) and when they serve users' needs to work rapidly witl1 low cognitive load ai1d low error rates. Even as technical problems are being solved and the recognition rates are improving, spoken commands are more demanding of users' working memory than is hand/eye coordination and thu s may be more disruptive to users while they are carrying out tasks . Speech requires use of limited resources, while hand / eye coordination is processed elsewhere in the brain, enabling a higher level of parallel processing. Planning and problem solving can proceed in parallel with hand/eye coordination, but they are more difficult to accom­plish while speaking (Radvansky and Ashcraf t, 2013). In shor t, speaking is more demanding than many advocates of speech recognition report.

Early applications include systems for aircraft -engine inspectors, who wear wireless microphones as they walk around the engine, their hand s busy open­ing cover plates or adjusting componen ts. They can issue orders, read serial numbers, or retrie ve previous maintenance records by Ltsing limited ,,ocabu ­lary. As in all speech input systems, they can be disruptive to others who find the noise a serious distraction.

The benefits of speec h recognition to peop le with physical or visua l disabili­ties, even temporary ones, are rewarding to see (Fig. 9.1). Its va lue during mobile

9.2 Speech Recognition 315

BOX 9.2 Speech recognition and production : Opportunities and obstacles.


• When users have physical impairmen ts

• When the speaker's hands are busy

• When mobility is required

• When the speaker's eyes are occupied

• When harsh or cramped conditions preclude use of a keyboard

• When application domain vocabulary and tasks are limited

• When the user is unable to read or write (e.g., children)

Obstacles to speech recognition

• Interference from noisy environments and poor-quality microphones

• Commands need to be learned and remembered

• Recognition may be challenged by strong accents or unusual vocabulary

• Talking is not always acceptable (e.g., in shared office, during meetings)

• Error correction can be time-consuming

• Increased cognitive load compared wi t h typing or pointing

• Math or programming difficult without extreme customization

Obstacles to spe ech production

• Slow pace of speech output when compared with visual displays

• Ephemeral nature of speech

• Not socially acceptable in public spaces (also privacy issues)

• Difficulty in scanning /searching spoken messages

use can be sign ificant for users who take the time to learn and remember what can be accom plished wi th spoken comm ands, but general users of office or per­sonal comput ers are not rushing yet to adopt speech input and output devices .

9 .2.2 Speech recognition applications For de signers of huma n-computer interaction sys tems, speech recogni tion tech­nologies have man y var iations, which can also be combin ed pro du ctively with speech pr odu ction (Li et al., 2015).

The goal of speech recogni tion is prim arily to produce text based on spoken input (Lewis, 2011), the most straightforwa rd application being dictation. Dictation sys­tems have now reac hed recogniti on rates that are accept able in many situations

316 Chapter 9 Expressive Human and Command Languages



♦ C I

RMnn-h "iumm:111')' : (200 1 _., - .,_ ._ , ... ,,,._ ,. ...,.. wor<borl~u)

lnt~rhb; (200 ""m'(b or le» )

,, SCJCUI lit,.._..,_

s~ I ,,..., It ... r-•• -4 .- -' ... . ....... .._ ..

°'"'' PtiD Gra• ll• g Ia, tlt• llon: r~-----PtiD \' ~:rir: r l

- w •

----- * .



Using Dragon TM speech dictation and a head mouse (as made visible by the little silver dot on his forehead), a computer scientist is able to overcome a temporary hand disabi lity.

(e.g ., Google Docs' Voice Typing). They allow users to compose a document or speak search terms such as "movie theater in college park" and then correct mis­takes with the keyboard instead of typing all the text. It can be a big time-saver with mobile devices, but keyboards, function keys, and pointing devices with direct manipulation often remain more rapid, depending on the quality of the recognition and the context of use (mobile or not), user's typing abili ties, vocab­ulary complexity, nonnative speaker, and so on.

Ironicall y, technical fields with a lot of jargon are good candidates for speech recognition because of the distinctive natur e of the terminology and the often­cons train ed documentation needs. For example, specialized systems for medical workers have become commercial successes (Nuance's Dragon ® Medical, also embedded in electronic health record systems such as Cemer PowerChart Tou ch1M). Dictation may be able to handl e large vocabular ies but inevi tabl y requires specialized termjnology to be usefuJ to medical practitioners or lawyers.

While dictation is becoming practical, the cognitive burdens of dictation interfere with planning and sentence formation. In dictation, users may experi­ence more interference between outputting their initial thought and elaborating on it. Spoken language may also be too informal compared with carefully typed sentences.

Speech recognition can also be used to transcribe recorded audio materials, either in real time or in a delayed fashion. It can facilitate closed captioning of radio or televis ion programs or transcription of court proceedings or lectures.

9.2 Speech Recognition 317

Some applications may be beneficial even when there are errors. For example, errors will be irritating but acceptable for most television or YouTube viewers, but the payoff is that searching becomes possible. Where exact spelling is required, such as with person or place names, careful checking and error correc­tion must be provided.

The other large category of speech recognition use is to allow users to speak con11nands that the user inter face is trained to recognize effectively. This includes completing tran sactions over the phone, interactir1g with a device whe11 direct manipulation is not convenient or possible, and using specia lized voice services or "assistants." Dictation without using a keyboard will also require the use of commands to correct errors, start a new paragraph, or request the possibility to spell a name.

Specialized voice services or "personal assistants" like Siri, Google Now, Cortana, and Hound have become the more visible use of speech recognition. Because mobile interaction makes the use of keyboards impractical, speech becomes attra ctive to allow users to speak commands that execute the most common tasks performed on those devices, such as finding a location of interest, setting reminders, calendaring, communicating with others, or launchin g apps. Since the arrival of Apple's Siri in 2011, the competition has been fierce as companies compe te to provide the most flexible and reliable services . The aim is to allow natural language, but users are often left wondering wha t they can say to get reliable results. This habitability problem remains a key problem, but logs of failed recognition facilitate efforts to broaden the acceptable inputs. Speaking assistants are now widely available, but many users never use them; others use on ly the few commands that they have learned and can remember; a smaller number of users can impress friends with apparent magic by having learned all the tricks. Companies do not report how much the assistants are being used, and while the demonstrations are impressive, the comparison tests often reveal problems (Ezzohari, 2015). Heavy use of spoken com mand s might be compared with the heavy use of keyboard shortcuts in traditional desktop users: not for everyone, but experts who ha ve mastered them cannot live without them.

Speech is now widely used to complete transactions or access a serv ice over the phone, for exampl e, to report an electrical outage, trade stocks, or track lost luggage. These phone services, also called information voice response (IVR) sys­tems, enable large financial savings for companies and provide 24/7 services for consumers (Lewis, 2011). Voice prompts welcome users and indicate what choices are possible. Users respond by pressing a number or speaking the word or shor t phrase that match their choice. Simple IVRs can be seen as an audio menu (see further discussion in Section 8.5).

A particularly challenging application is the translation of speech to facilitate human communication, such as foreign travelers or soldiers who must commu­nicate it1 a language that they do not know well . Other emerging uses of speech recognition include the rapid spotting of specific words or topics in videos or

318 Chapter 9 Expressive Human and Command Languages

telephone calls or speaker verification (also called voice biometrics). While users answers questions, the system verifies that they are who they claim to be. How­ever, ensuring robust performance, coping with users with colds, and dealing with noisy environments are still challenges.

9 .2.3 Designing spoken interaction After designers have established that using voice is appropr iate, they must decide whe ther the interaction will be conduc ted entirely via the audio channel (using speech recognition and production; e.g., on the phone or when users are driving or have visual impairments). Alternatively, they may integrate voice and visual channels to provide informative feedback or display results on the screen of a mobile device or a computer (Oviatt and Cohen, 2015). In general, combining input by voice with visual output is much preferable, as reading on the screen is much faster than listening to long prompts and allows rapid selec­tion. Having access to a keyboard to correct errors is also of great help.

Initiation The first step in using spoken interaction is for users to indicate that they wish to start the spoken interaction. In phone systems, a we lcome prompt is sufficien t to get start ed, but on the screen, a start button is needed (usually in the shape of a microphone), or an option is available to use a voice command to turn on the listenin g (e.g., "Hey Siri" or "Wake up"). This spoken command has to be very carefully chosen so that it is not misrecogni zed, but false positives will inevi tably occur, causing frustration and possible chaos if further commands are recognized without users noticing it. The initiation may be done for each com­mand, or a separate spoken command may be needed to stop the reco gnition process. For example, the Nuance Dragon™ system uses "Wake up" and "Go to sleep" and allows users to chat with others - or just relax- in the middle of a spoken interaction session. An on-screen reminder of the stop command is helpful to novice and intermittent users.

Knowing what to say Next, users need to know what can be said and reli­ably recognized. Learnability is one of the main issues of human language tech­nologies that attempt to mimic natural language . In IVR phone sys tems, spoken prompts guide users and invite them to press keys or speak one of the proposed menu choices. Because they are typica lly used by novices or intermittent users, the possible transactions remain simple and the dialogue entirely directed (e.g., us­ers are instructed to please say "accoun t balance," "biU pay," or "fund transfer"). Some IVR sys tems use more open-ended prompts (e.g., "Wha t serv ice do you need? ") and rely on a series of dia logues to clarify and confirm choices. The use of speech recognition allows users to shortcut through menu trees, which can be successfu l when users know the names of what they seek, such as a city, person, or stock nam e. They may even be able to speak while the instructional prompt is being read. This barge-in technique works well when most users are repeat users who can immediate ly speak the options they have learned from previous

9.2 Speech Recognition 319

experience. In all cases, the challenge is to identify novice users who attempt to use commands that are not recognized and switch them to a more directed mode that lists the possible commands. Users will become frustrated when they have to navigate a complex and deep menu structure (Section 8.4), when they are not allowed to ''barge-in," when long spoken information segments contain irrelevant information, or when the menu of choices does not address their in­formation need.

Users of mobile digital "assistants" are left with the burden of leanung and remembering wha t the effective commands are. They may quickly become frus­trated and quit if none of their attempts leads to success. Help can be provided with examples of commands (Fig. 9.2), or users are left to search blog postings to find lists of effective prompts (Cross, 2015), but those lists may be very long and commai1ds still have to be remembered.

Recognition errors Slips produced by speecl1-recognition programs make for entertaining sections in product reviews in the trade press. Common errors occur when the vocabulary includes similar terms ("dime/time" or "Houston/ Austin"). Challenges include dealing with regional or foreign accents and background noise. Users might also stutter, misspeak, or use the wrong terms. Dealing with unknown new words (and even failing to recognize that a word is unknown) can lead to confidently misrecognizing a similar-sounding word. Of course, the most difficult problem is matching the semantic i.J.1terpretation and contextual understanding that humai15 apply easily to predict and disambiguate what was said . This problem was nicely highligl1ted in one of the few humorous titles of IBM technical reports: "How to Wreck a Nice Beach" (a play on "How to Rec­ognize Speech"). To quote a summary of speech recognition's accomplishments, Huang et al. (2014) humbly report that "despi te the impressive progress over the

••• AT&T • • 100'1(. -

Some things you can ask me:

'.~.'hf'n i~ rw,.-nP.xt mf'P.t ,n~f'

Bing Norah Jones

Call my brother at work

Set an alarm for G·30am

What's my ETA?


Go gle

Wish Alex a happy binh day ask me anything

. ... . 60?PM


~ Mepsand POI

...,. V/twetsBIII>

Nw I ""' 0 ..... .,.,© - \;J

Mobile dev ice assistants (from left to right: Siri, GoogleNow, Cortana, and Hound) all have similar microphone buttons but different ways of presenting suggestions.

320 Chapter 9 Expressive Human and Command Languages

past decades, today's speech recognition systems stil l degrade catastrophically even when the deviations are small in the sense (that) the human listener exhibits little or no difficulty. Robu stness of speech recognition remains a major research challenge" (101). Finally, only a small portion of the myriad of world languages have adequate recognizers, and the mixing of two or more languages in the same sentence-which is common for multilingual speakers-also causes problems.

Early speech recognition systems were speaker dependent, ai1d users were required to train tl1e sys tem to recognize their voice or deal with a particular microphone. This is not the case anymore for mobile phone use but is encour­aged for professional applications that incorporate some level of personaliza­tion to increase the recognition rate. Changing microphones also required recalibration. In all cases, limiting the world of po ssible commands and care­fully selecting easily differentiated term s dramatically improve recognitio11.

Correcting errors Correcting errors can be very taxing, especially when users do not have access to a keyboard or pointing device so all corrections have to be done using speech, possibly compounding errors with new ones. Even wl1en a keyboard and pointer is available, having to correct errors is a significant di s­traction from the main task. A pause is generally required to separate dictatio11 from editing commands, but providing correction commands that are very dis­tinct '"'ill also facilitate their identification. Facilitating the erasing of last spoken text (e.g., saying "scratch that") allows repeating or rephrasing. Once a correc­tion command ha s been identified (Fig. 9.3), alternative text can be proposed, or


Aa&bce1,, llollbC<Dd AaBbC AaBbCc AaBbC AaBbCcO AoBbC<Od """""'°" .. .,..,,,. 1 NOmllll f NO Sc,ec.. t-ltoO,ng 1 HeadillQ 2 Titlt Subtillt x«>lle £111... Em~ ~tl'die L -.


an example for the innis lanauaae

"'~•,f). lcon«t firw11~h

#t Ccnection Menu Lu

Select "'Choose"' followed by a number -Choose 1" finish -Choose l' RNNtSH

You can also se-lect "SJ)ell that~ 11)'0Udon1ttt)'OUl'dlokt~ "tlay lhnt b;Kk"

• ~II caps that'" •Add that to Y:ocobul;,,y" -unse1ec1 thar "Qon't recogr.ze th~t word" ·Make thal a corrwnand"

9(:ooedion opllons"


Correcting a word during dictation using Nuance Dragon™. After saying "Correc t finnish," the word is selected and possible corrections are displayed in a menu along with additional commands such as "spell that." Users can use the cursor, arrow keys, or voice to specify their choice.

9.2 Speech Recognition 321

users can add and record new terms (e.g., "IEEE" pronounced as "I triple E") or spell out words (e.g., for new names or cities).

Mapping to possible actions The secret of most successful speech recogni ­tion applications today is that they are limited to narrow application domains ­so the world of actions is Jjmited, and they use comma nd s carefulJy chosen to increase recognition (e.g., using "scratch that" to delete text). Banking IVRs only know about banking terms and have a small set of possible actions. Users of per­sonal assistants on mobile devices may impress friends with the variety of pos­sible commands, but each app has a limited set of possibilities. This stems from two main causes: First, mobile applications designers by nature focus on a limited number of often-used functions that are used constantly. Second, because speech is a highly variable signal, large corpora of recorded speech matching the applica­tion domain are needed to achieve good recognition results, so speech recogni­tion achieves much better results in application domains that have been studied and modeled extensively. Even if the speech recognition made no error , there are many levels of possible errors mapping the corresponding text to the expected action, as illustrated in Fig. 9.4. Companies continuously collect data from users as they speak and correct errors to improve both the speech recognition and the mapping to appropriate actions. Comparisons of today's assistants such as Siri, Google Now, Cortana, and Hound seem to suggest that mapping the recognized text to the most appropriate action is the most challenging task (Ezzohari, 2015).

Go gle glacier national Park

Aboul 19 400.000 1c-u~ (06 1 ";Oeonlk)

Glacier National Par1< U.S. National Park Service WWW.11)$.gov/glacl G]DieLMion Bca:(Ol&des natiorul P.ll\ . Goor;ile $Nr(h + Molilll fireL Rel/Ye the Clays OI Nabve Am$ncans rMJ rll"!&I"'-• ~ i.~ 1o ,-,. RIii T!'II ~ol : ,11 of o.-~-. ~ ,~llletl

Webcams Poolo Oenery •

In the news

F GURE 9.4

to ~ Old td.,ro, NoJla: b ~t,,,ao,,4',-U151»nn.-. ~~ ltt.-W Clllleot• llleo,c.nl , ,.., Doll web ~h fol' gl;,(.M:lf' Mllonal ~ rt

~MY tllal aQtifll



··-\ -~ ~:. ~t::·• - .. -ill. I~,·- ·• .-. .... . ,. -

Glacier Nationa l Park

Glltcier N11h:1nnl Pttrk 15 11 nntionlll p• rt locntOO 1n lie US 5181& Of t.lOtllana, on too caooda-U lliled States border .. Ille C8M<1Bn f)(OYJ'lCes Of AIIMl'I and BrliSh Coll.-nblll ~ IA

Address: w~ GltM:IOf, MT

It can be difficul t to remember what exact command will accompl ish the task. In t his examp le, when t he user said, "Search the web for glacier national park," a Goog le search was launched and a search executed as expected, but when the user said, "Do a web search for g lacier national park," all the words were accurately recognized but not as a command, so the text was placed in the Nuance Dragon TM

dictation box.

322 Chapter 9 Expressive Human and Command Languages

Being ab le to rely on contextua l information such as location or text from pre­vious commands gives the impression of a more conversational interaction. For example, it might be possible to say "show me close by restaurants," "how about in Baltimore instead," "with 3 or more star reviews." Those chains of commands are significantly more difficult to interpret correctly and are today only achieved in constrained applications and by trained users who have learned what will be successful.

Feedback and dialogues During dictation or transcription, the recogni zed text is shown in the document being composed or in a dictation buffer, usu­ally after a short delay (one to two seconds). Users can continue speaking or start correcting errors with the keyboard or by speaking navigation or editing commands. After correction, the text can also be transferred to a search box, the body of an e-mail message, or a field in a form. Applications tightly integrated with the speech recognition (opposed to relying on a dictation buffer) are more likely to be attractive and can generate spoken feedback as well.

Commands are usually executed directly, unless confirmation is preferable (e.g., "I am ready to e-mail this to Ben Shneiderman, should I go ahead?") or additional information or disambiguation is needed (e.g., "There are 2 Joht1 Smiths in your address book, which one should the e-mail be sent to?''). When context information has been used, feedback indicates how it was used. Specific questions may be asked to fill the holes in the task model and its attri­butes; for example, saying "Set an alarm" triggers a response asking "Set ai1 alarm for wl1en today?" (i.e., the date and time are missing from the alarm-setting task model, today was selected as the default date attribute, and a time attribute is still needed).

The availability of a display can greatly speed up interaction by presenting the proposed action in detail and only asking users to co11firm or cancel, but it precludes eye-free operation (e.g., potentially endangering drivers). On the other hand, entirely spoken dialogues can be lengthy and even reveal informa­tion the user didn't want to be heard.

9.2.4 Spoken prompts and commands

When human language techno logy has been identified as appropriate for an application, prompts and commands resembling natural languages have to be designed. A language may have a simple or complex syntax and may have a few operatio11s or hundreds, but the key issue - and the main usability determinant-is to adequately design clear prompts and a set of commands users can speak comfortably and remember easily and the system can recognize reliably.

The choice to use speech instead of keyboard entry is primarily a matter of user choice or possibility, but even with speech designers are the ones wl10 decide what features to support, what commands ,-vill be used, how users will

9.2 Speech Recognition 323

discover what is possible, and what feedback or error messages will be provided.

The designer's first step is to study the users' task domain. The outcome is a list of task actions and objects, which is tl1en abstracted into a set of interface actions and objects. These items, in turn, can be represented with the low -level interface syntax. Observing users speaking aloud is critical to discover com­mands tl1at users might speak "naturally." Both commands and prompts may include terms tl1at are rarely used in direct manipulation or menu systems; for example, users are likely to say "set an appointment for tomorrow " even though no specific menu for "tomorrow" exists in the menus of the graphical calendar interfaces.

A typ ical form is a verb followed by a noun object with qualifiers or arg11-ments for the verb or 11oun- for example, user s might say "lam1ch Facebook" or "set an alarm for 7 a.m." Human learning is greatly facilitated by meaningful structure. If a set of commands is well -designed , users will recognize its struc­tur e and easily encode it in their semantic -knowle dg e storage. For exampl e, if users can uniformly edit words, sentences, and document s, this meaningful pattern is easy for them to learn , apply, and recall. On the other hand, if they must use different terms to change a word, revise a sentence, or alter a docu­ment, the challenge and po tential for error grow substantially, no matt er how elega nt the syntax is . The "naturalness" will result from careful de sign and inclusion of synonyms (Fig. 9.5).

An effective way to test ear ly versions of a spoken language interaction is to conduct a Wizard of Oz evaluation in which a hidd en person is tran scri bing the spok en commands into text to simulate perfect recogn ition and typing dialog prompts that are shown to the unsuspecting participant on a screen (for an example, see Dyke et al., 2013).

give me help

give me help on commands

I ( go I move) I ( ( ( back I backward I backwards) I ( forward I forwards) ) I ( up I down)) ( one I a ) line

I ( go I move ) I ( ( ( back I backward I backwards) I ( forward I forwards )) I ( up I down)) ( twenty [ ... ) lines

( go I move) ... I ( ( one I one) I ( twenty I ... ) ) I [ ( go I move) I ( ( left I right ) I ( ( back I backward I backwards ) I ( forward I forwards) ) ) ( one I a ) character

[ ( go I move ) ) ( ( left I right) I ( ( back I backward I backwards ) I ( forward I forwards))) ( twenty [ ... ) characters

( go I move) to I the I ( bottom I end)

( go I move) to I the J ( bottom I end) of I the I ( line I document)

( go I move) to I the I ( start I top I b~inning )

( go I move) to I the I ( start I top I b~inning ) of I the I ( line I document)

goto sleep go_to_sleep

help me

FIGURE 9.5 A small subset of the rich set of commands used in the Nuance Dragon TM speech recognition system. Synonyms are inc luded and used consistently.

324 Chapter 9 Expressive Human and Command Languages

9 .3 Speech Production

Speech production is usually successful when the messages are simple and shor t and users' visual channels are overloaded; when they must be free to move around or on the phone; or when the environment is too brightly lit, too poorly lit, subjec t to severe vibration, or otherwise unsuitable for vis ual displays. However, designers must cope with the four obstacles to speech output: the slow pace of speech output when compared with visual displays, the ephemeral nature of speech, acceptability and privacy issues in public spaces, and the dif­ficulty in scanning/ searching (Box 9.2).

There are three general methods to produce speech. A common type of speech generation available commercially is for1nant synthesis, which produces entirely machine -generated speech using a set of algorithms to product sounds based on the phonetic representation of the text. The speech sounds somewha t artificial and robot-like. Concatenated synthesis instead combines tiny recorded human speech segments into phonemes, words, and phrases into full sentences. The voice is more natural but requires significantly more storage and comput ­ing power to assemble sentences on the fly. Formant synthes is and concatenated synthesis can generate any sente nce as needed. Finally, canned speech consists of a fixed set of digitized speech segments which can be assemb led together to cre­ate longer segments (e.g., "The next bus will arrive in" followed by "11" then "minutes"), but the number of possible complete sentences is limited and the seams between segments may sound awkward.

The quality of generated speech can be eva luated in terms of understandabil­ity, naturalness and acceptability. For some app lications, a computer- like sound may be preferred. For example, the robot-like sounds used in the Atlanta airport subway drew more attention than did a tape recording of a human giving direc­tions. Interacti ve voice response sys tems (IVRs) typical ly mix canned speecl-l segments and speech synthesis to allow appropriate emotional tone and current information presentation.

Audio books or audio tours in museums and tourist sites also use canned speech. They are success ful because they allow users to control the pace while conveying the curator's enthusiasm or author's emotion. Educational psycholo­gists conjecture that if several senses (sight, touch, hearing) are engaged, learn­ing can be facilitated. Adding a spoken component to an instructional system or an online help system (Section 14.3.2) may also improve the learning process.

Alerts and warnings can be presented using speech . They have been used it1 automobile navigation systems ("Turn right onto route Ml"), internet services ("You've got mail"), or utility-control rooms ("Danger, temperature rising"), but in most cases, the novelty wears off quickly. Talking sup ermarke t checkout machines that read out products and prices were found to violate sl1oppers' sense of privacy about purchases. Only generic instructions are spoken now, but

9.4 Human LanguageTechnology 325

many consumers still find them too noisy. Similarly, annoying warnings from cameras ("Too dark-use flash") and automobiles ("Your door is ajar") were removed and replaced with gentler tones or visual indicators. Spoken warnings in cockpi ts and control rooms are still used because they are omnidirectional and elicit rapid responses. However, even in the se environments, spoken warn­ings can be missed, especially when in competition with human-human com­munication, and multiple methods are used simultaneously (e.g., a visual alert or a dialog box).

Applications for the visua lly impaired are an important success sto ry . Utili­ties like the built-in Microsoft Windows Narrator or Apple VoiceOver can be used to read passages of text or hear descriptions of items on the screen. Screen readers like Freedom Scientific's JAWS, NV Access's Non Visual Desktop Access (NVDA), or Apple VoiceOver allow users with visual impairments to produc­tively navigate between windows, select applications, browse graphica l inter­faces, and of course read text. Such tools rely on textual descriptions being made available for visual elemen ts (labels for icons and image descriptions for graph­ics). Reading speed is adjustable, which allows interaction to be speeded up as well when needed. Book readers are also wide ly used in libraries. Patrons can place a book on a copier-like device that scans the text and does an acceptable job of reading it.

The slow pace of normal spoken output, the ephemera l nature of speech, and the difficulty in scanning/searching remain challenges, but speech production is widely used because it enables services that would otherwise be too expen ­sive; hiring well-trained customer-service representatives available 24 hours a day is not practical for many organizations.

9 .4 Human Language Technology

Even before there were co1nputers, people dreamed about creating machines that would be able to understand natural language-that is, be able to take the appro­priate action in various contexts without users having to learn any command syn­tax or select from menus. It is a wonderful fantasy, but language is subtle; there are many special cases, contexts are complex, and emotional relationships ha ve a powerful and pervasive effect in human-human communication. Although true comprehension and generation of ope11-ended language seem an inaccessible goal, there has been extensive research on human language technology; wide­spread use is slow in developing, primarily because the alternatives are more appealing. Contrary to common belief, human-human interaction is not necessar­ily an appropriate model for human operation of computers. Since computers can display information 1,000 times faster than people can enter commai1ds, it is advantageous to use the computer to displa y large amounts of information and to

326 Chapter 9 Expressive Human and Command Languages

allow users simply to choose among the items. Selection helps to guide users by making clear what objects and actions are available. For know ledgeable and fre­quent users who are thoroughly aware of the available functions, a precise, con­cise language (typed or spoken) is usually preferred (Section 9.5).

Natur al language interaction (NLI) in the form of a series of exchanges resembling a dialogue is difficult to design and build for even a single topic. The key impediment is the habitabilihJ of the user interface-that is, how easy it is for users to determine wha t objects and actions are appropriate. Visual interfaces provide the cues for the semantics of interaction, but NLI interfaces typically depend on assumed user models. Users who are knowledgeable about their tasks-for example, stock-market brokers who know the stock codes (objects) and buy /sell actions -c an place orders in natural language, but these users pre­fer compact command lai1guages because they are more rapid and reliable.

While early conceptions of human lan guage technology assumed that com­puters would parse natural language expressions in text or spoken forms a1ld derive some level of "understanding" and description of users' "intent," the current successes rely it1stead on statistical m.ethods based on the analysis of vas t textual or spoken corpora and usage data of millions of users.

For example, question-answering strategies are successful in situations where there are rele, ,ant corpora and designers have craf ted effective user interfaces that expand queries, search databases, show users alternatives, and present final resu lts in ways that are most likely to be useful. Their success comes not from the understanding of the natural lan guage but from the fact that the question at hand has already been asked before - using the same terminology - and has already been answered by others (Hearst, 2011). Another method is to analyze web search usage logs to find what resu lts users seek often. For example, when users type "Leddo restaurant," hLunan language technology extracts relevan t queries from the dataset and identifies that "Leddo" does not exist but "Ledo" is a frequent entr y. Then the word "restaurant" has been repeatedly identified as a term that leads users to look for an address, hours of operation, or a map, so that information can be presented by default. This can be done on the basis of fre­quency of past queries and on the log of previous users' actions.

Other applications include extraction and tagging. Extractio n refers to the pro­cess of analyzing human language to create a more structured format, such as a relational database. The advantage is that the parsing can be done once in advance to structure the entire database and to speed searches when users pose relational queries. Legal (Supreme Court decisions or state laws), medical (scientific journal articles or patient histories), and journalistic (Associated Press news stories or Wall Street Journal reports) texts have been used. A variant is to tag documents based on content. For example, it is useful to have an automated analysis of business news stories to classify them as covering mergers, bankruptcies, or initial public offerings for companies in vario us industries sucl1 as electronics, pharmaceutical, or petroleum. Extracting and tagging applica­tions are promising because users appreciate even a modest increase in suitable

9.4 Human LanguageTechnology 327

retrievals, and imperfect retrievals are more acceptable than errors in natura l language interaction. On the other hand, errors can become quite problematic when the extracted information is used to make decisions or inform policies. One example is the use of human language technology in medicine. A large amount of information about medical conditions, treatments, and outcomes is buried in textual notes \"lritten by physicians in electronic health records. Auto­matically extracting diagnoses or test results out of the text notes can be very useful to identify possible candidates for a clinical trial, as all records will be reviewed by a clinician. On the other hand, the use of automatic tags for clinical decision making can be problematic. The rare cases of success are limited to situ­ations with specific users, document types, and decision support goals (Demner­Fushman et al., 2009). Sentiment analysis is a specializ ed tagging, which can be applied to groups of news articles, reviews, or soc ial media to monitor global changes in opinion, but tagging of individual documents remains error -prone.

Human language text generation is used for simple tasks, such as the prepara­tion of structur ed weather report s ("80% chance of light rain in north ern suburbs by late Sunday afternoon") in which generated reports from structured databases can be sent out automatically. Automatical ly generated text can be used to supple ­ment standard data charts such as bar charts or scatterplots in order to make them mor e accessible to us ers with visual impairm ents (e.g., Google Sheet's Explore or iweave .com). More elaborat e applications of text generation includ e pr eparation of reports of medical laboratory or psycho logical tests. The computer generates not only readable reports ("White -blood -cell count is 12,000") but also warnings ("This value exceeds the normal range of 3,000 to 8,000 by 50%") or recommenda­tions ("Further examination for systemic infection is recommended"). Still mor e in, ,olved scenarios for text generation involve the creation of legal contracts, wills, or business proposa ls. Text summarization remains a much greater challenge with limited success, as summaries must capture the essence of the content and convey it accurately in a compact manner (Liu et al., 2012).

Human language technologies are used in instructional systems. Successfu l examp les are in grammatical error detect ion and proofreading. Also widely used-but more controversial-is the automated scoring of short-answer responses or essays during student assessment. Human language technology has been introduced into a variety of educational contexts such as reading sup­port. Tutoria ls with materia ls and pedagogy that have been carefu lly tested can provide feedback in natural language, which encourages students to stay engaged in the educational process. Simulations can also be used to practice communication skills learned in other settings (Fig. 9.6).

A remaining question is whether learning differs when students speak their responses or type them. A Wizard of Oz experiment (where a human transcribed the learner's speech before submitting it to the tutor) suggests that learning gains and preferences are similar with both modalities, but highly motivated students reported lower cognitive load and demonstrated increased leanung when typing compared with speaking (D'Mello et al., 2011).

328 Chapter 9 Expressive Human and Command Languages

Translation between human lan­guages has long been a goal (Green et al., 2015), but older strategie s of word replacement witl1 some grammatical parsing have given way to statistical methods based on having large data­bases with correct human translations , such as United Nation s document s that appear in five required language s. Then well -designed user interfaces clarify what users can input in text window s, present translation option s, show the translation, and guide sub se­quent user actions (Fig. 9.7). This design effort gets more complex with inpt1t errors and languag es that may have unfamiliar characters, differ from English left-to-right formatting, and invoke words that do not exist in the targ et languag es.



FIGURE 9.6 Using the lmmersive Naval Officer Training System (INOTS), new Navy off icers can practice their counseling skills in a virtual reality environment. Officers listen to an avatar and respond using spoken language, loosely fo llowing suggestions from multi-choice prompts presented on the screen and designed to match the learning objectives. The interaction is constrained, but assessment is faci litated (Dyke et al., 2013; http:// www.netc.navy.mil/nstc/news _ page_2012_02_24_2.asp).

Catherine ::: 0 •

English Spani sh FrencJI Doted laneu11go • .. ~ trre!Mlh En9lil h Spanl1h • SH:! di Dur de traduire ces drOles de phrases

<frOles ◄~

Definitions of drOle _ ... Arnusant, comique. "Ce com8dien e61 lrits dr61e .-

Bizarre. "C'est drOle, on n'a pas entendU parler de lul depuls longlemps ."

See also hlstolre drOle, C'e, t drtile.

x Hard to translate [these fujnny[_ sentences

Translations of dr61e


theee bl ny

--· ~ C88 ..,.._ thOse b'l ny ...... .._.

r W· 1'4 i!M'ft!rW!=-51


- funny drOle, amusant, marrarrl bizarre, comique, 6trange - amusing amu&ant. drtte, plaisan1

- comical oomique, cocasse, drl'Ae, bouffon, ristie ■ comic oomlqua, drtlle, bouffon, cocasse, r1slJle

- droll drOIO, com,IQue. bil:orre ■ rum drOle, bizarre. 61raoge, biscomu


Tlm off lnstanl translation About Google Translate Mobile Community PriY'acy & Tanna Help Send feedback

FIGURE 9.7 Google Translate, showing a French sentence translated in English. A click on the word "dr61e" disp layed its definition. Selecting "funny" high lighted "these funny" as wel l as the matching French words, and an alternative translation can be selec ted.

9 .5 Traditional Command Languages 329

9.5 Traditional Command Languages

Early tally marks and pictographs on cave walls existed for millennia before pre­cise notations for numbers or other concepts appeared. Eventually, languages with a sma ll alpl1abet and rules of word and sentence formation dominated. Computers were quickly found to be effective to manipulate logical expressions, operate on the real world, or searcl1 vast libraries. These applications encouraged designers to find convenient notations to direct the computer, leading to com­mand languages. With a command-language interface, users type a command and watch what happens. If the result is correct, the next command is issued; if not, some otl1er strategy is adopted. By contrast, menu-selection users view or hear the limited set of menu items and they respond more than initiate. With command-language interfaces, users must recall the exact words to be used and the correct syn tax, although prompted input often supplies a visual list of correct completions. For example, in the early days of command languages, users who needed to print a document may have been instructed to type:


Ano ther example is the Unix command used to delete blank lines from a file:

grep -v A$ f il ea > fileb

Command-line interfaces are often preferred when the application is used in an advanced way (e.g., professionals using an application for hours every day). Casua l users favor graphical user interfaces, but both styles of interface can be made available successfully because they do not always provide the same functionality. For example, in MATLAB, the command language can handle all the calculatio11s, and a large subset of calct1lations is also available via the graph­ical user interface, which makes it easier for novice user s to get started . Being able to type complex Boolean expressions using AND, OR, or NOT as well as regular expressions remains a key motivation for experienced users who can accomplish remarkable feats at amazing speed (Fig. 9.8).

Web addresses or URLs can be seen as a form of command language. Users come to memorize the structure of their favorite site addresses, even though the typical usage is to click on a link to select an address from a webpage or a search result page. The address field of browsers can also be used as a command line. For example, typing "(1024*768)/25" in the URL field in a Chrome browser will calculate the result, and typing "100 feet to meters" will launch the conversion tool and show the result: 30.48 meters.

330 Chapter 9 Expressive Human and Command Languages

••• 1 XDDT_0000c7c 2 XDDT_0000c7q 3 XDDT_0000c7c 4 XDOT_0000c7c 5 XDOT_0000c7c 6 XDOT_0000c7c 7 XDOT_0000c7c 8 XDOT_0000c7<! 9 XDOT_0000c7c

10 XDOT_0000c7c 11 XDOT_0000c7c 12 XDOT_0000c7 13 XDOT_0000c7 14 XDOT_0000c7c 15 XDOT_0000c7c 16 XDOT_000141e 17 XDOT_000141• 18 XDOT_000141q 19 XDOT_000141e 20 XDOT_000141e


Incident start 2013-07-20 21:30:46,000 State Po(l c• I arrived 2013-07- 20 21:31:31,000 SHA Shop Churchville notified 2013~7-20 21:31:45.000 Fireboard arrived 2013-07-20 22:06 :40.000 Investigation-accident notified 2013~7-20 22:39:58.000 H•dical Examin•r notifi•d 2013-07-20 22:40:08.000 Priv. Tow Light Duty notified 2013~7-20 22:40:22.000 local Police ! arrived 2013-07-20 22:40:38.000 Unit not tied 2013-07-20 22:45:17.000 Unit notified 2013-07-20 22:48:06.000 Fireboard departed 2013-07-21 01:55 :55.000 loc a Po ice departed 2013-07-21 01:55:56.000 State Po ce departed 2013-07- 21 01:55:58,000 Inc ent c eared 2013-07-21 01:56:29.000 Inciden t closed 2013-07-21 01:56:32.000 Incident start 2011-03-05 21:22:33,000 Fi reboard arriv•d 2011-03~5 21:23 :12.000 Local Police) arr ived 2011-03~5 21:23:15.000 CITY PD notified 2011-03~5 21:23:22.000 CITY PD arrived 2011-03-05 21:23:22,000

t .• 7 Police

Using the Sublime text editor, a user is doing a search and replace in a data t able using regular expressions. Typing" \t.*? Pol ice" in the search box searches for a tab fo llowed by zero or more characters, a space, and then the word "Po lice." The patterns found in the document are highlighted with a thin black line, showing that both "Local Police" and "State Police" have been found and selected. An overview of the entire document is visible on the right, revealing the presence of many othe r matches that can now be replaced all at once.

Twitter tags (#hcil, $TWTR, or @benbendc) can also be considered an exam ­ple of new command language that needs to be learned and remembered, along with the proliferation of acronyms and abbreviations used by clever text­message writers (e.g., LOL for "laugh out loud" or 2G2BT for "too good to be true"). In the traditional desktop environment, shortcut keys also remain heav ­ily used by users who take the time to learn them (e.g., typing Ctrl-Q for Quit or Ctrl-P for Print; see Section 8.2.2). Programmers or professionals who use a single app lication all day long (e.g., a computer-aided design or pub lishing application) can memorize hundreds of commands and shortcuts, helping them gain mastery of their application (Cockburn et al., 2014).

One important opportun ity linked to command languages is that histories can easily be kept and macros or scripts created to automate actions, but the essence of command languages is that they have an ephemera l natu re and they produce an immediate result on some object of interest. Feedback is generated for correct commands, and error messages (Section 12.7) result from unaccept­able forms or typos. Auto-completion is critical to help prevent errors. Command -language systems may offer brief prompts with choices, becoming closer to menu-selection systems. Command languages typically do not require a pointing device and therefore can become a lifesaver for users with visual impairments, which make the use of mice and touchscreens impractical.

Practitioner's Summary 331

Database-query languages for relational databases were developed in the middle to late 1970s; they led to the still widely used Structured Query Language, or SQLTM, which emphasized short segments of code (2 to 20 lines) tha t could be typed and execu ted immediately. For examp le:


Here the goal of the user is to create a result rather than a program. A key part of database-query languages and information-retrieval languages is the specifi­cation of Boolean operations - AND, OR, and NOT- which can be very chal­lenging to specify. See Chapter 15 for more on advances regarding searching.

Major considerations for expert users are the possibilities of tailoring the lan ­guage to suit personal work styles and of creating named macros to permit sev­eral operations to be carried out with a single command. Macro facilities allow extensions that the designe rs did not foresee or that are beneficial to only a smal l fragment of the user community. A macro facility can become a full program ­ming language that might include specification of arguments, conditionals, iter­ation, integers, strings, and screen-manipulation prim itives plus library and editing too ls-resembling a full-blown programming language.

In summary, while error rates remain high, the complexity and power of command languages have a certain attraction for a portion of the computer user community. Users gain satisfaction in overcoming the difficulties and becoming one of the inner circle "gurus" of their favorite command language.

Practitioner's Summary

The dream of natural language interaction has been mostly replaced by the effec­tive use of statistical methods based on very large spoken and text corpora and logs of user interactions. Speech recognition for personal digita l assistants and dictation has become increasingly successful, but errors and error correction remain issues. Speech-based approaches for guided interactions over telephones are also proving to be useful.

Speech generation, when well-designed, can support effective applications with phone, mobile devices, or book readers. Well-designed user interfaces enable inte­gration of visual displays and touchscreens with speech. Text analysis, generation, and translation are useful human language technologies based on large training databases and appropriate user interfaces to prompt users and handle interactions.

Command languages continue to be attractive for expert users who learn the semantics and syntax because they can rapidly specify actions involving

332 Chapter 9 Expressive Human and Command Languages

multip le options. Command languages allow sequences of commands to be stored for future use as a macro or script.

For command languages as well as spoken command languages, designers begin with a careful task analysis to determirle what functions should be pro­vided. Meaningful specific names aid learning and retention.

Researcher's Agenda

Speech recognition and generation user interfaces are maturing rapidly as effec­tive designs have a growing user community. Improved user interfaces that integrate speech with visual displays and touchscreen contro ls may attract still larger communities; however, research on error reduction and methods to facili­tate error correction is still needed.

Natural language interaction success stories are still elusive, but human lan­guage technology has become an important part of the success of search tech­nology (Chapter 15). Spoken and text generation has shown value, so further research is v.rarranted. For those who continue to explore specific applications, empirical tests and long-term case studies offer successful strategies to identify the appropriate niches and designs.



• Designers will find many demonstrations of spoken interaction on You Tube. For example, the different styles of feedback and dialogue used by personal "assistants" can be seen at Hound Beta vs . Siri vs . Google Now vs . Cortana: https://www .youtube .com/

watch ?t=134&v=9zNh8kOLhfo. • Experimenting with common search engines and personal digital

assistants such as Siri or Google Now provides hints about the current human lang uage techno logy strateg ies used for question answering .

• T ranslation: http://translate.goog le.com or http://www.babelfish.com • Speech recognition commercial systems: http://www.nuance.com/dragon

• IVR dialog system: http://www .ibm .com/smarterplanet/us/en/ ibmwatson/developercloud/dialog .html

References 333

Discussion Questions

1. Consider voice-activated digital assistants such as Siri, Cortana, or Google Talk. Identify a situation or scenario where you chose to use this personal as­sistant, and identify a scenario where you chose to avoid it.

2. As a follow-on to the previous question, produce a thoughtful argument about what role spoken interaction should have in user interfaces. Be sure to list at least three benefits and limitations of spoken interaction.

3. Briefly describe the applications of speech recognition.

4. \i\lhat are the obstacles to speech recognition and production?

5. There exist applications of human language understanding technology. Name some examples.

6. List several situations when command languages can be attractive for users.


Bou zid, Ahmed, and Ma, Wei ye, Don't Make Me Tap! A Co1nn1on Sense Approach to Voice Usability, Dakota Press (2013) .

Cockburn, A., Guhvin, C., Scarr, J., and Malacria, S., Supporting novice to expert transi­tion s in user interfaces, ACM Con·1puting Suroei;s 47, 2 (2014), Article 2.

Cohen, M. H ., Giangola, J.P., and Balogh, J., Voice User Inte1face Design, Addison Wesley (2004).

Cross, J., A list of all the Google Now voice commands, Greenbot blog http:/ /www. greenbot .com/ article /2359684 /system-software/ a-list -of-all -the -ok-google -voice ­com ma nds.h tm (2015).

Demner-Fushman, 0., Chapman , W. Vv., and McDonald, C. J., What can natural language processing do for clinical decision support? Journal of Bivn1edical Informatics 42, 5 (2009), 760-772.

D'Mello, S. K., Dowell, N. N., and Graesser, A., Does it really matter ,-vhether students' contributions are spoken versus typed in an intelligent tutoring system with natural language? Journal of Experirnental Psychology 17, l (2011), 1-17 .

Dyke, G., Adamson, A., Howley, I., and Rose, C. P., Enhancing scientific reasoning and discussion ,-vith conversational agents, IEEE Transactions on Learning Technologies 6, 3 (2013), 240-247.

Ezzohari, H ., [ULTIMATE ] personal assistant review: Hound vs Siri vs Google Now vs Cortana http:// ,-vww. typhone.nl/blog/ ultimate-voice-assistant-review/ (2015).

Green, S., Heer, J., and Manning, C. D., Natural language translation at the intersection of AI and HCI, Co1nn1unications of the ACM 58, 9 (2015), 46-53.

334 Chapter 9 Express ive Human and Command Languages

Hearst, M. A., "Natural" search user interfaces, Connnunications of the ACM 54, 11 (2011), 60- 67.

Huang, X., Baker, J., and Reddy, R., A histor ical perspective of speech recognition, Con1n1unications of the ACM 57, 1 (2014), 94- 103.

Karat, M-C., Lau, J., Steward, 0., and Yankelovich, N., Speech and language interfaces, applications and technologies, in Jacko, J. (Editor), The Human-Computer Interaction Handbook, CRC Press (2012), 367-386.

Lewis, J. R., Practical Speech User Interface Design, CRC Press (2011) .

Li, Jinyu, Deng, Li, Haeb-Umbach, Reinhold, and Gong, Yifang, Robust Speech Recogni­tion: A Bridge to Practical Applications, Academ ic Press (2015).

Liu, S., Zhou, M. X., Pan, S., Song, Y., Qian, W., Cai, W., and Lian, X., TIARA: Interactive, topic -dased vis ual text summari za tion and ana lys is, ACM Transactions on Intelligent Systenzs Technology 3, 2 (2012), 28 pages.

Mariani, Joseph, Rosset, Sophie, Garnier-Rizet, and Devillers, Laurence (Editors), Natural in teraction ivith Robots, Kno1vbots and Snurrtphones: Putting Spoken Dialog Syste,ns into Practice, Springer (2014).

Neustein, A., and Markowitz, J. A. (Editors), Mobile Speech and Advanced Natural Language Solutions, Springer (2013) .

Oviatt, Sharon, and Cohen, Philip, The Paradign·1 Shift to Multimodality in Contemporary Computer Interfaces, Morgan & Claypool (2015) .

Pieraccini, Roberto, The Voice in the Machine: Building Co,nputers that Understand Speech, MIT Press (2012).

Radvansky, G., and Ashcraft, M., Cognition, 6th Edition, Pearson (2013) .

