Designing for Multi-Modal IVI Systems

Multi-tasking challenges us to understand, and to harness, human attention as a design medium

As the auto industry continues to work through the challenges of measuring, designing, and regulating the way attention (and distraction) function in the multi-modal vehicle cockpit, shared languages that bridge research and design are crucial to the creation of more efficient, engaging, and safer interactions with IVI systems

The automotive industry today faces a crisis: tech-savvy drivers expect access to all their familiar media, communications, and rich navigation while behind the wheel, and manufacturers stand to benefit enormously by making this possible. But there is also little doubt that interacting with this non-essential information can distract dangerously from the primary task of piloting a two-ton vehicle at high speeds. As User Experience (UX) and interface designers working in the in-vehicle infotainment (IVI) space, we have been challenged to resolve these two competing imperatives. We have come to believe that not only is multi-tasking in the vehicle inevitable, but in fact multi-tasking is likely to be the dominant form of human-computer interaction in the coming years, not just in the car, but in mobile devices, PCs and TVs. What multi-tasking challenges us to understand, and to harness, is the very medium of attention itself-how it encompasses, focuses, diverts, splits, and sustains; how it is apportioned across a sensory field of vision, audio, touch. What we need is a new notation, a musical score, if you will, with which we can orchestrate interactions at the new scales, frequency, and durations demanded by today's technologies.

Prevailing wisdom suggests that a driver's attention span is a limited quantity, primarily consumed by the driving task itself. Studies have shown, not surprisingly, that more of a driver's attention is required on curvy roads, in traffic, and at high speeds. Less attention is required when driving at slow speeds in familiar environments, almost none when idling at a stoplight. Drawn below is a very diagrammatic "Attention Graph" showing the fluctuations of a driver's attention load during a hypothetical trip.

Secondary, non-driving tasks like talking on the phone or sending a text message in theory draw down part, or in some cases all of a driver's available attention resources. If we add the attention requirements of these secondary tasks to the driving attention load graph, we get a rough picture of the total attention demands at a particular point of the driver's trip, including points at which the attention composite of multiple simultaneous task approaches dangerous limits.

While the overall attention load can be approximated as a single quantity, as shown above, the simultaneous, overlapping fields of visual, auditory, and tactile attention add complexity to the model. Driving is primarily a visual task, and looking away from the road while driving, even momentarily, can be fatal. US DoT guidelines, for example, mandate that any in-car interaction divert the driver's gaze from the road for no more than a two seconds. In other words, the visual sensory channel, primarily occupied with the primary driving task, has very little 'bandwidth' available for secondary visual tasks (anything involving looking at a screen). With the visual channel, or modality, already near capacity, auditory and tactile sensory modalities have become important channels for the exchange of information between driver and IVI computer. In recent years, speech, gesture, vibro-tactile, eye-tracking, and other input and output systems have all been adopted within vehicle systems, often in concert to offset their individual limitations.

Structuring Attention Design

Faced with multiple sensory channels and highly variable task flows, how can we think about orchestrating multi-modal interactions in a more deliberate, systematic way? With few available HMI guidelines, and little cross-disciplinary metrics shared between modality research silos, we decided to build our own rough qualitative framework to better describe the characteristics of driver attention as it plays out across the multi-modal, spatially-distributed interactions of the car cockpit. We defined three basic axes of attention occurring within the vehicle cockpit: Location, Duration, and Intensity. Along these axes we mapped a range of automotive HMI modalities-speech, touchscreen, steering wheel controls, hand gesture, finger gesture, vibrotactile, and a deformable surface concept-according to the sensory parameters, limitations, and conveniences of each. This framework offered a way of organizing the knowledge we gained through conversations with HMI experts, user studies, and product tests. It shed light on strengths and weaknesses of different HMI systems in various use contexts. And it gave us a simple visual model that we could use to structure a series of multi-modal interaction concepts and use case scenarios.

Location (Core-Periphery)

In the vehicle cockpit, the eyes-forward, hands-on-wheel driving posture and fixed distribution of interfaces make the spatial dimension of attention a primary consideration for safe and efficient operation. The diagram below addresses the direction of gaze and distance of tactile and gestural interaction points from the primary driving posture, as well as the spatial dimensions of other non-visual interfaces (for instance, drivers tend to look towards the direction of sound when using speech commands). As interaction systems link various screens, controllers, sensors, and other touchpoints across the user's physical environment, and utilize multiple points of the body (from pedometers to brain-wave controllers), designers must think of a distributed ergonomics-the distinct spatial dimension of each interaction path.


Interaction paths no longer require the continuous, unbroken engagement of user attention from start to finish of a specific task. Increasing appetite for multitasking and on-the-go information access requires that interactions are broken down into smaller increments, segmented or "chunked" across longer time frames and often multiple screens. Drivers might, for instance, begin inputting a POI term into their head unit navigation system while stopped at a stopsign, select their target from a shortlist on their instrument cluster while on an open stretch of road, and then glance at the map at multiple points along their journey. Each HMI modality involved in this task flow has optimal durations of use, based on ergonomics, convenience, and other factors. The diagram below maps the duration 'profile' of several HMI technologies.


The spectrum of interaction intensity in the vehicle has expanded. At the low end of the spectrum, ambient sensing has blurred the line between active and passive interfaces. Eye-tracking and heartbeat sensors adjust screen interfaces and supply warnings based on driver drowsiness levels. Smiles, finger tapping and other affirmative gestures give low-level feedback to the computer systems that determine music selections. At the high end of the spectrum, certain HMI modalities (auditory in particular) are better than others at alerts, warnings, and high-intensity interactions. The diagram below maps the intensity range of several HMI modalities.

The mapping framework presented here is a template-a sketch of what a notation for cataloguing, comparing, and eventually orchestrating interactions across multiple sensory modalities might look like. In a field of highly specialized, HMI modality research, this framework is subjective, simplistic, and deliberately fuzzy. But it suggests a way forward. As the auto industry continues to work through the challenges of measuring, designing, and regulating the way attention (and distraction) function in the multi-modal vehicle cockpit, shared languages that bridge research and design are crucial to the creation of more efficient, engaging, and safer interactions with IVI systems.