Thinking Out Loud: An Invitation for Designers to Consider the Voice User Interface (VUI)

Zahabi, Liese

doi:https://doi.org/10.3998/dialectic.14932326.0004.105

Volume 4, Issue 1, Fall 2022

Thinking Out Loud: An Invitation for Designers to Consider the Voice User Interface (VUI)

Liese Zahabi

Department of Art & Art History, The University of New Hampshire, Durham, New Hampshire, USA

DOI: https://doi.org/10.3998/dialectic.14932326.0004.105

Permissions: This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License. Please contact [email protected] to use this work in a way not covered by the license.

For more information, read Michigan Publishing's access and usage policy.

PDF
Share+
- Twitter
- Facebook
- Reddit
- Mendeley

Current trends in computer-supported, digital technology point to voice activated devices and interfaces, specifically the Voice User Interface (VUI) becoming much more normalized and prevalent in places like the U.S. that make regular use of electricity and internet-facilitated computational activity. According to a survey conducted by the Pew Research Center in 2019, 45 million households in the U.S. already have at least one smart speaker, and large numbers of homes in other countries are quickly adopting similar technology. Add to this the global number of smart phone users, and users of other VUI enabled devices, and it becomes evident that using voice to interact with computationally facilitated technology has become increasingly commonplace. This context of use is reminiscent of an era in the 1980s when the use of Graphical User Interfaces (GUIs) and the “what-you-see-is-what-you-get,” or “WYSIWYG,” facilitated modalities of use became the main way users interacted with computers. These modalities afforded interaction designers opportunities to become effective ambassadors for ensuring that the needs and desires of those who used and would use these interfaces could be effectively met. We are currently witnessing a similarly important inflection point in the history of how humans interact with computers during which visual communicators and user experience designers have the potential to significantly affect the functionalities of VUI interactions and the operations of the devices they support. This article aims to critically analyze some key aspects of VUI as they relate to the following questions.

Is the current orientation of this technology toward audio-focused interfaces that rely on the metaphor of conversation best serving those who have chosen to integrate these interfaces into their lives?
What are the design implications of asking humans to engage with an interface that is “there but not there,” or that users can only actuate by speaking?
What types of expertise can visual communication and user experience designers bring to the development, implementation and testing of VUI-enabled or enhanced user experiences to improve the efficacy of their operations?
What else might we do with these devices to take more effective advantage of their specialized affordances to create unique, useful and meaningful user experiences?

Introduction

According to the user experience research and consultancy organization the Nielsen Norman Group, “The holy grail of usability is to build an interface that requires zero interaction cost: [this means] being able to fulfill users’ needs without having them do anything”.[1] Cultures have imagined variations of the intelligent machine, one that is able to deftly communicate with humans, for hundreds of years. One popular and reoccurring trope is the computer that listens, understands, and speaks. While we haven’t yet perfected such a system (like the one known as HAL from the film 2001: A Space Odyssey[2], or the one known as Samantha from the film Her[3]), current trends in technology point to voice activated devices and interfaces, as specifically operationalized by the Voice User Interface (VUI), becoming more normalized and prevalent. According to a survey conducted by the Pew Research Center in 2019, “One-quarter of US adults say they have a smart speaker in their home.”[4] This represents 45 million households in the US alone that already have at least one smart speaker, and large numbers of homes in other countries are quickly adopting similar technology. Add to this the global number of smart phone users, and users of other VUI enabled devices (currently estimated to be over 2.5 billion[5]), and it becomes evident that using voice to interact with computationally facilitated technology has become increasingly commonplace.

Voice-activated systems have been around since the 1950s when Bell Laboratories first started conducting research related to their potential, and their use has been common for automated phone-based assistance for decades. Contemporary voice-based, user experience facilitation assistants like Apple’s Siri have been in use since 2010. Ritwik Dasgupta discusses the importance of this contemporary moment for VUI, arguing that while these products have been explored for decades, “What makes the current offerings exciting is that they are finally widely commercially available, and we have a need for designers and engineers who can take up the challenge to develop scenarios to solve everyday problems for the user.”[6] Dasgupta goes on to compare this moment to when the use of Graphical User Interfaces (GUI) became the main way users interacted with computers, “where we felt the need for designers to clear up the clutter, simplify the data, and present the users with flows and solutions that were easier to grasp.”[7] We are witnessing an important inflection point during which visual communicators and user experience designers have the potential to make a significant impact on VUI interactions and devices.

Researcher Matthew Hoy describes VUI systems such as the Amazon Echo as objects connected to online software systems that are “[listening] for a keyword to wake [them] up. Once [they] hear that keyword, [they] record the user’s voice and send it to a specialized server, which processes and interprets it as a command.”[8] The software attempts to figure out the applicable context for this command, and will, “supply the voice assistant with appropriate information to be read back to the user ... or complete tasks with various connected services and devices.” [9] As much of contemporary digital media and technology is built to work with and across these connected services and devices, the primary design challenge that involves them entails creating a coherent and consistent user experience, which usually begins with a single (and simplified) access point. Google Search is one example of this. For all the complexity inherent in the combination of algorithms, the usage of machine learning and programmed web crawlers, and the employment of data mining that enable the functionalities of Google Search, the user experience itself is actuated through an intentionally sparse interface.

This is the promise that VUIs such as Alexa and Siri are making. With the aid of increasingly sophisticated and effective Artificial Intelligence (AI), they can connect an array of disparate databases, services, and interfaces, marshalling all that information under the command of one voice, one contact point. With the continued work being done with sensors, networked devices, and the Internet of Things (IoT), an additional promise is being made: VUIs can also help you to control many aspects of your environment.

While Voice User Interfaces have been widely available for more than a decade, researchers and designers are still working to figure out the best ways to integrate these interfaces into existing technology, and to create new types of interfaces centered on them and their unique potential. As we examine the VUI landscape, several questions become germane.

Is the current orientation of this technology toward audio-focused interfaces that rely on the metaphor of conversation best serving the humans integrating these interfaces into their lives?
What are the design implications of asking humans to engage with an interface that is there but not there?
What expertise can visual communication and user experience designers bring to VUI enabled design experiences?
Finally, what else might we do with these devices to take advantage of their affordances, creating unique and meaningful user experiences?

This paper aims to critically analyze aspects of VUI as they relate to these questions.

Background and Related Concerns

A comprehensive discussion of Voice User Interfaces would include exploring the implications for privacy and security, the ethics of mining consumer data, serious issues related to sustainability and the natural environment, an overview of accessibility concerns, and issues connected to the fomenting of gender and racial bias. While a deep exploration of these complex and connected issues is worthwhile, it is beyond the scope of this paper. However, the following serves as a modest overview of work related to these aspects of VUIs.

Privacy and security

Any conversation about networked technology in the 21st century must examine the impacts on privacy and security for individuals and groups of users. Voice User Interface enabled devices have the potential to become invasive because users tend to always leave them on. This is the default state, allowing for the convenience of activating the system quickly using only one’s voice. While the companies selling these devices insist that they are not recording data from devices that haven’t been activated using a “wake-up word” or phrase, there have been reports of malfunctioning devices continuously sending information to the servers that are connected to them. [10] Seeking to protect and even expand their access to this data, Google and Amazon applied for a series of patents in late 2017 [11] that would significantly amplify the amount of data perceived, recorded, and analyzed by smart speakers and other VUI enabled devices. [12] While these patents may never be granted, the invasive nature of their intent is troubling. Digital privacy has been recently explored with some urgency,[13][14][15] and is an especially important focus as designers and technologists develop products with the capability of listening to our private conversations and personal lives.

Researchers have shown that Voice User Interface devices can be tricked by the use of lasers, [16] or duped when they are addressed by inaudible ultrasonic frequencies. [17] Both of these methods allow an attacker to approach a device that makes use of VUI and activate it using sounds that are inaudible to the human ear. This raises many important concerns regarding security, especially as we connect VUI devices to systems within our homes as essential components that are used in security alarms, video surveillance, and entrance locks. Data from VUI devices has also been subpoenaed for court cases and for use by law enforcement, with troubling privacy implications. [18] Designers trained to create human-centered products and experiences have an important role to play in the continued development and implementation of these products, exploring use cases connected to privacy, security, and the physicality of devices that comprise the Internet of Things, most of which are directly or indirectly connected to the internet.

Ethics of user data and environmental sustainability

In 2018, Kate Crawford and Vladen Joler published research [19] that revealed some of the problematic ways that the infrastructure that enabled the operation of a system like the Amazon Echo was and is presented to the public. Specifically, they discussed the difference between how individual users are commodified and represented in the system versus how that system is understood by those users. The material and labor structures that support contemporary digital devices are so complex that even the manufacturers themselves have a hard time understanding the structure and the functionalities that span the entirety of their whole supply chain. Devices that include VUI introduce yet another layer of complexity because of the invisible labor[a] and materials needed to build the underlying algorithmic infrastructure and software that make them work.

Crawford and Joler situate the user within the complexity of these systems: “Just as the Greek chimera was a mythological animal that was part lion, goat, snake and monster, the Echo user is simultaneously a consumer, a resource, a worker, and a product [...] and they provide labor.” [20] Problematic ethical implications are at play when humans are not only the consumers of digital products, but also supply labor unknowingly, or at the very least unconsciously, acting as an imperative resource that facilitates the operation and analysis of whatever systems support them. Similarly, there are ecological and life-centered design [21] concerns that have been found to affect the complex systems that enable VUI to function. These include the rare earth metals used in the construction of the devices themselves, the use of exploited labor to facilitate their manufacture, and the enormous expenditures of energy required to sustain their operations, as well as to make and ship them. Systems thinking designers, service designers, and sustainability-focused designers who work with the companies that develop, design, manufacture and disseminate these products are well-placed to increase understandings about not only their inherent complexities, but the socio-cultural, economic, environmental and public policy complexities that their widespread use affects. It is often those fulfilling design roles across the VUI landscape that are most well-positioned to implement more ethical and sustainable approaches to creating and distributing them, and are also most well-positioned to better inform users of the opportunities they have to affect the positive evolution of these systems.

Accessibility

One of the fundamental problems VUIs seem to be trying to solve is the requirement to type or click (or swipe or tap) to access information online. Instead of using a keyboard and what can be perceived in a visual display, users interact with these systems via spoken word and what can be perceived auditorily. This can help people in two specific ways. First, if a user can speak to a device instead of typing in a command or question, they can do so while simultaneously engaged in other activities. Second, VUIs can help humans with different accessibility needs. If a given user cannot physically input commands into the system using a keyboard or mouse, voice activation is incredibly powerful and empowering — it can allow these specific users a mechanism for engagement they don’t always have with technology, enabling them to complete tasks as part of their daily lives. Because the use of VUIs like Alexa and Siri have become popular, they can function as a less stigmatized tool to facilitate accessibility. People who are not literate, are having cognitive difficulties, have sight impairments, or who have other mobility differences can also find these devices helpful.

While the development of a device like the Echo potentially offers greater accessibility, much of the current focus of these products is on consumerism for the able-voiced. Moira Corcoran highlights how those with speech differences are not being effectively served by thesedevices. “If voice is the future, tech companies need to prioritize developing software that is inclusive of all speech. In the United States, 7.5 million people have trouble using their voice and more than 3 million people stutter, which can make it difficult for them to fully realize [the potential inherent in] voice-enabled technology.” [22]

The machine learning and algorithms that make Voice User Interfaces function are, in part, built using the input of actual humans speaking commands — during use the system collects the words users say and fine-tunes the algorithms. The issue, as Corcoran states, is that “most of these data points come from younger [fully] abled users.” [23] In order for humans with speech-impairments — or even those with different accents or speech behaviors — to fully benefit from VUI devices, the systems must start incorporating a more diverse array of human speakers in their development and testing. Some companies are starting to work in this space, [24] but more developers, designers, and researchers need to actively engage more diverse groups of people in the development of these systems.

Inclusivity and inherent bias

Finally, important work is being done to document, analyze, and unpack the inherent biases within the systems and structures that make Voice User Interfaces function. Many recently published books explore these themes and issues,[25][26][27][28] and in so doing critically examine the ways that algorithms, data, and AI are perceived as neutral and unbiased, but are in fact quite often the opposite. It has now become crucial for designers to ensure that their teams, processes, and approaches are as empathetically informed, inclusive and unbiased as possible. Almost any digitechnically facilitated system that incorporates or actualizes data, algorithms, and AI now needs to be meticulously examined during user testing and even after broad implementation to ensure that inherent (and possibly hidden) biases that could be misleading or harmful to those who might use them can be revealed and, as necessary, removed before undesirable affects occur.

Categories, Frameworks, and Implications

It is useful to consider how Voice User Interfaces fit into the current landscape of interface design. Senior user experience designer Alex Mohebbi outlines the differences between four categories of user interface: Command Line Interfaces (CLI), Graphical User Interfaces (GUI), Conversation User Interfaces (CUI), and Voice User Interfaces (VUI) [see Figure 1]. [29] These four interface types are used in various hardware, software, and use contexts, and some contemporary interfaces integrate more than one of these categories as options for interaction.

Figure 1: The four categories that constitute the types of user interface designs that are currently available across the so-called G20 nations in the world (diagram designed and illustrated by the author).

Although VUIs have grown in popularity, many research studies show that the current interfaces are still not fully usable. The Norman Nielsen Group released a study in 2019 that bluntly states, “Our user research found that current intelligent assistants [have a] usability level that’s close to useless for even slightly complex interactions. For simple interactions, the devices do meet the bare minimum usability requirements.” [30] Alex Mohebbi discusses why Graphical User Interfaces (GUI) have proven more effective for users engaged in complex interactions:

“What makes GUI so successful ... is its ability to create information spaces that users can relate to. By designing pages, navigation menus and buttons, [they] provide users not only with a mental model of a system’s architecture and capabilities, but also a nomenclature to identify these capabilities and a formula to trigger them without having to explicitly (and literally) express need or intent.” [31]

Through decades of experience with designing, co-developing, testing and implementing GUIs, designers and users have developed several effective strategies and mental models for understanding and interacting with them.

Moreover, the very nature of GUI engages humans in ways that work with how they typically process information (at least for those who are sighted, which is an important caveat): visual content presented on a computationally rendered screen acts as a prothesis that facilitates many types of understanding. VUI cannot engage users in the same way, because it relies on a fundamentally different process of facilitating user interaction. Interaction designer Jen Spatz explores many of these ideas in her work, specifically noting that, “Instead of a predictable system of containers and doorways, VUIs introduce helpful disembodied voices. New users don’t learn their capabilities by clicking around and exploring, but rather by cautiously mimicking behaviors witnessed in advertisements and other homes.” [32] The opaque nature of the VUI fosters an experience that is obscure, transactional, and turn-based, more like the experience of using so-called “Command-Line Interfaces” in the past (these were popular during the early days of so-called personal computing, from the late 1960s through the mid 1980s).

German scholars Dirk Schnelle and Fernando Lyardet analyze similar issues pertinent to the effective construction of VUI:

“VUIs are particularly difficult to build due to their transient and invisible nature. Unlike visual interfaces, once the commands and actions have been communicated to the user, they ‘are not there anymore’ ... the interaction with the user interface is not only affected by the objective limitations of a voice channel, but human factors play a decisive role: auditive and orientation capabilities, attention, clarity, diction, speed, and ambient noise.” [33]

A GUI allows users to perceive and recognize information at a glance, providing multiple pathways and choices within an interface in a single moment. A VUI cannot work the same way — it operates in the fourth dimension, and every word spoken aloud requires time for a user to articulate his, her or their thoughts, and then a system to process its perception of them. A VUI requires users to hold choices and information in their heads as they are being spoken, a task that is cognitively challenging and that is also limited by working memory. [34] This is exacerbated if the user is trying to multitask or is using the device in a chaotic context, such as a crowded commute on the subway, a busy workplace, a noisy home environment, or while driving in heavy traffic.

These concerns are echoed in the frameworks and models introduced by Donald Norman in his seminal text, The Design of Everyday Things. Norman introduces four “principles of good design” [35] that are useful to consider for VUI:

Visibility: By looking, the user should be able to ascertain the state of the device and the alternatives for action.
A good conceptual model: The designer provides a model for the user that consistently and sensibly depicts and functionalizes operations; this results in the user being presented with a coherent, consistent depiction of the entire system.
Good mappings: These make it possible to determine the relationships between actions and results, between the controls and their effects, and between the system state and what is visible.
Feedback: The user receives full and continuous feedback about the results of his, her or their actions.

Arguably, VUIs utilize the conceptual mental model of a helpful assistant, which is easily understood by most users. These devices offer ready feedback when the system does not understand an utterance or command, although these feedback mechanisms do not always work perfectly. When users issue complex commands or ask nuanced questions, many VUI systems struggle to tell a user specifically what was not understood, or exactly what pertinent information might be missing from a given request or question.

The parameters articulated within the other two principles described above are not currently being met. Voice User Interfaces differ in level of visibility depending on their form-factors and how they deliver information or respond to inputs. The use of Siri on an iPhone is embedded into a highly visual GUI, but a hands-free interaction with it may not have a visual component, whereas the use of Alexa on an Amazon Echo is almost completely invisible beyond the indicator light that has been incorporated into some models. For many of these interfaces, the user cannot ascertain “alternatives for action,” or even determine whether the interface is functioning properly. Similarly, most VUI devices do not offer “good mappings” between the controls and the possible actions. Many of these systems are almost completely audible, so users are left, as Jen Spatz has recounted, copying interactions they have seen others use in promotional materials or by friends and acquaintances. [36] These interfaces are designed to mimic everyday conversational interactions, but despite the best efforts, when we interact with a VUI we are not actually speaking to a human, and even the most sophisticated AI still struggles with the complexity, ambiguity, and nuance found in human communication. Designers should consider how other types of mappings could work more effectively than, or work adjacent to, the use of conversation as the overriding metaphor for VUIs.

Norman also explores the concept of designing for “information in the head” versus “information in the world”[37] as users of VUIs access information and complete tasks using audible means, they are required to keep relevant information in their heads, unless they use a VUI that is connected to a screen-based GUI. Humans naturally rely on the informational prostheses of screen displays, written notes, printed text, and other visible mechanisms to keep track of information they are absorbing through audible means. The limitations of working memory keep most of us from being able to juggle multiple verbal cues and data points in our brains. Well-designed VUIs must allow for these limitations, in terms of verbal pacing, ordering of information, or by providing a screen with appropriate information at the appropriate moment, so that users can complete more complex tasks with ease and confidence.

How might we reframe and reconsider the design of these devices?

When we engage with a screen, there is a natural framing of our focus, an obvious sense of inside and outside, an interaction that offers a defined scope for our attention. We experience a sense of physicality with GUI-bound information — the trash is located here, these files are held in this folder, we can touch the screen (and sometimes affect its functionality by doing this). Interacting with a Voice User Interface is far more ephemeral. You perceive the interaction using only your ears, and there is no physical evidence left behind. Listening is a completely different experience and skill from seeing. When we actively listen our bodies and minds are engaged in a specific way; we may close our eyes to help us focus, we may lean in, we may even hold our breath. Conversation is natural to humans; it is, after all, the way most of us communicate with other humans every day. But is this the best framework for VUIs? What do we miss when human conversation is the dominant metaphor used to design these interfaces? While GUI designers have created frameworks and systems that provide users with multiple avenues to utilize to guide their interactions, VUI designers have focused on conversation as the primary mechanism for engagement, which can feel static, confusing, and disorienting.

What is the experience of interacting with an interface facilitated by a system such as Alexa? We ask questions, bark orders, shout repeated commands, and may even clip our phrases to resemble keywords we would use in a Google search. We may or may not be polite. The ways we relate to the interface are predicated on how it is presented to us through marketing: as “the help,” an assistant we have a formal relationship with, as furniture that can control our playlists or a given thermostat, or as a quirky audible version of other quotidian interfaces.

But what if these experiences were framed in a different way? What if a VUI interface was presented to users as a wizard, an oracle, a mystic, a caregiver, a coach? How might designers utilize sounds of all types to annotate interactions, to signal different kinds of information, to present users with richer audible data to build new kinds of meaningful design patterns? How might we think about the use of VUI in more social or familial situations? What would it be like if multiple people could communicate with these devices together, echoing the flow of group conversations and other social interactions? Rather than try to recreate our typical interactions with GUI by simply accomplishing them as a result of voice commands, which forces a focus on conversation as the defining metaphor to guide this kind of interaction, we must begin by examining how people process sound information. This means we must more closely analyze how people think out loud, and how they want and need their environments and spaces to sound and respond to their often widely varied vocal inputs.

Design educators, researchers, and practitioners, especially those working in interaction and user experience design, need to embrace broadly informed examinations regarding the use of VUI, as well as research approaches and methods that have the potential to make it a more effective means of facilitating satisfying interactions between humans and computing systems. If we don’t accept this challenge, we risk having VUI implemented by those that might favor marketing-driven features over ones that are more human-centered, or who fail to grasp how much the daily lives of users are or could be adversely affected by the use of this technology. [38] The use of voice interaction is increasingly woven into the objects, devices, systems, and services that humans use to conduct business and engage in leisure activities. [39] Voice integration has the potential to create multi-modal and multi-sensory interactions, engaging humans in rich experiences, allowing for new kinds of accessibility. It is imperative that designers are a part of the creation and implementation of these interfaces, and that we are there to ask human-centered questions, to offer critical feedback, and to help frame and shape a new conversation around VUI.

The use of VUI, with and without accompanying visual elements and systems, will continue to be an important aspect of interface design, and the user-centered, decision-making processes that guide it. We need human- and life-centered designers to be at the forefront of the creation of these interfaces as the norms and patterns for their use and implementation are being established. As the immediate past illuminates, when these norms are left to engineers, computer scientists, and other types of “machine-first” professionals, or to those tasked with marketing and selling these interfaces, user experiences are often overloaded with features and options that don’t account for what users actually want and need. The ultimate aim of this paper is to spark curiosity about VUI among the visual communication design and user experience design communities, to highlight its increasing prevalence in the day-to-day lives of broad cross sections of people around the world, and to serve as a call to action for design practitioners, researchers, and educators to become more closely involved in the planning, development, design and implementation of these types of interfaces.

References

2001: A Space Odyssey (Beverly Hills, California, U.S.A. & London, U.K.: Metro-Goldwyn-Mayer, 1968).
Author(s) Unknown. “Home Invasion: Google, Amazon Patent Filings Reveal Digital Home Assistant Privacy Problems.” Consumer Watchdog, October 2017. Online. Available at: www.consumerwatchdog.org (Consumer Watchdog, December 2017), https://www.consumerwatchdog.org/report/home-invasion-google-amazon-patent-filings-reveal-digital-home-assistant-privacy-problems. (Accessed June 1, 2022).
Author(s) Unknown. “Home Invasion: Google, Amazon Patent Filings Reveal Digital Home Assistant Privacy Problems.” Consumer Watchdog, October 2017. Online. Available at: www.consumerwatchdog.org (Consumer Watchdog, December 2017), https://www.consumerwatchdog.org/report/home-invasion-google-amazon-patent-filings-reveal-digital-home-assistant-privacy-problems. (Accessed June 1, 2022).
Auxier, B. “5 Things to Know about Americans and Their Smart Speakers,” Pew Research Center (Pew Research Center, November 21, 2019). Online. Available at: https://www.pewresearch.org/fact-tank/2019/11/21/5-things-to-know-about-americans-and-their-smart-speakers/. (Accessed May 31, 2022).
Baddeley, A. Working Memory, Thought, and Action. Oxford, U.K.: Oxford Up, 2010. Pgs. 8–9.
Budiu, R. & Laubheimer, P. “Intelligent Assistants Have Poor Usability: A User Study of Alexa, Google Assistant, and Siri,” Nielsen Norman Group, 2019. Online. Available at: https://www.nngroup.com/articles/intelligent-assistant-usability/. (Accessed June 6, 2022).
Chouhan, C. et al., “Co-Designing for Community Oversight,” Proceedings of the ACM on Human-Computer Interaction 3, no. CSCW (November 7, 2019): pgs. 1–31. Online. Available at: https://doi.org/10.1145/3359248. (Accessed June 5, 2022).
Corcoran, M. “People with Speech Disabilities Are Being Left out of the Voice-Assistant Revolution,” Slate Magazine, 16 October, 2018. Online. Available at: https://slate.com/technology/2018/10/voice-assistants-alexa-siri-speech-disabilities-recognition.html. (Accessed June 6, 2022).
Crawford, K. & Joler, V. “Anatomy of an AI System: The Amazon Echo As An Anatomical Map of Human Labor, Data and Planetary Resources,” AI Now Institute and Share Lab, 7 September 2018). Online. Available at: https://anatomyof.ai. (Accessed June 6, 2022).
Dasgupta, R. Voice User Interface Design Moving from GUI to Mixed Modal Interaction. Berkeley, California, U.S.A.: Apress, 2018. P. 3.
Fowler, G. “Perspective | Alexa Has Been Eavesdropping on You This Whole Time,” The Washington Post, May 6, 2019. Online. Available at: https://www.washingtonpost.com/technology/2019/05/06/alexa-has-been-eavesdropping-you-this-whole-time/. (Accessed May 31, 2022).
Haney, J., Acar, Y., & Furman, S. “It’s the Company, the Government, You and I: User Perceptions of Responsibility for Smart Home Privacy and Security,” in 30th {USENIX} Security Symposium ({USENIX} Security 21). ({USENIX} Association, 2021), pgs. 411-428. Online. Available at: https://www.usenix.org/conference/usenixsecurity21/presentation/haney. (Accessed May 31, 2022).
Her (Los Angeles, California, U.S.A.: Annapurna Pictures United States, 2013).
Hoy, M.B. “Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants,” Medical Reference Services Quarterly 37, no. 1 (January 2, 2018): pgs. 81–88. Online. Available at: https://doi.org/10.1080/02763869.2018.1404391. (Accessed June 4, 2022).
Kropczynski, J. et al. “Towards Building Community Collective Efficacy for Managing Digital Privacy and Security within Older Adult Communities,” Proceedings of the ACM on Human-Computer Interaction 4, no. CSCW3 (January 5, 2021): pgs. 1–27. Online. Available at: https://doi.org/10.1145/3432954. (Accessed June 5, 2022).
Lau, A. “Life Centered Design – a Paradigm for Engineering in the 21 St Century,” Proceedings of the 2004 American Society for Engineering Education Annual Conference & Exposition, June 20, 2004. Online. Available at: https://doi.org/10.18260/1-2—13851. (Accessed June 6, 2022).
Maheshwari, S. “Hey, Alexa, What Can You Hear? And What Will You Do with It?,” The New York Times, March 31, 2018. Online. Available at: https://www.nytimes.com/2018/03/31/business/media/amazon-google-privacy-digital-assistants.html. (Accessed June 1, 2022).
Mohebbi, A. “Why Conversational Interfaces Are Taking Us back to the Dark Ages of Usability,” UXCollective: Medium, 22 December, 2019. Online. Available at: https://uxdesign.cc/why-conversational-interfaces-are-taking-us-back-to-the-dark-ages-of-usability-fa45fefb446b. (Accessed June 6, 2022).
Noble, S.U. Algorithms of Oppression: How Search Engines Reinforce Racism. New York, New York, U.S.A.: New York University Press, 2018.
Norman, D.A. The Design of Everyday Things. New York, New York, U.S.A.: Basic Books, 2002. Pgs. 54–55.
O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (New York, New York, U.S.A.: Broadway Books, 2017.
Perez, C.C. Invisible Women: Data Bias in a World Designed for Men. New York, New York, U.S.A.: Harry N. Abrams, 2020.
Pfeifle, A. “Alexa, What Should We Do about Privacy? Protecting Privacy for Users of Voice-Activated Devices,” Washington Law Review 93, no. 1 (2018): pgs. 421–58. Online. Available at: https://digitalcommons.law.uw.edu/wlr/vol93/iss1/9/. (Accessed June 5, 2022).
Schnelle-Walke, D. & Lyardet, F. “Voice User Interface Design Patterns.” Paper presented at Eleventh EuroPLoP (Eleventh European Conference on Pattern Languages of Programs), Irsee, Germany, July 5-9, 2006. Online. Available at: https://www.researchgate.net/publication/221034540_Voice_User_Interface_Design_Patterns. (Accessed June 6, 2022).
Spatz, J. “Talking to Computers” (Masters’ Thesis, 2018). Online. Available at: https://digitalcommons.risd.edu/masterstheses/239. (Accessed June 6, 2022).
Stephens-Davidowitz, Seth, and Steven Pinker. Everybody Lies: Big Data, New Data, and What the Internet Reveals about Who We Really Are. New York: Dey St./William Morrow, 2018.
Sugawara, Takeshi, Benjamin Cyr, Sara Rampazzi, Daniel Genkin, and Kevin Fu. “Light Commands: Laser-Based Audio Injection Attacks on Voice-Controllable Systems.” In 29th {USENIX} Security Symposium, 2631–48. USENIX, 2020.
Vlahos, J. Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think. Boston, Massachusetts, U.S.A.: Houghton Mifflin Harcourt, 2019: p. 7.
Voiceitt. “Home,” n.d. https://voiceitt.com/.
Winner, L. The Whale and the Reactor: A Search for Limits in an Age of High Technology (revised edition). Chicago, Illinois, U.S.A.: University of Chicago Press, 1986 (revised 2020): pgs. 6-15.
Yeasmin, Farida. “Privacy Analysis of Voice User Interfaces.” Master of Science Thesis, 2020. https://trepo.tuni.fi/handle/10024/120072.
Zhang, Guoming, Chen Yan, Xiaoyu Ji, Tianchen Zhang, Taimin Zhang, and Wenyuan Xu. “DolphinAttack.” Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security - CCS ’17, 2017. https://doi.org/10.1145/3133956.3134052.

Biography

Liese Zahabi is an Assistant Professor of Design at the University of New Hampshire. Previous to her current appointment, she fulfilled a faculty position in Graphic and Interaction Design at the University of Maryland in College Park, Maryland, U.S.A. and, prior to that, she held a position in the Visual Communication Design program at Weber State University in Ogden, Utah, U.S.A. She earned a Master of Graphic Design degree from North Carolina State University, and a Bachelor of Fine Arts degree with a concentration in graphic design from Eastern Michigan University. She has been working as a professional graphic designer since 2000, and teaches courses in graphic design, interaction design, motion design and animation, as well as typography, game design, user experience design, and design research. Professor Zahabi’s academic research focuses on search as a cognitive and cultural process, how the design of interfaces can change the experience of digital search tasks, how the nature of searching manifests itself in visual patterns and sense-making, how the digital record influences memory and our understanding of history, and how language and image intersect within the context of the Internet.

a
Invisible labor refers to work that is unpaid and that, most of the time, goes unnoticed, unacknowledged, and that, for these reasons, tends to be unregulated.

Budiu, R. & Laubheimer, P. “Intelligent Assistants Have Poor Usability: A User Study of Alexa, Google Assistant, and Siri,” Nielsen Norman Group, 2019. Online. Available at: https://www.nngroup.com/articles/intelligent-assistant-usability/. (Accessed May 31, 2022).
2001: A Space Odyssey (Beverly Hills, California, U.S.A. & London, U.K.: Metro-Goldwyn-Mayer, 1968).
Her (Los Angeles, California, U.S.A.: Annapurna Pictures United States, 2013).
Auxier, B. “5 Things to Know about Americans and Their Smart Speakers,” Pew Research Center (Pew Research Center, November 21, 2019). Online. Available at: https://www.pewresearch.org/fact-tank/2019/11/21/5-things-to-know-about-americans-and-their-smart-speakers/. (Accessed May 31, 2022).
Ibid.
Dasgupta, R. Voice User Interface Design Moving from GUI to Mixed Modal Interaction. Berkeley, California, U.S.A.: Apress, 2018. P. 3.
Dasgupta, R. Voice User Interface Design Moving from GUI to Mixed Modal Interaction. Berkeley, California, U.S.A.: Apress, 2018. P.3.
Hoy, M.B. “Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants,” Medical Reference Services Quarterly 37, no. 1 (January 2, 2018): pgs. 81–88. Online. Available at: https://doi.org/10.1080/02763869.2018.1404391, p. 82. (Accessed June 4, 2022).
Hoy, M.B. “Alexa, Siri, Cortana, and More: An Introduction to Voice Assistants,” Medical Reference Services Quarterly 37, no. 1 (January 2, 2018): pgs. 81–88, https://doi.org/10.1080/02763869.2018.1404391, 82. (Accessed June 4, 2022).
Fowler, G. “Perspective | Alexa Has Been Eavesdropping on You This Whole Time,” The Washington Post, May 6, 2019. Online. Available at: https://www.washingtonpost.com/technology/2019/05/06/alexa-has-been-eavesdropping-you-this-whole-time/. (Accessed May 31, 2022).
Author(s) Unknown. “Home Invasion: Google, Amazon Patent Filings Reveal Digital Home Assistant Privacy Problems.” Consumer Watchdog, October 2017. Online. Available at: www.consumerwatchdog.org (Consumer Watchdog, December 2017), https://www.consumerwatchdog.org/report/home-invasion-google-amazon-patent-filings-reveal-digital-home-assistant-privacy-problems. (Accessed June 1, 2022).
Maheshwari, S. “Hey, Alexa, What Can You Hear? And What Will You Do with It?,” The New York Times, March 31, 2018. Online. Available at: https://www.nytimes.com/2018/03/31/business/media/amazon-google-privacy-digital-assistants.html. (Accessed June 1, 2022).
Chouhan, C. et al., “Co-Designing for Community Oversight,” Proceedings of the ACM on Human-Computer Interaction 3, no. CSCW (November 7, 2019): pgs. 1–31. Online. Available at: https://doi.org/10.1145/3359248. (Accessed June 5, 2022).
Kropczynski, J. et al. “Towards Building Community Collective Efficacy for Managing Digital Privacy and Security within Older Adult Communities,” Proceedings of the ACM on Human-Computer Interaction 4, no. CSCW3 (January 5, 2021): pgs. 1–27. Online. Available at: https://doi.org/10.1145/3432954. (Accessed June 5, 2022).
Haney, J., Acar, Y., & Furman, S. “It’s the Company, the Government, You and I: User Perceptions of Responsibility for Smart Home Privacy and Security,” in 30th {USENIX} Security Symposium ({USENIX} Security 21). ({USENIX} Association, 2021), pgs. 411-428. Online. Available at: https://www.usenix.org/conference/usenixsecurity21/presentation/haney. (Accessed May 31, 2022).
Sugawara, T., et al. “Light Commands: Laser-Based Audio Injection Attacks on Voice-Controllable Systems,” in 29th {USENIX} Security Symposium (29th {USENIX} Security Symposium, USENIX, 2020), pgs. 2631–48.
Zhang, G. et al. “DolphinAttack,” Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security - CCS ’17, 2017. Online. Available at: https://doi.org/10.1145/3133956.3134052. (Accessed May 31, 2022).
Pfeifle, A. “Alexa, What Should We Do about Privacy? Protecting Privacy for Users of Voice-Activated Devices,” Washington Law Review 93, no. 1 (2018): pgs. 421–58. Online. Available at: https://digitalcommons.law.uw.edu/wlr/vol93/iss1/9/. (Accessed June 5, 2022).
Crawford, K. & Joler, V. “Anatomy of an AI System: The Amazon Echo As An Anatomical Map of Human Labor, Data and Planetary Resources,” AI Now Institute and Share Lab, 7 September 2018). Online. Available at: https://anatomyof.ai. (Accessed June 6, 2022).
Crawford, K. & Joler, V. “Anatomy of an AI System,” Anatomy of an AI System, 2018. Online. Available at: https://anatomyof.ai/. (Accessed June 6, 2022).
Lau, A. “Life Centered Design – a Paradigm for Engineering in the 21 St Century,” Proceedings of the 2004 American Society for Engineering Education Annual Conference & Exposition, June 20, 2004. Online. Available at: https://doi.org/10.18260/1-2—13851. (Accessed June 6, 2022).
Corcoran, M. “People with Speech Disabilities Are Being Left out of the Voice-Assistant Revolution,” Slate Magazine, 16 October, 2018. Online. Available at: https://slate.com/technology/2018/10/voice-assistants-alexa-siri-speech-disabilities-recognition.html. (Accessed June 6, 2022).
Ibid.
“Home,” Voiceitt, n.d. Online. Available at: https://voiceitt.com/. (Accessed June 6, 2022).
Noble, S.U. Algorithms of Oppression: How Search Engines Reinforce Racism. New York, New York, U.S.A.: New York University Press, 2018.
Perez, C.C. Invisible Women: Data Bias in a World Designed for Men. New York, New York, U.S.A.: Harry N. Abrams, 2020.
O’Neil, C. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy (New York, New York, U.S.A.: Broadway Books, 2017.
Stephens-Davidowitz, S. & Pinker, S. Everybody Lies: Big Data, New Data, and What the Internet Reveals about Who We Really Are. New York, New York, U.S.A.: Dey St./William Morrow, 2018.
Mohebbi, A. “Why Conversational Interfaces Are Taking Us back to the Dark Ages of Usability,” UXCollective (Medium, December 22, 2019). Online. Available at: https://uxdesign.cc/why-conversational-interfaces-are-taking-us-back-to-the-dark-ages-of-usability-fa45fefb446b.
Budiu, R. & Laubheimer, P. “Intelligent Assistants Have Poor Usability: A User Study of Alexa, Google Assistant, and Siri,” Nielsen Norman Group, 2019. Online. Available at: https://www.nngroup.com/articles/intelligent-assistant-usability/. (Accessed June 6, 2022).
Mohebbi, A. “Why Conversational Interfaces Are Taking Us back to the Dark Ages of Usability,” UXCollective: Medium, 22 December, 2019. Online. Available at: https://uxdesign.cc/why-conversational-interfaces-are-taking-us-back-to-the-dark-ages-of-usability-fa45fefb446b. (Accessed June 6, 2022).
Spatz, J. “Talking to Computers” (Masters Thesis, 2018). Online. Available at: https://digitalcommons.risd.edu/masterstheses/239. (Accessed June 6, 2022).
Schnelle-Walke, D. & Lyardet, F. “Voice User Interface Design Patterns.” Paper presented at Eleventh EuroPLoP (Eleventh European Conference on Pattern Languages of Programs), Irsee, Germany, July 5-9, 2006. Online. Available at: https://www.researchgate.net/publication/221034540_Voice_User_Interface_Design_Patterns. (Accessed June 6, 2022).
Baddeley, A. Working Memory, Thought, and Action. Oxford, U.K.: Oxford Up, 2010. Pgs. 8–9.
Norman, D.A. The Design of Everyday Things. New York, New York, U.S.A.: Basic Books, 2002. Pgs. 54–55.
Spatz, J. “Talking to Computers” (Masters’ Thesis, 2018). Online. Available at: https://digitalcommons.risd.edu/masterstheses/239. (Accessed June 6, 2022).
Norman, D.A. The Design of Everyday Things (New York, New York, U.S.A.: Basic Books, 2002). Pgs. 54–55.
Winner, L. The Whale and the Reactor: A Search for Limits in an Age of High Technology (revised edition). Chicago, Illinois, U.S.A.: University of Chicago Press, 1986 (revised 2020): pgs. 6-15.
Vlahos, J. Talk to Me: How Voice Computing Will Transform the Way We Live, Work, and Think. Boston, Massachusetts, U.S.A.: Houghton Mifflin Harcourt, 2019: p. 7.

Top of page