In Chapter 1, we briefly talked about how the greatest challenge after prioritization is that teams often do not have experience with accessibility and assistive technologies.
In this chapter, we will walk-through concrete examples of user needs that apply to mobile and web interfaces for different subsets of disabilities. In each case, we will also uncover growth opportunities made possible by accounting for these needs.
User Needs by Type of Disability and Examples of Assistive Technologies
In this section, we will walk-through concrete examples of user needs that map roughly to Web Content Accessibility Guidelines (WCAG) level A and AA. We briefly covered the WCAG in Chapter 1. WCAG is an ever-evolving comprehensive set of guidelines best referenced at w3c.org. We will not cover each guideline of the WCAG here. Instead, we will discuss different types of disabilities and how to think about basic user needs for each. Once you have a foundational understanding of user needs broken down by type of disability, you will be much better equipped to address more complex user needs, for example, for people with overlapping disabilities.
People with Visual Impairments
Visual impairments encompass blindness, partial vision loss, color blindness, and light sensitivity, among other conditions. In the United States alone, 12 million people over the age of 40 have a visual impairment, including 1 million who are blind.1 There are also millions of people who prefer to read larger or smaller text, use alternative color themes like Window High Contrast Mode or dark mode, or voice commands.
The most common type of assistive technology used by people with complete or partial blindness is screen readers. Screen readers output the contents on a digital device as speech or braille. The first screen reader was developed in 1986 at IBM.2 Today, most devices come preloaded with a free screen reader. Some of the most popular screen readers are NVDA,3 JAWS,4 VoiceOver (Apple products), and TalkBack (Android).
About 8% of males and 0.5% of females suffer from a condition called “red-green” color vision deficiency,5 which makes the perception of red or green difficult. For people with color blindness or light sensitivity, settings such as dark mode, brightness, or color inversion are other forms of assistive technology.
Below are a few basic guidelines that apply primarily to making products accessible for people with visual impairments:
Alternative Text
In all the preceding three code samples, we used the English description. In applications that serve a global audience, we would replace the English text with a variable that assigns localized text according to a user’s location and language preferences.
Adding a label is not enough to make the experience usable. It should describe what the component is, if it is in a particular state (if the element represents a state of the application), and what action a user can take (if applicable). For example, if a checkbox or toggle component is selected or unselected, the state should be read with the description. For a volume slider on an audio component, the percentage or value of where the user is should be read with the description.
What about user-generated content?
- 1.
Encouraging users to add alternative text at the upload stage or later
- 2.
Use image recognition to generate captions/alternate text for visual assets (Figure 3-5).
Image Processing and Machine Learning in Action
If the images are baked into the application, adding alternative text is pretty straightforward. If the content is user generated, for example on Facebook or Instagram, where users are posting their own content, the product can use image recognition to provide labels for images that don’t have captions as a fallback to (or to enhance) captions supplied by the user uploading.
In 2016, Facebook (now Meta) released a feature for users with visual impairments that uses neural networks to recognize faces and objects in images. In 2017, the company launched a feature for everyone which automatically tags people in photos using facial recognition. This technology detects people, objects, scenes, actions, places of interest, and whether an image or video contains objectionable content.7 With over 250 billion photos on the platform,8 you can imagine the enormous amounts of data such a feature generates, as well as its endless applications in search, content moderation, and ads personalization while providing a richer, more powerful experience for users with visual impairments.
Meaningful Sequence, Grouping, and Hierarchy
Screen readers allow navigation by headings, links, paragraphs, and other structural attributes. In order for them to work as expected, these elements must be programmatically identifiable, and in a focus order that is consistent with the intent of the content.
In terms of hierarchy, the page title is first, followed by the section headings “Shop Sale Items” and “Browse by Department,” followed by department cards such as “Office,” “Living,” “Dining,” and “Bath,” and finally the individual items under the sale tab.
Not convinced? No worries. Search engines use text, headings,9 titles, alternative text, and page structure to determine search ranking. If your site isn’t taking meaningful hierarchy and headings into account, you won’t be getting many visitors anyway.
Dynamic Sizing
Users can adjust the size of text and other components such as images. The WCAG guideline for text resizing allows up to a 200% increase in size without the use of assistive technology. This guideline also helps people with motor disabilities such as tremors, who might have trouble reliably tapping a small target area.
If the viewports are not restricted in both directions, that is, horizontally and vertically, such that overflowing content wraps in one direction, most dynamic sizing issues can be resolved. Restricting the flow of content to either vertical or horizontal scrolling (also known as reflow10) makes it easier for people with vision loss (who need enlarged text) and with motor impairments to track and easily read the content.
Colors Alone Are Not Used to Convey Meaning
Content and instructions if conveyed through color, shape, size, or other solely visual means should have text equivalents or another marker.
Links and Other Actions That Lead to Context Change Have Clear Descriptions
An Auto-playing Video or Audio Should Not Interfere with the Screen Reader
This guideline requires that you provide user controls for auto-playing files if the content is played for more than three seconds since the sound from the file can interfere with the screen reader.13
Synchronized Video Content Has Audio Descriptions
If there are movements or context changes in a movie scene while there is no sound, audio descriptions are the only way for blind users to follow along with their sighted counterparts.
Beyond Compliance
Aside from the benefits to people with permanent visual impairments, here are a few other use cases and opportunities unlocked by the above guidelines:
Defaulting to no volume regardless of video duration and letting the user decide whether a file should be played with the sound on is a better experience for everyone. Overall, auto-playing videos are quickly going out of style as people are realizing their accessibility pitfalls, as well as their adverse business repercussions.14 Auto-play also increases page load times, adds cognitive load, and introduces unnecessary data usage.
If an application supports voice interaction and audio descriptions, it is safer and more accessible while driving. The same goes for use of voice assistants such as home speakers without screens.
People might just prefer larger or smaller text sizes, regardless of disability. We have all at some point, zoomed into a block of text on our phones or changed the font size on our Kindle devices.15 Some people prefer smaller text sizes so they can fit more content on a given screen, especially on mobile devices.
Accounting for dynamic views depending on magnification also allows for expansion to other languages that might require different screen space compared to the default.
Additionally, descriptions for images and audio descriptions for videos make content more discoverable given that search engines use alt text along with other metadata to rank and show images in search results.16
People with Hearing Impairment
Hearing impairment, similar to visual impairment, could mean complete loss of hearing, unilateral loss (in one ear), or partial deafness. In the United States alone, over 3017 million people have hearing impairments, with over 11 million people who are deaf or have serious hearing limitations.18 This excludes millions of people with partial hearing loss or less severe conditions who would benefit from having access to a different modality than sound.
Assistive technology used by people with hearing loss includes hearing aids, captions, and transcripts. Things to keep in mind for making products more accessible to this cohort of users are as follows.
Audio and Video Content Include Captions
Note Audio descriptions (covered in the section on visual impairments), closed captions, and subtitles are sometimes used interchangeably, even though they are meant for different use cases. Audio descriptions include (in addition to the original soundtrack), a description of what is happening visually.19 Closed captions assume that the user cannot hear, and include background sounds along with speaker changes and text. Subtitles assume that the user is hearing and only include spoken dialogue. A transcript is a text file with the spoken dialogue in the entire video or audio file, but not part of the video/audio file. It may or may not have speaker labels, time stamps, and audio descriptions.
The Telecommunications Act of 1996 in the United States requires broadcasters, cable companies, and satellite television service providers to provide closed captioning for 100% of all new, nonexempt, English-language video programming.20
Captions or transcripts can be manually generated, for instance, by creating a WebVTT file21 or with ASR (Automatic Speech Recognition). The trade-offs between manual and ASR options are price, speed (turnaround time), and most importantly accuracy. Depending on the quality of the audio file (background noise, number of speakers, recording equipment, etc.), the accuracy of ASR captions can drop significantly.22 The accuracy of speech-to-text systems is typically measured by WER (Word Error Rate), that is, the percentage of transcription errors a system makes. A word error rate of 5-10% is considered good.23 Acceptable error tolerance will of course depend on the application and the brand using ASR models.
Apart from availability of captions, giving users the option to adjust the size of captions ensures that people with overlapping hearing impairments and low vision can access the content. If the design and implementation account for different text size preferences, presenting captions in other languages that might take more screen space comes at no additional cost.
Sound Cues Alone Are Not Used to Convey Meaning
Visual or text alternatives for sound-based information should be present. For example, while playing an online game, if a user enters an erroneous command, a beep lets the user know they made an error. In this case, the game should also present a visual or text cue. On phones, haptic feedback is sometimes used to reinforce error messaging.
Beyond Compliance
The use cases for captions go well beyond accessibility. One video publisher with over 150 million users on Facebook reported that 85% of videos were watched without sound.24 Captions also allow for search, intelligent segmentation, translation (and therefore broader reach), and smart features on top of media files that would not otherwise be possible. Potential use cases of transcripts and captions are discussed in further detail in the case study below.
Case Study: Transcripts and Captions
A few years ago, when I worked at Yahoo Finance, our team collaborated with the accessibility team on a project called “Live tickers.” Closed captions on all prerecorded financial markets news already met the compliance requirement. We decided to go one step further.
In this project, we combined captions, machine learning, and front-end design to show the live stock price of companies that a reporter was talking about in real time as part of the video experience.
News providers in the financial markets sector already have this feature as you might have seen on tv, but it requires someone to manually identify and post this information. The automated solution is far more scalable and customizable. For example, it can pull in stock prices from when the news came out, and show its current price for comparison if the video is being watched later.
This was an example of assistive technology powering a richer experience for everyone. Another benefit of transcripts and captions is SEO or Search Engine Optimization. Discovery Digital Networks (DDN) performed an experiment on their YouTube channel, comparing videos with and without closed captions. They found that captioned videos enjoyed 7.32% more views on average.25 Captions and transcripts also give way to creating subtitles that can open up content to entirely new demographics.
Currently, the uploader decides where these breakpoints are. Captions combined with natural language processing might enable this feature by default, without manual segmentation, and even personalize sections for individual watchers.
Another amazing feature powered by live captions is live transcription and auto-generated meeting notes so people can focus on communication during meetings.
People with Cognitive Impairment
After motor impairments, this is the second most common form of disability. According to the CDC, over 16 million people in the United States live with cognitive impairments. Some examples of cognitive impairment are autism, attention deficit, dyslexia, dyscalculia, and memory loss.
There is a lot of overlap between guidelines for cognitive impairments and other types of disabilities. At the same time, some guidelines can present conflicts, that is, they help one group of users while making an experience more challenging for another group.
One example of this is closed captions on videos. While they help users with hearing loss, some users with ADHD may have difficulty focusing on the video while there is text on the screen.26 For users with dyslexia who have difficulty reading, additional text on the screen might also cause anxiety. In Chapter 5, we will talk about personalization so users have the ability to choose fonts and other parts of the experience they feel most comfortable with.
Content Is Organized, Digestible, and Consistent
Adding features to an application is the easiest thing for teams to do. The less intuitive thing to do is to constantly evaluate the cognitive complexity of functionality on a given app or screen. This not only makes applications more accessible for people with cognitive impairments but also more intuitive for everyone. Especially on mobile, where the number of UI components and actions a user can take on a given screen is limited to begin with.
Timeouts or Limits on Interactions Are Adjustable
If an interaction requires to be completed within a certain time frame, a user is given enough warning (20 seconds according to WCAG guidelines) before the expiration, or the ability to stop or adjust the timing. This especially applies to forms and other input activities where users can lose data if they are unable to complete a time-based task. These controls give users with reading or learning disabilities, as well as those with less experience with technology products sufficient time to complete tasks.
The exception to this rule is situations where real-time interaction is required, for example in real-time auctions.
Animations, Complex Language, and Auto-updating Content Can Be Turned Off or Paused
For mobile developers, using the operating system’s standard APIs for animations is the easiest way to ensure that user preferences on disabling or slow animation are honored.
Instructions and Errors on Forms Are Presented as Text, and in Context
For example, if a user needs to input a date in MM-DD-YYYY format, that is part of the instructions next to the field. The user is then less likely to fill an entire form, and at the submission stage come back to a list of errors that could have been avoided with clear instructions.
Below we have two sets of images that show common pitfalls in online forms and side-by-side comparisons with remediation for those pitfalls.
Next to the first image is an example with the error message appearing below the field in question, which updates dynamically as the user enters new information, instead of waiting for form submission.
Captcha and Other Authentication Methods Have Alternatives
This overlaps with user needs for people with visual impairments. Audio Captcha, two-factor authentication, or alternate ways of verifying identity as a human must be available so users are not obstructed from access to their online accounts without compromising security.28 The first iteration called reCAPTCHA was rated among the most inaccessible components on the web. It now supports audio alternatives.29 The W3C releases a detailed report on the accessibility challenges and proposed solutions on this topic, which includes third-party authenticators such as password managers to noninteractively authenticate humanity.30
Focus Indicator Is Visible
This coincides with user needs for people with low vision and motor impairments who are using screen readers and keyboard navigation. A focus indicator helps the user know the component (a button, text, etc.) on the screen they are interacting with at all times. Browsers and mobile operating systems have built-in focus indicators that meet WCAG requirements if standardized UI components are used. The requirements center around two aspects – size and color contrast so that the item in focus is distinguishable enough from items not in focus. The current guideline for the WCAG AA standard is to have a 3:1 contrast ratio between the focused and unfocused state, between the component and its adjacent UI elements, as well as a one-pixel outline or four-pixel shape around it.31
It is usually best not to tamper with default focus indicators but if you are using custom views or focus indicators, make sure to review these requirements. This can be particularly challenging with dynamic or user-generated content, where for example, the colors of uploaded images are unknown. One way to tackle this is to leave some space between the UI component and the focus indicator, so you are working with a known background color.
Beyond Compliance
Making content easily readable, less chaotic, and more understandable is simply good design and engineering practice. Simplifying user flows, and removing features is not only good for engagement and reach (creating a smaller application or web page size), but also better for teams to maintain.
People with Speech Impairment
Voice as a mode of interaction with our devices is growing significantly with developments in the IoT (Internet of Things) space with devices such as voice assistants. According to Google, 27% of the global population is already using voice search32 on mobile. Voice-activated applications are vastly more accessible for people with visual and fine-motor impairments.
We need to make sure that we don’t leave out other subsets of the population, such as people with speech and hearing impairments.
According to NIH33 data, about 7.5 million people in the United States have trouble using their voice. For applications that move toward a voice-first approach, it will be important to still maintain alternate modalities of interaction that are visual or text-based.
Another reason to consider alternate modalities is that even within the voice domain, there is so much diversity in speech patterns and accents, even among able-bodied users, that providing alternatives is the only way to ensure that all users have access. ASR (Automatic Speech Recognition) is as good as 95% for native English speakers but significantly lower for non-native speakers. For less common speech patterns, accents or languages, that number can dramatically decrease. This is mainly because the data used to train the machine learning models for these situations is limited. Siri, the voice assistant on Apple devices famously had trouble understanding accents34 when it first launched. Most voice assistants, including Siri35 now allow users to pick their accents and languages, and train the assistant to understand unique speech patterns.
Primarily Voice Input Applications Provide Alternative Ways of Interaction
The voice use case is not usually covered in guidelines due to its recent popularity and the fact that most actions that can be accomplished with voice also have alternate equivalents (keyboard or touch through visual interfaces).
Google’s project Euphonia36 is a great example of an artificial intelligence application in making speech technology more accessible to people with conditions such as ALS (amyotrophic lateral sclerosis) and Down syndrome. The technology trains on people’s natural intonations and speech patterns during the early stages and helps them communicate with their own voice after they lose speech. It has also been used to trigger smart home devices using nonspeech sounds, as well as to cheer during a sports game using facial gestures on one user with ALS.37 According to a 2021 study,38 the technology is already outperforming human transcribers, especially for the most severely affected people.
Beyond Compliance
Text alternatives to voice input can also provide a fallback for when the voice application doesn’t have sufficient data to understand uncommon accents and speech patterns.
Voice command features on mobile sometimes rely on accessibility labels, demonstrating yet another case where solving for one use case opens the door to several others.
People with Mobility Impairment
17% of the US population suffers from some form of mobility impairment.39 Examples include muscular dystrophy, multiple sclerosis, ALS, Parkinson’s disease, and essential tremor.
These conditions can make using touch-screen interfaces, keyboards, and mice difficult. Assistive technologies for users with dexterity limitations include sip and puff sticks, single-switch access, adaptive keyboards, and eye or head-tracking software. Now let’s talk about guidelines that can make applications inclusive for these users:
All Content Is Accessible Through a Keyboard
This especially applies to mobile and touchscreen devices, with which users attach external keyboards or custom assistive devices. The most rudimentary way to check whether an application is largely keyboard accessible is to pair a keyboard with the device, and navigate through the content (this is done by using the “Tab” key). This also ties in with the guideline on meaningful grouping and focus order we covered in the section on visual impairments and screen readers, where users can navigate by headings instead of tabbing through every element on a page. Additionally, for speech input users, as well as keyboard users who might not be able to type accurately, a user should be able to turn off or remap character key shortcuts.
All actions should be available without requiring timing or individual keystrokes. This is important for users with specialized or adapted input devices such as a head pointer, eye-gaze system, or speech-controlled mouse emulator, which makes certain gestures cumbersome, error-prone, or outright impossible.40 One example of this is a sortable list that requires a user to drag and drop items precisely in the position they want.
While drag and drop is a great interactive way to place elements where a user wants them, developers must also provide a way for users to reorder elements individually. For example, by entering the item’s place in the list, or moving an item up and down, one at a time.
Motion Actuation, Pointer Focus, or Activation Is Reversible
A user is able to remove focus from an element that can receive keyboard focus. The same applies to the activation of functions upon clicking or by certain movements of the device. For example, some apps allow activation of functions when a user shakes the device. This might happen unintentionally for users with motor impairments, or for anyone, and the user should be able to either turn off the functionality or be able to reverse it when it happens. Equally, these applications can’t rely on the idea that a user will be able to shake the device at all, as the device may be mounted to a wheelchair or other assistive device.
Beyond Compliance
Providing keyboard access goes hand in hand with screen-reader access and focus order. Elements that are accessible by keyboard are also accessible to screen readers.
Everyone
Here are the guidelines that apply to everyone, whether or not they use assistive technology:
Avoid Using Jargon or Unnecessarily Complicated Language
There are tests such as the Flesch-Kincaid reading test,41 that you can run through text to ensure that the content isn’t too difficult to read for your target audience. This is particularly important for marketing material or user onboarding messaging.
Adhere to Global Settings
We have touched on this in the previous sections on text sizes, animation settings, color inversion, and dark mode. If users have accessibility settings enabled on their device, developers must honor those settings within the application’s context. Similarly, if the user has chosen an in-app custom accessibility setting, it is a better user experience to persist (save) those settings between sessions, so the user does not have to reconfigure them everytime.
Allow Users to Provide Feedback
Summary
The most common type of assistive technology used by people with complete or partial blindness is screen readers. For people with color blindness or light sensitivity, settings such as dark mode, brightness, or color inversion are other forms of assistive technology.
Screen readers use the alternative text provided by the website or application to read to the user when they focus on a component.
If the content is user generated the product can encourage content creators to add description text as part of the captions section at the upload stage, or use image recognition to provide labels for images that don’t have captions.
Links should also be visually distinguishable from regular text and with text that describes what the link is.
Assistive technology used by people with hearing loss includes hearing aids, captions, and transcripts.
Captions also allow for search, intelligent segmentation, translation (and therefore broader reach), and smart features on top of media files that would not otherwise be possible.
Some guidelines for cognitive impairments can present conflicts. While they help users with hearing loss, some users with ADHD may have difficulty focusing on the video while there is text on the screen.
It is usually best not to tamper with default focus indicators but if you are using custom views or focus indicators, make sure to review WCAG requirements.
Text alternatives to voice input can provide a fallback for when the voice application doesn’t have sufficient data to understand uncommon accents and speech patterns.
Providing keyboard access goes hand in hand with screen-reader access and focus order. Elements that are accessible by keyboard are also accessible to screen readers.
A broader definition of accessibility would include avoiding complicated language, adhering to global settings, and allowing users to provide feedback.