What exactly is the accessibility API?
I notice that a lot of content introducing the accessibility API (Application Programming Interface) only talk about how it might be used by screen readers. I'm guessing this is one of the reasons that people don't ever consider voice control or other assistive technologies when doing accessibility testing, and why people add questionable labels to certain controls.
I wrote this blog post to try and give an introduction to the accessibility API that doesn't only consider screen readers, and introduces what kinds of things a browser considers when generating an accessibility tree for the accessibility API. I hope I can share what I learned over the past year of doing open source work and digging into native accessibility code.
Table of Contents
Foreword
All the content in this blog post will assume the basics of HTML and CSS. I will link to C++ or Python source code that you can explore if you like, but it is by no means required to understand this topic at a high level.
While any browser source code I link to will be biased to Chromium, any browser behavior I mention below should be true for the following major browsers, unless explicitly mentioned otherwise:
- Chromium (Version 116)
- Firefox (Version 116)
- Safari (Version 16.6)
Finally, all links to source code, whether it be to browsers or assistive technology or otherwise, will be locked to a specific revision — their latest revision as of the time of this writing. I do this so future readers won't experience the phenomena where I'm pointing to a line of code and that specific line of code doesn't do what I point out anymore.
What is assistive technology?
Before introducing the concept of an accessibility API, it's important to think about what kind of technology we're trying to support.
Assistive technology is any piece of technology that disabled people use to improve their quality of life. This can include things like:
- Wheelchairs
- Refreshable braille displays
- Magnification software
- Screen readers
- Voice recognition software
For the purpose of this post, let's focus on the last two bullet points in more detail.
What is a screen reader?
Wikipedia gives a decent summary:
A screen reader is a form of assistive technology (AT) that renders text and image content as speech or braille output. Screen readers are essential to people who are blind, and are useful to people who are visually impaired, illiterate, or have a learning disability.
- Wikipedia's Screen Reader Article
There are different popular screen readers depending on the platform you are on.
- JAWS and NVDA on Windows
- VoiceOver on Mac and iOS
- Orca on Linux
- Talkback on Android
- ChromeVox on ChromeOS
Regardless of platform, however, screen readers are able to do things like:
- Announce what control is currently focused
- Announce any relevant state information of the control (Is it checked? Is it expanded?)
along with any other visually important information that a screen reader user should know.
Screen readers are highly customizable. For example, screen reader users can control how fast or slow text is announced, or filter out specific bits of information they might not care about.
What is voice recognition software?
Voice recognition software is a form of assistive technology that allows a user to interact with their machine through voice commands. For example, if there is a link called "See my projects", such as the dummy link below:
a voice recognition user should be able to say something like "Click see my projects", and the software will programmatically click the link. The user can also do things such as dictate text to type into some editable text area, and the voice recognition software will automatically enter that text for the user.
Some examples of voice recognition software:
- Windows Speech Recognition and Dragon Speech Recognition on Windows
- Voice Control on Mac and iOS
- Voice Access on Android
Demo of Dragon by Nuance Communications
What information does assistive technology need to gather?
Suppose that we are developers for screen readers or voice recognition software. What kind of information would we need from any application that wants to support us?
Screen readers would need a way to do the following for any application:
- Programmatically access all user interface (UI) elements.
- Query for the name of a UI element: if focused, what should I announce?
- Query for what kind of UI element something is: Is it a button? Is it a link?
- Query for any state of those UI elements: Is it checked? Is it pressed?
- Programmatically ask the application to activate an element that might not be currently focused: for instance, to programmatically activate a control under a virtual cursor.
Voice recognition software would need a way to do the following for any application:
- Programmatically access all interactive UI elements of an application.
- Query for the name of a UI element: is my user trying to activate this element?
- Programmatically ask the application to activate an element for us: for instance, if the user commands us to activate a button that isn't currently focused.
- Programmatically insert text somewhere in the application if the user starts dictating.
These are very similar asks. We are asking for an API that lets us programmatically read and interact with an application, and that is precisely what an accessibility API is.
What is the accessibility API?
As mentioned above, an accessibility API allows for a consumer to do two main things:
- Programmatically determine what is in the UI
- Programmatically interact with the UI
In the accessibility API, the UI of an application is exposed as something called the accessibility tree, with each node in the tree being some individual unit in the UI. Depending on the UI element a node represents, a consumer can query for the state of a node, or ask to programmatically take some kind of action on a node.
Depending on the operating system, the accessibility API can have different implementation details, but the general idea remains the same. The average web developer does not have to concern themselves with this.
What does an accessibility tree look like?
An accessibility tree is a normal tree data structure that you might explore in your computer science class. For example, in the context of the browser, suppose we have some HTML like the following:
<header>...</header>
<main>
<h1>Hello World</h1>
<img
src="..."
alt="..."
/>
<p>
This is a paragraph.
<span> This is more text. </span>
</p>
</main>
<footer>...</footer>
The browser might generate an accessibility tree similar to the following for this HTML:
-
root
- header
-
main
- heading
- image
- paragraph
- footer
Each of these nodes may have a plethora of accessibility-related information, but we can get to that later. Notice that the accessibility tree also isn't necessarily one-to-one with the DOM tree generated from the HTML. For example, it might not be justified to give that <span>
element its own accessibility node, and so it's just absorbed into the parent paragraph
node instead.
From the above example, let's zoom in to the image node specifically. What kind of information might it have?
image
- Name
- "Some alt text"
- Role
- Image
- Name: The node will need an accessible name if the image is not decorative. This can let screen readers know what to read when encountering this image.
- Role: The node will need to have some kind of attribute marking what type of UI element it is. This lets assistive technologies know what kinds of actions they can take on this node, as well as what kind of information they can query on it. In this case, the node should be an image node.
If we instead take something stateful, like a checkbox:
<label for="check">Stay logged in</label>
<input
type="checkbox"
id="check"
<!-- Any other relevant attributes... -->
/>
The accessibility node for the checkbox might look something like this:
checkbox
- Name
- "Stay logged in"
- Role
- Checkbox
- State
- IsChecked: true
- Name: Once again, the name lets the screen reader know what to announce when the checkbox is focused, and lets voice recognition software know what to click when the user says this checkbox's name.
- Role: Because of the role, assistive technology knows that it can query for the "checked" state of this node. It also knows that it can ask the browser to programmatically click the checkbox.
- State: Allows assistive technology to query whether this checkbox is checked or not. For example, screen readers might use this to decide what to announce.
This is the common trio of name, role, value which can be important to make sure your website is accessible.
As far as web developers are concerned, the name, role, value, and state of a node are the main parts of the accessibility tree that you need to worry about. However, there is a lot of other information calculated in the accessibility tree that you might not be aware of. For example, if we just restrict our attention to the UIAutomation accessibility API:
- Bounding boxes: Where is the element located on the screen?
- Locale: What language should I be interpreting this object with?
- FrameworkId: What is the source of this accessibility tree? Is this an accessibility tree coming from Chrome? Is this an accessibility tree coming from Microsoft PowerPoint?
Moving our attention to another accessibility API, IAccessible2, gives us more interesting information to look at, with both examples being custom properties that don't seem to formally be part of the IAccessible2 spec:
- LayoutTable: For the purpose of semantics, is this a real or fake table?
- CSS Display: What is the CSS display for this node?
There is a lot of other information that I am leaving out. While this information isn't the most important to know (although it is good practice to specify locale with the lang
attribute), it can be useful to know that the accessibility tree can't be generated just by scanning the DOM — we need information from CSS as well.
Regarding CSS Display in an accessibility node
Finding this property surprised me. I double checked the source of this data in Chromium, and it does appear to be the computed CSS style for the display property.
I tried searching for similar logic in Firefox, but wasn't able to find anything interesting besides this comment.
I don't expect Safari to calculate this, since it seems to be logic specific to IAccessible2 and AT-SPI. Neither of those accessibility APIs are used in Mac products as far as I can tell.
What is this ever used for? No idea. I naively searched for it in the NVDA and Orca codebase, and got the following hits.
How can I view the accessibility tree?
You have two choices here, depending on the amount of detail that you need.
The first choice is to use the browser's built-in accessibility tree viewer. You don't see the exact information that is given to assistive technology, as the information needs to be expressed in terms of the platform accessibility APIs, but it's extremely rare that you need that level of detail.
Regarding Chromium's accessibility tree inspector
The accessibility tree you see in the Chromium developer tools is what is internally known as the Blink tree. For all intents and purposes, this is a peek into the exact data that is being translated into the platform APIs - this isn't some special intermediate representation that the code makes just for the developer tools, in other words.
This logic is handled in InspectorAccessibilityAgent
.
The second choice is to use an external tool that can display the native accessibility tree to you. You have different tools depending on your platform.
- Dump Tree Utility
- Accessibility Insights for Windows (UIAutomation only)
- Accessibility Inspector on OS X
- Accerciser
These tools are a good way to play around with the native accessibility APIs if you want to take the time. I think it's very rare these tools are useful for web development, though.
What technologies use the accessibility API?
Remember that an accessibility API allows you to do two things:
- It allows you to programmatically read off the UI of an application through an exposed accessibility tree.
- It allows you to programmatically interact with the UI of an application through nodes on the exposed accessibility tree.
As previously discussed, this allows screen readers to know what to read out when interacting with a page, and allows voice recognition software to know how to respond to specific voice commands from the user. However, there are lots of other assistive and non-assistive technologies that use the accessibility API.
To find out what applications use the accessibility API, we can use the about://histograms page on a Chromium based browser to determine whether the browser is calculating accessibility information or not (for performance purposes, the browser doesn't kick off accessibility-related code until it has to).
As a a decent hueristic, if the histogram for HandleAXEvents has more than one data sample, we know that accessibility must have been turned on for some duration while the Chromium browser was running. In other words, some application has been making calls to the accessibility API. You can use this heuristic on any platform Chromium runs on.
We can do better on the Windows platform, though. On Windows, Chromium has a WinAPIs histogram that lets us know how many times specific Windows accessibility APIs were called. The values of the histogram correspond with this collection of enums. So not only do we know if an application is using the accessibility API or not, but we also know precisely what kind of information it is asking for. While not all accessibility API calls are logged, this should still give us some interesting information!
Onscreen Keyboard
I was very surprised to see that the onscreen keyboard was making accessibility API calls to Chromium. I launched the onscreen keyboard and typed some random gibberish into google (the "fdtgdgejcd/cgj" string to be precise) and got the following results.
- UMA_API_GET_ACC_FOCUS: 6 hits
- UMA_API_GET_ACC_PARENT: 10 hits
- UMA_API_GET_ACC_STATE: 7 hits
- UMA_API_GET_UNIQUE_ID: 8 hits
- UMA_API_GET_WINDOW_HANDLE: 8 hits
- UMA_API_IA2_GET_ATTRIBUTES: 1 hit
- UMA_API_QUERY_SERVICE: 108 hits
- UMA_API_ROLE: 4 hits
Windows Magnifier
Magnification software also makes good use of accessibility APIs. In this workflow, I used the Windows Magnifier and then launched Chromium to the WCAG 2.1 guidelines. I then zoomed in to 200%, then zoomed back out to 100%. Here are the results:
- UMA_API_ACC_LOCATION: 18 hits
- UMA_API_GET_ACC_CHILD: 41 hits
- UMA_API_GET_ACC_FOCUS: 37 hits
- UMA_API_GET_ACC_NAME: 2 hits
- UMA_API_GET_ACC_PARENT: 112 hits
- UMA_API_GET_ACC_ROLE: 25 hits
- UMA_API_GET_ACC_STATE: 101 hits
- UMA_API_GET_UNIQUE_ID: 186 hits
- UMA_API_GET_WINDOW_HANDLE: 23 hits
- UMA_API_IA2_GET_ATTRIBUTES: 29 hits
- UMA_API_QUERY_SERVICE: 708 hits
- UMA_API_ROLE: 61 hits
Power Automate
Power Automate is a tool that automates tasks in a UI. I have a simple workflow where I launch an instance of Chrome that navigates to a google search results page for "test", then clicks on the first link in the results. We get the following hits:
- UMA_API_GET_ACC_PARENT: 19 hits
- UMA_API_GET_ACC_STATE: 1 hit
- UMA_API_GET_UNIQUE_ID: 4 hits
- UMA_API_IA2_GET_ATTRIBUTES: 1 hit
- UMA_API_QUERY_SERVICE: 82 hits
- UMA_API_ROLE: 1 hit
Grammarly
Grammarly is a tool that attempts to improve your writing skills. The native application seems to have a natural usecase for using the accessibility API - we want to get text from some editable text area in an application, then programmatically insert some different text back if the user accepts Grammarly's suggestions.
Note that in this test I explicitly used the native application version of Grammarly, not the browser extension. The browser extension will not need to use accessibility APIs since it can simply scrape the DOM.
In this workflow, I launched MDN's textarea article and typed in the word "salaid" into the provided <textarea>
element. I then waited for Grammarly to give me an autocorrect suggestion to "salad", and accepted the suggestion. We get the following hits:
- UMA_API_ACC_LOCATION: 306 hits
- UMA_API_ADD_SELECTION: 1 hit
- UMA_API_GET_ACC_CHILD: 134 hits
- UMA_API_GET_ACC_CHILD_COUNT: 10 hits
- UMA_API_GET_ACC_NAME: 157 hits
- UMA_API_GET_ACC_PARENT: 871 hits
- UMA_API_GET_ACC_ROLE: 20 hits
- UMA_API_GET_ACC_STATE: 325 hits
- UMA_API_GET_ACC_VALUE: 50 hits
- UMA_API_GET_CARET_OFFSET: 12 hits
- UMA_API_GET_CHARACTER_EXTENTS: 64 hits
- UMA_API_GET_INDEX_IN_PARENT: 8 hits
- UMA_API_GET_N_SELECTIONS: 14 hits
- UMA_API_GET_SELECTION: 3 hits
- UMA_API_GET_STATES: 19 hits
- UMA_API_GET_TEXT: 34 hits
- UMA_API_GET_TOOLKIT_NAME: 22 hits
- UMA_API_GET_UNIQUE_ID: 408 hits
- UMA_API_GET_WINDOW_HANDLE: 142 hits
- UMA_API_IA2_GET_ATTRIBUTES: 364 hits
- UMA_API_QUERY_SERVICE: 6411 hits
- UMA_API_ROLE: 277 hits
- UMA_API_SET_SELECTION: 1 hit
On Grammarly's accessibility API usage
These results really surprised me - I didn't expect Grammarly to make such heavy use of these accessibility APIs. I even did this workflow 3 times to make sure I wasn't missing anything, and confirmed that before I launched Grammarly that the WinAPIs histogram had zero entries.
I wonder how many redundancies are here, although it's hard for me to say without knowing the reasons behind all of these API calls.
Mobile Password Managers
OK, I don't have data here since this is clearly not on Windows. However, notice how you can give accessibility permissions to password managers on Android, and that they can start overlaying their own UI on password fields after you do that. Visiting the histogram for HandleAXEvents on android Chromium does confirm that accessibility mode is turned on, although I can't comment on what API calls they are making.
How does the browser generate the accessibility tree?
Generating and updating an information dense structure like the accessibility tree in a performant way can be a very interesting technical challenge. This is even more true when the source of data comes from dynamic HTML and CSS, which can be combined in a variety of ways.
While there are lots of things to keep in mind when generating the accessibility tree, we'll restrict our discussion to the basics: how do we generate the name and role for a given accessibility node?
Generating the name
Although I'm simplifying a bit, there are three main ways a name can be generated for a node:
- Name from content
- Name from labelling element
- Name from author
Name from content is a strategy where the name of a node is derived from its text content. For example, suppose we have the following markup:
<a href="/projects">See my projects</a>
For the link's accessibility node, browsers are smart enough to make the node's name be the same as the node's text content. This gives screen readers the appropriate name to read, and also allows voice recognition software to click on this link if the user requested it.
link
- Name
- "See my projects"
- Role
- Link
Browsers adopt a similar strategy for something like a button:
<button type="button">Dark theme</button>
which would lead to generating an accessibility node that looks something like:
button
- Name
- "Dark theme"
- Role
- Button
However, note that this strategy can't be implemented in a sensible way for all HTMLElements
. For example, could we generate a good name from content for even a simple table like the one below?
<table>
<tr>
<th>Countries</th>
<th>Capitals</th>
<th>Population</th>
<th>Language</th>
</tr>
<tr>
<td>USA</td>
<td>Washington, D.C.</td>
<td>309 million</td>
<td>English</td>
</tr>
<tr>
<td>Sweden</td>
<td>Stockholm</td>
<td>9 million</td>
<td>Swedish</td>
</tr>
</table>
Countries | Capitals | Population | Language |
---|---|---|---|
USA | Washington, D.C. | 309 million | English |
Sweden | Stockholm | 9 million | Swedish |
Name from labelling elements is the second strategy browsers use to generate a name for an accessibility node. Certain HTMLElements
can source their name from a specific HTMLElement
that serves as a natural label.
For example, going back to the table example from above, we can add a <caption>
element to source the name from.
<table>
<caption>
Some title
</caption>
...
<!-- Same table content as before -->
...
</table>
Countries | Capitals | Population | Language |
---|---|---|---|
USA | Washington, D.C. | 309 million | English |
Sweden | Stockholm | 9 million | Swedish |
The table can then have an accessibility node that looks something like this:
table
- Name
- "Some title"
- Role
- Table
An even more natural example is the pairing of an <input>
and <label>
element.
<label for="input">
Some label
</label>
<input
id="input"
type="checkbox"
<!-- Whatever other attributes you want -->
/>
which will generate an accessibility node that looks something like this:
checkbox
- Name
- "Some label"
- Role
- Checkbox
Name from author is the third strategy browsers can use to generate a name. In this case, the browser relies on the web developer to manually supply an accessible name using the aria-label
or aria-labelledby
attributes. A name from author always overrides other naming strategies - assuming that the role supports name from author, anyway. For example, if we had some HTML as follows:
<a
aria-label="Not a chance!"
href="/projects"
>See my projects</a
>
the accessibility node that is generated would have the name "Not a chance!":
link
- Name
- "Not a chance!"
- Role
- Link
Note that using ARIA in this case is extremely bad, as browsers already have a great name it can generate from content. Overriding the name in this way can also make it so voice recognition software won't be able to find the link if a user says "Click see my projects", as this link has a completely different accessible name. See the Label in Name criterion for a related discussion.
Musings on name from content
Even though the ARIA spec says that elements with the row
role must compute their name from content, Chromium only does this conditionally for performance purposes.
This codepen for conditional row name computations allows you to play around with this behavior. I have only tested this in Chromium, but I haven't tested this behavior in other browsers to see if they also try to make similar optimizations.
Musings on CSS contributions to name
Did you know that CSS pseudoelements can also contribute to the accessible name of an element? I suggest that you try the CSS pseudoname contribution codepen and use the accessibility inspector of your choice to see for yourself. This behavior is spec'd out, so you should see this behavior in all major browsers.
Interestingly, Chromium does some additional work to not expose CSS pseudocontent coming from the micro clearfix hack in the accessibility tree. I don't have the time to make a codepen to test this, along with seeing what other browsers do at this time. I'll probably make an edit when I do later, though. 🙂
Generating the role
The role of an accessibility node is generally simple to calculate if it has a corresponding HTMLElement. We can first use the role given to us by the ARIA role attribute if it exists. Otherwise, we can just look at the HTML tagname.
For example, consider the following HTML:
<button>This is some text</button>
The button
element when translated would just have the button role. Assistive technology know that it can interact with this control as a button, and query for state relevant for buttons.
However, if we added a role
attribute to it:
<button role="link">This is some text</button>
Then the <button>
element would be translated into having the link
role.
Note that you should generally not do this - if you really need a link
, just use the <a>
element.
There are some amounts of complexity to consider though. For example, not all ARIA roles are valid on all HTMLElements.
Second, browsers don't always respect the semantics coming from an HTMLElement, although this is restricted to tables
and lists
as far as I'm aware:
- All major browsers implement interesting heuristics to determine if a table should be exposed as a table. This is done to compensate for bad HTML where the
<table>
element is used as a styling tool rather than to communicate table semantics. - Safari specifically attempts to do something similar for lists. Other browsers don't seem to implement similar heuristics.
Wrapup
If you take nothing else, the accessibility API is an API that lets you programmatically read off and interact with an application. When generating the accessibility API, the browser has to consider both HTML and CSS to generate the accessibility tree.
Edit History
- 9/17/2023: Clarified that
aria-label
only overrides the name on roles that support name from author. Per the spec, it should not do anything otherwise.