Testability and Validity of WCAG 2.0: The Expertise Effect

Brajnik, G; Yesilada, Yeliz; Harper, Simon

doi:10.1145/1878803.1878813

Web Content Accessibility Guidelines 2.0 (WCAG 2.0) require that success criteria be tested by human inspection. Further, testability of WCAG 2.0 criteria is achieved if 80% of knowledgeable inspectors agree that the criteria has been met or not. In this paper we investigate the very core WCAG 2.0, being their ability to determine web content accessibility conformance. We conducted an empirical study to ascertain the testability of WCAG 2.0 success criteria when experts and non-experts evaluated four relatively complex web pages; and the differences between the two. Further, we discuss the validity of the evaluations generated by these inspectors and look at the differences in validity due to expertise. In summary, our study, comprising 22 experts and 27 non-experts, shows that approximately 50% of success criteria fail to meet the 80% agreement threshold; experts produce 20% false positives and miss 32% of the true problems. We also compared the performance of experts against that of non-experts and found that agreement for the non-experts dropped by 6%, false positives reach 42% and false negatives 49%. This suggests that in many cases WCAG 2.0 conformance cannot be tested by human inspection to a level where it is believed that at least 80% of knowledgeable human evaluators would agree on the conclusion. Why experts fail to meet the 80% threshold and what can be done to help achieve this level are the subjects of further investigation.