Why Standardized Tests Don’t Measure Educational Quality

The task for those developing standardized achievement tests is to create an assessment instrument that, with a handful of items, yields valid norm-referenced interpretations of a student’s status regarding a substantial chunk of content. Items that do the best job of discriminating among students are those answered correctly by roughly half the students. Devlopers avoid items that are answered correctly by too many or by too few students.

As a consequence of carefully sampling content and concentrating on items that discriminate optimally among students, these test creators have produced assessment tools that do a great job of providing relative comparisons of a student’s content mastery with that of students nationwide. Assuming that the national norm group is genuinely representative of the nation at large, then educators and parents can make useful inferences about students.

One of the most useful of those inferences typically deals with students’ relative strengths and weaknesses across subject areas, such as when parents find that their daughter sparkles in mathematics but sinks in science. It’s also possible to identify students’ relative strengths and weaknesses within a given subject area if there are enough test items to do so. For instance, if a 45-item standardized test in mathematics allocates 15 items to basic computation, 15 items to geometry, and 15 items to algebra, it might be possible to get a rough idea of a student’s relative strengths and weaknesses in those three realms of mathematics. More often than not, however, these tests contain too few items to allow meaningful within-subject comparisons of students’ strengths and weaknesses.

A second kind of useful inference that can be based on standardized achievement tests involves a student’s growth over time in different subject areas. For example, let’s say that a child is given a standardized achievement test every third year. We see that the child’s percentile performances in most subjects are relatively similar at each testing, but that the child’s percentiles in mathematics appear to drop dramatically at each subsequent testing. That’s useful information.

But there’s an enormous amount of knowledge and/or skills that children at any grade level are likely to know. The substantial size of the content domain that a standardized achievement test is supposed to represent poses genuine difficulties for the developers of such tests. If a test actually covered all the knowledge and skills in the domain, it would be far too long.

So standardized achievement tests often need to accomplish their measurement mission with a much smaller collection of test items than might otherwise be employed if testing time were not an issue. The way out of this assessment bind is for standardized achievement tests to sample the knowledge and/or skills in the content domain. Frequently, such tests try to do their assessment job with only 40 to 50 items in a subject field—sometimes fewer.