Skip to content

Research and Insights · Education

Why Surprises on State Tests Are Not Inevitable

Prediction is a function of design. Surprise is a function of its absence.

6 min read · An essay from SignalWorks

The Annual Ritual of Surprise

Every summer, district leaders sit down with state assessment results and react with some combination of relief, frustration, and confusion. Schools that seemed to be on track underperformed. Schools that looked uncertain quietly exceeded expectations. Subgroups moved in directions no one anticipated. The conversation that follows almost always treats these outcomes as surprises, the kind of unpredictable events that simply have to be absorbed and explained.

But results on a standardized assessment are rarely as unpredictable as they appear. They are the visible end of a long chain of design choices, instructional decisions, and feedback signals that the system either captured or ignored throughout the year. When the outcome surprises us, the honest interpretation is not that students performed unexpectedly. It is that our prediction systems failed.

What Prediction Requires

A system capable of predicting performance is a system that has done four things well. It has defined what successful performance actually looks like, in specific and observable terms. It has identified the cognitive demands those performances impose, including the kinds of reasoning, transfer, and stamina they require. It has built rehearsal opportunities throughout the year that approximate those demands under realistic conditions. And it has instrumented the year with checkpoints that produce honest signal rather than reassuring noise.

Most schools do parts of this. Few do all of it. The result is a feedback environment full of grades, completion rates, and benchmark scores that feel like prediction but rarely behave like it.

Why Benchmarks Often Mislead

Many districts rely on interim benchmark assessments and treat their results as predictive. Sometimes they are. Often they are not. A benchmark that is too easy creates false confidence. A benchmark that is poorly aligned to the actual cognitive demands of the year-end task creates false alarm. A benchmark administered under conditions students will never face again — open-book, untimed, with retake policies — measures something different from what the real performance will measure.

Prediction requires alignment between the rehearsal conditions and the performance conditions. Without that alignment, benchmark data becomes another source of noise that obscures the signal leaders actually need.

Prediction as Professional Responsibility

Treating prediction as a professional responsibility changes the posture of educators and leaders. Instead of waiting for outcomes and explaining them, the work becomes building systems that surface concerns early enough to act on them. It means designing checkpoints that ask students to perform under conditions resembling the eventual assessment. It means triangulating signals — student work, observation, performance tasks — rather than relying on any single metric. It means being willing to adjust mid-year on the basis of evidence, not to wait until results force a reaction.

When schools take prediction seriously, surprises become rare. Not because performance is guaranteed, but because the system has been instrumented to see clearly enough that significant gaps no longer arrive without warning.

What Disappointment Usually Means

When results disappoint, the most useful question is not what went wrong with the students. It is what the system failed to see in time. That question moves leaders out of explanation and into design. It treats unpredicted outcomes as a signal that the architecture, not the children, needs attention.

Schools that adopt this stance stop being surprised. They develop the quiet confidence that comes from knowing, in November, how their students are likely to perform in May, and from knowing they still have time to change the answer.

Surprise is not the price of doing the work. It is the cost of designing systems that cannot see clearly enough to predict.

If people cannot independently and reliably apply what they have learned under real conditions, learning has not truly occurred.