The Test Was Already Broken

The Smallness of the Test

A few days ago, I opened a technical assessment and immediately felt that old familiar irritation.

Not because it was hard.

Because it was small.

The kind of small that shows up as obscure JavaScript trivia. Not “can you build a reliable system?” Not “can you debug this real failure mode?” Not “can you explain your tradeoffs?” Just a carefully staged little puzzle asking whether you remember the exact shape of a language edge case under artificial conditions.

And I had this thought:

AI didn’t break this test.

The test was already broken.

AI just made it obvious.

Recall Is Not Competence

For a long time, technical assessments have confused recall with competence. They ask whether you can reproduce knowledge without tools, as if real engineers spend their days locked in a white room with no documentation, no tests, no logs, no teammates, no debugger, no internet, and no memory outside their skull.

But that is not engineering.

Engineering is not the performance of isolation. Engineering is the practice of using the available world to move a system toward correctness.

A bad test asks:

Can you remember this without help?

A better test asks:

Can you solve the problem with the tools engineers actually use?

Those are not the same question.

What the Calculator Changed

The calculator did this to lazy math testing. If the entire exam was arithmetic, the calculator looked like cheating. But the better teachers understood the shift. Once arithmetic became cheap, the question had to move up a level. Set up the problem. Choose the method. Interpret the result. Know when the answer is nonsense.

AI is doing the same thing to knowledge work.

If a technical test becomes useless the second someone has access to documentation, search, or an AI assistant, then the test was probably not measuring engineering judgment. It was measuring controlled deprivation.

That does not mean foundational knowledge no longer matters. It matters enormously. You need enough JavaScript to smell when this is wrong. You need enough async understanding to debug a race condition. You need enough type and runtime intuition to know when an answer is suspicious.

But the purpose of foundational knowledge is orientation, not theatrical memorization.

The Operator Stance

The real skill is not “can you contain everything?”

The real skill is “can you operate?”

Can you define the problem? Can you find the relevant abstraction? Can you use the docs without drowning in them? Can you ask the AI for help without surrendering your judgment? Can you test the output? Can you explain why it works? Can you notice when it doesn’t?

That is what a modern engineering assessment should measure.

Give someone a repo. Give them the tools. Let them use AI, docs, tests, search, whatever they would actually use at work. Then watch how they move.

Do they scope the problem? Do they inspect before editing? Do they write tests? Do they catch hallucinations? Do they overbuild? Do they communicate tradeoffs? Do they leave the system cleaner than they found it?

That is signal.

The Test Now

The old test asked whether the engineer could perform intelligence without the world.

The new test should ask whether the engineer can stay intelligent while using the world.

Because AI did not make tool use optional. It made tool use central.

And that means the question is no longer:

Can this person work without help?

The question is:

Can this person use powerful help without losing themselves?

That is the operator stance.

Not memorization.

Judgment.

Not isolation.

Agency.

Not “did you already know the answer?”

“Can you get to the truth without lying to yourself about how you got there?”

That is the test now.

And honestly, it probably should have been the test all along.