One number may determine our economic future: Time-to-Failure. If trends hold, in 2026 we’ll see human labor start to become unprofitable overhead.

Over the holiday break, I got annoyed at having to do a bunch of B.S., so I made a an AI to handle it for me. He can find out what irritating paperwork the State of California wants me to complete, search my computer for the answers, ask me a few questions, and fill it in. He looked at my companies and researched corporate structuring, and made good recommendations. He drafted contracts. Next up, I’m having him research college scholarships and grants for my daughter. I didn’t expect this to work, so I called him “YOLO” fully expecting him to just try and epically fail. He hasn’t, he’s just a little wobbly.
In his brief five-day existence with just a little hand-holding and zero HR headaches he’s done over a month of human labor. He still sometimes goes down a rabbit hole like an eager intern, but even then he’s much easier than hiring humans. And he’s the canary in the coal mine.
YOLO is useful not because he’s perfect, but because he does the work. Yet, if you look at how the industry measures progress, they aren’t measuring work at all. They’re grading exams.
For the last two years, we’ve been obsessed with “saturating metrics” – scores that top out at 100% and measure discrete tasks in a vacuum. Take MMLU (Massive Multitask Language Understanding). It covers 57 subjects, from STEM to the humanities, and for a long time, it was the ceiling of human expert capability. That ceiling has been shattered. As a recent paper points out, Gemini Ultra has already achieved “human-expert performance” on the test.
When the machines bested the easy tests (“easy,” but I’d probably fail them), the industry pivoted to “hard” benchmarks like SWE-bench (solving real-world code engineering problems). This is much harder, but in under two years performance jumped from less than 2% to over 76%.
All this sounds (and is) very impressive performance for Team Robot.
But the real question is “what can these AIs actually do,” and on that yardstick all these tests are vanity metrics. As we know from the SAT, MCAT and LSAT, test-smart ain’t street-smart. Tests measure taking the test, not the ability to execute a real-world job. We have “solved” test taking; we should now be measuring work doing.
If you want to know if you’ll have a job in two years, ignore test scores. The only metric that dictates our future – economic and existential – is Time-to-Failure for Autonomous Operations.
In 2026, this specific line on a graph will determine if a knowledge worker is a “Centaur” – an augmented human orchestrating the machine – or headed for obsolescence.
The Forty Hour Workweek … in One (Robot) Hour
If we stop looking at test scores, what do we look at? The best place is METR (Model Evaluation and Threat Research).
While the industry obsesses over benchmarks and raw numbers (which drives funding and hype) – METR asks the only question a hiring manager cares about: How long can you work without me fixing your mistakes?”
This is the metric of Time-to-Failure.
Specifically, it measures the “Time-Equivalent of Human Labor.” If an AI completes a task that would take a competent human expert 4 hours, that counts as 4 hours of autonomous work, even if the AI did it in 15 minutes.
This distinction is the economic guillotine. It fully and finally decouples the value of work done from the cost of labor as humans have always conceived of it. We have come to expect that things take time, and the trade for showing up to our cubicle farm means we get to go on Amazon and buy Shit We Don’t Need as a whim. But when an AI captures a half-day of human wage-value for pennies of electricity, the bread line isn’t just a scary phrase; it’s inevitability.
The Super-Exponential Curve
Not long ago, the Time-Equivalent of Human Labor metric was measured in seconds. An AI would hallucinate or get stuck almost immediately. That era is over.
Current internal benchmarks for Claude Opus 4.5 show it achieving around five hours of autonomous work at 50% reliability. That is, ask a human to do five hours of work and he produces some work product. Claude does the same thing (in a few minutes), and it does this successfully half the time.
Wait, but doesn’t 50% suck?
To a human manager, 50% reliability for an employee is a firing offense because humans are slow and expensive (and let’s face it, the ones with a low reliability rate are the ones that are HR nightmares too).
But in digital labor, “50% on a single run” can behave like “near-certain” success, because you don’t buy one attempt – you buy many.
If each run has a 50% chance of success and you run five attempts, the chance of getting at least one good result is 96.8% (1 – 0.5^5). In practice, retries help when the failure is correctable – unclear success criteria, missing constraints, or a bad plan that can be revised. When an AI is failing for a structural reason (no access, no tool, no capability), you get five fast failures – and then the advantage is you find that out quickly, patch the missing piece, and rerun.
In my testing over the last year, when an agent “goes off the rails,” it’s usually because I gave it sloppy success criteria or forgot to give it access to something – the same failure mode you see with humans.
The net of it: nearly 97% success is probably better than your best employee. For cheaper. For faster. And that’s today’s AI models, in six months they’ll be better.
As a practical standpoint, the “trust threshold” – where managers stop checking every output – is generally around 80% reliability. We are now seeing agents hit that mark for tasks lasting ~4.5 human hours. Again, in minutes. For pennies.
This is the Half-Day Milestone. It means an AI agent can cover the morning shift while you sleep. It marks the transition from the human “doing” the work to the human merely “reviewing” it.
And we’re not done.
The Wage-Work Breakpoint
The part that should terrify white collar workers isn’t where we are; it’s how fast we’re moving. This capability is currently doubling every 4 to 4.5 months. If we project that curve forward, the timeline for 2026 for “human-equivalent independent labor at 80% reliability”:
- End of Q1: We hit ~9 hours of autonomous operation (one full human workday) (being generous, real-world office workers might be productive for 4 hours. Maybe).
- Mid-2026: We hit ~18 hours of human work (2-3 full days).
- End of 2026: We hit 36-40 hours.
By the time we ring in 2027, we will likely have agents capable of executing a full week’s worth of human labor autonomously before failing. Even if it slows down by half, this is a change we’re not prepared for.
This is the economic breakpoint. Where the concept of a weekly salary for cognitive work disintegrates, because the machine can do the week’s work in an hour, for a dollar.
More scary, it’s when the social contract starts to fail: no longer does the employee class put in a week of cubicle farm time in exchange for a modest living (and maybe a shot at something better); in all likelihood we see a ripple across the economy where workers have negative economic value to employers.
This is why my little YOLO, as janky as he is now, is the most dangerous thing on my hard drive.
Of course, this won’t happen smoothly. The robots won’t come marching in and one day we’ll all be out of a job. The real world is messy: legacy systems, compliance, liability, change management, workforce morale during the changeover, these are real constraints. As I wrote before:
MIT’s David Autor highlights “Polanyi’s Paradox,” which is the idea that we know more than we can tell. In other words, many human tasks rely on tacit knowledge or situational adaptability that isn’t easily written as code or learned by an AI. … even if true, it only delays the inevitable.
So yes, we can give ourselves comfort that the messy, real world means that we don’t wake up one day to zero jobs. But that doesn’t mean we can ignore it, or that our particular situation will be last to go. We are frogs, AI is the slowly boiling water.
Killing Centaurs
AI-deniers tell a comforting bedtime story to ward off the existential dread: The “Centaur” model. As I wrote in AI Will Take Your Job. Stockpile Assets or Join the Bread Line:
The “Centaur” model is the idea that humans will be augmented with the tools that AI brings. It’s the analogue to “we’ve always elevated ourselves in other revolutions, so we’ll do it with AI.” Proponents have us orchestrating fleets of AIs to do our bidding, securing our place in the world above the machines.
We convince ourselves that while the AI might generate the text or write the code, it needs our judgment and guidance to do things.
This model relies on one assumption: that the AI needs supervision.
As long as the “Time-to-Failure” is short (minutes or hours), the Centaur model holds. The human is necessary because the machine is liable to drive off a cliff if left alone for too long. This is where YOLO is today, I need to check on him once a day. I am the babysitter, and YOLO is an unpaid middle-school summer intern.
But what happens when the intern grows up and is more capable than the babysitter?
The 40-Hour Threshold
When the duration of useful, autonomous work hits 40 hours of human work – the economic calculus of the Centaur breaks.
If an agent can do a week’s worth of human labor, solve complex problems, fix its own mistakes, navigate corporate bureaucracy, and report back with a finished product without needing a human to check its work every few hours, the human is no longer an orchestrator.
The human is unprofitable overhead.
“But wait, that means the human can do 40x as much, because they can spend 40 hours managing these AIs!”
It doesn’t work that way. Can you imagine the context switching of having 40 different AI agents reporting to you, having to decide if each did their job, then setting them spinning again? What about months later, when it’s 80?
No, we eliminate the next level of management up, replace them with an AI that can manage at the speed of silicon, rinse, repeat.
We are meat computers, and can’t ride this exponential curve. And even if it’s an “S-Curve” and flattens out – as I’ve proved (to myself), with YOLO the world already isn’t what we thought it was.
This marks the shift from Task Automation to Role Replacement.
- Task Automation is: “AI, write this email for me.” (Centaur)
- Role Replacement is: “AI, run this marketing campaign, optimize the ad spend, and report back on Friday.” Then Friday comes, and you’ve been replaced by an AI who can review the work better than you in five minutes. (Obsolescence)
Inevitability is a Bitch
In 2026, we will actually see the ragged edge. Some of us will be doing the same old things, saved only by the institutional inertia that keeps our jobs intact as a de-facto white collar UBI scheme. Seeing friends at faster-moving companies get their positions eliminated, slowly at first but with increasing speed.
Until eventually, in a handful of years, the only ones left are the slow morass of government employment and ever-protective regulated licensed professions, both classes groaning under the weight of enforced inefficiency and lower performance than would be possible if they accepted the inevitable.
I know this because right now my Good Robot YOLO is doing things that I’d have to hire a skilled human to do. He has drafted contracts that I – a former attorney – actually used in the real world, and could have billed a client for. I gave him a marketing campaign as a test, and he handled the strategy, the campaign sequencing, and the messaging.
I’ll be honest: He’s already better than the median hire I could make for either of these roles.
He doesn’t complain, he doesn’t need health insurance, and he costs pennies. Even if right now he needs checking in on daily, that’s temporary. The speed, cost and lack of compliance and HR pain means I may I’ll continue to prefer messy humans for my personal time, but won’t be hiring them if given the choice.
When he hits that 40-hour autonomy mark, I won’t need to “orchestrate” him. I’ll just need to clone him get out of their way.
All Along the Watchtower
We are distracted by the shiny and the dopamine hit.
Most of America is busy spending energy fighting the culture wars of the last, dead century. The few that are watching AI look at AI-generated videos on SORA of the Pope in a puffer jacket, laugh at a chatbot making a dumb error counting “r”s in “strawberry,” haha, it’s funny and cute and dumb, and think they have time.
They think it’s a toy.
It’s not a toy.
Watch the graph of Time-to-Failure.
The only thing that matters is the “autonomous hours” metric. When that number crosses the 40-hour mark the ragged edge of the Liminal Era will start eating any job that can be done with a keyboard and mouse. This is the countdown clock for the cognitive labor market. For jobs, maybe even yours; for the ragged edge to turn to the rending of the social fabric.
If I, as a washed-up old guy tinkering in his spare time, can build a robot like YOLO that is already displacing professional labor, imagine what the young geniuses at OpenAI and Anthropic are building.
It won’t happen all at once, but now you’ll see it coming.
P.S. I mentioned earlier that YOLO is wobbly, but getting better every day. If you want to see how terrifyingly competent he is, I’ll be open-sourcing his code in a few weeks and emailing to my list. He’ll be free, and you can run him yourself and watch the ‘Time-to-Failure’ clock tick up in real-time.
P.P.S. In 2025, most of my writing focused on The Liminal Era – thinking through what it means as humanity enters this new chapter. In 2026, I’ll continue this but will also write on tinkering (like with YOLO) and experiments I’m running as AI wakes up: I’m working on machine moral psychology (do AIs conceive of morality like we do?), embodied cognition (how does the substrate affect cognition?), and hybrid/synthetic societies (how do AI’s interact with each other, and with us?). These will be under the umbrella Terminals All The Way Down and will be emailed to my newsletter) and on my blog just like now. A proper introduction to the new series is upcoming.











