Human in the End: Rethinking How We Interact with AI Agents

10 min read
阅读中文版

I’ve been building an agent runtime called Alan. It’s written in Rust, and the core idea is to model AI agents as Turing machines—a stateless program executes on a stateful tape, producing observable side effects.

But the deeper I got into the runtime design, the more I kept hitting the same question: where does the human fit in?

The industry answer is “Human-in-the-Loop.” Put a human at every critical decision point. Make them approve tool calls, review outputs, confirm actions. It’s safe. It’s responsible. And yet—a recent Anthropic study tells a different story: among Claude Code users, about 20% of new user sessions run with full auto-approve; by the time someone has used it 750+ times, that number crosses 40%. The longest agent sessions nearly doubled—from under 25 minutes to over 45—between October 2025 and January 2026.

People are voting with their feet. They trust agents more, and they don’t want to babysit every step.

This post is about why I think we need a different model—one I’m calling Human-in-the-End—and what it actually means for agent architecture.

The Loop That Doesn’t Scale

Human-in-the-Loop (HITL) is the consensus pattern for AI safety. The idea is straightforward: AI proposes, human disposes. Before the agent executes anything risky, it pauses and waits for approval.

In practice, this shows up in three flavors:

This works. For short, interactive sessions where a human is actively watching, it works well. The human catches hallucinations, prevents destructive actions, and provides course corrections in real time.

But here’s the thing: agents are getting more capable. They can plan multi-step tasks, reflect on their own outputs, retry failed approaches, and maintain context across long execution chains. The trajectory is clear—agents are moving from “answer my question” to “complete my project.”

And that’s where HITL falls apart.

When an agent runs for hours—researching, coding, testing, iterating—the approval model creates three compounding problems:

Approval fatigue. You click “approve” so many times that you stop reading what you’re approving. The safety mechanism becomes theater. David Farrell nailed this in The Unsupervised Agent Problem: “your attention has drifted… you rubber-stamp an action you should have scrutinized.” And Anthropic’s data backs it up—over 40% of experienced Claude Code users just turn on full auto-approve.

Bandwidth ceiling. Every approval is a synchronous blocking call on a human brain. The agent’s throughput is capped by your attention span. OpenAI’s Harness team reported that they regularly see single Codex runs lasting over 6 hours—often while the engineers are asleep. You can’t step-by-step approve something that runs overnight.

Role mismatch. You’re forced to evaluate low-level details (should this file be written? should this command run?) when your actual value is in high-level judgment (is this the right approach? does this outcome meet the goal?). The Harness team figured this out: they progressively replaced human review with agent-to-agent review, keeping humans for moments that require real judgment. The result? Three engineers used Codex to ship roughly 1,500 PRs and nearly a million lines of code in five months. Not a single line written by hand.

The irony: the more capable the agent, the less sense it makes to supervise every step. You wouldn’t hire a senior engineer and then stand behind them approving every keystroke.

What “Human in the End” Actually Means

The phrase “Human-in-the-End” is deliberately provocative. It’s meant to contrast with HITL—from “approve every step” to “just check the result.” Taken literally, though, it’s misleading.

Humans don’t vanish until the end. In practice, they show up at three stages:

What humans don’t participate in is execution—the routine, reversible, within-policy operations that make up 95% of an agent’s work.

So the real shift isn’t about where humans appear. It’s about what role they play:

HITLHITE
Human roleOperatorOwner
FocusExecution detailsGoals and outcomes
InterventionStep-by-step approvalException-driven oversight
Agent modelSupervised toolDelegated autonomous executor

The core formula: Human Defines → Agent Executes → Human Owns.

This isn’t just theory. OpenAI’s Harness team arrived at essentially the same model independently—their stated principle is “Humans steer, agents execute.” Humans set goals and constraints, agents handle the full pipeline from coding to testing to deployment, and only escalate when judgment is genuinely needed.

  1. Human Defines. Set the goal, budget, risk policy, and hard boundaries. Define the rules, not babysit the process.
  2. Agent Executes. The agent handles all operational work within the sandbox—planning, executing, reflecting, retrying. Normal flow runs silently.
  3. Human Owns. The human is accountable for the outcome. The system only escalates when it hits a policy boundary—budget exceeded, high-risk action triggered, anomaly detected. This is exception-driven human oversight.

Why Humans Never Fully Disappear

I want to be clear about this, because HITE is easy to misread as “fully autonomous AI, no humans needed.” That’s not my argument.

Humans stay irreplaceable for structural reasons, not technical ones:

Accountability. If an agent signs a bad contract or pushes broken code to production, a human bears the consequences. There must always be someone accountable—just not someone approving every step.

Strategic judgment. AI can optimize, but it can’t define purpose. “Should we enter this market?” isn’t an optimization problem. It’s a judgment call involving risk appetite, culture, and brand—things no probability distribution captures.

Unquantifiable value. Long-term relationships, trust networks, industry gut feel. These shape decisions in ways models can’t replicate.

HITE frees humans from execution oversight so they can focus on what genuinely requires human judgment: setting goals and owning outcomes.

From Philosophy to Architecture

Ideas are cheap. The hard part is encoding them into a runtime.

I’ve been working through this in Alan, which models agents as Turing machines: a stateless program (AgentConfig) executes on a stateful tape (Tape), producing durable side effects recorded in a rollout log. Three abstractions:

AgentConfig  →  "how to think"   (stateless: LLM + tools + policies)
Workspace    →  "who I am"       (persistent: persona + memory + skills)
Session      →  "what I'm doing" (bounded: tape + turns + rollout)

Alan already has the foundation—stateless agent / stateful workspace / bounded session separation, plus rollback, replay, approval, and sandbox primitives. But turning HITE from a philosophy into runtime behavior requires four more things.

Commit Boundaries and Policy-as-Code

If HITE means anything concrete, it means this: most agent actions should flow freely, and human intervention should converge on commit boundaries—irreversible checkpoints where real-world consequences kick in.

Traditional approval flows bind intervention to specific tool calls. Too rigid. In an autonomous system, reading a file, running a local computation, trying an approach that might fail—all of these should be free. Human attention should be saved for the moments that actually matter: signing a contract, transferring funds, pushing to production.

These boundaries need to be defined as declarative Policy-as-Code:

In Turing machine terms: a commit boundary is a special symbol on the tape. When the read/write head hits it, the transition function yields to the human instead of deciding on its own. The human writes their decision, the machine resumes. This maps naturally to Alan’s existing Yield/Resume protocol.

Task/Job: Beyond the Context Window

Sessions are bounded by the LLM’s context window. Fine for a single conversation, but real agent tasks—“refactor this module,” “investigate and fix this production issue,” “research competitors and write a report”—span hours or days.

This needs a new layer above sessions:

The hard problem is continuity. When a new session picks up a task, it needs enough context to continue without re-deriving everything. This means structured handoff artifacts—not just “here’s the conversation history” but “here’s the current state, what’s been tried, what’s left.”

This layering maps to the HITE formula: Task carries Human Defines (goals and constraints), Run/Session carries Agent Executes (autonomous operation), and the Task’s final state delivers to Human Owns (outcome accountability).

Checkpointed Reasoning as Trust Infrastructure

Alan already records every thought, action, and observation in a durable rollout log—what I call Checkpointed Reasoning. But for HITE to work—for humans to genuinely feel comfortable not watching every step—the rollout needs to evolve from a display log into a verifiable evidence chain:

Humans let go not because they blindly trust the agent, but because they can trace the complete decision chain at any time. Checkpointed Reasoning upgrades from “technical feature” to “trust infrastructure.”

Skills over Plugins: The Unix Philosophy

Under HITE, agents need to autonomously orchestrate many tools over long periods. The architecture of tool orchestration directly determines how controllable and maintainable the system is.

The current industry trend leans toward MCP (Model Context Protocol). But MCP’s design runs counter to the Unix philosophy—“do one thing well” and “communicate through text streams.” It introduces heavyweight client/server architecture, handshake protocols, and state management. To let an agent use a simple tool, you have to wrap it as an RPC server.

OpenAI learned this the hard way. When building the Codex App Server, they initially tried shipping Codex as an MCP server, but quickly found it “difficult to maintain MCP semantics” and pivoted to their own JSON-RPC protocol. When you need rich agent interactions—thread lifecycles, streaming progress, approval interrupts—MCP’s abstraction level isn’t enough.

Alan takes a different path: Skills over Plugins.

In this architecture, MCP and OpenAPI are just atomic executors among many. Skills are the orchestration layer—the logic controllers.

This solves real problems: context isolation (the model only loads the current skill, not every tool schema), composition determinism (skills define explicit data flow), auth decoupling (each tool manages its own authentication), and developer sovereignty (write a CLI + a SKILL.md, no servers required).

Update (March 2, 2026)

Today I came across Eric Holmes’s post on Hacker News: MCP is dead. Long live the CLI. One line captures the core tradeoff well: “The best tools are the ones that work for both humans and machines.”

This aligns with the “Skills over Plugins” argument above. In Alan, Skills remain the orchestration layer over small CLI executors. MCP and OpenAPI still have value as adapters when needed, but CLI-first is the default path because it gives better composability, easier debugging, and cleaner permission boundaries.

Found this worth reading or have thoughts to share?