Skills, Not Just Tools — Rethinking Agent Capabilities

The agent ecosystem is obsessed with tools. MCP servers, tool calling, function calling—every framework announcement centers on how many tools you can connect. But tools are the wrong abstraction for where we're headed. We should be talking about skills.

This isn't semantic. It's the difference between building copilots and building agents that actually work autonomously.

Tools Were Right for Copilots

Tools made sense in 2024. A human sat in the loop, providing judgment. The agent called get_weather(city) or send_email(to, subject, body), and the human decided what to do with the result.

This works when:

Tasks are single-step or short sequences
A human validates each decision
Context doesn't compound

Most "agents" today are still copilots. They suggest, you approve. They draft, you edit. The human provides the judgment layer. According to Stack Overflow's 2025 Developer Survey, 84% of developers now use AI tools, with 51% using them daily—the majority are using them as assistants, not autonomous agents.

But the ceiling is visible. In coding, we went from "AI autocompletes my line" to "AI writes most of the code while I review." Microsoft CEO Satya Nadella recently revealed that 20-30% of Microsoft's code is now AI-generated. At the extreme end, Y Combinator reported that 25% of their Winter 2025 founders had 95% of their code generated by AI.

The ratio keeps shifting. The human-in-the-loop is becoming human-on-the-loop, then human-out-of-the-loop.

When the human steps back, tools break down.

COPILOT MODEL

human-in-the-loop

Human

→judgment→

Agent

→

Tools

Judgment is external
Human decides what's right

AUTONOMOUS MODEL

judgment embedded

Agent

→

Skills

→

Tools

instructions · context · evaluation

Judgment is embedded
Skills know what's right

The Tool Mental Model Breaks at Autonomy

Tools are atomic—stateless functions that execute and return. search_web(query) doesn't know why you're searching or what counts as a good result. send_slack_message(channel, text) doesn't know if now is the right time to send it.

When a human orchestrates, this is fine. The human holds the context, the judgment, the domain knowledge.

When an agent orchestrates autonomously, it needs that judgment embedded. As IBM's research team puts it, a true AI agent is "an intelligent entity with reasoning and planning capabilities that can autonomously take action"—but those reasoning and planning capabilities are exactly what tools don't provide.

A tool can't tell you:

When to use it vs. when to skip it
What success looks like
How to handle the edge case you didn't anticipate
What to do when it fails

This is where the tool abstraction hits its ceiling.

What a Skill Actually Is

A skill is a packaged capability—not just a function, but the expertise to use it well.

Think about what it actually means to learn a craft. You can hand someone a hammer, a saw, and a tape measure—they now have tools. But watch them try to build a cabinet. They'll drive nails at the wrong angle. They'll cut boards without accounting for the blade width. They'll measure once, cut twice, and wonder why nothing fits.

Now compare that to someone who's learned carpentry. They have the same hammer, but they also know: which wood to use for which purpose, how humidity affects joints, when to use screws instead of nails, how to recover when a cut goes wrong. The tools are identical. The capability is worlds apart.

That's the gap between tools and skills. A tool is an action. A skill is the judgment to use it well.

SKILL — packaged capability

Tools

atomic functions← MCP stops here

Instructions

how and when to use them

Context

domain knowledge, constraints

Evaluation

what success looks like

Recovery

what to do when things break

Concrete example: "Email Outreach" as a skill

A tool approach gives you:

send_email(to, subject, body)
check_inbox()
get_contact_info(name)

An agent with these tools can technically send emails. But it doesn't know:

How many follow-ups are appropriate before you're spamming
That Tuesday morning converts better than Friday afternoon
To check if this person was contacted by a colleague last week
What "success" means—open rate? reply? meeting booked?
How to adjust tone for a CEO vs. a developer

A skill packages all of this:

The tools, yes
Instructions: "Follow up twice, spaced 3-4 days apart. Stop if they reply or unsubscribe."
Context: "Reference their recent funding round if applicable. Match formality to their LinkedIn tone."
Data: Integration with CRM to check contact history
Evaluation: "Success = reply or meeting. Failure = bounce or unsubscribe."
Recovery: "If email bounces, try to find alternate contact via LinkedIn."

The skill doesn't just execute. It knows the job.

The Hierarchy

Tools don't disappear. They become components:

AGENT

orchestrates capabilities

SKILLS — packaged capabilities with embedded judgment

Outreach

Research

Analysis

COMPONENTS (inside each skill)

ToolsInstructionsContextEvaluationRecovery

Tools are what agents call. Skills are what agents have.

Why MCP Isn't Enough

MCP moved the industry forward. It standardized how tools connect to agents—tools, prompts, and resources in one protocol. But MCP is infrastructure, not the full stack.

What's missing:

No evaluation. How do you know a skill works? MCP doesn't define testing, quality metrics, or reliability signals. You're trusting the tool author blindly. This matters: more developers now actively distrust AI tool accuracy (46%) than trust it (33%).

No composability. Real tasks require skills to chain—research feeds into outreach feeds into CRM updates. MCP doesn't specify how capabilities compose or hand off context.

No discovery. MCP assumes you know what you need. There's no marketplace, no trust layer, no way to find "the best skill for this job" across the ecosystem.

No adaptation. MCP resources are static. Skills need to adjust based on runtime context—what's worked so far, what just failed, what the agent learned mid-task.

MCP gives you the primitives. The skill layer is what's missing.

What Developers Should Build Toward

If you're building agents today:

Package capabilities, not functions. Don't ship a tool—ship the expertise to use it. Include instructions, success criteria, and failure modes.

Design for composition. Your skill will be one of many. Define clear inputs, outputs, and context handoffs.

Build in evaluation. How will someone know your skill works? Include test cases, expected behaviors, reliability metrics.

Assume no human. Design as if there's no one to ask "is this right?" The judgment has to be embedded.

Where This Goes

The industry will converge on skills. Not because it's a better word, but because autonomous agents require embedded judgment, and tools don't carry it.

We're not there yet. Stack Overflow's 2025 survey found that 52% of developers either don't use AI agents or stick to simpler tools, and 38% have no plans to adopt them. The gap isn't capability—it's trust. Developers don't trust tools that are "almost right."

The frameworks that win will make skills:

Discoverable — Find the right capability for the job
Composable — Chain skills into complex workflows
Trustworthy — Know they work before you deploy them

Tool calling was the floor. Skills are what agents actually need.

If you're building in this space, I'd like to hear what you're working on—find me on Twitter or LinkedIn.