A few weeks ago I was on a call with the person who runs operations at a forty-person engineering firm. They had just wrapped a leadership offsite. The big takeaway from the offsite, written on a whiteboard photo she sent me, was: "Build AI agents to handle proposal intake."
I asked her what proposal intake actually looked like. She walked me through it. An email comes in from a client or a referral. Someone — usually her — reads it, decides whether it's worth pursuing, and assigns it to one of three project leads based on the type of work and who has bandwidth. The lead pulls the past project file that's closest to it, drafts a fee, and sends it back to her for review.
That is not a job for an agent.
That is a job for a checklist, a shared inbox, and maybe — at the high end — a single AI prompt that suggests which past project file is the closest match. Total build time: a couple of days. Cost: nearly nothing.
What was being sold to her at the offsite was the most expensive, most fragile, and least necessary version of the same idea.
This is the second mistake I see most often, right after the one I wrote about last month. The first is companies trying to put AI on top of an operation nobody has ever drawn. The second is companies reaching for the most complicated tool in the box when the simplest one would do.
What the people building the model recommend
The most cited engineering guide on building AI systems for business in 2026 is published by Anthropic — the company that makes Claude. The whole field references it. It is worth any owner or operator's time once.
The thing that surprised me when I first read it was the headline advice. Not from a vendor. Not from a consultant. From the people who build the model itself:
"When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed. This might mean not building agentic systems at all."
They go on to make a distinction that almost no one in the industry uses correctly. There is a difference between an agent and a workflow, and the difference matters.
A workflow is a sequence somebody wrote down. Step A, then step B, then step C. AI can do one of the steps — classify, summarize, extract — but a person designed the sequence and the sequence does not change.
An agent is a system where the AI decides the sequence on its own, in the moment, based on what it finds. You give it a goal and a set of tools. It picks what to do next.
Both are useful. Neither is better than the other in general. They have different costs, different risks, and different use cases. And — this is the part the offsite gurus skip over — most of the work a 50–200 person company wants AI to do is the workflow kind, not the agent kind.
MIT studied 300 enterprise AI deployments last year. Ninety-five percent produced no measurable result. The pattern the researchers named was that the systems being built couldn't learn — they didn't retain feedback, didn't adapt to context, didn't improve over time. The companies that succeeded did the opposite of going big. They picked one specific, high-value problem and built the simplest thing that could solve it.
The four levels
Almost every AI request that lands on my desk falls onto one of four levels of complexity. The rule is the same as the one Anthropic gives its own engineers: pick the lowest level that can do the job. Don't go higher unless you have to.
Level 1 — A person and a chat window
Most of what AI is doing inside small companies today, it is doing through ChatGPT or Claude in a browser tab. Drafting an email. Cleaning up a meeting transcript. Reformatting a scope. Pulling the highlights out of a long PDF. There is no system. No integration. There is a person with a problem in front of them, pasting context into a chat window and getting back something useful.
This is not a stepping stone to something more sophisticated. For most of what a person does in a day, this is the answer. A 2026 Harvard Business Review study found that employees frustrated with cumbersome corporate AI tools were quietly using their personal ChatGPT and Claude accounts on the side — and getting more value from them. Their case study at BBVA, the Spanish bank, found that giving 11,000 employees access to a secure AI environment and twenty minutes of training produced 4,800 internally-built tools and 2–5 hours of time saved per person, per week.
The honest read for most 50–200 person companies: if every senior person at your company had a paid ChatGPT or Claude account and twenty minutes of training on how to use it well, you would capture seventy percent of the value AI has to offer your business. That is roughly twenty dollars a person, per month.
Level 2 — A single AI step inside something you already do
The next level up is one specific place where AI does one specific thing inside an existing process. Not a system. A step.
An AI step that reads an incoming proposal request and suggests the closest past project. An AI step that reads a meeting transcript and pulls action items into a draft email. An AI step that reads a vendor invoice and pre-fills the line items into accounting for a person to review.
The thing that makes Level 2 work is that the surrounding workflow does not change. The person still owns the process. The AI does one piece of it that used to take fifteen minutes and now takes thirty seconds. If the AI gets it wrong, the person catches it, because the person was going to do that step anyway.
Most of what gets sold as "AI agents" to mid-sized companies is, on close inspection, a Level 2 build. That is not a criticism. Level 2 is where most of the value is. The criticism is that it is being priced and built like a Level 4 system.
Level 3 — A workflow with several AI steps in a row
Level 3 is what happens when you string several Level 2 pieces together into a sequence. Read the email. Classify the type of request. Pull the relevant past data. Draft the first response. Route it to the right person for review.
This is still a workflow. A person — usually you, with help — wrote down the steps. The AI fills in pieces of each step. If the input is unusual, the workflow still runs the same way. The person at the end catches what the AI missed.
Level 3 is where most real automation lives in 50–200 person companies. Proposal intake. New-client onboarding. Closeout document assembly. Weekly report generation. Workflows that have always existed; the AI just makes each step faster. Build cost is real but contained — a few thousand dollars and a few weeks, not a six-figure engagement.
The signal that you have a Level 3 problem and not a Level 4 problem: a senior person in your company could write down exactly what should happen at every step, in advance, before the work comes in.
Level 4 — An actual agent
Level 4 is the real thing. The AI is given a goal and a set of tools, and it decides — in the moment, based on what it finds — what to do next. It might call one tool. It might call fifteen. The sequence is not predetermined because the sequence depends on what shows up.
The honest test, taken from the same Anthropic guide, is three questions. If the answer to any of them is no, you do not need an agent. You need a workflow.
- Could a senior person in your company write down the exact steps in advance? If yes, it is a workflow.
- Are the inputs unstructured and variable? Documents in different formats, free-text emails, situations that don't fit a pattern. If your inputs come on a fixed form, it is a workflow.
- Does the work require genuine judgment across multiple pieces of evidence? Not pattern-matching against rules. Real reasoning. If a junior person could do it from a checklist, it is a workflow.
Genuine Level 4 problems exist. A research task where the AI has to look in three different systems and decide which to look in next. A complex incoming claim where the route depends on a dozen variables. They are real. They are also rare in a 50–200 person company. Most of the time, what gets called an agent is really a Level 3 workflow with the work of writing it down skipped.
Why the level matters more than the technology
Each level up costs roughly ten times more to build, ten times more to run, and ten times more to debug than the one below it. That is not a metaphor. That is the actual range you see in practice.
A Level 1 use case costs $20 a month and works tomorrow. A Level 2 build typically runs $3,000 to $8,000 and a few hundred dollars a month to run. A Level 3 workflow can be $15,000 to $40,000 to build and a few thousand a month. A real Level 4 agent, built properly with the guardrails it needs to be safe, is rarely below $50,000 to build and is genuinely hard to operate. Every Level 4 agent needs a person who reviews its decisions for the first several months. There is no skipping that.
The mistake the offsite guru makes is selling a Level 4 system to solve a Level 2 problem. The mistake the company makes is assuming the Level 4 system will be more reliable because it is more complex. It will not be. It will be less reliable. It will be more expensive. It will be harder to fix. And — this is the part that matters most — it will take longer to deliver any value.
The complexity of the tool should match the complexity of the problem. Not the ambition of the meeting it was decided in.
How to read your own list
Take whatever list of "AI ideas" came out of your last leadership meeting. There is one. Everyone has one right now.
For each item, ask three questions. Not in a workshop. In your head, in five minutes.
- Could the same outcome happen tomorrow if a person at your company got really good at ChatGPT? A surprising number of items collapse here. The answer is yes, and the cost is $20 a month.
- If not, what is the one specific moment where AI would help? Name the moment. Not "speed up proposals." The exact place. "Reading the incoming proposal request and finding the most relevant past project." That is a moment. It is a Level 2 build.
- Could a senior person write down every step that should happen, in order? If yes, it is a workflow. Build it as one. If genuinely no — if the steps depend on what gets found at each turn — then and only then is it an agent.
Most lists, when run through this filter, have one or two genuine Level 3 builds, a handful of Level 2 candidates, and one Level 4 idea that probably collapses to Level 3 once you try to draw it.
That is good news. The work is more tractable than the offsite made it sound. Level 2 and Level 3 builds are where real money gets saved at most companies — quietly, on the margin, every week — and they are buildable in months, not years.
The honest tension
Will the levels collapse into each other? Will Level 4 agents eventually be cheap enough and reliable enough that you do not have to think about any of this?
Probably, eventually. The cost of the underlying model has come down by an order of magnitude in the last two years and is still falling. The day will come when the right answer for many of these problems is a Level 4 agent because the cost difference no longer matters.
That day is not today. Today, the difference between a $400-a-month Level 2 build that solves the actual problem and a $4,000-a-month Level 4 agent that solves a slightly more general version of it is the difference between a project that pays for itself in three months and one that does not.
And here is the second part of the honest answer. Even when the levels do collapse, the companies that benefit most will be the ones that did the simple work first. Companies running Level 2 and Level 3 systems today are quietly drawing the operation as they go — naming the steps, watching where the work breaks, building the muscle of "we have a process for that, and the AI is one part of it." Companies waiting for the perfect agent will arrive at that future with no record of how the work has ever moved.
The order matters
Most companies do not need an AI agent yet. Most companies need three things, in this order:
- Every senior person fluent enough with ChatGPT or Claude to use it for an hour a day.
- Two or three Level 2 AI steps inside the workflows that quietly cost the most.
- One or two Level 3 workflows where the AI is doing several steps in sequence, with a person reviewing the output at the end.
That is most of the value AI has to offer a company of fifty to two hundred people in 2026. It is not glamorous. It does not photograph well on a slide. It will not get you on a podcast. It will, however, save real time and real money — and leave you with a company that runs better, not a company that owes its operations to a system nobody internal can fix when it breaks.
If you have a list of AI ideas from a leadership meeting and you would like an outside read on which level each of them really belongs on — and which one to start with — the Health Check is about ten minutes, free, no sales call required. If something it surfaces is worth a longer conversation, we can have one.
Sources referenced in this article
- Anthropic Engineering, Building Effective Agents — the original guide on workflows vs. agents and the principle of starting with the simplest solution.
- MIT NANDA Initiative, The GenAI Divide: State of AI in Business 2025 — the 95 percent figure on enterprise AI pilots, and the analysis of why most failed (systems that couldn't learn, retain feedback, or adapt to context).
- Alfaro et al., The Hidden Demand for AI Inside Your Company, Harvard Business Review, April 2026 — on shadow AI usage and the BBVA case study (11,000 active users, 4,800 internal tools, 2–5 hours saved per employee per week).
Related reading
- Sterity Insights, The Picture That Lives in Their Heads — the first piece in this series. Why most companies can't yet apply AI at any level: the operation hasn't been drawn.