How to build a team of AI agents without letting them run wild

Carla G. June 6, 2026 Updated June 6, 2026 9 mins read

I do not think agentic development is about replacing developers with a chatbot.

Hand-drawn technical diagram showing agentic development as a governed team workflow with PM, developer, reviewer, QA, tools, tests, approvals, pull request, and website release.

I think it is about something more practical: turning AI from a single helper into a controlled delivery system.

That means roles, tools, state, evidence, reviews, approvals, logs, and clear stopping points.

This matters because one AI assistant can help you move faster. A team of AI agents can help you move work through a process. But only if the process is designed properly.

The short version

Traditional AI coding help works like this:

Human asks → AI answers → human decides what to do next

Agentic development works more like this:

Goal → PM agent plans → worker agents act → reviewer agents check → evidence is recorded → human approves → release moves forward

The difference is not just speed. The difference is structure.

Visual model: assistant vs agentic workflow

Chatbot → Answer

Goal → Plan → Build → Review → Approve → Release

The mental model: agents are not magic, they are workers with contracts

A useful AI agent needs a contract.

By contract, I mean a clear agreement about what the agent can do, what it cannot do, what evidence it must produce, and when it must stop.

A good analogy is a construction site.

You would not let the electrician approve the whole building. You would not let a junior worker change the structural plan. You would not call the building complete because someone created a pile of materials.

Software agents need the same discipline.

Construction site	Agentic development
Architect	System designer or PM agent
Builder	Developer agent
Inspector	Reviewer or QA agent
Permit	Human approval gate
Blueprint	Plan, task file, architecture note
Inspection report	Evidence, tests, logs, review notes

Why companies are moving this way

Companies are not moving toward agents only because agents are fashionable.

They are moving this way because normal AI assistants have limits.

A chatbot waits for the next prompt. A workflow agent can keep a task moving.
A chatbot answers from context. A tool-using agent can inspect files, tickets, docs, logs, or test results.
A chatbot can forget the process. A stateful workflow can remember what phase the work is in.
A chatbot can sound confident. A governed agent must produce evidence.
A chatbot can help one person. A multi-agent system can support a team process.

This is why agentic development is likely to become normal in serious engineering teams. The future is not just “AI writes code”. The future is AI participating in the software delivery lifecycle: planning, implementation, review, testing, documentation, pull request preparation, release notes, and monitoring.

The core process

This is the process I would use for a real website, app, plugin, or internal tool.

1. Define the outcome.
2. Split the work into small tasks.
3. Assign each task to the right agent role.
4. Give each agent a narrow tool set.
5. Require evidence from every task.
6. Review the evidence separately.
7. Ask for human approval at risk points.
8. Only then move to the next step.

Controlled agent loop

Goal → Task → Worker → Evidence → Review → Approval

If review fails, the task loops back for repair. It does not move forward just because the agent says done.

The roles in an AI agent team

Role	What it does	What it must not do
PM agent	Turns goals into tasks, tracks status, records risks, asks for approval.	Approve missing evidence.
Developer agent	Reads the task, edits scoped files, runs checks, writes a handoff.	Change unrelated files or deploy alone.
Reviewer agent	Checks correctness, scope, maintainability, security, and assumptions.	Rubber-stamp the developer agent.
QA agent	Runs functional, responsive, regression, and browser checks.	Ignore failed or untested paths.
Accessibility agent	Checks keyboard, names, headings, forms, contrast, focus, and WCAG risks.	Claim accessibility without evidence.
Release agent	Prepares release notes, changelog, package, and deployment checklist.	Release without approval.

The framework landscape

Different frameworks exist because agentic systems have different problems.

Some tools are good at specialist handoffs. Some are good at state machines. Some are good at role-based teams. Some are built for enterprise governance. Some are better for coding workflows inside a repository.

Framework or approach	Why it exists	Difficulty	Best use
Manual agent chats	To learn the process before automation.	Easy	Small experiments and learning.
File-based project control	To make work visible and auditable.	Easy to medium	Codex, local repos, solo builders.
Codex / coding-agent workflow	To let agents work directly in a codebase.	Medium	Implementation, refactoring, tests, PR prep.
OpenAI Agents SDK	To build custom agents with tools, guardrails, sessions, tracing, and handoffs.	Medium	Custom agent apps and specialist routing.
LangGraph	To build stateful graph workflows with loops, memory, durable execution, and human approval.	Medium to hard	Strict workflows and long-running processes.
CrewAI	To model role-based crews, tasks, and event-driven flows.	Medium	Team-like multi-agent workflows.
Microsoft Agent Framework	To bring AutoGen and Semantic Kernel patterns into a more enterprise-shaped agent framework.	Hard	Microsoft, Azure, enterprise governance.
AutoGen	To explore multi-agent conversations and tool-using agent collaboration.	Medium to hard	Learning, research, existing AutoGen systems.

1. Manual agent chats

This is the beginner version. You open separate chats and give each chat a role.

The value is not speed. The value is learning the workflow.

Manual chats

Carla → PM → Developer → Reviewer → QA → Carla

Good for learning. Bad for scale because the human becomes the router.

2. File-based project control

This is the first serious step. Instead of trusting a chat transcript, you create files that store the project state.

project-control/
  PROJECT-STATUS.md
  TASK-BOARD.md
  RISK-LOG.md
  DECISION-LOG.md
  APPROVAL-LOG.md
  EVIDENCE-REGISTER.md
  tasks/
  handoffs/
  reviews/
  test-results/

Analogy: this is the project’s black box recorder. If something goes wrong, you can inspect what happened.

3. Codex and coding-agent workflows

A coding agent workflow is different from a normal chatbot because it can inspect the repository, edit files, run commands, and report changed files.

This is where agentic development starts to feel real.

PM agent creates task.
Developer agent edits scoped files.
Developer agent runs tests.
Reviewer agent checks diff.
QA agent verifies behavior.
Human approves pull request.

Real example: a developer asks for a mobile navigation fix. The PM agent creates a task. The developer agent changes only the navigation module. The reviewer agent checks keyboard behavior, focus return, and unrelated file changes. The QA agent tests desktop and mobile. The human reviews the pull request.

4. OpenAI Agents SDK

OpenAI Agents SDK is useful when you want to build a custom agent application instead of using only a chat UI.

The important concepts are agents, tools, guardrails, sessions, tracing, and handoffs.

A handoff means one agent can transfer the task to a more specialised agent.

OpenAI Agents SDK model

Triage agent → Developer agent
Triage agent → Reviewer agent
Tools + Guardrails + Tracing

Real example: build an internal support agent. A triage agent reads the request. If it is a billing question, it hands off to a billing agent. If it is a technical bug, it hands off to a technical agent. Guardrails stop the agent from exposing private data or taking unsafe actions.

5. LangGraph

LangGraph exists because real agent workflows are rarely straight lines.

They branch, pause, retry, wait for humans, remember state, and continue later.

That is why LangGraph is useful for stateful graph workflows.

LangGraph model

Draft → Review pending → Human approval → Next phase
Fail → Remediation → Review pending

Real example: a compliance workflow. The agent drafts a report, a reviewer checks it, a human approves it, and if the review fails, the workflow loops back for remediation instead of moving forward.

6. CrewAI

CrewAI uses the idea of crews and flows.

A crew is a group of agents with roles. A flow controls the sequence and state of the work.

This is easy to understand because it maps to how teams already work.

CrewAI model

Crew: PM Researcher Developer QA

Flow: Plan → Research → Build → Test → Report

Real example: content production. One agent researches sources, one drafts, one checks facts, one prepares SEO metadata, and one creates the publishing checklist. The risk is that without evidence gates, the crew can produce a polished but weak article.

7. Microsoft Agent Framework

Microsoft Agent Framework is aimed at more enterprise-shaped agent development. It brings together ideas from Microsoft’s AutoGen and Semantic Kernel direction.

This is not where I would start for a solo website project. It makes more sense when a company needs typed workflows, enterprise integrations, observability, governance, and Microsoft ecosystem alignment.

Microsoft Agent Framework model

Enterprise app → Agent workflow → Middleware → Telemetry → Azure / Microsoft tools

Real example: a large company wants an internal agent that can read approved documentation, search tickets, call approved APIs, create draft work items, and log every tool call for audit.

8. AutoGen

AutoGen helped popularise multi-agent conversations: agents talking to each other, using tools, and involving a human when needed.

It is still useful to understand the history and existing systems. For new Microsoft-oriented work, Microsoft is moving toward Agent Framework.

AutoGen model

Agent A ↔ Agent B ↔ Agent C

Human can interrupt Tools can be used

How to choose

If you need…	Start with…	Why
To learn the concept	Manual chats	You see every handoff clearly.
A serious solo project	File-based control + Codex	You get evidence and repo changes without heavy framework setup.
A custom agent app	OpenAI Agents SDK	Good for agents, tools, guardrails, handoffs, tracing.
A strict state machine	LangGraph	Good for loops, human approval, durable workflows.
A role-based agent team	CrewAI	Good mental model for crews, tasks, and flows.
Enterprise Microsoft integration	Microsoft Agent Framework	Better fit for Azure, enterprise patterns, governance.
Research or older multi-agent systems	AutoGen	Useful for understanding multi-agent conversation patterns.

The safety rule: no agent should have more power than the process can verify

This is the rule I would use in any real project.

No agent should have more freedom than the team can test, monitor, explain, and roll back.

If the agent can only draft text, the risk is low. If it can edit code, the risk is higher. If it can query private data, open pull requests, change production systems, or send messages externally, the risk is much higher.

Low risk:
Read public docs and summarise.

Medium risk:
Read project files and suggest changes.

High risk:
Edit files, create pull requests, call internal APIs.

Critical risk:
Touch production, customer data, payments, security systems, or deployment pipelines.

A practical starter architecture

If I were starting a real project today, I would begin with this simple architecture.

agent-system/
  project-control/
    PROJECT-STATUS.md
    TASK-BOARD.md
    RISK-LOG.md
    APPROVAL-LOG.md
    EVIDENCE-REGISTER.md
  tasks/
    DEV-001-fix-navigation.md
    QA-001-test-navigation.md
  handoffs/
    DEV-001-handoff.md
  reviews/
    REVIEW-001-navigation.md
  scripts/
    run-checks.sh

Then I would use this prompt shape for every agent task:

Role:
You are the Developer Agent.

Task:
Fix the mobile navigation keyboard behavior.

Scope:
Only edit the navigation component and related tests.

Do not touch:
Homepage layout, footer, unrelated CSS, production settings.

Evidence required:
- files changed
- tests run
- browser checks
- known issues
- handoff note

Stop condition:
Stop after the handoff. Do not approve your own work.

Why agentic development is the future

Agentic development is the future because software work is not only writing code.

Real software work includes reading requirements, checking constraints, searching documentation, changing files, running tests, reviewing evidence, writing summaries, creating pull requests, checking accessibility, checking security, and explaining what changed.

Those are process tasks. Agents are useful when they can move through process tasks with tools and evidence.

The future is not “one AI does everything”. That is too vague and too risky.

The future is smaller specialised agents working inside governed systems.

Final checklist

Define the goal before choosing a framework.
Split the workflow into agent roles.
Give each agent a limited tool set.
Require evidence for every task.
Separate maker and reviewer.
Use human approval for risky steps.
Record decisions, risks, and failures.
Choose a framework only after the process is clear.