Article

journal / how-to-build-a-website-with-ai-agents-without-letting-them-run-wild

How to build a team of AI agents without letting them run wild

AIWeb Apps
Carla G. June 6, 2026 Updated June 6, 2026 9 mins read

I do not think agentic development is about replacing developers with a chatbot.

Hand-drawn technical diagram showing agentic development as a governed team workflow with PM, developer, reviewer, QA, tools, tests, approvals, pull request, and website release.

I think it is about something more practical: turning AI from a single helper into a controlled delivery system.

That means roles, tools, state, evidence, reviews, approvals, logs, and clear stopping points.

This matters because one AI assistant can help you move faster. A team of AI agents can help you move work through a process. But only if the process is designed properly.

Hand-drawn technical diagram showing agentic development as a governed team workflow with PM, developer, reviewer, QA, tools, tests, approvals, pull request, and website release.
Agentic development is not one assistant doing everything. It is a governed workflow with roles, tools, review, evidence, and release gates.

The short version

Traditional AI coding help works like this:

Human asks → AI answers → human decides what to do next

Agentic development works more like this:

Goal → PM agent plans → worker agents act → reviewer agents check → evidence is recorded → human approves → release moves forward

The difference is not just speed. The difference is structure.

Visual model: assistant vs agentic workflow

ChatbotAnswer

GoalPlanBuildReviewApproveRelease

The mental model: agents are not magic, they are workers with contracts

A useful AI agent needs a contract.

By contract, I mean a clear agreement about what the agent can do, what it cannot do, what evidence it must produce, and when it must stop.

A good analogy is a construction site.

You would not let the electrician approve the whole building. You would not let a junior worker change the structural plan. You would not call the building complete because someone created a pile of materials.

Software agents need the same discipline.

Construction siteAgentic development
ArchitectSystem designer or PM agent
BuilderDeveloper agent
InspectorReviewer or QA agent
PermitHuman approval gate
BlueprintPlan, task file, architecture note
Inspection reportEvidence, tests, logs, review notes

Why companies are moving this way

Companies are not moving toward agents only because agents are fashionable.

They are moving this way because normal AI assistants have limits.

  • A chatbot waits for the next prompt. A workflow agent can keep a task moving.
  • A chatbot answers from context. A tool-using agent can inspect files, tickets, docs, logs, or test results.
  • A chatbot can forget the process. A stateful workflow can remember what phase the work is in.
  • A chatbot can sound confident. A governed agent must produce evidence.
  • A chatbot can help one person. A multi-agent system can support a team process.

This is why agentic development is likely to become normal in serious engineering teams. The future is not just “AI writes code”. The future is AI participating in the software delivery lifecycle: planning, implementation, review, testing, documentation, pull request preparation, release notes, and monitoring.

The core process

This is the process I would use for a real website, app, plugin, or internal tool.

1. Define the outcome.
2. Split the work into small tasks.
3. Assign each task to the right agent role.
4. Give each agent a narrow tool set.
5. Require evidence from every task.
6. Review the evidence separately.
7. Ask for human approval at risk points.
8. Only then move to the next step.
Controlled agent loop

GoalTaskWorkerEvidenceReviewApproval

If review fails, the task loops back for repair. It does not move forward just because the agent says done.

The roles in an AI agent team

RoleWhat it doesWhat it must not do
PM agentTurns goals into tasks, tracks status, records risks, asks for approval.Approve missing evidence.
Developer agentReads the task, edits scoped files, runs checks, writes a handoff.Change unrelated files or deploy alone.
Reviewer agentChecks correctness, scope, maintainability, security, and assumptions.Rubber-stamp the developer agent.
QA agentRuns functional, responsive, regression, and browser checks.Ignore failed or untested paths.
Accessibility agentChecks keyboard, names, headings, forms, contrast, focus, and WCAG risks.Claim accessibility without evidence.
Release agentPrepares release notes, changelog, package, and deployment checklist.Release without approval.

The framework landscape

Different frameworks exist because agentic systems have different problems.

Some tools are good at specialist handoffs. Some are good at state machines. Some are good at role-based teams. Some are built for enterprise governance. Some are better for coding workflows inside a repository.

Framework or approachWhy it existsDifficultyBest use
Manual agent chatsTo learn the process before automation.EasySmall experiments and learning.
File-based project controlTo make work visible and auditable.Easy to mediumCodex, local repos, solo builders.
Codex / coding-agent workflowTo let agents work directly in a codebase.MediumImplementation, refactoring, tests, PR prep.
OpenAI Agents SDKTo build custom agents with tools, guardrails, sessions, tracing, and handoffs.MediumCustom agent apps and specialist routing.
LangGraphTo build stateful graph workflows with loops, memory, durable execution, and human approval.Medium to hardStrict workflows and long-running processes.
CrewAITo model role-based crews, tasks, and event-driven flows.MediumTeam-like multi-agent workflows.
Microsoft Agent FrameworkTo bring AutoGen and Semantic Kernel patterns into a more enterprise-shaped agent framework.HardMicrosoft, Azure, enterprise governance.
AutoGenTo explore multi-agent conversations and tool-using agent collaboration.Medium to hardLearning, research, existing AutoGen systems.

1. Manual agent chats

This is the beginner version. You open separate chats and give each chat a role.

The value is not speed. The value is learning the workflow.

Manual chats

CarlaPMDeveloperReviewerQACarla

Good for learning. Bad for scale because the human becomes the router.

2. File-based project control

This is the first serious step. Instead of trusting a chat transcript, you create files that store the project state.

project-control/
  PROJECT-STATUS.md
  TASK-BOARD.md
  RISK-LOG.md
  DECISION-LOG.md
  APPROVAL-LOG.md
  EVIDENCE-REGISTER.md
  tasks/
  handoffs/
  reviews/
  test-results/

Analogy: this is the project’s black box recorder. If something goes wrong, you can inspect what happened.

3. Codex and coding-agent workflows

A coding agent workflow is different from a normal chatbot because it can inspect the repository, edit files, run commands, and report changed files.

This is where agentic development starts to feel real.

PM agent creates task.
Developer agent edits scoped files.
Developer agent runs tests.
Reviewer agent checks diff.
QA agent verifies behavior.
Human approves pull request.

Real example: a developer asks for a mobile navigation fix. The PM agent creates a task. The developer agent changes only the navigation module. The reviewer agent checks keyboard behavior, focus return, and unrelated file changes. The QA agent tests desktop and mobile. The human reviews the pull request.

4. OpenAI Agents SDK

OpenAI Agents SDK is useful when you want to build a custom agent application instead of using only a chat UI.

The important concepts are agents, tools, guardrails, sessions, tracing, and handoffs.

A handoff means one agent can transfer the task to a more specialised agent.

OpenAI Agents SDK model

Triage agentDeveloper agent
Triage agentReviewer agent
Tools + Guardrails + Tracing

Real example: build an internal support agent. A triage agent reads the request. If it is a billing question, it hands off to a billing agent. If it is a technical bug, it hands off to a technical agent. Guardrails stop the agent from exposing private data or taking unsafe actions.

5. LangGraph

LangGraph exists because real agent workflows are rarely straight lines.

They branch, pause, retry, wait for humans, remember state, and continue later.

That is why LangGraph is useful for stateful graph workflows.

LangGraph model

DraftReview pendingHuman approvalNext phase
FailRemediationReview pending

Real example: a compliance workflow. The agent drafts a report, a reviewer checks it, a human approves it, and if the review fails, the workflow loops back for remediation instead of moving forward.

6. CrewAI

CrewAI uses the idea of crews and flows.

A crew is a group of agents with roles. A flow controls the sequence and state of the work.

This is easy to understand because it maps to how teams already work.

CrewAI model

Crew: PM Researcher Developer QA

Flow: Plan → Research → Build → Test → Report

Real example: content production. One agent researches sources, one drafts, one checks facts, one prepares SEO metadata, and one creates the publishing checklist. The risk is that without evidence gates, the crew can produce a polished but weak article.

7. Microsoft Agent Framework

Microsoft Agent Framework is aimed at more enterprise-shaped agent development. It brings together ideas from Microsoft’s AutoGen and Semantic Kernel direction.

This is not where I would start for a solo website project. It makes more sense when a company needs typed workflows, enterprise integrations, observability, governance, and Microsoft ecosystem alignment.

Microsoft Agent Framework model

Enterprise appAgent workflowMiddlewareTelemetryAzure / Microsoft tools

Real example: a large company wants an internal agent that can read approved documentation, search tickets, call approved APIs, create draft work items, and log every tool call for audit.

8. AutoGen

AutoGen helped popularise multi-agent conversations: agents talking to each other, using tools, and involving a human when needed.

It is still useful to understand the history and existing systems. For new Microsoft-oriented work, Microsoft is moving toward Agent Framework.

AutoGen model

Agent AAgent BAgent C

Human can interrupt Tools can be used

How to choose

If you need…Start with…Why
To learn the conceptManual chatsYou see every handoff clearly.
A serious solo projectFile-based control + CodexYou get evidence and repo changes without heavy framework setup.
A custom agent appOpenAI Agents SDKGood for agents, tools, guardrails, handoffs, tracing.
A strict state machineLangGraphGood for loops, human approval, durable workflows.
A role-based agent teamCrewAIGood mental model for crews, tasks, and flows.
Enterprise Microsoft integrationMicrosoft Agent FrameworkBetter fit for Azure, enterprise patterns, governance.
Research or older multi-agent systemsAutoGenUseful for understanding multi-agent conversation patterns.

The safety rule: no agent should have more power than the process can verify

This is the rule I would use in any real project.

No agent should have more freedom than the team can test, monitor, explain, and roll back.

If the agent can only draft text, the risk is low. If it can edit code, the risk is higher. If it can query private data, open pull requests, change production systems, or send messages externally, the risk is much higher.

Low risk:
Read public docs and summarise.

Medium risk:
Read project files and suggest changes.

High risk:
Edit files, create pull requests, call internal APIs.

Critical risk:
Touch production, customer data, payments, security systems, or deployment pipelines.

A practical starter architecture

If I were starting a real project today, I would begin with this simple architecture.

agent-system/
  project-control/
    PROJECT-STATUS.md
    TASK-BOARD.md
    RISK-LOG.md
    APPROVAL-LOG.md
    EVIDENCE-REGISTER.md
  tasks/
    DEV-001-fix-navigation.md
    QA-001-test-navigation.md
  handoffs/
    DEV-001-handoff.md
  reviews/
    REVIEW-001-navigation.md
  scripts/
    run-checks.sh

Then I would use this prompt shape for every agent task:

Role:
You are the Developer Agent.

Task:
Fix the mobile navigation keyboard behavior.

Scope:
Only edit the navigation component and related tests.

Do not touch:
Homepage layout, footer, unrelated CSS, production settings.

Evidence required:
- files changed
- tests run
- browser checks
- known issues
- handoff note

Stop condition:
Stop after the handoff. Do not approve your own work.

Why agentic development is the future

Agentic development is the future because software work is not only writing code.

Real software work includes reading requirements, checking constraints, searching documentation, changing files, running tests, reviewing evidence, writing summaries, creating pull requests, checking accessibility, checking security, and explaining what changed.

Those are process tasks. Agents are useful when they can move through process tasks with tools and evidence.

The future is not “one AI does everything”. That is too vague and too risky.

The future is smaller specialised agents working inside governed systems.

Final checklist

  • Define the goal before choosing a framework.
  • Split the workflow into agent roles.
  • Give each agent a limited tool set.
  • Require evidence for every task.
  • Separate maker and reviewer.
  • Use human approval for risky steps.
  • Record decisions, risks, and failures.
  • Choose a framework only after the process is clear.

Sources and further reading