Building Multi-Agent AI Systems Using Docker Agent
Introduction Docker Agent is an open source framework designed to facilitate the creation and execution of multi agent AI systems. It allows developers to configure teams of AI...
Introduction
Docker Agent is an open-source framework designed to facilitate the creation and execution of multi-agent AI systems. It allows developers to configure teams of AI agents using YAML files, eliminating the need for Python orchestration code. Each agent can deploy its unique model, whether through cloud APIs like OpenAI and Anthropic, or local models via Docker Model Runner. These agents can perform specialized tasks such as accessing the filesystem, conducting web searches, and integrating custom MCP tools. Docker Agent also enables the distribution of agents as OCI artifacts, which can be shared and deployed swiftly.
This tutorial provides a step-by-step guide on setting up a DigitalOcean Droplet with Docker Agent pre-installed. Using the Docker Agent 1-Click App from the DigitalOcean Marketplace, you will learn how to create a Bug Investigator—a multi-agent system where agents collaborate to diagnose errors, research fixes, implement solutions, and generate tests. You'll also configure cloud API and local model inference and explore further Docker tutorials on Ubuntu 24.04 once the guide is complete.
Note: The Docker Agent 1-Click image includes version 1.9.10 under its previous name cagent. Commands in this tutorial use cagent as per the pre-installed version, but the YAML configurations and architecture remain consistent with newer versions.
Key Takeaways
- The Docker Agent 1-Click Droplet significantly reduces deployment time to under five minutes with a pre-installed Docker Agent on Ubuntu 24.04.
- Multi-agent architectures divide complex tasks among specialized agents, enabling efficient task handling. For instance, a Tester agent can automatically generate unit tests without explicit prompts.
- Docker Agent supports both cloud APIs and local models through Docker Model Runner for environments requiring data privacy.
- Local inference with models like Qwen3 8B can identify bugs accurately but is slower on CPU compared to cloud APIs. Consider using DigitalOcean GPU Droplets for improved performance.
- Agents can be shared and deployed as OCI artifacts through Docker Hub, facilitating team collaboration.
What You Will Build
Understanding Agentic AI Workflows
Traditional AI interactions typically involve simple request-response cycles. Agentic AI workflows, however, empower AI agents to act autonomously, such as reading files, performing web searches, and writing code. This capability allows agents to not only suggest solutions but also implement fixes.
In multi-agent systems, specialized agents work together on complex tasks, akin to a development team where each member has a specific role. This collaborative approach enhances problem-solving efficiency compared to a single-agent system.
A multi-agent system (MAS) involves multiple AI agents working in unison to perform tasks. Docker Agent manages the coordination of these agents using YAML-defined hierarchies.
The Bug Investigator Architecture
In this tutorial, you'll construct a multi-agent debugging system with four specialized agents:
┌─────────────────────────────────────────┐
│ BUG INVESTIGATOR (Root) │
│ Analyzes & Coordinates │
└───────────────┬─────────────────────────┘
│
┌─────────┼─────────┐
▼ ▼ ▼
┌───────────┐ ┌───────┐ ┌───────────┐
│ RESEARCHER│ │ FIXER │ │ TESTER │
│ Web Search│ │ Writes│ │ Validates │
│ Find Docs │ │ Code │ │ & Tests │
└───────────┘ └───────┘ └───────────┘
- Investigator (Root Agent): Analyzes errors, identifies root causes, and coordinates the other agents.
- Researcher: Searches for documentation and finds similar issues and solutions.
- Fixer: Writes corrected code with proper implementation.
- Tester: Generates test cases to validate fixes.
Each agent has its own model and toolset defined in YAML, eliminating the need for orchestration code.
Prerequisites
Before starting, ensure you have:
- A DigitalOcean account with available credits or a configured payment method.
- An API key from OpenAI or Anthropic for cloud-based inference.
- An SSH client or web browser access to the DigitalOcean console. Beginners can review SSH essentials for guidance.
- Basic familiarity with the command line, including Git, SSH, and shell commands.
Optional for local model inference:
- Plan to resize your Droplet to an 8GB RAM plan.
Step 1: Create the Droplet
Navigate to the DigitalOcean Marketplace Docker Agent page and click "Create Docker Agent Droplet." Begin with the $6/month tier (1GB RAM), suitable for cloud API agents since AI inference is processed on external servers.
Step 2: Choose Your Preferred Region
Select a datacenter region closest to your users or location to reduce latency.
Step 3: Choose the Marketplace Image
In the "Choose an image" section, select the "Marketplace" tab and search for "Docker Agent."
Step 4: Configure and Launch
Review the Droplet size, add your SSH key, and click "Create Droplet."
Step 5: Access and Verify
Option 1: Web Console (Easiest)
- Go to your Droplets page.
- Click on your Docker Agent Droplet.
- Click the Access tab on the left.
- Click "Launch Droplet Console."
This opens a browser-based terminal without needing an SSH key.
Option 2: SSH Access
If preferred, reset the root password or use your SSH key:
- Click on your Droplet.
- Go to the Access tab.
- Click "Reset Root Password."
- Check your email for the new password.
- SSH into your Droplet:
ssh root@YOUR_DROPLET_IP
Replace YOUR_DROPLET_IP with your Droplet’s actual IP address.
Step 6: Check the Docker Agent Version
Verify that Docker Agent (cagent) is pre-installed:
cagent version
Expected output:
cagent version v1.9.10
Commit: 1782337c60dadcb39643f7c9e1a9798ea784c7aa
Step 7: Clone the Bug Investigator Repository
git clone https://github.com/ajeetraina/bug-investigator-agent.git
cd bug-investigator-agent
Review the project structure:
tree
.
├── LICENSE
├── README.md
├── bug.txt
├── cagent-anthropic.yaml
├── cagent-local.yaml
├── cagent-openai.yaml
├── cagent.yaml
├── examples
│ └── bug-scenarios.md
├── scripts
│ └── deploy.sh
└── test-code
├── Dockerfile
├── app.py
├── deployment.yaml
├── index.js
└── main.go
4 directories, 14 files
The repository includes YAML configurations for three model providers: OpenAI, Anthropic, and local (Docker Model Runner). Each configuration targets a different inference backend.
Step 8: Configure and Run with Cloud APIs
Set the OpenAI API key:
export OPENAI_API_KEY=sk-proj-XXXXXXXXXXX
The repository includes cagent-openai.yaml configured for GPT-4o and GPT-4o-mini. Launch the agent:
cagent run ./cagent-openai.yaml
The cagent chat interface opens, ready to use the multi-agent Bug Investigator architecture.
Step 9: Test with a Python Bug
Paste the following buggy code into the agent chat:
I have this Python code that's crashing:
def calculate_average(numbers):
total = 0
for num in numbers:
total += num
return total / len(numbers)
result = calculate_average([])
print(result)
Error: ZeroDivisionError: division by zero
The agent processes this through its multi-agent workflow:
- Investigator analyzes the error and identifies an empty list as the root cause.
- Fixer implements a guard clause.
- Tester generates comprehensive test cases.
The agent creates a tests/ directory with test_calculate_average.py. After processing, verify the generated test file:
cat tests/test_calculate_average.py
Expected output:
import unittest
def calculate_average(numbers):
if not numbers:
return None
return sum(numbers) / len(numbers)
class TestCalculateAverage(unittest.TestCase):
def test_empty_list(self):
self.assertIsNone(calculate_average([]))
def test_single_element(self):
self.assertEqual(calculate_average([5]), 5)
def test_multiple_elements(self):
self.assertEqual(calculate_average([3, 5, 7]), 5.0)
def test_floats(self):
self.assertAlmostEqual(calculate_average([1.5, 2.5, 3.5]), 2.5)
def test_negative_numbers(self):
self.assertEqual(calculate_average([-1, -2, -3]), -2.0)
if __name__ == '__main__':
unittest.main()
The agent:
- Fixed the bug by adding
if not numbers: return None. - Created five test cases:
- Empty list returns
None. - Single element returns that element.
- Multiple elements calculate the average.
- Floats handle decimals correctly.
- Negative numbers work correctly.
- Empty list returns
Step 10: Run the Tests
python3 tests/test_calculate_average.py
Expected output:
.....
----------------------------------------------------------------------
Ran 5 tests in 0.000s
OK
Here is a summary of what happened:
- Bug reported: ZeroDivisionError.
- Agent diagnosed: Empty list causes division by zero.
- Agent fixed: Added guard clause.
- Agent tested: Five comprehensive test cases.
- Tests pass: All five pass.
Comparing Cloud versus Local Model Performance
Before setting up local models, here's a comparison for debugging tasks:
| Metric | Cloud API (GPT-4o) | Local Model (Qwen3 8B) | |--------------------|---------------------|------------------------| | Response time | About 5 seconds | About 3 minutes (CPU) | | Accuracy | Correct diagnosis and fix | Correct diagnosis and fix | | Multi-agent coordination | Full support (all 4 agents) | Root agent only (simplified config) | | Cost per request | $0.01 to $0.10 | $0.00 (Droplet cost only) | | Data privacy | Code sent to external API | Code stays on your server | | Minimum Droplet size | 1GB RAM ($6/month) | 8GB RAM ($48/month) |
Cloud APIs are fast and support the full multi-agent workflow. Local models preserve code privacy but require more resources and are slower on CPU. For increased speed, consider using GPU Droplets.
Configure Local Model Inference (Optional)
Cloud APIs are efficient but send code to external servers. For privacy-sensitive environments, local inference keeps everything on your Droplet.
Your current Droplet is 1GB RAM. Local models need at least 4GB.
Resize the Droplet:
- Go to the DigitalOcean Console and click your Droplet.
- Power off the Droplet.
- Click "Resize" in the left menu.
- Select the s-2vcpu-8gb plan for reliable local model performance.
- Click "Resize Droplet."
Power the Droplet back on and verify the new memory:
free -m
Expected output:
total used free shared buff/cache available
Mem: 7941 512 7189 4 477 7429
Swap: 0 0 0
Step 11: Install Docker Model Runner
Docker Model Runner allows running AI models locally within the Docker ecosystem. Install it:
sudo apt-get update
sudo apt-get install -y docker-model-plugin
Verify the installation by listing available models:
docker model ls
Docker Model Runner downloads its runtime on first use. You will see output similar to:
latest: Pulling from docker/model-runner
...
Status: Downloaded newer image for docker/model-runner:latest
Creating model storage volume docker-model-runner-models...
Starting model runner container docker-model-runner...
MODEL NAME PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED CONTEXT SIZE
The empty model list confirms that Docker Model Runner is installed and running. Next, you will pull a model.
Step 12: Pull a Model Optimized for Tool Calling
Model selection impacts agentic task performance. Based on evaluations, here is how popular models compare:
| Model | Tool Calling F1 Score | Recommendation | |----------|-----------------------|------------------------| | GPT-4 | 0.974 | Best overall (cloud) | | Qwen3 8B | 0.919 | Best local option | | Gemma3 4B | 0.733 | Insufficient for reliable tool use |
Pull Qwen3 8B:
docker model pull ai/qwen3:8B-Q4_K_M
Verify the download:
docker model ls
Expected output:
MODEL NAME PARAMETERS QUANTIZATION ARCHITECTURE MODEL ID CREATED CONTEXT SIZE
qwen3:8B-Q4_K_M 8.19 B IQ2_XXS/Q4_K_M qwen3 79fa56c07429 10 months ago 4.68 GiB
Step 13: Update the Local Model Configuration
The default cagent-local.yaml may reference a different model. Update it to use the Qwen3 model you pulled:
sed -i 's/ai\/gemma3:2B-Q4_0/ai\/qwen3:8B-Q4_K_M/g' cagent-local.yaml
Verify the change:
grep model cagent-local.yaml
Expected output:
# docker model pull ai/qwen3:8B-Q4_K_M
models:
local-model:
model: ai/qwen3:8B-Q4_K_M
model: local-model
Review the full configuration:
version: "2"
models:
local-model:
provider: dmr
model: ai/qwen3:8B-Q4_K_M
max_tokens: 4096
agents:
root:
model: local-model
description: Debugging assistant that helps fix code issues
instruction: |
You are a helpful debugging assistant. When a developer shares an error:
1. **Analyze** the error message and stack trace
2. **Identify** the root cause
3. **Explain** what went wrong in simple terms
4. **Provide** a working fix with code
Be concise. Focus on actionable solutions.
toolsets:
- type: filesystem
- type: think
- type: todo
The local configuration uses a single root agent because local models on CPU work best with focused, single-agent tasks.
Step 14: Run with Local Model
cagent run ./cagent-local.yaml
Try this JavaScript async bug:
I have this Node.js code that's not working correctly:
async function fetchUserData(userId) {
const response = await fetch(`/api/users/${userId}`);
const data = response.json();
return data;
}
const user = fetchUserData(123);
console.log(user.name);
Error: Cannot read property 'name' of undefined
The local model correctly identifies the issue: the function fetchUserData is async, but it's not being awaited when called, leading to an unresolved Promise and undefined value.
The agent suggested these key changes:
- Added
awaitbeforeresponse.json(). - Added error handling for network responses.
- Wrapped the call in an async IIFE with
try/catch. - Ensured proper async handling throughout.
Cost: $0.00 (local model, no API charges)
Note: Local models on CPU work correctly but are slow. For production speed, consider using GPU Droplets or cloud APIs. For privacy-sensitive tasks where speed is not the top priority, CPU inference is a viable alternative.
Step 15: Push to Docker Hub
Share your agent configuration by publishing it as an OCI artifact:
docker login
cagent push ./cagent-openai.yaml docker.io/YOUR_DOCKERHUB_USERNAME/bug-investigator:latest
Replace YOUR_DOCKERHUB_USERNAME with your Docker Hub username. Follow the authentication flow when prompted.
Expected output:
Pushing agent ./cagent-openai.yaml to docker.io/YOUR_DOCKERHUB_USERNAME/bug-investigator:latest
Successfully pushed artifact to docker.io/YOUR_DOCKERHUB_USERNAME/bug-investigator:latest
Your Bug Investigator Agent is now live on Docker Hub. Anyone with Docker Agent installed can pull and run it:
cagent run docker.io/YOUR_DOCKERHUB_USERNAME/bug-investigator:latest
Understanding the YAML Configuration in Depth
The YAML-based configuration distinguishes Docker Agent from other frameworks that require orchestration code. Here's the structure of the OpenAI configuration used in this tutorial:
version: "2"
models:
openai-main:
provider: openai
model: gpt-4o
max_tokens: 4096
openai-mini:
provider: openai
model: gpt-4o-mini
max_tokens: 4096
agents:
root:
model: openai-main
description: Bug investigator that analyzes errors and coordinates fixes
instruction: |
You are an expert bug investigator. When given an error:
1. Analyze the error message and stack trace
2. Identify the root cause
3. Delegate to researcher for documentation lookup
4. Delegate to fixer for code correction
5. Delegate to tester for test generation
sub_agents: [researcher, fixer, tester]
toolsets:
- type: filesystem
- type: think
researcher:
model: openai-mini
description: Searches documentation and finds solutions
instruction: |
Search for relevant documentation and similar issues.
Provide links and context for the fix.
toolsets:
- type: mcp
ref: docker:duckduckgo
fixer:
model: openai-main
description: Writes corrected code
instruction: |
Write minimal, targeted fixes for diagnosed bugs.
Include proper error handling.
toolsets:
- type: filesystem
- type: shell
tester:
model: openai-mini
description: Generates test cases
instruction: |
Generate comprehensive test cases for the fix.
Cover edge cases, positive cases, and negative cases.
toolsets:
- type: filesystem
Key points in this configuration:
- Two model tiers: The root agent and fixer use
gpt-4ofor complex reasoning, while the researcher and tester usegpt-4o-minito reduce costs. - Sub-agent delegation: The
sub_agentsfield on the root agent defines which agents it can delegate to. - Toolsets: Each agent gets only the tools it needs. The researcher gets web search, the fixer gets filesystem and shell access, and the tester gets filesystem access to write test files.
- No orchestration code: The entire multi-agent workflow is defined declaratively in YAML.
Customizing Agent Behavior with MCP Tools
The Model Context Protocol (MCP) standard connects AI models to external tools and data sources. Docker Agent integrates with MCP servers, enhancing agent capabilities.
For example, add GitHub integration to your bug investigator to read issues and create pull requests:
toolsets:
- type: mcp
ref: docker:github
config:
env:
GITHUB_TOKEN: ${GITHUB_TOKEN}
Or add a database tool for log queries:
toolsets:
- type: mcp
ref: docker:postgres
config:
env:
DATABASE_URL: ${DATABASE_URL}
The Docker MCP Catalog provides pre-built MCP servers for integrations, including Slack, Jira, and various databases.
FAQs
1. What is a Multi-Agent AI System?
A multi-agent AI system consists of several AI agents working together to perform tasks. Each agent has a specialized role and tools, with a root agent coordinating tasks. In Docker Agent, this structure is defined in YAML, allowing the root agent to analyze, delegate, and manage tasks efficiently.
2. Is Docker Agent Suitable for Production Use?
Yes, for internal team tools and developer workflows. The Docker Agent 1-Click Droplet provides a stable environment for running agent teams. For customer-facing applications, ensure to add rate limiting, error handling, and human review of suggested fixes, as the agent might produce incorrect fixes.
3. Can I Use Anthropic Claude Instead of OpenAI?
Yes, the bug-investigator-agent repository includes cagent-anthropic.yaml for Claude models. Set the ANTHROPIC_API_KEY environment variable and run the agent configuration.
4. How Does Docker Agent Compare to Other Agent Frameworks?
Docker Agent offers a YAML-first approach, while other frameworks like LangGraph or CrewAI require programming code. Docker Agent emphasizes simplicity and portability, making it easier to set up and share standard agent workflows.
5. What are the Ongoing Costs for Running This Setup?
The costs include:
- Droplet: $6/month for cloud API agents or $48/month for local model inference.
- OpenAI API: Approximately $0.01 to $0.10 per debugging session.
- Local models: $0 per request, only the Droplet cost.
For frequent use, local models on a larger Droplet can be cost-effective. For occasional use, the $6/month Droplet with cloud APIs is economical.
Conclusion
This tutorial demonstrated deploying a Docker Agent Droplet to build a multi-agent bug investigator, test it with real bugs, configure local model inference, and publish the agent to Docker Hub. The multi-agent architecture is adaptable to various workflows beyond debugging, utilizing the same YAML configuration for tasks like code reviews, documentation generation, and security scanning. Docker Agent's declarative approach ensures ease of version control and sharing, supported by reproducible infrastructure.
Next Steps
Consider these extensions after completing the tutorial:
- Add more specialized agents: Create a Security Analyzer or Performance Optimizer by expanding the YAML configuration.
- Integrate MCP tools: Add integrations for platforms like GitHub, Slack, or databases.
- Deploy as a service: Use
systemdto run the agent continuously on your Droplet. - Explore GPU Droplets: For quicker local inference, consider using GPU Droplets.
- Try the Anthropic backend: Utilize the provided configuration for Claude models.
Resources
- Source code: GitHub Repository
- Docker Hub: Bug Investigator
- Docker Agent documentation: Documentation