Building Multi-Agent AI Systems Using Docker Agent

Introduction

Docker Agent is an open-source framework designed to facilitate the creation and execution of multi-agent AI systems. It allows developers to configure teams of AI agents using YAML files, eliminating the need for Python orchestration code. Each agent can deploy its unique model, whether through cloud APIs like OpenAI and Anthropic, or local models via Docker Model Runner. These agents can perform specialized tasks such as accessing the filesystem, conducting web searches, and integrating custom MCP tools. Docker Agent also enables the distribution of agents as OCI artifacts, which can be shared and deployed swiftly.

This tutorial provides a step-by-step guide on setting up a DigitalOcean Droplet with Docker Agent pre-installed. Using the Docker Agent 1-Click App from the DigitalOcean Marketplace, you will learn how to create a Bug Investigator—a multi-agent system where agents collaborate to diagnose errors, research fixes, implement solutions, and generate tests. You'll also configure cloud API and local model inference and explore further Docker tutorials on Ubuntu 24.04 once the guide is complete.

Note: The Docker Agent 1-Click image includes version 1.9.10 under its previous name cagent. Commands in this tutorial use cagent as per the pre-installed version, but the YAML configurations and architecture remain consistent with newer versions.

Key Takeaways

The Docker Agent 1-Click Droplet significantly reduces deployment time to under five minutes with a pre-installed Docker Agent on Ubuntu 24.04.
Multi-agent architectures divide complex tasks among specialized agents, enabling efficient task handling. For instance, a Tester agent can automatically generate unit tests without explicit prompts.
Docker Agent supports both cloud APIs and local models through Docker Model Runner for environments requiring data privacy.
Local inference with models like Qwen3 8B can identify bugs accurately but is slower on CPU compared to cloud APIs. Consider using DigitalOcean GPU Droplets for improved performance.
Agents can be shared and deployed as OCI artifacts through Docker Hub, facilitating team collaboration.

What You Will Build

Understanding Agentic AI Workflows

Traditional AI interactions typically involve simple request-response cycles. Agentic AI workflows, however, empower AI agents to act autonomously, such as reading files, performing web searches, and writing code. This capability allows agents to not only suggest solutions but also implement fixes.

In multi-agent systems, specialized agents work together on complex tasks, akin to a development team where each member has a specific role. This collaborative approach enhances problem-solving efficiency compared to a single-agent system.

A multi-agent system (MAS) involves multiple AI agents working in unison to perform tasks. Docker Agent manages the coordination of these agents using YAML-defined hierarchies.

The Bug Investigator Architecture

In this tutorial, you'll construct a multi-agent debugging system with four specialized agents:

┌─────────────────────────────────────────┐
│         BUG INVESTIGATOR (Root)         │
│         Analyzes & Coordinates          │
└───────────────┬─────────────────────────┘
                │
      ┌─────────┼─────────┐
      ▼         ▼         ▼
┌───────────┐ ┌───────┐ ┌───────────┐
│ RESEARCHER│ │ FIXER │ │  TESTER   │
│ Web Search│ │ Writes│ │ Validates │
│ Find Docs │ │ Code  │ │ & Tests   │
└───────────┘ └───────┘ └───────────┘

Investigator (Root Agent): Analyzes errors, identifies root causes, and coordinates the other agents.
Researcher: Searches for documentation and finds similar issues and solutions.
Fixer: Writes corrected code with proper implementation.
Tester: Generates test cases to validate fixes.

Each agent has its own model and toolset defined in YAML, eliminating the need for orchestration code.

Prerequisites

Before starting, ensure you have:

A DigitalOcean account with available credits or a configured payment method.
An API key from OpenAI or Anthropic for cloud-based inference.
An SSH client or web browser access to the DigitalOcean console. Beginners can review SSH essentials for guidance.
Basic familiarity with the command line, including Git, SSH, and shell commands.

Optional for local model inference:

Plan to resize your Droplet to an 8GB RAM plan.

Step 1: Create the Droplet

Navigate to the DigitalOcean Marketplace Docker Agent page and click "Create Docker Agent Droplet." Begin with the $6/month tier (1GB RAM), suitable for cloud API agents since AI inference is processed on external servers.

Step 2: Choose Your Preferred Region

Select a datacenter region closest to your users or location to reduce latency.

Step 3: Choose the Marketplace Image

In the "Choose an image" section, select the "Marketplace" tab and search for "Docker Agent."

Step 4: Configure and Launch

Review the Droplet size, add your SSH key, and click "Create Droplet."

Step 5: Access and Verify

Option 1: Web Console (Easiest)

Go to your Droplets page.
Click on your Docker Agent Droplet.
Click the Access tab on the left.
Click "Launch Droplet Console."

This opens a browser-based terminal without needing an SSH key.

Option 2: SSH Access

If preferred, reset the root password or use your SSH key:

Click on your Droplet.
Go to the Access tab.
Click "Reset Root Password."
Check your email for the new password.
SSH into your Droplet:

ssh root@YOUR_DROPLET_IP

Replace YOUR_DROPLET_IP with your Droplet’s actual IP address.

Step 6: Check the Docker Agent Version

Verify that Docker Agent (cagent) is pre-installed:

cagent version

Expected output:

cagent version v1.9.10
Commit: 1782337c60dadcb39643f7c9e1a9798ea784c7aa

Step 7: Clone the Bug Investigator Repository

git clone https://github.com/ajeetraina/bug-investigator-agent.git
cd bug-investigator-agent

Review the project structure:

tree

.
├── LICENSE
├── README.md
├── bug.txt
├── cagent-anthropic.yaml
├── cagent-local.yaml
├── cagent-openai.yaml
├── cagent.yaml
├── examples
│   └── bug-scenarios.md
├── scripts
│   └── deploy.sh
└── test-code
    ├── Dockerfile
    ├── app.py
    ├── deployment.yaml
    ├── index.js
    └── main.go

4 directories, 14 files

The repository includes YAML configurations for three model providers: OpenAI, Anthropic, and local (Docker Model Runner). Each configuration targets a different inference backend.

Step 8: Configure and Run with Cloud APIs

Set the OpenAI API key:

export OPENAI_API_KEY=sk-proj-XXXXXXXXXXX

The repository includes cagent-openai.yaml configured for GPT-4o and GPT-4o-mini. Launch the agent:

cagent run ./cagent-openai.yaml

The cagent chat interface opens, ready to use the multi-agent Bug Investigator architecture.

Step 9: Test with a Python Bug

Paste the following buggy code into the agent chat:

I have this Python code that's crashing:

def calculate_average(numbers):
    total = 0
    for num in numbers:
        total += num
    return total / len(numbers)

result = calculate_average([])
print(result)

Error: ZeroDivisionError: division by zero

The agent processes this through its multi-agent workflow:

Investigator analyzes the error and identifies an empty list as the root cause.
Fixer implements a guard clause.
Tester generates comprehensive test cases.

The agent creates a tests/ directory with test_calculate_average.py. After processing, verify the generated test file:

cat tests/test_calculate_average.py

Expected output:

import unittest

def calculate_average(numbers):
    if not numbers:
        return None
    return sum(numbers) / len(numbers)

class TestCalculateAverage(unittest.TestCase):

    def test_empty_list(self):
        self.assertIsNone(calculate_average([]))

    def test_single_element(self):
        self.assertEqual(calculate_average([5]), 5)

    def test_multiple_elements(self):
        self.assertEqual(calculate_average([3, 5, 7]), 5.0)

    def test_floats(self):
        self.assertAlmostEqual(calculate_average([1.5, 2.5, 3.5]), 2.5)

    def test_negative_numbers(self):
        self.assertEqual(calculate_average([-1, -2, -3]), -2.0)

if __name__ == '__main__':
    unittest.main()

The agent:

Fixed the bug by adding if not numbers: return None.
Created five test cases:
- Empty list returns None.
- Single element returns that element.
- Multiple elements calculate the average.
- Floats handle decimals correctly.
- Negative numbers work correctly.

Step 10: Run the Tests

python3 tests/test_calculate_average.py

Expected output:

.....
----------------------------------------------------------------------
Ran 5 tests in 0.000s

OK

Here is a summary of what happened:

Bug reported: ZeroDivisionError.
Agent diagnosed: Empty list causes division by zero.
Agent fixed: Added guard clause.
Agent tested: Five comprehensive test cases.
Tests pass: All five pass.

Comparing Cloud versus Local Model Performance

Before setting up local models, here's a comparison for debugging tasks:

| Metric | Cloud API (GPT-4o) | Local Model (Qwen3 8B) | |--------------------|---------------------|------------------------| | Response time | About 5 seconds | About 3 minutes (CPU) | | Accuracy | Correct diagnosis and fix | Correct diagnosis and fix | | Multi-agent coordination | Full support (all 4 agents) | Root agent only (simplified config) | | Cost per request | $0.01 to $0.10 | $0.00 (Droplet cost only) | | Data privacy | Code sent to external API | Code stays on your server | | Minimum Droplet size | 1GB RAM ($6/month) | 8GB RAM ($48/month) |

Cloud APIs are fast and support the full multi-agent workflow. Local models preserve code privacy but require more resources and are slower on CPU. For increased speed, consider using GPU Droplets.

Configure Local Model Inference (Optional)

Cloud APIs are efficient but send code to external servers. For privacy-sensitive environments, local inference keeps everything on your Droplet.

Your current Droplet is 1GB RAM. Local models need at least 4GB.

Resize the Droplet:

Go to the DigitalOcean Console and click your Droplet.
Power off the Droplet.
Click "Resize" in the left menu.
Select the s-2vcpu-8gb plan for reliable local model performance.
Click "Resize Droplet."

Power the Droplet back on and verify the new memory:

free -m

Expected output:

total        used        free      shared  buff/cache   available
Mem:            7941         512        7189           4         477        7429
Swap:              0           0           0

Step 11: Install Docker Model Runner

Docker Model Runner allows running AI models locally within the Docker ecosystem. Install it:

sudo apt-get update
sudo apt-get install -y docker-model-plugin

Verify the installation by listing available models:

docker model ls

Docker Model Runner downloads its runtime on first use. You will see output similar to:

latest: Pulling from docker/model-runner
...
Status: Downloaded newer image for docker/model-runner:latest
Creating model storage volume docker-model-runner-models...
Starting model runner container docker-model-runner...
MODEL NAME  PARAMETERS  QUANTIZATION  ARCHITECTURE  MODEL ID  CREATED  CONTEXT  SIZE

The empty model list confirms that Docker Model Runner is installed and running. Next, you will pull a model.

Step 12: Pull a Model Optimized for Tool Calling

Model selection impacts agentic task performance. Based on evaluations, here is how popular models compare:

| Model | Tool Calling F1 Score | Recommendation | |----------|-----------------------|------------------------| | GPT-4 | 0.974 | Best overall (cloud) | | Qwen3 8B | 0.919 | Best local option | | Gemma3 4B | 0.733 | Insufficient for reliable tool use |

Pull Qwen3 8B:

docker model pull ai/qwen3:8B-Q4_K_M

Verify the download:

docker model ls

Expected output:

MODEL NAME       PARAMETERS  QUANTIZATION    ARCHITECTURE  MODEL ID      CREATED        CONTEXT  SIZE
qwen3:8B-Q4_K_M  8.19 B      IQ2_XXS/Q4_K_M  qwen3         79fa56c07429  10 months ago           4.68 GiB

Step 13: Update the Local Model Configuration

The default cagent-local.yaml may reference a different model. Update it to use the Qwen3 model you pulled:

sed -i 's/ai\/gemma3:2B-Q4_0/ai\/qwen3:8B-Q4_K_M/g' cagent-local.yaml

Verify the change:

grep model cagent-local.yaml

Expected output:

#   docker model pull ai/qwen3:8B-Q4_K_M
models:
  local-model:
    model: ai/qwen3:8B-Q4_K_M
    model: local-model

Review the full configuration:

version: "2"

models:
  local-model:
    provider: dmr
    model: ai/qwen3:8B-Q4_K_M
    max_tokens: 4096

agents:
  root:
    model: local-model
    description: Debugging assistant that helps fix code issues
    instruction: |
      You are a helpful debugging assistant. When a developer shares an error:

      1. **Analyze** the error message and stack trace
      2. **Identify** the root cause
      3. **Explain** what went wrong in simple terms
      4. **Provide** a working fix with code

      Be concise. Focus on actionable solutions.

    toolsets:
      - type: filesystem
      - type: think
      - type: todo

The local configuration uses a single root agent because local models on CPU work best with focused, single-agent tasks.

Step 14: Run with Local Model

cagent run ./cagent-local.yaml

Try this JavaScript async bug:

I have this Node.js code that's not working correctly:

async function fetchUserData(userId) {
    const response = await fetch(`/api/users/${userId}`);
    const data = response.json();
    return data;
}

const user = fetchUserData(123);
console.log(user.name);

Error: Cannot read property 'name' of undefined

The local model correctly identifies the issue: the function fetchUserData is async, but it's not being awaited when called, leading to an unresolved Promise and undefined value.

The agent suggested these key changes:

Added await before response.json().
Added error handling for network responses.
Wrapped the call in an async IIFE with try/catch.
Ensured proper async handling throughout.

Cost: $0.00 (local model, no API charges)

Note: Local models on CPU work correctly but are slow. For production speed, consider using GPU Droplets or cloud APIs. For privacy-sensitive tasks where speed is not the top priority, CPU inference is a viable alternative.

Step 15: Push to Docker Hub

Share your agent configuration by publishing it as an OCI artifact:

docker login
cagent push ./cagent-openai.yaml docker.io/YOUR_DOCKERHUB_USERNAME/bug-investigator:latest

Replace YOUR_DOCKERHUB_USERNAME with your Docker Hub username. Follow the authentication flow when prompted.

Expected output:

Pushing agent ./cagent-openai.yaml to docker.io/YOUR_DOCKERHUB_USERNAME/bug-investigator:latest
Successfully pushed artifact to docker.io/YOUR_DOCKERHUB_USERNAME/bug-investigator:latest

Your Bug Investigator Agent is now live on Docker Hub. Anyone with Docker Agent installed can pull and run it:

cagent run docker.io/YOUR_DOCKERHUB_USERNAME/bug-investigator:latest

Understanding the YAML Configuration in Depth

The YAML-based configuration distinguishes Docker Agent from other frameworks that require orchestration code. Here's the structure of the OpenAI configuration used in this tutorial:

version: "2"

models:
  openai-main:
    provider: openai
    model: gpt-4o
    max_tokens: 4096
  openai-mini:
    provider: openai
    model: gpt-4o-mini
    max_tokens: 4096

agents:
  root:
    model: openai-main
    description: Bug investigator that analyzes errors and coordinates fixes
    instruction: |
      You are an expert bug investigator. When given an error:
      1. Analyze the error message and stack trace
      2. Identify the root cause
      3. Delegate to researcher for documentation lookup
      4. Delegate to fixer for code correction
      5. Delegate to tester for test generation
    sub_agents: [researcher, fixer, tester]
    toolsets:
      - type: filesystem
      - type: think

  researcher:
    model: openai-mini
    description: Searches documentation and finds solutions
    instruction: |
      Search for relevant documentation and similar issues.
      Provide links and context for the fix.
    toolsets:
      - type: mcp
        ref: docker:duckduckgo

  fixer:
    model: openai-main
    description: Writes corrected code
    instruction: |
      Write minimal, targeted fixes for diagnosed bugs.
      Include proper error handling.
    toolsets:
      - type: filesystem
      - type: shell

  tester:
    model: openai-mini
    description: Generates test cases
    instruction: |
      Generate comprehensive test cases for the fix.
      Cover edge cases, positive cases, and negative cases.
    toolsets:
      - type: filesystem

Key points in this configuration:

Two model tiers: The root agent and fixer use gpt-4o for complex reasoning, while the researcher and tester use gpt-4o-mini to reduce costs.
Sub-agent delegation: The sub_agents field on the root agent defines which agents it can delegate to.
Toolsets: Each agent gets only the tools it needs. The researcher gets web search, the fixer gets filesystem and shell access, and the tester gets filesystem access to write test files.
No orchestration code: The entire multi-agent workflow is defined declaratively in YAML.

Customizing Agent Behavior with MCP Tools

The Model Context Protocol (MCP) standard connects AI models to external tools and data sources. Docker Agent integrates with MCP servers, enhancing agent capabilities.

For example, add GitHub integration to your bug investigator to read issues and create pull requests:

toolsets:
  - type: mcp
    ref: docker:github
    config:
      env:
        GITHUB_TOKEN: ${GITHUB_TOKEN}

Or add a database tool for log queries:

toolsets:
  - type: mcp
    ref: docker:postgres
    config:
      env:
        DATABASE_URL: ${DATABASE_URL}

The Docker MCP Catalog provides pre-built MCP servers for integrations, including Slack, Jira, and various databases.

FAQs

1. What is a Multi-Agent AI System?

A multi-agent AI system consists of several AI agents working together to perform tasks. Each agent has a specialized role and tools, with a root agent coordinating tasks. In Docker Agent, this structure is defined in YAML, allowing the root agent to analyze, delegate, and manage tasks efficiently.

2. Is Docker Agent Suitable for Production Use?

Yes, for internal team tools and developer workflows. The Docker Agent 1-Click Droplet provides a stable environment for running agent teams. For customer-facing applications, ensure to add rate limiting, error handling, and human review of suggested fixes, as the agent might produce incorrect fixes.

3. Can I Use Anthropic Claude Instead of OpenAI?

Yes, the bug-investigator-agent repository includes cagent-anthropic.yaml for Claude models. Set the ANTHROPIC_API_KEY environment variable and run the agent configuration.

4. How Does Docker Agent Compare to Other Agent Frameworks?

Docker Agent offers a YAML-first approach, while other frameworks like LangGraph or CrewAI require programming code. Docker Agent emphasizes simplicity and portability, making it easier to set up and share standard agent workflows.

5. What are the Ongoing Costs for Running This Setup?

The costs include:

Droplet: $6/month for cloud API agents or $48/month for local model inference.
OpenAI API: Approximately $0.01 to $0.10 per debugging session.
Local models: $0 per request, only the Droplet cost.

For frequent use, local models on a larger Droplet can be cost-effective. For occasional use, the $6/month Droplet with cloud APIs is economical.

Conclusion

This tutorial demonstrated deploying a Docker Agent Droplet to build a multi-agent bug investigator, test it with real bugs, configure local model inference, and publish the agent to Docker Hub. The multi-agent architecture is adaptable to various workflows beyond debugging, utilizing the same YAML configuration for tasks like code reviews, documentation generation, and security scanning. Docker Agent's declarative approach ensures ease of version control and sharing, supported by reproducible infrastructure.

Next Steps

Consider these extensions after completing the tutorial:

Add more specialized agents: Create a Security Analyzer or Performance Optimizer by expanding the YAML configuration.
Integrate MCP tools: Add integrations for platforms like GitHub, Slack, or databases.
Deploy as a service: Use systemd to run the agent continuously on your Droplet.
Explore GPU Droplets: For quicker local inference, consider using GPU Droplets.
Try the Anthropic backend: Utilize the provided configuration for Claude models.

Resources

Source code: GitHub Repository
Docker Hub: Bug Investigator
Docker Agent documentation: Documentation