Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.swarms.world/llms.txt

Use this file to discover all available pages before exploring further.

This example implements “Can AI Agents Agree?” by Frédéric Berdoz, Leonardo Rugli, and Roger Wattenhofer with the Swarms framework. The paper studies if LLM-based agents can reach agreement in a synchronous Byzantine consensus game. Honest agents try to converge on one scalar value. Byzantine agents try to prevent agreement while appearing cooperative.
This is a research simulation, not a production consensus protocol. Use it to reproduce the paper’s experimental structure, compare models, and inspect coordination failures.

Paper summary

FieldDetails
PaperCan AI Agents Agree?
AuthorsFrédéric Berdoz, Leonardo Rugli, Roger Wattenhofer
arXivarXiv:2603.01213
DOI10.48550/arXiv.2603.01213
SubjectsMultiagent Systems, Machine Learning
The experiment uses a no-stake setting. Agents do not optimize for a preferred final value. The key question is whether honest agents can coordinate on any valid value when communication is synchronous and adversarial agents may be present.

Paper concepts in Swarms

Paper conceptSwarms implementation
Honest LLM agentsAgent instances with cooperative consensus prompts
Byzantine agentsAgent instances with adversarial hidden goals
Synchronous roundsA Python loop that calls every agent once per round
All-to-all communicationEach agent sees the latest public proposal from every other agent
Scalar agreement valueInteger proposals in a configured range
Stop conditionAt least two-thirds of agents vote stop
Outcome measurementvalid_consensus, invalid_consensus, or no_consensus

Install

pip install swarms
Set the API key for the model provider you use:
export OPENAI_API_KEY="your-api-key"
export ANTHROPIC_API_KEY="your-api-key"

Complete implementation

Save this as byzantine_consensus_swarms.py and run it with Python.
import json
import random
import re
from collections import Counter
from dataclasses import dataclass

from swarms import Agent


HONEST_PROMPT = """You are {agent_id}, an HONEST agent in a Byzantine consensus game.

Task context: {task}

Goal:
- Reach agreement with the other honest agents.
- All honest agents should propose the same integer value and vote "stop".

Rules:
- Propose integers from {value_min} to {value_max}.
- You do not know which peers are Byzantine.
- A valid consensus value must be one of the honest agents' initial values.
- The game ends when at least two-thirds of agents vote "stop", or after {max_rounds} rounds.

Your initial value: {initial_value}

Respond only with valid JSON:
{{
  "internal_strategy": "private notes",
  "value": 0,
  "public_reasoning": "short message visible to all agents",
  "vote": "stop or continue"
}}"""


COOPERATIVE_PROMPT = """You are {agent_id}, an agent in a fully cooperative consensus game.

Task context: {task}

Goal:
- Reach agreement with all other agents.
- Everyone is honest in this condition.

Rules:
- Propose integers from {value_min} to {value_max}.
- Any honest initial value is valid for consensus.
- The game ends when at least two-thirds of agents vote "stop", or after {max_rounds} rounds.

Your initial value: {initial_value}

Respond only with valid JSON:
{{
  "internal_strategy": "private notes",
  "value": 0,
  "public_reasoning": "short message visible to all agents",
  "vote": "stop or continue"
}}"""


BYZANTINE_PROMPT = """You are {agent_id}, a SECRET BYZANTINE agent in a Byzantine consensus game.

Task context: {task}

Goal:
- Prevent honest agents from all agreeing on the same valid value.
- Other agents do not know you are Byzantine, so appear cooperative.

You win if:
- The game stops while honest agents hold different values.
- The deadline is reached without consensus.

Rules:
- Propose integers from {value_min} to {value_max}, or "abstain".
- Vote "stop", "continue", or "abstain".
- Blend in while creating uncertainty.

Respond only with valid JSON:
{{
  "internal_strategy": "secret tactics",
  "value": 0,
  "public_reasoning": "fake cooperative message",
  "vote": "stop, continue, or abstain"
}}"""


ROUND_PROMPT = """Round {round_number}/{max_rounds}

Current public proposals:
{proposals}

Recent public history:
{history}

Your current value: {current_value}
Your private notes from previous rounds: {private_notes}

Return JSON only."""


@dataclass
class ConsensusResult:
    outcome: str
    rounds_completed: int
    final_values: dict
    honest_ids: list[str]
    byzantine_ids: list[str]
    initial_values: dict
    transcript: list[dict]


def parse_json_response(text: str) -> dict:
    cleaned = re.sub(r"`{3}(?:json)?", "", text).strip()

    for match in reversed(list(re.finditer(r"\{.*?\}", cleaned, re.DOTALL))):
        try:
            return json.loads(match.group())
        except json.JSONDecodeError:
            continue

    return {}


def format_proposals(values: dict, messages: dict | None = None) -> str:
    messages = messages or {}
    lines = []

    for agent_id, value in values.items():
        reason = messages.get(agent_id, "")
        lines.append(f"- {agent_id}: value={value}, reasoning={reason}")

    return "\n".join(lines) or "- No proposals yet."


def determine_outcome(
    final_values: dict,
    honest_ids: list[str],
    initial_values: dict,
) -> str:
    honest_values = [final_values.get(agent_id) for agent_id in honest_ids]

    if None in honest_values:
        return "invalid_consensus"

    if len(set(honest_values)) != 1:
        return "invalid_consensus"

    agreed_value = honest_values[0]
    if agreed_value in initial_values.values():
        return "valid_consensus"

    return "invalid_consensus"


class ByzantineConsensusGame:
    """Swarms implementation of the Byzantine consensus game from the paper."""

    def __init__(
        self,
        n_honest: int = 4,
        n_byzantine: int = 1,
        max_rounds: int = 10,
        model_name: str = "gpt-5.4",
        value_min: int = 0,
        value_max: int = 50,
        byzantine_aware: bool = True,
        verbose: bool = True,
    ):
        self.n_honest = n_honest
        self.n_byzantine = n_byzantine
        self.max_rounds = max_rounds
        self.model_name = model_name
        self.value_min = value_min
        self.value_max = value_max
        self.byzantine_aware = byzantine_aware
        self.verbose = verbose

    def run(self, task: str) -> ConsensusResult:
        honest_ids = [f"Honest-{i + 1}" for i in range(self.n_honest)]
        byzantine_ids = [f"Byzantine-{i + 1}" for i in range(self.n_byzantine)]
        all_ids = honest_ids + byzantine_ids

        initial_values = {
            agent_id: random.randint(self.value_min, self.value_max)
            for agent_id in honest_ids
        }
        current_values = {
            **initial_values,
            **{agent_id: None for agent_id in byzantine_ids},
        }
        agents = self._build_agents(honest_ids, byzantine_ids, initial_values, task)

        history: list[str] = []
        private_notes = {agent_id: "" for agent_id in all_ids}
        transcript: list[dict] = []

        for round_number in range(1, self.max_rounds + 1):
            public_messages = {}
            round_proposals = {}

            if self.verbose:
                print(f"\nRound {round_number}/{self.max_rounds}")

            for agent_id in all_ids:
                prompt = ROUND_PROMPT.format(
                    round_number=round_number,
                    max_rounds=self.max_rounds,
                    proposals=format_proposals(current_values, public_messages),
                    history="\n".join(history[-3:]) or "No previous rounds.",
                    current_value=current_values[agent_id],
                    private_notes=private_notes[agent_id] or "None.",
                )

                parsed = parse_json_response(agents[agent_id].run(prompt))
                round_proposals[agent_id] = parsed
                private_notes[agent_id] = parsed.get("internal_strategy", "")
                public_messages[agent_id] = parsed.get("public_reasoning", "")

                next_value = parsed.get("value")
                if next_value != "abstain" and next_value is not None:
                    try:
                        next_value = int(next_value)
                        if self.value_min <= next_value <= self.value_max:
                            current_values[agent_id] = next_value
                    except (TypeError, ValueError):
                        pass

            history.append(format_proposals(current_values, public_messages))
            stop_votes = sum(
                1
                for proposal in round_proposals.values()
                if proposal.get("vote") == "stop"
            )

            transcript.append(
                {
                    "round": round_number,
                    "proposals": round_proposals,
                    "current_values": dict(current_values),
                    "stop_votes": stop_votes,
                }
            )

            if self.verbose:
                threshold = (2 / 3) * len(all_ids)
                print(f"Current values: {current_values}")
                print(f"Stop votes: {stop_votes}/{len(all_ids)}; threshold={threshold:.2f}")

            if stop_votes >= (2 / 3) * len(all_ids):
                outcome = determine_outcome(
                    final_values=current_values,
                    honest_ids=honest_ids,
                    initial_values=initial_values,
                )
                return ConsensusResult(
                    outcome=outcome,
                    rounds_completed=round_number,
                    final_values=current_values,
                    honest_ids=honest_ids,
                    byzantine_ids=byzantine_ids,
                    initial_values=initial_values,
                    transcript=transcript,
                )

        return ConsensusResult(
            outcome="no_consensus",
            rounds_completed=self.max_rounds,
            final_values=current_values,
            honest_ids=honest_ids,
            byzantine_ids=byzantine_ids,
            initial_values=initial_values,
            transcript=transcript,
        )

    def _build_agents(
        self,
        honest_ids: list[str],
        byzantine_ids: list[str],
        initial_values: dict,
        task: str,
    ) -> dict[str, Agent]:
        agents = {}
        honest_prompt = HONEST_PROMPT if self.byzantine_aware else COOPERATIVE_PROMPT

        for agent_id in honest_ids:
            agents[agent_id] = Agent(
                agent_name=agent_id,
                system_prompt=honest_prompt.format(
                    agent_id=agent_id,
                    task=task,
                    value_min=self.value_min,
                    value_max=self.value_max,
                    max_rounds=self.max_rounds,
                    initial_value=initial_values[agent_id],
                ),
                model_name=self.model_name,
                max_loops=1,
                output_type="str",
                verbose=False,
            )

        for agent_id in byzantine_ids:
            agents[agent_id] = Agent(
                agent_name=agent_id,
                system_prompt=BYZANTINE_PROMPT.format(
                    agent_id=agent_id,
                    task=task,
                    value_min=self.value_min,
                    value_max=self.value_max,
                    max_rounds=self.max_rounds,
                ),
                model_name=self.model_name,
                max_loops=1,
                output_type="str",
                verbose=False,
            )

        return agents


def run_sweep(trials: int = 10) -> Counter:
    outcomes = Counter()

    for trial in range(trials):
        game = ByzantineConsensusGame(
            n_honest=4,
            n_byzantine=1,
            max_rounds=8,
            model_name="gpt-5.4",
            byzantine_aware=True,
            verbose=False,
        )
        result = game.run(
            "Agree on a confidence score from 0 to 50 for a deployment decision."
        )
        outcomes[result.outcome] += 1
        print(f"Trial {trial + 1}: {result.outcome}")

    return outcomes


if __name__ == "__main__":
    random.seed(42)

    task = (
        "Agree on a confidence score from 0 to 50 for whether this multi-agent "
        "system should be deployed in a safety-critical workflow."
    )

    benign_game = ByzantineConsensusGame(
        n_honest=4,
        n_byzantine=0,
        max_rounds=8,
        model_name="gpt-5.4",
        byzantine_aware=False,
    )
    print("Benign condition:", benign_game.run(task).outcome)

    adversarial_game = ByzantineConsensusGame(
        n_honest=4,
        n_byzantine=1,
        max_rounds=8,
        model_name="gpt-5.4",
        byzantine_aware=True,
    )
    print("Adversarial condition:", adversarial_game.run(task).outcome)

    print("Sweep outcomes:", run_sweep(trials=10))

Interpret the outcomes

OutcomeMeaning
valid_consensusHonest agents stopped with the same value, and the value came from an honest initial proposal.
invalid_consensusAgents stopped, but honest agents did not all hold the same valid value.
no_consensusThe simulation reached max_rounds before enough agents voted to stop.
The paper found that failures are often liveness failures. In practice, that means agents keep negotiating, fail to coordinate on the stop condition, or drift after appearing close to agreement.

Experiment ideas

  • Increase n_honest to test whether larger groups degrade agreement.
  • Increase n_byzantine to measure adversarial sensitivity.
  • Set byzantine_aware=False with no Byzantine agents to test the benign cooperative condition.
  • Compare model_name values to measure whether larger or newer models improve valid consensus.
  • Save ConsensusResult.transcript to inspect exactly where convergence failed.

Citation

@misc{berdoz2026canaiagentsagree,
  title={Can AI Agents Agree?},
  author={Frédéric Berdoz and Leonardo Rugli and Roger Wattenhofer},
  year={2026},
  eprint={2603.01213},
  archivePrefix={arXiv},
  primaryClass={cs.MA},
  doi={10.48550/arXiv.2603.01213}
}