GroupChat module.
Overview
TheGroupChat module (swarms/structs/groupchat.py) implements an
asynchronous, self-selecting group conversation among autonomous language-model
agents. Unlike round-robin or speaker-selection schemes, where an orchestrator
picks who talks next, GroupChat has no global turn order. Every agent observes
every message, independently rates how much it wants to respond, and speaks only
when that desire clears a threshold. The result behaves less like a moderated
panel and more like an open room: several agents may react to the same remark at
once, and an agent with nothing to add simply stays quiet.
This document explains the architecture in depth, formalizes the system as a
concurrent actor model, and proves a sequence of properties: linearizability of
the shared transcript, guaranteed termination, bounded stop latency, and, most
interestingly, a branching-process characterization of conversation dynamics
that yields a sharp criticality threshold separating chats that die out from
chats that run to their hard cap. The final section provides a complete, runnable
program that exercises the full API.
The design goal is worth stating plainly. Most multi-agent chat frameworks impose
coordination from the outside: a controller computes a speaking order, or a
manager agent nominates the next speaker. That central coordinator is a
bottleneck, a single point of failure, and a source of artificial serialization.
GroupChat removes the coordinator entirely. Coordination becomes emergent: it
arises from many independent local decisions rather than one global schedule. The
mathematics below is what lets us reason about whether that emergent behavior is
stable.
Architecture
The actor model
GroupChat is a textbook actor system. Each agent is an actor with a private
mailbox, implemented as an asyncio.Queue:
_agent_loop, plus one
_idle_monitor coroutine. There is no shared mutable conversation pointer passed
between agents; instead, communication happens exclusively by message passing into
mailboxes. This is the defining property of the actor model, and it is what makes
the concurrency tractable. An actor never reaches into another actor’s state; it
only drops a message into a queue.
Shared bookkeeping (the transcript, a message counter, a last-activity timestamp,
a stop event, and a lock) lives in a single state dictionary:
_run_async, proceeds as follows. First, seed the conversation
with the user task through _broadcast with score=None (the seed is attributed
to the synthetic sender "User"). Next, spawn all agent loops and the idle
monitor as concurrent tasks. Then block on await state["stop"].wait(). When the
stop event fires, cancel every coroutine, gather them with
return_exceptions=True so cancellation does not raise, and return the formatted
transcript through history_output_formatter.
The respond protocol
The central design problem is making a speaking decision machine-readable. A free-form “should I talk?” answer would require brittle natural-language parsing, and any parsing error would corrupt the control flow of the whole room. Instead,GroupChat forces every agent to emit a structured decision through a
function-calling schema, RESPOND_TOOL:
_ensure_respond_tool auto-injects this schema into any agent missing it
(controlled by the auto_equip flag) and rebuilds the agent’s LLM client so the
tool is bound at construction time:
tools_list_dictionary after the fact would
otherwise have no effect until the client is regenerated through llm_handling().
The decision prompt, DECIDE_PROMPT, is deliberately biased toward silence. It
opens with “Silence is the default. Most messages do NOT warrant a reply,” and it
enumerates the few conditions that justify a high score (direct expertise, being
addressed by name, a factual error to correct, a concrete next step) against the
many that justify staying quiet (off-topic, redundant, mere agreement, piling on a
converging thread). This bias is not cosmetic. The branching-process analysis
below proves it is exactly what keeps the system stable.
Decision extraction
Provider outputs vary. Some return a single tool-call dictionary, some return a list of them, and occasionally a model emits malformed JSON in the arguments field._extract_args normalizes all of these into a clean pair,
defaulting to the silent decision on any malformed output,
and clamping the score into range:
Formal model
Let the agent set be with . The minimum of two is enforced at construction, because each message is broadcast to “other” agents and a one-agent room would have no audience:- threshold (
threshold), - hard message cap (
max_loops), - idle timeout seconds (
idle_timeout), - inbox poll period s and monitor poll period s.
_agent_loop:
Correctness properties
Linearizability of the transcript
Theorem 1 (Serialized history). Every transcript append and every read of the full history occurs in a total order consistent with real time. No two appends interleave, and the message counter is never lost to a race. Proof. Both the write path (_broadcast) and the read path (the history
snapshot in _agent_loop) acquire state["lock"]:
asyncio runs all coroutines on a single event-loop thread, and the lock
is held across the read-modify-write of message_count together with the
conversation.add call, these critical sections are mutually exclusive and
execute atomically with respect to one another. The lock therefore induces a total
order on all transcript-mutating sections, which is the definition of linearizable
history. Crucially, the queue fan-out (inbox.put) happens after the lock is
released, so message delivery does not extend the critical section and waiting
agents proceed concurrently.
A subtle and intentional consequence: an agent’s history snapshot is a consistent
prefix of the transcript, but not necessarily the most recent one. Another agent
may publish between the moment a snapshot is taken and the moment the snapshotting
agent finishes its own decision. This is by design. It is exactly what allows two
agents to react to the same message genuinely concurrently, instead of forcing one
to wait for the other.
Termination
Theorem 2 (Guaranteed termination). For any inputs,run halts in finite
time.
Proof. There are two independent stop guarantees, and either one suffices on its
own.
(i) Counter bound. Every published message increments message_count under the
lock, and the broadcast sets the stop event once the cap is reached. Since the
counter is monotonically non-decreasing and increments by exactly one per
publication, after at most publications the stop event is set. Hence the total
number of posted messages satisfies the hard bound
(ii) Idle bound. Suppose fewer than messages are ever posted. Then there is
a last publication at wall-clock time , after which last_activity is never
updated again. The _idle_monitor wakes every s and fires when
state["stop"].wait() returns and the coroutines are cancelled.
Corollary 2.1 (Stop latency). The conversation ends within seconds
of the final message. Idle detection latency is bounded by one timeout plus one
monitor poll period, approximately s.
Corollary 2.2 (No cancellation deadlock). Agent loops poll their inboxes with
a finite timeout and re-check the stop flag on every wakeup:
get is wrapped in wait_for with timeout , no agent
can be permanently parked on an empty queue. Every agent observes the stop event
within seconds, so the final asyncio.gather over the cancelled tasks
completes promptly.
Conversation dynamics as a branching process
The richest behavior ofGroupChat is amplification. Each posted message is
delivered to agents, each of whom may reply, and each reply is itself
delivered to agents. This recursive fan-out is precisely a Galton-Watson
branching process, and analyzing it tells us when a chat fizzles versus when it
runs all the way to the cap.
The reply probability
Model each agent’s decision on a given inbound message as an independent Bernoulli trial. Let be the probability that a single agent publishes in response to a single message. Then the number of replies that one message provokes across the room is We call the branching factor, the basic reproduction number of the conversation. It is the expected number of “child” messages each message produces.The criticality threshold
Theorem 3 (Criticality). Treat the conversation, ignoring the cap , as a Galton-Watson process seeded by the user task. Then:- If (subcritical), the conversation goes extinct with probability 1.
- If (supercritical), it survives forever with positive probability.
- The critical per-agent reply probability is .
DECIDE_PROMPT. Consider
agents: the critical reply probability is . If the prompt let
agents reply more than a quarter of the time on average, the room would be
supercritical. It would generate messages faster than they drain and would
reliably slam into max_loops regardless of whether anything useful was being
said. By pushing the default decision toward silence, so that the realized
sits well below , the design keeps the system subcritical. Conversations
then terminate on substance, through the idle timeout, rather than on the
mechanical hard cap.
Expected conversation length
Theorem 4 (Expected message count, subcritical case). In the subcritical regime , ignoring the cap, the expected total number of messages, including the seed, is Proof. The total progeny of a Galton-Watson process started from a single individual satisfies for , where the sum counts the root generation. The root here is the user task. Reinstating the hard cap, the realized expected length is so behaves as a safety valve that binds only when the room is near critical. For a concrete feel, take . At , below , we get and an average of messages, a brief exchange that ends by idle timeout. At , just under critical, and the expectation balloons to messages, the point wheremax_loops finally starts
doing real work.
Effect of the threshold and room size
The threshold controls monotonically. Let be the probability that an agent emits a non-empty message at all, and let be the conditional CDF of its score. Then which is non-increasing in . Raisingthreshold lowers and pushes the
system deeper into the subcritical (quieter, shorter) regime; lowering it does the
opposite. The criticality identity makes the tradeoff
explicit, and it exposes a scaling law: adding agents raises the branching
factor linearly. To hold fixed as the room grows, the threshold must rise
so that . A threshold tuned for three agents will be too
permissive for eight.
Concurrency and latency
Each speaking decision is a blocking LLM call dispatched off the event loop:asyncio.to_thread means up to decisions evaluate concurrently rather
than serially, so one slow model call cannot stall the room. If a single decision
takes wall-clock time and a message provokes responders, those
responders’ decisions overlap, so one “generation” of the conversation costs
roughly rather than , provided the thread pool has at least free
workers. Python’s default executor caps at about threads,
so for very large rooms the effective concurrency is bounded by that pool and
generations begin to serialize.
End to end, the subcritical regime has expected branching depth on the order of a
small constant (since ), so wall-clock time scales with conversation
depth times , not total message count times . The practical takeaway is
that latency is governed by how many rounds of back-and-forth occur, not by the
raw number of messages, because messages within a round overlap.
Practical implications
The theory yields concrete tuning guidance.- Keep the room subcritical. Choose and so that the realized
branching factor stays below one. This is the difference
between a chat that converges and one that stops only because it exhausted
max_loops. - Scale the threshold with room size. Because grows linearly in , a
threshold that works for three agents will be too permissive for eight. Raise
thresholdas you add participants. - Treat
max_loopsas a circuit breaker, not a length dial. In the intended subcritical regime it rarely binds. If your chats consistently hit it, your room is supercritical and the fix is a higher threshold, not a larger cap. - Use
idle_timeoutto set tail latency. By Corollary 2.1, the chat lingers up to s after the final message. Lower for snappier termination, at the cost of possibly cutting off agents that are still composing a reply. - Disable persistent memory on participants. Each agent runs many short
decision calls; per-agent
persistent_memory=Falseandmax_loops=1keep those calls cheap and stateless, leaving the transcript as the single source of shared context.
Complete worked example
The following program builds a four-agent room, runs a discussion, and inspects the result. It is fully runnable once an LLM API key is set in the environment.What to expect when you run it
The seed task is posted asUser and fanned out to all four inboxes. Each agent
independently asks its model, through the forced respond call, how strongly it
wants to speak. Agents whose score exceeds 0.6 and who produce a non-empty
message broadcast their reply, which then wakes every other inbox. Because the
room is subcritical (Theorem 3 with and a selective threshold), the
discussion will typically run for a handful of substantive exchanges and then go
quiet. After idle_timeout seconds of silence, the idle monitor sets the stop
event (Corollary 2.1), all coroutines are cancelled, and run returns the
formatted transcript. If the room were tuned to be supercritical instead, the same
program would terminate by hitting the max_loops=12 cap rather than by going
idle.
Summary
GroupChat is a single-event-loop actor system whose shared transcript is
linearizable (Theorem 1) and whose execution always terminates, bounded jointly by
a message cap and an idle timeout (Theorem 2, with bounded stop latency in
Corollary 2.1 and no cancellation deadlock in Corollary 2.2). Its emergent
dynamics are exactly those of a Galton-Watson branching process with branching
factor , which yields a sharp criticality threshold
(Theorem 3) and a closed-form expected length in the
stable regime (Theorem 4). The silence-biased decision prompt is best understood
not as a stylistic choice but as a control mechanism that holds the system below
criticality, which is what makes the chat end on substance rather than on a hard
limit.