Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.swarms.world/llms.txt

Use this file to discover all available pages before exploring further.

Web scraper agents combine an LLM with the scrape_and_format_sync tool from swarms-tools. The agent decides what to scrape, the tool handles HTML parsing and formatting, and the LLM produces structured output — JSON, markdown, free text, or whatever the system prompt requests.
CapabilityDescription
Auto-navigationPull relevant content from web pages without writing selectors.
Structured parsingConvert HTML into clean text/markdown/JSON.
Dynamic contentHandles JS-rendered pages and dynamic elements.
Summarisation & analysisLLM produces summaries, comparisons, and analyses on top of the scraped content.
Batched scalingRun many scrape jobs in parallel for comprehensive research.

Step 1: Install

pip3 install -U swarms swarms-tools

Step 2: Set up environment

export OPENAI_API_KEY="..."

Step 3: Build a single-site scraper agent

from swarms import Agent
from swarms_tools import scrape_and_format_sync

agent = Agent(
    agent_name="Web Scraper Agent",
    model_name="gpt-5.4",
    tools=[scrape_and_format_sync],
    dynamic_context_window=True,
    dynamic_temperature_enabled=True,
    max_loops=1,
    system_prompt=(
        "You are a web scraper agent. You are given a URL and you need to scrape "
        "the website and return the data in a structured format. The format type should be full"
    ),
)

out = agent.run(
    "Scrape swarms.ai website and provide a full report of the company does. "
    "The format type should be full."
)
print(out)

Step 4: Scale to multiple sites in parallel

batched_grid_agent_execution runs N agents on N tasks concurrently. Use it when you need to scrape several sites at once — for example, a competitive landscape report.
from swarms import Agent
from swarms_tools import scrape_and_format_sync
from swarms.structs.multi_agent_exec import batched_grid_agent_execution

agent = Agent(
    agent_name="Web Scraper Agent",
    model_name="gpt-5.4",
    tools=[scrape_and_format_sync],
    dynamic_context_window=True,
    dynamic_temperature_enabled=True,
    max_loops=1,
    system_prompt=(
        "You are a web scraper agent. You are given a URL and you need to scrape "
        "the website and return the data in a structured format. The format type should be full"
    ),
)

out = batched_grid_agent_execution(
    agents=[agent, agent],
    tasks=[
        "Scrape swarms.ai website and provide a full report of the company's mission, "
        "products, and team. The format type should be full.",
        "Scrape langchain.com website and provide a full report of the company's mission, "
        "products, and team. The format type should be full.",
    ],
)

print(out)
You can pass distinct agents per task as well — useful when each site needs a different system prompt or model.

See also