Documentation Index
Fetch the complete documentation index at: https://docs.swarms.world/llms.txt
Use this file to discover all available pages before exploring further.
Web scraper agents combine an LLM with the scrape_and_format_sync tool from swarms-tools. The agent decides what to scrape, the tool handles HTML parsing and formatting, and the LLM produces structured output — JSON, markdown, free text, or whatever the system prompt requests.
| Capability | Description |
|---|
| Auto-navigation | Pull relevant content from web pages without writing selectors. |
| Structured parsing | Convert HTML into clean text/markdown/JSON. |
| Dynamic content | Handles JS-rendered pages and dynamic elements. |
| Summarisation & analysis | LLM produces summaries, comparisons, and analyses on top of the scraped content. |
| Batched scaling | Run many scrape jobs in parallel for comprehensive research. |
Step 1: Install
pip3 install -U swarms swarms-tools
Step 2: Set up environment
export OPENAI_API_KEY="..."
Step 3: Build a single-site scraper agent
from swarms import Agent
from swarms_tools import scrape_and_format_sync
agent = Agent(
agent_name="Web Scraper Agent",
model_name="gpt-5.4",
tools=[scrape_and_format_sync],
dynamic_context_window=True,
dynamic_temperature_enabled=True,
max_loops=1,
system_prompt=(
"You are a web scraper agent. You are given a URL and you need to scrape "
"the website and return the data in a structured format. The format type should be full"
),
)
out = agent.run(
"Scrape swarms.ai website and provide a full report of the company does. "
"The format type should be full."
)
print(out)
Step 4: Scale to multiple sites in parallel
batched_grid_agent_execution runs N agents on N tasks concurrently. Use it when you need to scrape several sites at once — for example, a competitive landscape report.
from swarms import Agent
from swarms_tools import scrape_and_format_sync
from swarms.structs.multi_agent_exec import batched_grid_agent_execution
agent = Agent(
agent_name="Web Scraper Agent",
model_name="gpt-5.4",
tools=[scrape_and_format_sync],
dynamic_context_window=True,
dynamic_temperature_enabled=True,
max_loops=1,
system_prompt=(
"You are a web scraper agent. You are given a URL and you need to scrape "
"the website and return the data in a structured format. The format type should be full"
),
)
out = batched_grid_agent_execution(
agents=[agent, agent],
tasks=[
"Scrape swarms.ai website and provide a full report of the company's mission, "
"products, and team. The format type should be full.",
"Scrape langchain.com website and provide a full report of the company's mission, "
"products, and team. The format type should be full.",
],
)
print(out)
You can pass distinct agents per task as well — useful when each site needs a different system prompt or model.
See also