Skip to content

Agent Output Types Examples with Vision Capabilities

This example demonstrates how to use different output types when working with Swarms agents, including vision-enabled agents that can analyze images. Each output type formats the agent's response in a specific way, making it easier to integrate with different parts of your application.

Prerequisites

  • Python 3.7+
  • OpenAI API key
  • Anthropic API key (optional, for Claude models)
  • Swarms library

Installation

pip3 install -U swarms

Environment Variables

WORKSPACE_DIR="agent_workspace"
OPENAI_API_KEY=""  # Required for GPT-4V vision capabilities
ANTHROPIC_API_KEY=""  # Optional, for Claude models

Examples

Vision-Enabled Quality Control Agent

from swarms.structs import Agent
from swarms.prompts.logistics import (
    Quality_Control_Agent_Prompt,
)

# Image for analysis
factory_image = "image.jpg"


# Quality control agent
quality_control_agent = Agent(
    agent_name="Quality Control Agent",
    agent_description="A quality control agent that analyzes images and provides a detailed report on the quality of the product in the image.",
    model_name="gpt-4.1-mini",
    system_prompt=Quality_Control_Agent_Prompt,
    multi_modal=True,
    max_loops=2,
    output_type="str-all-except-first",
)


response = quality_control_agent.run(
    task="what is in the image?",
    img=factory_image,
)

print(response)

Supported Image Formats

The vision-enabled agents support various image formats including:

Format Description
JPEG/JPG Standard image format with lossy compression
PNG Lossless format supporting transparency
GIF Animated format (only first frame used)
WebP Modern format with both lossy and lossless compression

Best Practices for Vision Tasks

Best Practice Description
Image Quality Ensure images are clear and well-lit for optimal analysis
Image Size Keep images under 20MB and in supported formats
Task Specificity Provide clear, specific instructions for image analysis
Model Selection Use vision-capable models (e.g., GPT-4V) for image tasks