Skip to content

Vision Processing Examples

This example demonstrates how to use vision-enabled agents in Swarms to analyze images and process visual information. You'll learn how to work with both OpenAI and Anthropic vision models for various use cases.

Prerequisites

  • Python 3.7+

  • OpenAI API key (for GPT-4V)

  • Anthropic API key (for Claude 3)

  • Swarms library

Installation

pip3 install -U swarms

Environment Variables

WORKSPACE_DIR="agent_workspace"
OPENAI_API_KEY=""  # Required for GPT-4V
ANTHROPIC_API_KEY=""  # Required for Claude 3

Working with Images

Supported Image Formats

Vision-enabled agents support various image formats:

Format Description
JPEG/JPG Standard image format with lossy compression
PNG Lossless format supporting transparency
GIF Animated format (only first frame used)
WebP Modern format with both lossy and lossless compression

Image Guidelines

  • Maximum file size: 20MB
  • Recommended resolution: At least 512x512 pixels
  • Image should be clear and well-lit
  • Avoid heavily compressed or blurry images

Examples

1. Quality Control with GPT-4V

from swarms.structs import Agent
from swarms.prompts.logistics import Quality_Control_Agent_Prompt

# Load your image
factory_image = "path/to/your/image.jpg"  # Local file path
# Or use a URL
# factory_image = "https://example.com/image.jpg"

# Initialize quality control agent with GPT-4V
quality_control_agent = Agent(
    agent_name="Quality Control Agent",
    agent_description="A quality control agent that analyzes images and provides detailed quality reports.",
    model_name="gpt-4.1-mini",
    system_prompt=Quality_Control_Agent_Prompt,
    multi_modal=True,
    max_loops=1
)

# Run the analysis
response = quality_control_agent.run(
    task="Analyze this image and provide a detailed quality control report",
    img=factory_image
)

print(response)

2. Visual Analysis with Claude 3

from swarms.structs import Agent
from swarms.prompts.logistics import Visual_Analysis_Prompt

# Load your image
product_image = "path/to/your/product.jpg"

# Initialize visual analysis agent with Claude 3
visual_analyst = Agent(
    agent_name="Visual Analyst",
    agent_description="An agent that performs detailed visual analysis of products and scenes.",
    model_name="anthropic/claude-3-opus-20240229",
    system_prompt=Visual_Analysis_Prompt,
    multi_modal=True,
    max_loops=1
)

# Run the analysis
response = visual_analyst.run(
    task="Provide a comprehensive analysis of this product image",
    img=product_image
)

print(response)

3. Image Batch Processing

from swarms.structs import Agent
import os

def process_image_batch(image_folder, agent):
    """Process multiple images in a folder"""
    results = []
    for image_file in os.listdir(image_folder):
        if image_file.lower().endswith(('.png', '.jpg', '.jpeg', '.webp')):
            image_path = os.path.join(image_folder, image_file)
            response = agent.run(
                task="Analyze this image",
                img=image_path
            )
            results.append((image_file, response))
    return results

# Example usage
image_folder = "path/to/image/folder"
batch_results = process_image_batch(image_folder, visual_analyst)

Best Practices

Category Best Practice Description
Image Preparation Format Support Ensure images are in supported formats (JPEG, PNG, GIF, WebP)
Size & Quality Optimize image size and quality for better processing
Image Quality Use clear, well-lit images for accurate analysis
Model Selection GPT-4V Usage Use for general vision tasks and detailed analysis
Claude 3 Usage Use for complex reasoning and longer outputs
Batch Processing Consider batch processing for multiple images
Error Handling Path Validation Always validate image paths before processing
API Error Handling Implement proper error handling for API calls
Rate Monitoring Monitor API rate limits and token usage
Performance Optimization Result Caching Cache results when processing the same images
Batch Processing Use batch processing for multiple images
Parallel Processing Implement parallel processing for large datasets