Skip to content

ChromaDB Documentation

ChromaDB is a specialized module designed to facilitate the storage and retrieval of documents using the ChromaDB system. It offers functionalities for adding documents to a local ChromaDB collection and querying this collection based on provided query texts. This module integrates with the ChromaDB client to create and manage collections, leveraging various configurations for optimizing the storage and retrieval processes.

Parameters

Parameter Type Default Description
metric str "cosine" The similarity metric to use for the collection.
output_dir str "swarms" The name of the collection to store the results in.
limit_tokens Optional[int] 1000 The maximum number of tokens to use for the query.
n_results int 1 The number of results to retrieve.
docs_folder Optional[str] None The folder containing documents to be added to the collection.
verbose bool False Flag to enable verbose logging for debugging.
*args tuple () Additional positional arguments.
**kwargs dict {} Additional keyword arguments.

Methods

Method Description
__init__ Initializes the ChromaDB instance with specified parameters.
add Adds a document to the ChromaDB collection.
query Queries documents from the ChromaDB collection based on the query text.
traverse_directory Traverses the specified directory to add documents to the collection.

Usage

from swarms_memory import ChromaDB

chromadb = ChromaDB(
    metric="cosine",
    output_dir="results",
    limit_tokens=1000,
    n_results=2,
    docs_folder="path/to/docs",
    verbose=True,
)

Adding Documents

The add method allows you to add a document to the ChromaDB collection. It generates a unique ID for each document and adds it to the collection.

Parameters

Parameter Type Default Description
document str - The document to be added to the collection.
*args tuple () Additional positional arguments.
**kwargs dict {} Additional keyword arguments.

Returns

Type Description
str The ID of the added document.

Example

task = "example_task"
result = "example_result"
result_id = chromadb.add(document="This is a sample document.")
print(f"Document ID: {result_id}")

Querying Documents

The query method allows you to retrieve documents from the ChromaDB collection based on the provided query text.

Parameters

Parameter Type Default Description
query_text str - The query string to search for.
*args tuple () Additional positional arguments.
**kwargs dict {} Additional keyword arguments.

Returns

Type Description
str The retrieved documents as a string.

Example

query_text = "search term"
results = chromadb.query(query_text=query_text)
print(f"Retrieved Documents: {results}")

Traversing Directory

The traverse_directory method traverses through every file in the specified directory and its subdirectories, adding the contents of each file to the ChromaDB collection.

Example

chromadb.traverse_directory()

Additional Information and Tips

Verbose Logging

Enable the verbose flag during initialization to get detailed logs of the operations, which is useful for debugging.

chromadb = ChromaDB(verbose=True)

Handling Large Documents

When dealing with large documents, consider using the limit_tokens parameter to restrict the number of tokens processed in a single query.

chromadb = ChromaDB(limit_tokens=500)

Optimizing Query Performance

Use the appropriate similarity metric (metric parameter) that suits your use case for optimal query performance.

chromadb = ChromaDB(metric="euclidean")

References and Resources

By following this documentation, users can effectively utilize the ChromaDB module for managing document storage and retrieval in their applications.