HuggingFaceLLM
HuggingfaceLLM
Documentation¶
Introduction¶
The HuggingfaceLLM
class is designed for running inference using models from the Hugging Face Transformers library. This documentation provides an in-depth understanding of the class, its purpose, attributes, methods, and usage examples.
Purpose¶
The HuggingfaceLLM
class serves the following purposes:
- Load pre-trained Hugging Face models and tokenizers.
- Generate text-based responses from the loaded model using a given prompt.
- Provide flexibility in device selection, quantization, and other configuration options.
Class Definition¶
The HuggingfaceLLM
class is defined as follows:
class HuggingfaceLLM:
def __init__(
self,
model_id: str,
device: str = None,
max_length: int = 20,
quantize: bool = False,
quantization_config: dict = None,
verbose=False,
distributed=False,
decoding=False,
):
# Attributes and initialization logic explained below
pass
def load_model(self):
# Method to load the pre-trained model and tokenizer
pass
def run(self, prompt_text: str, max_length: int = None):
# Method to generate text-based responses
pass
def __call__(self, prompt_text: str, max_length: int = None):
# Alternate method for generating text-based responses
pass
Attributes¶
Attribute | Description |
---|---|
model_id |
The ID of the pre-trained model to be used. |
device |
The device on which the model runs ('cuda' for GPU or 'cpu' for CPU). |
max_length |
The maximum length of the generated text. |
quantize |
A boolean indicating whether quantization should be used. |
quantization_config |
A dictionary with configuration options for quantization. |
verbose |
A boolean indicating whether verbose logs should be printed. |
logger |
An optional logger for logging messages (defaults to a basic logger). |
distributed |
A boolean indicating whether distributed processing should be used. |
decoding |
A boolean indicating whether to perform decoding during text generation. |
Class Methods¶
__init__
Method¶
The __init__
method initializes an instance of the HuggingfaceLLM
class with the specified parameters. It also loads the pre-trained model and tokenizer.
model_id
(str): The ID of the pre-trained model to use.device
(str, optional): The device to run the model on ('cuda' or 'cpu').max_length
(int, optional): The maximum length of the generated text.quantize
(bool, optional): Whether to use quantization.quantization_config
(dict, optional): Configuration for quantization.verbose
(bool, optional): Whether to print verbose logs.logger
(logging.Logger, optional): The logger to use.distributed
(bool, optional): Whether to use distributed processing.decoding
(bool, optional): Whether to perform decoding during text generation.
load_model
Method¶
The load_model
method loads the pre-trained model and tokenizer specified by model_id
.
run
and __call__
Methods¶
Both run
and __call__
methods generate text-based responses based on a given prompt. They accept the following parameters:
prompt_text
(str): The text prompt to initiate text generation.max_length
(int, optional): The maximum length of the generated text.
Usage Examples¶
Here are three ways to use the HuggingfaceLLM
class:
Example 1: Basic Usage¶
from swarm_models import HuggingfaceLLM
# Initialize the HuggingfaceLLM instance with a model ID
model_id = "NousResearch/Nous-Hermes-2-Vision-Alpha"
inference = HuggingfaceLLM(model_id=model_id)
# Generate text based on a prompt
prompt_text = "Once upon a time"
generated_text = inference(prompt_text)
print(generated_text)
Example 2: Custom Configuration¶
from swarm_models import HuggingfaceLLM
# Initialize with custom configuration
custom_config = {
"quantize": True,
"quantization_config": {"load_in_4bit": True},
"verbose": True,
}
inference = HuggingfaceLLM(
model_id="NousResearch/Nous-Hermes-2-Vision-Alpha", **custom_config
)
# Generate text based on a prompt
prompt_text = "Tell me a joke"
generated_text = inference(prompt_text)
print(generated_text)
Example 3: Distributed Processing¶
from swarm_models import HuggingfaceLLM
# Initialize for distributed processing
inference = HuggingfaceLLM(model_id="gpt2-medium", distributed=True)
# Generate text based on a prompt
prompt_text = "Translate the following sentence to French"
generated_text = inference(prompt_text)
print(generated_text)
Additional Information¶
- The
HuggingfaceLLM
class provides the flexibility to load and use pre-trained models from the Hugging Face Transformers library. - Quantization can be enabled to reduce model size and inference time.
- Distributed processing can be used for parallelized inference.
- Verbose logging can help in debugging and understanding the text generation process.
References¶
This documentation provides a comprehensive understanding of the HuggingfaceLLM
class, its attributes, methods, and usage examples. Developers can use this class to perform text generation tasks efficiently using pre-trained models from the Hugging Face Transformers library.