Language Model Interface Documentation¶
Table of Contents¶
- Introduction
- Abstract Language Model
- Initialization
- Attributes
- Methods
- Implementation
- Usage Examples
- Additional Features
- Performance Metrics
- Logging and Checkpoints
- Resource Utilization Tracking
- Conclusion
1. Introduction ¶
The Language Model Interface (BaseLLM
) is a flexible and extensible framework for working with various language models. This documentation provides a comprehensive guide to the interface, its attributes, methods, and usage examples. Whether you're using a pre-trained language model or building your own, this interface can help streamline the process of text generation, chatbots, summarization, and more.
2. Abstract Language Model ¶
Initialization ¶
The BaseLLM
class provides a common interface for language models. It can be initialized with various parameters to customize model behavior. Here are the initialization parameters:
Parameter | Description | Default Value |
---|---|---|
model_name |
The name of the language model to use. | None |
max_tokens |
The maximum number of tokens in the generated text. | None |
temperature |
The temperature parameter for controlling randomness in text generation. | None |
top_k |
The top-k parameter for filtering words in text generation. | None |
top_p |
The top-p parameter for filtering words in text generation. | None |
system_prompt |
A system-level prompt to set context for generation. | None |
beam_width |
The beam width for beam search. | None |
num_return_sequences |
The number of sequences to return in the output. | None |
seed |
The random seed for reproducibility. | None |
frequency_penalty |
The frequency penalty parameter for promoting word diversity. | None |
presence_penalty |
The presence penalty parameter for discouraging repetitions. | None |
stop_token |
A stop token to indicate the end of generated text. | None |
length_penalty |
The length penalty parameter for controlling the output length. | None |
role |
The role of the language model (e.g., assistant, user, etc.). | None |
max_length |
The maximum length of generated sequences. | None |
do_sample |
Whether to use sampling during text generation. | None |
early_stopping |
Whether to use early stopping during text generation. | None |
num_beams |
The number of beams to use in beam search. | None |
repition_penalty |
The repetition penalty parameter for discouraging repeated tokens. | None |
pad_token_id |
The token ID for padding. | None |
eos_token_id |
The token ID for the end of a sequence. | None |
bos_token_id |
The token ID for the beginning of a sequence. | None |
device |
The device to run the model on (e.g., 'cpu' or 'cuda'). | None |
Attributes ¶
model_name
: The name of the language model being used.max_tokens
: The maximum number of tokens in generated text.temperature
: The temperature parameter controlling randomness.top_k
: The top-k parameter for word filtering.top_p
: The top-p parameter for word filtering.system_prompt
: A system-level prompt for context.beam_width
: The beam width for beam search.num_return_sequences
: The number of output sequences.seed
: The random seed for reproducibility.frequency_penalty
: The frequency penalty parameter.presence_penalty
: The presence penalty parameter.stop_token
: The stop token to indicate text end.length_penalty
: The length penalty parameter.role
: The role of the language model.max_length
: The maximum length of generated sequences.do_sample
: Whether to use sampling during generation.early_stopping
: Whether to use early stopping.num_beams
: The number of beams in beam search.repition_penalty
: The repetition penalty parameter.pad_token_id
: The token ID for padding.eos_token_id
: The token ID for the end of a sequence.bos_token_id
: The token ID for the beginning of a sequence.device
: The device used for model execution.history
: A list of conversation history.
Methods ¶
The BaseLLM
class defines several methods for working with language models:
-
run(task: Optional[str] = None, *args, **kwargs) -> str
: Generate text using the language model. This method is abstract and must be implemented by subclasses. -
arun(task: Optional[str] = None, *args, **kwargs)
: An asynchronous version ofrun
for concurrent text generation. -
batch_run(tasks: List[str], *args, **kwargs)
: Generate text for a batch of tasks. -
abatch_run(tasks: List[str], *args, **kwargs)
: An asynchronous version ofbatch_run
for concurrent batch generation. -
chat(task: str, history: str = "") -> str
: Conduct a chat with the model, providing a conversation history. -
__call__(task: str) -> str
: Call the model to generate text. -
_tokens_per_second() -> float
: Calculate tokens generated per second. -
_num_tokens(text: str) -> int
: Calculate the number of tokens in a text. -
_time_for_generation(task: str) -> float
: Measure the time taken for text generation. -
generate_summary(text: str) -> str
: Generate a summary of the provided text. -
set_temperature(value: float)
: Set the temperature parameter. -
set_max_tokens(value: int)
: Set the maximum number of tokens. -
clear_history()
: Clear the conversation history. -
enable_logging(log_file: str = "model.log")
: Initialize logging for the model. -
log_event(message: str)
: Log an event. -
save_checkpoint(checkpoint_dir: str = "checkpoints")
: Save the model state as a checkpoint. -
load_checkpoint(checkpoint_path: str)
: Load the model state from a checkpoint. -
toggle_creative_mode(enable: bool)
: Toggle creative mode for the model. -
track_resource_utilization()
: Track and report resource utilization. -
`
get_generation_time() -> float`: Get the time taken for text generation.
-
set_max_length(max_length: int)
: Set the maximum length of generated sequences. -
set_model_name(model_name: str)
: Set the model name. -
set_frequency_penalty(frequency_penalty: float)
: Set the frequency penalty parameter. -
set_presence_penalty(presence_penalty: float)
: Set the presence penalty parameter. -
set_stop_token(stop_token: str)
: Set the stop token. -
set_length_penalty(length_penalty: float)
: Set the length penalty parameter. -
set_role(role: str)
: Set the role of the model. -
set_top_k(top_k: int)
: Set the top-k parameter. -
set_top_p(top_p: float)
: Set the top-p parameter. -
set_num_beams(num_beams: int)
: Set the number of beams. -
set_do_sample(do_sample: bool)
: Set whether to use sampling. -
set_early_stopping(early_stopping: bool)
: Set whether to use early stopping. -
set_seed(seed: int)
: Set the random seed. -
set_device(device: str)
: Set the device for model execution.
3. Implementation ¶
The BaseLLM
class serves as the base for implementing specific language models. Subclasses of BaseLLM
should implement the run
method to define how text is generated for a given task. This design allows flexibility in integrating different language models while maintaining a common interface.
4. Usage Examples ¶
To demonstrate how to use the BaseLLM
interface, let's create an example using a hypothetical language model. We'll initialize an instance of the model and generate text for a simple task.
# Import the BaseLLM class
from swarm_models import BaseLLM
# Create an instance of the language model
language_model = BaseLLM(
model_name="my_language_model",
max_tokens=50,
temperature=0.7,
top_k=50,
top_p=0.9,
device="cuda",
)
# Generate text for a task
task = "Translate the following English text to French: 'Hello, world.'"
generated_text = language_model.run(task)
# Print the generated text
print(generated_text)
In this example, we've created an instance of our hypothetical language model, configured its parameters, and used the run
method to generate text for a translation task.
5. Additional Features ¶
The BaseLLM
interface provides additional features for customization and control:
batch_run
: Generate text for a batch of tasks efficiently.arun
andabatch_run
: Asynchronous versions ofrun
andbatch_run
for concurrent text generation.chat
: Conduct a conversation with the model by providing a history of the conversation.__call__
: Allow the model to be called directly to generate text.
These features enhance the flexibility and utility of the interface in various applications, including chatbots, language translation, and content generation.
6. Performance Metrics ¶
The BaseLLM
class offers methods for tracking performance metrics:
_tokens_per_second
: Calculate tokens generated per second._num_tokens
: Calculate the number of tokens in a text._time_for_generation
: Measure the time taken for text generation.
These metrics help assess the efficiency and speed of text generation, enabling optimizations as needed.
7. Logging and Checkpoints ¶
Logging and checkpointing are crucial for tracking model behavior and ensuring reproducibility:
enable_logging
: Initialize logging for the model.log_event
: Log events and activities.save_checkpoint
: Save the model state as a checkpoint.load_checkpoint
: Load the model state from a checkpoint.
These capabilities aid in debugging, monitoring, and resuming model experiments.
8. Resource Utilization Tracking ¶
The track_resource_utilization
method is a placeholder for tracking and reporting resource utilization, such as CPU and memory usage. It can be customized to suit specific monitoring needs.
9. Conclusion ¶
The Language Model Interface (BaseLLM
) is a versatile framework for working with language models. Whether you're using pre-trained models or developing your own, this interface provides a consistent and extensible foundation. By following the provided guidelines and examples, you can integrate and customize language models for various natural language processing tasks.