Skip to content

Language Model Interface Documentation

Table of Contents

  1. Introduction
  2. Abstract Language Model
  3. Initialization
  4. Attributes
  5. Methods
  6. Implementation
  7. Usage Examples
  8. Additional Features
  9. Performance Metrics
  10. Logging and Checkpoints
  11. Resource Utilization Tracking
  12. Conclusion

1. Introduction

The Language Model Interface (BaseLLM) is a flexible and extensible framework for working with various language models. This documentation provides a comprehensive guide to the interface, its attributes, methods, and usage examples. Whether you're using a pre-trained language model or building your own, this interface can help streamline the process of text generation, chatbots, summarization, and more.

2. Abstract Language Model

Initialization

The BaseLLM class provides a common interface for language models. It can be initialized with various parameters to customize model behavior. Here are the initialization parameters:

Parameter Description Default Value
model_name The name of the language model to use. None
max_tokens The maximum number of tokens in the generated text. None
temperature The temperature parameter for controlling randomness in text generation. None
top_k The top-k parameter for filtering words in text generation. None
top_p The top-p parameter for filtering words in text generation. None
system_prompt A system-level prompt to set context for generation. None
beam_width The beam width for beam search. None
num_return_sequences The number of sequences to return in the output. None
seed The random seed for reproducibility. None
frequency_penalty The frequency penalty parameter for promoting word diversity. None
presence_penalty The presence penalty parameter for discouraging repetitions. None
stop_token A stop token to indicate the end of generated text. None
length_penalty The length penalty parameter for controlling the output length. None
role The role of the language model (e.g., assistant, user, etc.). None
max_length The maximum length of generated sequences. None
do_sample Whether to use sampling during text generation. None
early_stopping Whether to use early stopping during text generation. None
num_beams The number of beams to use in beam search. None
repition_penalty The repetition penalty parameter for discouraging repeated tokens. None
pad_token_id The token ID for padding. None
eos_token_id The token ID for the end of a sequence. None
bos_token_id The token ID for the beginning of a sequence. None
device The device to run the model on (e.g., 'cpu' or 'cuda'). None

Attributes

  • model_name: The name of the language model being used.
  • max_tokens: The maximum number of tokens in generated text.
  • temperature: The temperature parameter controlling randomness.
  • top_k: The top-k parameter for word filtering.
  • top_p: The top-p parameter for word filtering.
  • system_prompt: A system-level prompt for context.
  • beam_width: The beam width for beam search.
  • num_return_sequences: The number of output sequences.
  • seed: The random seed for reproducibility.
  • frequency_penalty: The frequency penalty parameter.
  • presence_penalty: The presence penalty parameter.
  • stop_token: The stop token to indicate text end.
  • length_penalty: The length penalty parameter.
  • role: The role of the language model.
  • max_length: The maximum length of generated sequences.
  • do_sample: Whether to use sampling during generation.
  • early_stopping: Whether to use early stopping.
  • num_beams: The number of beams in beam search.
  • repition_penalty: The repetition penalty parameter.
  • pad_token_id: The token ID for padding.
  • eos_token_id: The token ID for the end of a sequence.
  • bos_token_id: The token ID for the beginning of a sequence.
  • device: The device used for model execution.
  • history: A list of conversation history.

Methods

The BaseLLM class defines several methods for working with language models:

  • run(task: Optional[str] = None, *args, **kwargs) -> str: Generate text using the language model. This method is abstract and must be implemented by subclasses.

  • arun(task: Optional[str] = None, *args, **kwargs): An asynchronous version of run for concurrent text generation.

  • batch_run(tasks: List[str], *args, **kwargs): Generate text for a batch of tasks.

  • abatch_run(tasks: List[str], *args, **kwargs): An asynchronous version of batch_run for concurrent batch generation.

  • chat(task: str, history: str = "") -> str: Conduct a chat with the model, providing a conversation history.

  • __call__(task: str) -> str: Call the model to generate text.

  • _tokens_per_second() -> float: Calculate tokens generated per second.

  • _num_tokens(text: str) -> int: Calculate the number of tokens in a text.

  • _time_for_generation(task: str) -> float: Measure the time taken for text generation.

  • generate_summary(text: str) -> str: Generate a summary of the provided text.

  • set_temperature(value: float): Set the temperature parameter.

  • set_max_tokens(value: int): Set the maximum number of tokens.

  • clear_history(): Clear the conversation history.

  • enable_logging(log_file: str = "model.log"): Initialize logging for the model.

  • log_event(message: str): Log an event.

  • save_checkpoint(checkpoint_dir: str = "checkpoints"): Save the model state as a checkpoint.

  • load_checkpoint(checkpoint_path: str): Load the model state from a checkpoint.

  • toggle_creative_mode(enable: bool): Toggle creative mode for the model.

  • track_resource_utilization(): Track and report resource utilization.

  • `

get_generation_time() -> float`: Get the time taken for text generation.

  • set_max_length(max_length: int): Set the maximum length of generated sequences.

  • set_model_name(model_name: str): Set the model name.

  • set_frequency_penalty(frequency_penalty: float): Set the frequency penalty parameter.

  • set_presence_penalty(presence_penalty: float): Set the presence penalty parameter.

  • set_stop_token(stop_token: str): Set the stop token.

  • set_length_penalty(length_penalty: float): Set the length penalty parameter.

  • set_role(role: str): Set the role of the model.

  • set_top_k(top_k: int): Set the top-k parameter.

  • set_top_p(top_p: float): Set the top-p parameter.

  • set_num_beams(num_beams: int): Set the number of beams.

  • set_do_sample(do_sample: bool): Set whether to use sampling.

  • set_early_stopping(early_stopping: bool): Set whether to use early stopping.

  • set_seed(seed: int): Set the random seed.

  • set_device(device: str): Set the device for model execution.

3. Implementation

The BaseLLM class serves as the base for implementing specific language models. Subclasses of BaseLLM should implement the run method to define how text is generated for a given task. This design allows flexibility in integrating different language models while maintaining a common interface.

4. Usage Examples

To demonstrate how to use the BaseLLM interface, let's create an example using a hypothetical language model. We'll initialize an instance of the model and generate text for a simple task.

# Import the BaseLLM class
from swarm_models import BaseLLM

# Create an instance of the language model
language_model = BaseLLM(
    model_name="my_language_model",
    max_tokens=50,
    temperature=0.7,
    top_k=50,
    top_p=0.9,
    device="cuda",
)

# Generate text for a task
task = "Translate the following English text to French: 'Hello, world.'"
generated_text = language_model.run(task)

# Print the generated text
print(generated_text)

In this example, we've created an instance of our hypothetical language model, configured its parameters, and used the run method to generate text for a translation task.

5. Additional Features

The BaseLLM interface provides additional features for customization and control:

  • batch_run: Generate text for a batch of tasks efficiently.
  • arun and abatch_run: Asynchronous versions of run and batch_run for concurrent text generation.
  • chat: Conduct a conversation with the model by providing a history of the conversation.
  • __call__: Allow the model to be called directly to generate text.

These features enhance the flexibility and utility of the interface in various applications, including chatbots, language translation, and content generation.

6. Performance Metrics

The BaseLLM class offers methods for tracking performance metrics:

  • _tokens_per_second: Calculate tokens generated per second.
  • _num_tokens: Calculate the number of tokens in a text.
  • _time_for_generation: Measure the time taken for text generation.

These metrics help assess the efficiency and speed of text generation, enabling optimizations as needed.

7. Logging and Checkpoints

Logging and checkpointing are crucial for tracking model behavior and ensuring reproducibility:

  • enable_logging: Initialize logging for the model.
  • log_event: Log events and activities.
  • save_checkpoint: Save the model state as a checkpoint.
  • load_checkpoint: Load the model state from a checkpoint.

These capabilities aid in debugging, monitoring, and resuming model experiments.

8. Resource Utilization Tracking

The track_resource_utilization method is a placeholder for tracking and reporting resource utilization, such as CPU and memory usage. It can be customized to suit specific monitoring needs.

9. Conclusion

The Language Model Interface (BaseLLM) is a versatile framework for working with language models. Whether you're using pre-trained models or developing your own, this interface provides a consistent and extensible foundation. By following the provided guidelines and examples, you can integrate and customize language models for various natural language processing tasks.