Optimization#
Base Classes and Data Structures#
The GradComponent
and LossComponent
are a subclass from Component
to serve the purpose to differentiate the gradient and loss components in the optimization process.
And it will be used if users want to implement their own with more customization.
Parameter is used by Optimizer, Trainers, AdalComponent to auto-optimizations |
|
Base Classes for AdalFlow Optimizers, including Optimizer, TextOptimizer, and DemoOptimizer. |
|
Base class for Autograd Components that can be called and backpropagated through. |
|
Base class for Autograd Components that can be called and backpropagated through. |
|
All data types used by Parameter, Optimizer, AdalComponent, and Trainer. |
Few Shot Optimizer#
Adapted and optimized boostrap fewshot optimizer: |
Textual Gradient#
Implementation of TextGrad: Automatic “Differentiation” via Text |
|
Adapted from text_grad's String Based Function |
|
Text-grad operations such as Sum and Aggregate. |
|
Text-grad optimizer and prompts. |
Trainer and AdalComponent#
AdalComponent provides an interface to compose different parts, from eval_fn, train_step, loss_step, optimizers, backward engine, teacher generator, etc to work with Trainer. |
|
Ready to use trainer for LLM task pipeline |
Overview#
- class Optimizer[source]#
Bases:
object
Base class for all optimizers.
- proposing: bool = False#
- params: Iterable[Parameter] | Iterable[Dict[str, Any]]#
- class RandomSampler(dataset: Sequence[T_co] | None = None, default_num_shots: int | None = None)[source]#
Bases:
Sampler
,Generic
[T_co
]Simple random sampler to sample from the dataset.
- random_replace(shots: int, samples: List[Sample[T_co]], replace: bool | None = False) List[Sample[T_co]] [source]#
Randomly replace num of shots in the samples.
If replace is True, it will skip duplicate checks
- class ClassSampler(dataset: Sequence[T_co], num_classes: int, get_data_key_fun: Callable, default_num_shots: int | None = None)[source]#
Bases:
Sampler
,Generic
[T_co
]Sample from the dataset based on the class labels.
T_co can be any type of data, e.g., dict, list, etc. with get_data_key_fun to extract the class label.
Example: Initialize
` dataset = [{"coarse_label": i} for i in range(10)] sampler = ClassSampler[Dict](dataset, num_classes=6, get_data_key_fun=lambda x: x["coarse_label"]) `
- random_replace(shots: int, samples: List[Sample], replace: bool | None = False, weights_per_class: List[float] | None = None) Sequence[Sample[T_co]] [source]#
Randomly select num shots from the samples and replace it with another sample has the same class index
- class Sampler(*args, **kwargs)[source]#
Bases:
Generic
[T_co
]- dataset: Sequence[object] = None#
- class Parameter(*, id: str | None = None, data: ~optim.parameter.T = None, requires_opt: bool = True, role_desc: str = '', param_type: ~adalflow.optim.types.ParameterType = <ParameterType.NONE: none, ''>, name: str = None, gradient_prompt: str = None, raw_response: str = None, instruction_to_optimizer: str = None, instruction_to_backward_engine: str = None, score: float | None = None, eval_input: object = None, from_response_id: str | None = None, successor_map_fn: ~typing.Dict[str, ~typing.Callable] | None = None)[source]#
Bases:
Generic
[T
]A data container to represent a parameter used for optimization.
A parameter enforce a specific data type and can be updated in-place. When parameters are used in a component - when they are assigned as Component attributes they are automatically added to the list of its parameters, and will appear in the
parameters()
ornamed_parameters()
method.Args:
End users only need to create the Parameter with four arguments and pass it to the prompt_kwargs in the Generator.
data (str): the data of the parameter
requires_opt (bool, optional): if the parameter requires optimization. Default: True
role_desc.
param_type, incuding ParameterType.PROMPT for instruction optimization, ParameterType.DEMOS
for few-shot optimization. - instruction_to_optimizer (str, optional): instruction to the optimizer. Default: None - instruction_to_backward_engine (str, optional): instruction to the backward engine. Default: None
The parameter users created will be automatically assigned to the variable_name/key in the prompt_kwargs for easy reading and debugging in the trace_graph.
References:
- proposing: bool = False#
- input_args: Dict[str, Any] = None#
- full_response: object = None#
- backward_engine_disabled: bool = False#
- id: str = None#
- role_desc: str = ''#
- name: str = None#
- param_type: ParameterType#
- data: T = None#
- eval_input: object = None#
- from_response_id: str = None#
- successor_map_fn: Dict[str, Callable] = None#
- map_to_successor(successor: object) T [source]#
Apply the map function to the successor based on the successor’s id.
- add_successor_map_fn(successor: object, map_fn: Callable)[source]#
Add or update a map function for a specific successor using its id.
- trace_forward_pass(input_args: Dict[str, Any], full_response: object)[source]#
Trace the forward pass of the parameter.
- add_to_trace(trace: DataClass, is_teacher: bool = True)[source]#
Called by the generator.forward to add a trace to the parameter.
It is important to allow updating to the trace, as this will give different sampling weight. If the score increases as the training going on, it will become less likely to be sampled, allowing the samples to be more diverse. Or else, it will keep sampling failed examples.
- add_score_to_trace(trace_id: str, score: float, is_teacher: bool = True)[source]#
Called by the generator.backward to add the eval score to the trace.
- propose_data(data: T, demos: List[DataClass] | None = None)[source]#
Used by optimizer to put the new data, and save the previous data in case of revert.
- step_data(include_demos: bool = False)[source]#
Use PyTorch’s optimizer syntax to finalize the update of the data.
- update_value(data: T)[source]#
Update the parameter’s value in-place, checking for type correctness.
- get_gradient_and_context_text() str [source]#
Aggregates and returns: 1. the gradients 2. the context text for which the gradients are computed
- get_short_value(n_words_offset: int = 10) str [source]#
Returns a short version of the value of the variable. We sometimes use it during optimization, when we want to see the value of the variable, but don’t want to see the entire value. This is sometimes to save tokens, sometimes to reduce repeating very long variables, such as code or solutions to hard problems. :param n_words_offset: The number of words to show from the beginning and the end of the value. :type n_words_offset: int
- static trace_graph(root: Parameter) Tuple[Set[Parameter], Set[Tuple[Parameter, Parameter]]] [source]#
- draw_graph(add_grads: bool = True, format: Literal['png', 'svg'] = 'png', rankdir: Literal['LR', 'TB'] = 'TB', filepath: str | None = None)[source]#
Draw the graph of the parameter and its gradients.
- Parameters:
add_grads (bool, optional) – Whether to add gradients to the graph. Defaults to True.
format (str, optional) – The format of the output file. Defaults to “png”.
rankdir (str, optional) – The direction of the graph. Defaults to “TB”.
filepath (str, optional) – The path to save the graph. Defaults to None.
- class BackwardContext(backward_fn: Callable, backward_engine: BackwardEngine = None, *args, **kwargs)[source]#
Bases:
object
Represents a context for backward computation.
- Parameters:
backward_fn (callable) – The backward function to be called during backward computation.
args – Variable length argument list to be passed to the backward function.
kwargs – Arbitrary keyword arguments to be passed to the backward function.
- Variables:
backward_fn (callable) – The backward function to be called during backward computation.
fn_name (str) – The fully qualified name of the backward function.
args – Variable length argument list to be passed to the backward function.
kwargs – Arbitrary keyword arguments to be passed to the backward function.
- Method __call__(backward_engine:
EngineLM) -> Any: Calls the backward function with the given backward engine and returns the result.
- Method __repr__() -> str:
Returns a string representation of the BackwardContext object.
- class BootstrapFewShot(params: List[Parameter], raw_shots: int | None = None, bootstrap_shots: int | None = None, dataset: List[DataClass] | None = None, weighted: bool = True, exclude_input_fields_from_bootstrap_demos: bool = False)[source]#
Bases:
DemoOptimizer
BootstrapFewShot performs few-shot sampling used in few-shot ICL.
It will be used to optimize paramters of demos. Based on research from AdalFlow team and DsPy library.
- Compared with Dspy’s version:
we added weighted sampling for both the raw and augmented demos to prioritize failed demos but successful in augmented demos based on the evaluation score while we backpropagate the demo samples.
In default, we exclude the input fields from the augmented demos. Our reserch finds that using the reasoning demostrations from teacher model can be more effective in some cases than taking both inputs and output samples and be more token efficient.
Reference: - DsPy: Com-piling declarative language model calls into state-of-the-art pipelines.
- config_shots(raw_shots: int, bootstrap_shots: int)[source]#
Initialize the samples for each parameter.
- property num_shots: int#
- sample(augmented_demos: Dict[str, DataClass], demos: Dict[str, DataClass], dataset: List[DataClass], raw_shots: int, bootstrap_shots: int, weighted: bool = True)[source]#
Performs weighted sampling, ensure the score is in range [0, 1]. The higher score means better accuracy.
- class TGDOptimizer(params: Iterable[Parameter] | Iterable[Dict[str, Any]], model_client: ModelClient, model_kwargs: Dict[str, object] = {}, constraints: List[str] = None, optimizer_system_prompt: str = '\nYou are part of an optimization system that refines existing variable values based on feedback.\n\nYour task: Propose a new variable value in response to the feedback.\n1. Address the concerns raised in the feedback while preserving positive aspects.\n2. Observe past performance patterns when provided and to keep the good quality.\n3. Consider the variable in the context of its peers if provided.\n FYI:\n - If a peer will be optimized itself, do not overlap with its scope.\n - Otherwise, you can overlap if it is necessary to address the feedback.\n\nOutput:\nProvide only the new variable value between {{new_variable_start_tag}} and {{new_variable_end_tag}} tags.\n\nTips:\n1. Eliminate unnecessary words or phrases.\n2. Add new elements to address specific feedback.\n3. Be creative and present the variable differently.\n{% if instruction_to_optimizer %}\n4. {{instruction_to_optimizer}}\n{% endif %}\n', in_context_examples: List[str] = None, num_gradient_memory: int = 0, max_past_history: int = 3)[source]#
Bases:
TextOptimizer
Textual Gradient Descent(LLM) optimizer for text-based variables.
- proposing: bool = False#
- params_history: Dict[str, List[HistoryPrompt]] = {}#
- params: Iterable[Parameter] | Iterable[Dict[str, Any]]#
- constraints: List[str]#
- property constraint_text#
Returns a formatted string representation of the constraints.
- Returns:
A string containing the constraints in the format “Constraint {index}: {constraint}”.
- Return type:
str
- add_history(param_id: str, history: HistoryPrompt)[source]#
- class EvalFnToTextLoss(eval_fn: Callable | BaseEvaluator, eval_fn_desc: str, backward_engine: BackwardEngine | None = None, model_client: ModelClient = None, model_kwargs: Dict[str, object] = None)[source]#
Bases:
LossComponent
Convert an evaluation function to a text loss.
LossComponent will take an eval function and output a score (usually a float in range [0, 1], and the higher the better, unlike the loss function in model training).
In math:
score/loss = eval_fn(y_pred, y_gt)
The gradident/feedback = d(score)/d(y_pred) will be computed using a backward engine. Gradient_context = GradientContext(
context=conversation_str, response_desc=response.role_desc, variable_desc=role_desc,
)
- Parameters:
eval_fn – The evaluation function that takes a pair of y and y_gt and returns a score.
eval_fn_desc – Description of the evaluation function.
backward_engine – The backward engine to use for the text prompt optimization.
model_client – The model client to use for the backward engine if backward_engine is not provided.
model_kwargs – The model kwargs to use for the backward engine if backward_engine is not provided.
- forward(kwargs: Dict[str, Parameter], response_desc: str = None, metadata: Dict[str, str] = None) Parameter [source]#
Default just wraps the call method.
- set_backward_engine(backward_engine: BackwardEngine = None, model_client: ModelClient = None, model_kwargs: Dict[str, object] = None)[source]#
- backward(response: Parameter, eval_fn_desc: str, kwargs: Dict[str, Parameter], backward_engine: BackwardEngine | None = None, metadata: Dict[str, str] = None)[source]#
Ensure to set backward_engine for the text prompt optimization. It can be None if you are only doing demo optimization and it will not have gradients but simply backpropagate the score.
- class LLMAsTextLoss(prompt_kwargs: Dict[str, str | Parameter], model_client: ModelClient, model_kwargs: Dict[str, object])[source]#
Bases:
LossComponent
Evaluate the final RAG response using an LLM judge.
The LLM judge will have: - eval_system_prompt: The system prompt to evaluate the response. - y_hat: The response to evaluate. - Optional: y: The correct response to compare against.
The loss will be a Parameter with the evaluation result and can be used to compute gradients. This loss use LLM/Generator as the computation/transformation operator, so it’s gradient will be found from the Generator’s backward method.
- class Trainer(adaltask: AdalComponent, optimization_order: Literal['sequential', 'mix'] = 'sequential', strategy: Literal['random', 'constrained'] = 'constrained', max_steps: int = 1000, train_batch_size: int | None = 4, num_workers: int = 4, ckpt_path: str = None, batch_val_score_threshold: float | None = 1.0, max_error_samples: int | None = 4, max_correct_samples: int | None = 4, max_proposals_per_step: int = 5, train_loader: Any | None = None, train_dataset: Any | None = None, val_dataset: Any | None = None, test_dataset: Any | None = None, raw_shots: int | None = None, bootstrap_shots: int | None = None, weighted_sampling: bool = False, exclude_input_fields_from_bootstrap_demos: bool = False, debug: bool = False, save_traces: bool = False, *args, **kwargs)[source]#
Bases:
Component
Ready to use trainer for LLM task pipeline to optimize all types of parameters.
Training set: can be used for passing initial proposed prompt or for few-shot sampling. Validation set: Will be used to select the final prompt or samples. Test set: Will be used to evaluate the final prompt or samples.
- Parameters:
adaltask – AdalComponent: AdalComponent instance
strategy – Literal[“random”, “constrained”]: Strategy to use for the optimizer
max_steps – int: Maximum number of steps to run the optimizer
num_workers – int: Number of workers to use for parallel processing
ckpt_path – str: Path to save the checkpoint files, default to ~/.adalflow/ckpt.
batch_val_score_threshold – Optional[float]: Threshold for skipping a batch
max_error_samples – Optional[int]: Maximum number of error samples to keep
max_correct_samples – Optional[int]: Maximum number of correct samples to keep
max_proposals_per_step – int: Maximum number of proposals to generate per step
train_loader – Any: DataLoader instance for training
train_dataset – Any: Training dataset
val_dataset – Any: Validation dataset
test_dataset – Any: Test dataset
few_shots_config – Optional[FewShotConfig]: Few shot configuration
save_traces – bool: Save traces for for synthetic data generation or debugging
debug (and for demo) – bool: Debug mode to run the trainer in debug mode. If debug is True, for text debug, the graph will be under /ckpt/YourAdalComponentName/debug_text_grads for prompt parameter,
debug
parameters. (the graph will be under /ckpt/YourAdalComponentName/debug_demos for demo)
Note
When you are in the debug mode, you can use get_logger api to show more detailed log on your own.
Example:
from adalflow.utils import get_logger
get_logger(level=”DEBUG”)
- optimizer: Optimizer = None#
- ckpt_file: str | None = None#
- optimization_order: Literal['sequential', 'mix'] = 'sequential'#
- strategy: Literal['random', 'constrained']#
- max_steps: int#
- ckpt_path: str | None = None#
- adaltask: AdalComponent#
- num_workers: int = 4#
- train_loader: Any#
- val_dataset = None#
- test_dataset = None#
- batch_val_score_threshold: float | None = 1.0#
- max_error_samples: int | None = 8#
- max_correct_samples: int | None = 8#
- max_proposals_per_step: int = 5#
- train_batch_size: int | None = 4#
- debug: bool = False#
- diagnose(dataset: Any, split: str = 'train')[source]#
Run an evaluation on the trainset to track all error response, and its raw response using AdaplComponent’s default configure_callbacks :param dataset: Any: Dataset to evaluate :param split: str: Split name, default to train and it is also used as set the directory name for saving the logs
Example:
trainset, valset, testset = load_datasets(max_samples=10) adaltask = TGDWithEvalFnLoss( task_model_config=llama3_model, backward_engine_model_config=llama3_model, optimizer_model_config=llama3_model, ) trainer = Trainer(adaltask=adaltask) diagnose = trainer.diagnose(dataset=trainset) print(diagnose)
- debug_report(text_grad_debug_path: str | None = None, few_shot_demo_debug_path: str | None = None)[source]#
- fit(*, adaltask: AdalComponent | None = None, train_loader: Any | None = None, train_dataset: Any | None = None, val_dataset: Any | None = None, test_dataset: Any | None = None, debug: bool = False, save_traces: bool = False, raw_shots: int | None = None, bootstrap_shots: int | None = None, resume_from_ckpt: str | None = None)[source]#
train_loader: An iterable or collection of iterables specifying training samples.
- prep_ckpt_file_path(trainer_state: Dict[str, Any] = None)[source]#
Prepare the checkpoint root path: ~/.adalflow/ckpt/task_name/.
It also generates a unique checkpoint file name based on the strategy, max_steps, and a unique hash key. For multiple runs but with the same adalcomponent + trainer setup, the run number will be incremented.
- class AdalComponent(task: Component, eval_fn: Callable | None = None, loss_fn: LossComponent | None = None, backward_engine: BackwardEngine | None = None, backward_engine_model_config: Dict | None = None, teacher_model_config: Dict | None = None, text_optimizer_model_config: Dict | None = None, *args, **kwargs)[source]#
Bases:
Component
Define a train, eval, and test step for a task pipeline.
This serves the following purposes: 1. Organize all parts for training a task pipeline in one place. 2. Help with debugging and testing before the actual training. 3. Adds multi-threading support for training and evaluation.
- task: Component#
- eval_fn: Callable | None#
- loss_fn: LossComponent | None#
- backward_engine: BackwardEngine | None#
- prepare_task(sample: Any, *args, **kwargs) Tuple[Callable, Dict] [source]#
Tell Trainer how to call the task in both training and inference mode.
Return a task call and kwargs for one training sample.
If you just need to eval, ensure the Callable has the inference mode. If you need to also train, ensure the Callable has the training mode which returns a Parameter and mainly call forward for all subcomponents within the task.
Example:
def prepare_task(self, sample: Any, *args, **kwargs) -> Tuple[Callable, Dict]: return self.task, {"x": sample.x}
- prepare_loss(sample: Any, y_pred: Parameter, *args, **kwargs) Tuple[Callable, Dict] [source]#
Tell Trainer how to calculate the loss in the training mode.
Return a loss call and kwargs for one loss sample.
Need to ensure y_pred is a Parameter, and the real input to use for y_gt and y_pred is eval_input. Make sure it is setup.
Example:
# "y" and "y_gt" are arguments needed #by the eval_fn inside of the loss_fn if it is a EvalFnToTextLoss def prepare_loss(self, sample: Example, pred: adal.Parameter) -> Dict: # prepare gt parameter y_gt = adal.Parameter( name="y_gt", data=sample.answer, eval_input=sample.answer, requires_opt=False, ) # pred's full_response is the output of the task pipeline which is GeneratorOutput pred.eval_input = pred.full_response.data return self.loss_fn, {"kwargs": {"y": y_gt, "y_pred": pred}}
- prepare_eval(sample: Any, y_pred: Any, *args, **kwargs) float [source]#
Tell Trainer how to eval in inference mode. Return the eval_fn and kwargs for one evaluation sample.
Ensure the eval_fn is a callable that takes the predicted output and the ground truth output. Ensure the kwargs are setup correctly.
- configure_optimizers(*args, **kwargs) List[Optimizer] [source]#
Note: When you use text optimizor, ensure you call configure_backward_engine_engine too.
- configure_backward_engine(*args, **kwargs)[source]#
Configure a backward engine for all generators in the task for bootstrapping examples.
- evaluate_samples(samples: Any, y_preds: List, metadata: Dict[str, Any] | None = None, num_workers: int = 2) EvaluationResult [source]#
Run evaluation on samples using parallel processing. Utilizes
prepare_eval
defined by the user.Metadata is used for storing context that you can find from generator input.
- Parameters:
samples (Any) – The input samples to evaluate.
y_preds (List) – The predicted outputs corresponding to each sample.
metadata (Optional[Dict[str, Any]]) – Optional metadata dictionary.
num_workers (int) – Number of worker threads for parallel processing.
- Returns:
An object containing the average score and per-item scores.
- Return type:
- pred_step(batch, batch_idx, num_workers: int = 2, running_eval: bool = False, min_score: float | None = None)[source]#
Applies to both train and eval mode.
If you require self.task.train() to be called before training, you can override this method as:
def train_step(self, batch, batch_idx, num_workers: int = 2) -> List: self.task.train() return super().train_step(batch, batch_idx, num_workers)
- validate_condition(steps: int, total_steps: int) bool [source]#
In default, trainer will validate at every step.
- validation_step(batch, batch_idx, num_workers: int = 2, minimum_score: float | None = None) EvaluationResult [source]#
If you require self.task.eval() to be called before validation, you can override this method as:
def validation_step(self, batch, batch_idx, num_workers: int = 2) -> List: self.task.eval() return super().validation_step(batch, batch_idx, num_workers)
- loss_step(batch, y_preds: List[Parameter], batch_idx, num_workers: int = 2) List[Parameter] [source]#
Calculate the loss for the batch.
- configure_teacher_generator()[source]#
Configure a teach generator for all generators in the task for bootstrapping examples.
You can call configure_teacher_generator_helper to easily configure it by passing the model_client and model_kwargs.
- configure_teacher_generator_helper(model_client: ModelClient, model_kwargs: Dict[str, Any], template: str | None = None)[source]#
Configure a teach generator for all generators in the task for bootstrapping examples.
- configure_backward_engine_helper(model_client: ModelClient, model_kwargs: Dict[str, Any], template: str | None = None)[source]#
Configure a backward engine for all generators in the task for bootstrapping examples.
- configure_callbacks(save_dir: str | None = 'traces', *args, **kwargs)[source]#
In default we config the failure generator callback. User can overwrite this method to add more callbacks.
- run_one_task_sample(sample: Any) Any [source]#
Run one training sample. Used for debugging and testing.
- run_one_loss_sample(sample: Any, y_pred: Any) Any [source]#
Run one loss sample. Used for debugging and testing.
- configure_demo_optimizer_helper() List[DemoOptimizer] [source]#
One demo optimizer can handle multiple demo parameters. But the demo optimizer will only have one dataset (trainset) configured by the Trainer.
If users want to use different trainset for different demo optimizer, they can configure it by themselves.
- configure_text_optimizer_helper(model_client: ModelClient, model_kwargs: Dict[str, Any]) List[TextOptimizer] [source]#
One text optimizer can handle multiple text parameters.
- class DemoOptimizer(weighted: bool = True, dataset: Sequence[DataClass] = None, exclude_input_fields_from_bootstrap_demos: bool = False, *args, **kwargs)[source]#
Bases:
Optimizer
Base class for all demo optimizers.
Demo optimizer are few-shot optimization, where it will sample raw examples from train dataset or bootstrap examples from the model’s output. It will work with a sampler to generate new values for a given text prompt.
If bootstrap is used, it will require a teacher genearator to generate the examples.
- dataset: Sequence[DataClass]#
- exclude_input_fields_from_bootstrap_demos: bool = False#
- class TextOptimizer(*args, **kwargs)[source]#
Bases:
Optimizer
Base class for all text optimizers.
Text optimizer is via textual gradient descent, which is a variant of gradient descent that optimizes the text directly. It will generate new values for a given text prompt.This includes: - System prompt - output format - prompt template