tgd_optimizer

Text-grad optimizer and prompts. Also combined methods from ORPO llm optimizer.

With the auto-diff gradients, it made it possible to optimize any prompt parameter in a task pipeline.

https://arxiv.org/abs/2309.03409 Source code: https://github.com/google-deepmind/opro

Functions

Classes

CustomizedXMLParser()

Custom XML parser for TGD optimizer output with reasoning, method, and proposed_variable fields.

HistoryPrompt(id, value, eval_score[, ...])

Instruction(text, score[, responses, gts])

Structure variable values for instructions.

TGDData(reasoning, method[, proposed_variable])

TGDOptimizer(params, model_client[, ...])

Textual Gradient Descent(LLM) optimizer for text-based variables.

TGDOptimizerTrace([api_kwargs, output])

class HistoryPrompt(id: str, value: str, eval_score: float, method: str = None, reasoning: str = None)[source]

Bases: DataClass

id: str
value: str
eval_score: float
method: str = None
reasoning: str = None
class Instruction(text: str, score: float, responses: List[str] | None = None, gts: List[str] | None = None)[source]

Bases: DataClass

Structure variable values for instructions. Can be used in the history of instructions.

text: str
score: float
responses: List[str] | None = None
gts: List[str] | None = None
class TGDData(reasoning: str, method: str, proposed_variable: str = None)[source]

Bases: DataClass

reasoning: str
method: str
proposed_variable: str = None
class TGDOptimizerTrace(api_kwargs: Dict[str, Any] = None, output: optim.text_grad.tgd_optimizer.TGDData = None)[source]

Bases: DataClass

api_kwargs: Dict[str, Any] = None
output: TGDData = None
class CustomizedXMLParser[source]

Bases: DataComponent

Custom XML parser for TGD optimizer output with reasoning, method, and proposed_variable fields.

get_output_format_str() str[source]
call(input: str) TGDData[source]

Parse the XML response and extract the three fields, returning TGDData directly.

extract_new_variable(text: str) str[source]
class TGDOptimizer(params: Iterable[Parameter] | Iterable[Dict[str, Any]], model_client: ModelClient, model_kwargs: Dict[str, object] = {}, constraints: List[str] = None, optimizer_system_prompt: str = 'You are an excellent prompt engineer tasked with instruction and demonstration tuning a compound LLM system.\nYour task is to refine a variable/prompt based on feedback from a batch of input data points.\n\nThe variable is either input or output of a functional component where the component schema will be provided.\nIf the same DataID has multiple gradients, it means this component/variable is called multiple times in the compound system(with a cycle) in the same order as it appears in the gradient list.\n\nYou Must edit the current variable with one of the following editing methods.\nYou can not rewrite everything all at once:\n\nYou have Four Editing Methods:\n1. ADD new elements(instruction) to address each specific feedback.\n2. ADD Examples (e.g., input-reasoning-answer) for tasks that require strong reasoning skills.\n3. Rephrase existing instruction(for more clarity), Replace existing sample with another, to address the feedback.\n4. DELETE unnecessary words to improve clarity.\n\nThese SIX prompting techniques can be a helpful direction.\n1. Set Context and Role: Establish a specific identity or domain expertise for the AI to guide style, knowledge, and constraints.\n2. Be Specific, Clear, and Grammarly correct: Clearly define instructions, desired format, and constraints to ensure accurate and relevant outputs with regards to the feedback.\n3. Illicit reasoning: "chain-of-thought" (e.g. "think step by step") helps the model reason better.\n4. Examples: Construct examples(e.g., input(optional)-reasoning(required)-answer) especially for tasks that require strong reasoning skills.\n5. Leverage Constraints and Formatting: Explicitly direct how the answer should be structured (e.g., bullet points, tables, or tone).\n6. Self-Consistency / Verification Prompts: Prompt the model to check its own logic for errors, inconsistencies, or missing details.\n\nYour final action/reasoning  = one of FOUR editing method + one of SIX prompting technique.\n\nYou must stick to these instructions:\n1. **MUST Resolve concerns raised in the feedback** while preserving the positive aspects of the original variable.\n2. **Observe past performance patterns** to retain good qualities in the variable and past failed ones to try things differently.\n3. **System Awareness**: When other system variables are given, ensure you understand how this variable works in the whole system.\n4. **Peer Awareness**: This variable works together with Peer variables, ensure you are aware of their roles and constraints.\n5. **Batch Awareness**: You are optimizing a batch of input data, ensure the change applys to the whole batch (except while using demonstration.)\n\n{{output_format_str}}\n\n{% if instruction_to_optimizer %}\n**Additional User Instructions**: {{instruction_to_optimizer}}\n{% endif %}\n', in_context_examples: List[str] = None, max_past_history: int = 3, max_failed_proposals: int = 5, steps_from_last_improvement: int = 0, one_parameter_at_a_time: bool = False)[source]

Bases: TextOptimizer

Textual Gradient Descent(LLM) optimizer for text-based variables.

proposing: bool = False
params_history: Dict[str, List[HistoryPrompt]] = {}
failed_proposals: Dict[str, List[HistoryPrompt]] = {}
current_tgd_output: Dict[str, TGDData | None] = {}
params: Iterable[Parameter] | Iterable[Dict[str, Any]]
constraints: List[str]
one_parameter_at_a_time: bool
property constraint_text

Returns a formatted string representation of the constraints.

Returns:

A string containing the constraints in the format “Constraint {index}: {constraint}”.

Return type:

str

increment_steps_from_last_improvement()[source]
reset_steps_from_last_improvement()[source]
add_score_to_params(val_score: float)[source]
add_score_to_current_param(param_id: str, param: Parameter, score: float)[source]
add_history(param_id: str, history: HistoryPrompt)[source]
render_history(param_id: str) List[str][source]

Render history for the optimizer prompt.

Selects top max_past_history prompts by their average score across all evaluations (from trainer’s multi-minibatch tracking).

Returns:

List of YAML strings for the top prompts

add_failed_proposal()[source]

Save a copy of the current value of the parameter in the failed proposals.

render_failed_proposals(param_id: str) List[str][source]
update_gradient_memory(param: Parameter)[source]
zero_grad()[source]

Clear all the gradients of the parameters.

set_target_param()[source]
propose()[source]

Proposing a value while keeping previous value saved on parameter.

revert()[source]

Revert to the previous value when the evaluation is worse.

step()[source]

Discard the previous value and keep the proposed value.

gumbel_top_k(scores, k, *, probs=False, seed=None, temperature=1.0, noise_scale=1.0, counts=None, ucb_beta=0.0)[source]

Gumbel Top-k sampling with balanced exploration-exploitation.

Parameters:
  • scores – list/1D array. If probs=False, treated as logits; if probs=True, treated as probabilities.

  • k – number of indices to sample (k <= len(scores)).

  • probs – True if scores are probabilities.

  • seed – optional RNG seed.

  • temperature (float) – temperature scaling. T<1 amplifies differences; T>1 increases randomness.

  • noise_scale (float) – scale of Gumbel noise. 0 disables stochastic exploration.

  • counts (list/array or None) – evaluation counts n_i per item (for optional UCB bonus).

  • ucb_beta (float) – >0 to enable a lightweight UCB bonus: β * sqrt(log(N+1)/(n_i+1)).

Returns:

indices of the top-k (descending by perturbed score).

Return type:

List[int]

generate_top_k_scoring_function(batch_val: List[float], batch_val_acc_list: List[List[int]], window: int | None = None, k: int = 5, epsilon_within: float = 0.0, beta: float = 1.0) List[int][source]

Generate top-K indices using Gumbel-Max sampling.

This implements Softmax Acquisition via Gumbel-Max: - Add Gumbel noise to historical scores - Select top-K by perturbed scores

Based on: https://arxiv.org/pdf/2106.12059

Parameters:
  • batch_val – List of average validation scores (percentages) for each historical prompt. These are the average accuracies across multiple mini-batch evaluations.

  • batch_val_acc_list – List of lists containing individual success/fail records. Not used in this implementation but kept for compatibility.

  • window – Optional window size (not used, kept for compatibility)

  • k – Number of top prompts to select

  • epsilon_within – Epsilon for within-batch exploration (not used)

  • beta – Temperature parameter for Gumbel distribution (not used, default 1.0)

Returns:

List of indices (with size <= k) in descending order by Gumbel values

top_k_selected_prompts(batch_val: List[float], batch_val_acc_list: List[List[int]], k: int | None = None)[source]

Select top-K prompts using Gumbel-Top-K sampling.

This is the main entry point for Gumbel-based prompt selection. It’s called during the optimization loop to probabilistically select promising historical prompts for refinement.

Parameters:
  • batch_val – List of average validation scores (percentages) for each historical prompt. The list index corresponds to the prompt iteration number.

  • batch_val_acc_list – List of lists containing individual success/fail records for each prompt.

  • k – Number of top prompts to select (defaults to self.max_past_history if available, otherwise 3)

Returns:

  • selected_prompts: List of selected prompt strings

  • selected_indices: List of selected prompt indices

  • selected_metadata: Optional metadata (None for base implementation)

Return type:

Tuple of

to_dict()[source]