Developer Notes#

Learn the why and how-to (customize and integrate) behind each core part within the AdalFlow library. These are our most important tutorials before you move ahead to build your use cases end to end.

LLM application is no different from a mode training/eval workflow — LLM application is no different from a mode training/evaluation workflow#

The AdalFlow library focuses on providing building blocks for developers to build and optimize the task pipeline. We have a clear Design Philosophy, which results in this Class Hierarchy.

Introduction#

Component is to LLM task pipelines what nn.Module is to PyTorch models. An LLM task pipeline in AdalFlow mainly consists of components, such as a Prompt, ModelClient, Generator, Retriever, Agent, or any other custom components. This pipeline can be Sequential or a Directed Acyclic Graph (DAG) of components. A Prompt will work with DataClass to ease data interaction with the LLM model. A Retriever will work with databases to retrieve context and overcome the hallucination and knowledge limitations of LLM, following the paradigm of Retrieval-Augmented Generation (RAG). An Agent will work with tools and an LLM planner for enhanced ability to reason, plan, and act on real-world tasks.

Additionally, what shines in AdalFlow is that all orchestrator components, like Retriever, Embedder, Generator, and Agent, are model-agnostic. You can easily make each component work with different models from different providers by switching out the ModelClient and its model_kwargs.

We will introduce the library starting from the core base classes, then move to the RAG essentials, and finally to the agent essentials. With these building blocks, we will further introduce optimizing, where the optimizer uses building blocks such as Generator for auto-prompting and retriever for dynamic few-shot in-context learning (ICL).

Building#

Base classes#

Code path: adalflow.core.

Base Class	Description
Component	The building block for task pipeline. It standardizes the interface of all components with call, acall, and __call__ methods, handles state serialization, nested components, and parameters for optimization. Components can be easily chained together via `Sequential`.
DataClass	The base class for data. It eases the data interaction with LLMs for both prompt formatting and output parsing.

RAG Essentials#

RAG components#

Code path: adalflow.core. For abstract classes:

ModelClient: the functional subclass is in adalflow.components.model_client.
Retriever: the functional subclass is in adalflow.components.retriever.

Part	Description
Prompt	Built on jinja2, it programmatically and flexibly formats prompts as input to the generator.
ModelClient	The standard protocol to intergrate LLMs, Embedding models, ranking models, etc into respective orchestrator components, either via APIs or local to reach to model agnostic.
Generator	The orchestrator for LLM prediction. It streamlines three components: ModelClient, Prompt, and output_processors and works with optimizer for prompt optimization.
Parser	The interpreter of the LLM output. The component that parses the output string to structured data.
Embedder	The component that orchestrates model client (Embedding models in particular) and output processors.
Retriever	The base class for all retrievers, which in particular retrieve relevant documents from a given database to add context to the generator.

Data Pipeline and Storage#

Data Processing: including transformer, pipeline, and storage. Code path: adalflow.components.data_process, adalflow.core.db, and adalflow.database. Components work on a sequence of Document and return a sequence of Document.

Part	Description
Text Splitter	To split long text into smaller chunks to fit into the token limits of embedder and generator or to ensure more relevant context while being used in RAG.
Data (Database/Pipeline)	Understanding the data modeling, processing, and storage as a whole. We will build a chatbot with enhanced memory and memoy retrieval in this note (RAG).

Agent Essentials#

Agent in components.agent is LLM great with reasoning, planning, and using tools to interact and accomplish tasks.

Part	Description
Function calls	Provide tools (function calls) to interact with the generator.
Agent	The ReactAgent.

Optimization#

AdalFlow auto-optimization provides a powerful and unified framework to optimize every single part of the prompt: (1) instruction, (2) few-shot examples, and (3) the prompt template, for any task pipeline you have just built. We leverage all SOTA prompt optimization from Dspy, Text-grad, ORPO, to our own research in the library.

The optimization requires users to have at least one dataset, an evaluator, and define optimizor to use. This section we will briefly cover the datasets and evaluation metrics supported in the library.

Evaluation#

You can not optimize what you can not meature. In this section, we provide a general guide to the evaluation datasets, metrics, and methods to productionize your LLM tasks and to publish your research.

Part	Description
LLM Evaluation	A quick guide to the evaluation datasets, metrics, and methods.
Datasets	How to load and use the datasets in the library.

Training#

Code path: adalflow.optim.

Adalflow defines four important classes for auto-optimization: (1) Parameter, similar to role of nn.Tensor in PyTorch, (2) Optimizer wh, (3) AdalComponent to define the training and validation steps, and (4) Trainer to run the training and validation steps on either data loaders or datasets.

We will first introduce these classes, from their design to important features each class provides.

Classes#

Note: Documentation is work in progress for this section.

Part	Description
parameter_	The Parameter class stores the text, textual gradidents(feedback), and manage the states and applies the backpropagation in auto-diff.
optimizer_	The Optimizer to define a structure and to manage propose, revert, and step methods. We defined two variants: DemoOptimizer and TextOptimizer to cover the prompt optimization and the few-shot optimization.
few_shot_optimizer_	Subclassed from `DemoOptimizer`, the few-shot optimizer to optimize the few-shot in-context learning.
auto_text_grad_	Subclassed from `TextOptimizer`, Auto textual gradient for prompt optimization. It is the most capable and general optimizer in the library to optimize instructions or generator output.
adalcomponent_	The `intepreter` between task pipeline and the trainer, defining train, validate steps, optimizers, evaluator, loss function, and backward engine.
trainer_	The `Trainer` will take the `AdalComponent` and run the training and validation steps on either data loaders or datasets.

Logging & Tracing#

Code path: adalflow.utils and adalflow.tracing.

Part	Description
Logging	AdalFlow uses native `logging` module as the first line of debugging tooling. We made the effort to help you set it up easily.
Tracing	We provide two tracing methods to help you develop and improve the Generator: 1. Trace the history change(states) on prompt during your development process. 2. Trace all failed LLM predictions in a unified file for further improvement.