base_data_class#

A base class that provides an easy way for data to interact with LLMs.

Functions

check_adal_dataclass(data_class)

Check if the provided class is a valid dataclass for the AdalFlow framework.

required_field()

A factory function to create a required field in a dataclass.

Classes

DataClass()

The base data class for all data types that interact with LLMs.

DataClassFormatType(value[, names, module, ...])

The format type for the DataClass schema.

DynamicDataClassFactory()

Constants

ExcludeType#

alias of List[str] | Dict[str, List[str]] | None

IncludeType#

alias of List[str] | Dict[str, List[str]] | None

class DataClass[source]#

Bases: object

The base data class for all data types that interact with LLMs.

Please only exclude optional fields in the exclude dictionary.

Designed to streamline the handling, serialization, and description of data within our applications, especially to LLM prompt. We explicitly handle this instead of relying on 3rd party libraries such as pydantic or marshmallow to have better transparency and to keep the order of the fields when get serialized.

How to create your own dataclass?

  1. Subclass DataClass and define the fields with the field decorator.

  2. Use the medata argument and a desc key to describe the field.

  3. Keep the order of the fields as how you want them to be serialized and described to LLMs.

  4. field with default value is considered optional. Field without default value and field with default_factory=required_field is considered required.

How to use it?

Describing:

We defined DataClassFormatType to categorize DataClass description formats as input or output in LLM prompt.

  1. For describing the class (data structure):

Signature is more token effcient than schema, and schema as it is always a json string, when you want LLMs to output yaml, it can be misleading if you describe the data structure in json.

  • DataClassFormatType.SCHEMA: a more standard way to describe the data structure in Json string, to_schema() as string and to_schema() as dict.

  • DataClassFormatType.SIGNATURE_JSON: emitating a json object with field name as key and description as value, to_json_signature() as string.

  • DataClassFormatType.SIGNATURE_YAML: emitating a yaml object with field name as key and description as value, to_yaml_signature() as string.

(2) For describing the class instance: this is helpful to do few-shot examples in LLM prompt. - DataClassFormatType.EXAMPLE_JSON: the json representation of the instance, to_json() as string. - DataClassFormatType.EXAMPLE_YAML: the yaml representation of the instance, to_yaml() as string.

Overall, we have a unified class method format_str() to generate formatted output based on the type of operation and class/instance context.

note::
  1. Avoid using Optional[Type] for the type of fields, as dataclass already distingushes between optional and required fields using default value.

  2. If you need to customize, you can subclass and overwrite any method to fit your needs.

Loading data:

  • from_dict() is used to create a dataclass instance from a dictionary.

Refer DataClass for more detailed instructions.

Examples:

# Define a dataclass
from adalflow.core import DataClass
from dataclasses import dataclass, field

@dataclass
class MyOutputs(DataClass):
    age: int = field(metadata={"desc": "The age of the person", "prefix": "Age:"})
    name: str = field(metadata={"desc": "The name of the person", "prefix": "Name:"})

# Create json signature
print(MyOutputs.to_json_signature())
# Output:
# {
#     "age": "The age of the person",
#     "name": "The name of the person"
# }
# Create yaml signature
print(MyOutputs.to_yaml_signature())
# Output:
# age: The age of the person
# name: The name of the person

# Create a dataclass instance
my_instance = MyOutputs(age=25, name="John Doe")
# Create json example
print(my_instance.to_json_example())
# Output:
# {
#     "age": 25,
#     "name": "John Doe"
# }
# Create yaml signature
print(my_instance.to_yaml_example())
# Output:
# age: 25
# name: John Doe
classmethod get_task_desc() str[source]#

Get the task description for the dataclass.

Returns:

The task description for the dataclass.

Return type:

str

classmethod set_task_desc(task_desc: str) None[source]#

Set the task description for the dataclass.

Parameters:

task_desc (str) – The task description to set.

classmethod get_input_fields()[source]#

Return a list of all input fields.

classmethod set_input_fields(input_fields: List[str])[source]#
Set the input fields for the dataclass.

When creating schema or instance, it will follow the input field and output field order

Parameters:

input_fields (List[str]) – The input fields to set.

classmethod get_output_fields()[source]#

Return a list of all output fields.

classmethod set_output_fields(output_fields: List[str])[source]#
Set the output fields for the dataclass.

When creating schema or instance, it will follow the input field and output field order

Parameters:

output_fields (List[str]) – The output fields to set.

to_dict(*, exclude: List[str] | Dict[str, List[str]] | None = None, include: List[str] | Dict[str, List[str]] | None = None) Dict[str, Any][source]#

Convert a dataclass object to a dictionary.

Supports nested dataclasses, lists, and dictionaries. Allow exclude keys for each dataclass object.

Use cases: - Decide what information will be included to be serialized to JSON or YAML that can be used in LLM prompt. - Exclude sensitive information from the serialized output. - Serialize the dataclass instance to a dictionary for saving states.

Parameters:

exclude (Optional[Dict[str, List[str]]], optional) – A dictionary of fields to exclude for each dataclass object. Defaults to None.

Example:

from dataclasses import dataclass
from typing import List

@dataclass
class TrecData:
    question: str
    label: int

@dataclass
class TrecDataList(DataClass):

    data: List[TrecData]
    name: str

trec_data = TrecData(question="What is the capital of France?", label=0)
trec_data_list = TrecDataList(data=[trec_data], name="trec_data_list")

trec_data_list.to_dict(exclude={"TrecData": ["label"], "TrecDataList": ["name"]})

# Output:
# {'data': [{'question': 'What is the capital of France?'}]}
classmethod from_dict(data: Dict[str, Any]) DataClass[source]#

Create a dataclass instance from a dictionary.

Supports nested dataclasses, lists, and dictionaries.

Example from the to_dict() method.

..code-block:: python

data_dict = trec_data_list.to_dict() restored_data = TreDataList.from_dict(data_dict)

assert str(restored_data.__dict__) == str(trec_data_list.__dict__)

If any required field is missing, it will raise an error. Do not use the dict that has excluded required fields.

Use cases: - Convert the json/yaml output from LLM prediction to a dataclass instance. - Restore the dataclass instance from the serialized output used for states saving.

classmethod from_json(json_str: str) DataClass[source]#

Create a dataclass instance from a JSON string.

Parameters:

json_str (str) – The JSON string to convert to a dataclass instance.

Example:

json_str = '{"question": "What is the capital of France?", "label": 0}'
trec_data = TrecData.from_json(json_str)
to_json_obj(exclude: List[str] | Dict[str, List[str]] | None = None, include: List[str] | Dict[str, List[str]] | None = None) Any[source]#

Convert the dataclass instance to a JSON object.

to_dict() along with the use of sort_keys=False to ensure the order of the fields is maintained. This can be important to llm prompt.

Args:

exclude (Optional[Dict[str, List[str]]], optional): A dictionary of fields to exclude for each dataclass object. Defaults to None.

to_json(exclude: List[str] | Dict[str, List[str]] | None = None, include: List[str] | Dict[str, List[str]] | None = None) str[source]#

Convert the dataclass instance to a JSON string.

to_dict() along with the use of sort_keys=False to ensure the order of the fields is maintained. This can be important to llm prompt.

Args:

exclude (Optional[Dict[str, List[str]]], optional): A dictionary of fields to exclude for each dataclass object. Defaults to None.

classmethod from_yaml(yaml_str: str) DataClass[source]#

Create a dataclass instance from a YAML string.

Args:

yaml_str (str): The YAML string to convert to a dataclass instance.

Example:

yaml_str = 'question: What is the capital of France?
label: 0’

trec_data = TrecData.from_yaml(yaml_str)

to_yaml_obj(exclude: List[str] | Dict[str, List[str]] | None = None, include: List[str] | Dict[str, List[str]] | None = None) Any[source]#

Convert the dataclass instance to a YAML object.

to_dict() along with the use of sort_keys=False to ensure the order of the fields is maintained.

Args:

exclude (Optional[Dict[str, List[str]]], optional): A dictionary of fields to exclude for each dataclass object. Defaults to None.

to_yaml(exclude: List[str] | Dict[str, List[str]] | None = None, include: List[str] | Dict[str, List[str]] | None = None) str[source]#

Convert the dataclass instance to a YAML string.

to_dict() along with the use of sort_keys=False to ensure the order of the fields is maintained.

Args:

exclude (Optional[Dict[str, List[str]]], optional): A dictionary of fields to exclude for each dataclass object. Defaults to None.

dict_to_yaml(data: Dict[str, Any]) str[source]#

Convert a dictionary to a YAML string.

Parameters:

data (Dict[str, Any]) – The dictionary to convert to a YAML string.

Returns:

The YAML string representation of the dictionary.

Return type:

str

classmethod to_schema(exclude: List[str] | Dict[str, List[str]] | None = None, include: List[str] | Dict[str, List[str]] | None = None) Dict[str, Dict[str, Any]][source]#

Generate a Json schema which is more detailed than the signature.

classmethod to_schema_str(exclude: List[str] | Dict[str, List[str]] | None = None, include: List[str] | Dict[str, List[str]] | None = None) str[source]#

Generate a Json schema which is more detailed than the signature.

classmethod to_yaml_signature(exclude: List[str] | Dict[str, List[str]] | None = None, include: List[str] | Dict[str, List[str]] | None = None) str[source]#

Generate a YAML signature for the class from desc in metadata.

Used mostly as LLM prompt to describe the output data format.

classmethod to_json_signature(exclude: List[str] | Dict[str, List[str]] | None = None, include: List[str] | Dict[str, List[str]] | None = None) str[source]#

Generate a JSON `signature`(json string) for the class from desc in metadata.

Used mostly as LLM prompt to describe the output data format.

Example:

>>> @dataclass
>>> class MyOutputs(DataClass):
>>>    age: int = field(metadata={"desc": "The age of the person", "prefix": "Age:"})
>>>    name: str = field(metadata={"desc": "The name of the person", "prefix": "Name:"})
>>> print(MyOutputs.to_json_signature())
>>> # Output is a JSON string:
>>> # '{
>>> #    "age": "The age of the person (int) (required)",
>>> #    "name": "The name of the person (str) (required)"
>>> #}'
classmethod to_dict_class(exclude: List[str] | Dict[str, List[str]] | None = None, include: List[str] | Dict[str, List[str]] | None = None) Dict[str, Any][source]#

More of an internal used class method for serialization.

Converts the dataclass to a dictionary, optionally excluding specified fields. Use this to save states of the class in serialization, not advised to use in LLM prompt.

classmethod format_class_str(format_type: DataClassFormatType, exclude: List[str] | Dict[str, List[str]] | None = None, include: List[str] | Dict[str, List[str]] | None = None) str[source]#

Generate formatted output based on the type of operation and class/instance context.

Parameters:

format_type (DataClassFormatType) – Specifies the format and type (schema, signature, example).

Returns:

A string representing the formatted output.

Return type:

str

Examples:

# Define a dataclass
from adalflow.core import DataClass
format_example_str(format_type: DataClassFormatType, exclude: List[str] | Dict[str, List[str]] | None = None, include: List[str] | Dict[str, List[str]] | None = None) str[source]#

Generate formatted output based on the type of operation and class/instance context.

Parameters:

format_type (DataClassFormatType) – Specifies the format and type (schema, signature, example).

Returns:

A string representing the formatted output.

Return type:

str

class DataClassFormatType(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

The format type for the DataClass schema.

SCHEMA = 'schema'#
SIGNATURE_YAML = 'signature_yaml'#
SIGNATURE_JSON = 'signature_json'#
EXAMPLE_YAML = 'example_yaml'#
EXAMPLE_JSON = 'example_json'#
required_field() Callable[[], Any][source]#

A factory function to create a required field in a dataclass. The returned callable raises a TypeError when invoked, indicating a required field was not provided.

Parameters:

name (Optional[str], optional) – The name of the required field. Defaults to None

Returns:

A callable that raises TypeError when called, indicating a missing required field.

Return type:

Callable[[], Any]

Example:

from dataclasses import dataclass
from adalflow.core.base_data_class import required_field, DataClass

@dataclass
class Person(DataClass):
    name: str = field(default=None)
    age: int = field(default_factory=required_field())# allow required field after optional field
check_adal_dataclass(data_class: Type) None[source]#

Check if the provided class is a valid dataclass for the AdalFlow framework.

Parameters:

data_class (Type) – The class to check.

class DynamicDataClassFactory[source]#

Bases: object

static from_dict(data: ~typing.Dict[str, ~typing.Any], base_class: ~typing.Type = <class 'core.base_data_class.DataClass'>, class_name: str = 'DynamicDataClass') DataClass[source]#

Create an instance of a dataclass from a dictionary. The dictionary should have the following structure: {

“field_name”: field_value, …

}

Parameters:
  • data (dict) – The dictionary with field names and values.

  • base_class (type) – The base class to inherit from (default: BaseDataClass).

  • class_name (str) – The name of the generated dataclass (default: DynamicDataClass).

Returns:

An instance of the generated dataclass.

Return type:

BaseDataClass