Parser and Structured Output¶
Parser is the interpreter of the LLM output.
Basic Parser¶
For basic data formats where you do not need to create a data class for, you can use the following Parsers in the library.
Parser Class |
Target Python Object |
Description |
---|---|---|
|
Extracts the first boolean value from the text as |
|
|
Extracts the first integer value from the text as |
|
|
Extracts the first float value from the text as |
|
|
Extracts ‘[]’ and parses the first list string from the text. Uses both json.loads and yaml.safe_load. |
|
|
Extracts ‘[]’ and ‘{}’ and parses JSON strings from the text. It resorts to yaml.safe_load for robust parsing. |
|
|
Extracts |
Here are some quick demonstrations:
BooleanParser
from adalflow.core.string_parser import BooleanParser
bool_str = "True"
bool_str_2 = "False"
bool_str_3 = "true"
bool_str_4 = "false"
bool_str_5 = "1" # will fail
bool_str_6 = "0" # will fail
bool_str_7 = "yes" # will fail
bool_str_8 = "no" # will fail
# it will all return True/False
parser = BooleanParser()
print(parser(bool_str))
print(parser(bool_str_2))
print(parser(bool_str_3))
print(parser(bool_str_4))
The printout will be:
True
False
True
False
Boolean parsers will not work for ‘1’, ‘0’, ‘yes’, ‘no’ as they are not the standard boolean values.
IntParser
rom adalflow.core.string_parser import IntParser
int_str = "42"
int_str_2 = "42.0"
int_str_3 = "42.7"
int_str_4 = "the answer is 42.75"
# it will all return 42
parser = IntParser()
print(parser(int_str))
print(parser(int_str_2))
print(parser(int_str_3))
print(parser(int_str_4))
The printout will be:
42
42
42
42
IntParser
will return the integer value of the first number in the string, even if it is a float.
FloatParser
from adalflow.core.string_parser import FloatParser
float_str = "42.0"
float_str_2 = "42"
float_str_3 = "42.7"
float_str_4 = "the answer is 42.75"
# it will all return 42.0
parser = FloatParser()
print(parser(float_str))
print(parser(float_str_2))
print(parser(float_str_3))
print(parser(float_str_4))
The printout will be:
42.0
42.0
42.7
42.75
FloatParser
will return the float value of the first number in the string, even if it is an integer.
ListParser
from adalflow.core.string_parser import ListParser
list_str = '["key", "value"]'
list_str_2 = 'prefix["key", 2]...'
list_str_3 = '[{"key": "value"}, {"key": "value"}]'
parser = ListParser()
print(parser(list_str))
print(parser(list_str_2))
print(parser(list_str_3))
The output will be:
['key', 'value']
['key', 2]
[{'key': 'value'}, {'key': 'value'}]
JsonParser
Even though it can work on lists, it is better to only use it for dictionaries.
from adalflow.core.string_parser import JsonParser
dict_str = '{"key": "value"}'
nested_dict_str = (
'{"name": "John", "age": 30, "attributes": {"height": 180, "weight": 70}}'
)
list_str = '["key", 2]'
list_dict_str = '[{"key": "value"}, {"key": "value"}]'
parser = JsonParser()
print(parser)
print(parser(dict_str))
print(parser(nested_dict_str))
print(parser(list_str))
print(parser(list_dict_str))
The output will be:
{'key': 'value'}
{'name': 'John', 'age': 30, 'attributes': {'height': 180, 'weight': 70}}
['key', 2]
[{'key': 'value'}, {'key': 'value'}]
YamlParser
Though it works almost on all of the previous examples, it is better to use it for yaml formatted dictionaries.
from adalflow.core.string_parser import YamlParser
yaml_dict_str = "key: value"
yaml_nested_dict_str = (
"name: John\nage: 30\nattributes:\n height: 180\n weight: 70"
)
yaml_list_str = "- key\n- value"
parser = YamlParser()
print(parser)
print(parser(yaml_dict_str))
print(parser(yaml_nested_dict_str))
print(parser(yaml_list_str))
The output will be:
{'key': 'value'}
{'name': 'John', 'age': 30, 'attributes': {'height': 180, 'weight': 70}}
['key', 'value']
Note
All parsers will raise ValueError
if it fails at any step. Developers should process it accordingly.
DataClassParser¶
For more complicated data structures, we can use DataClass
to define it.
The usage of it is pretty much the same as native dataclass from dataclasses.
Let’s try to define a User class:
from dataclasses import dataclass, field
from adalflow.core import DataClass
# no need to use Optional, when default is on, it is optional.
.. code-block:: python
@dataclass
class SampleDataClass(DataClass):
description: str = field(metadata={"desc": "A sample description"})
category: str = field(metadata={"desc": "Category of the sample"})
value: int = field(metadata={"desc": "A sample integer value"})
status: str = field(metadata={"desc": "Status of the sample"})
# input and output fields can work with DataClassParser
__input_fields__ = [
"description",
"category",
]
__output_fields__ = ["value", "status"]
We have three classes to work with structured data.
They are DataClassParser
,
JsonOutputParser
, and YamlOutputParser<components.output_parsers.outputs.YamlOutputParser>.
DataClassParser is the easiest to use.
Now, lets’ create a parser that will use the SampleDataClass to parse the output json string back to the data class instance.
from adalflow.components.output_parsers import DataClassParser
parser = DataClassParser(data_class=SampleDataClass, return_data_class=True, format_type="json")
Let’s view the structure of the parser use print(parser).
The output will be:
DataClassParser(
data_class=SampleDataClass, format_type=json, return_data_class=True, input_fields=['description', 'category'], output_fields=['value', 'status']
(_output_processor): JsonParser()
(output_format_prompt): Prompt(
template: Your output should be formatted as a standard JSON instance with the following schema:
```
{{schema}}
```
-Make sure to always enclose the JSON output in triple backticks (```). Please do not add anything other than valid JSON output!
-Use double quotes for the keys and string values.
-DO NOT mistaken the "properties" and "type" in the schema as the actual fields in the JSON output.
-Follow the JSON formatting conventions., prompt_variables: ['schema']
)
)
You can get the output and input format strings using the following methods:
print(parser.get_input_format_str())
print(parser.get_output_format_str())
The output for the output format string will be:
Your output should be formatted as a standard JSON instance with the following schema:
```
{
"value": " (int) (required)",
"status": " (str) (required)"
}
```
-Make sure to always enclose the JSON output in triple backticks (```). Please do not add anything other than valid JSON output!
-Use double quotes for the keys and string values.
-DO NOT mistaken the "properties" and "type" in the schema as the actual fields in the JSON output.
-Follow the JSON formatting conventions.
The input format string will be:
{
"description": " (str) (required)",
"category": " (str) (required)"
}
Convert a json string to a data class instance:
user_input = '{"description": "Parsed description", "category": "Sample Category", "value": 100, "status": "active"}'
parsed_instance = parser.call(user_input)
print(parsed_instance)
The output will be:
SampleDataClass(description='Parsed description', category='Sample Category', value=100, status='active')
Try the examples string:
samples = [
SampleDataClass(
description="Sample description",
category="Sample category",
value=100,
status="active",
),
SampleDataClass(
description="Another description",
category="Another category",
value=200,
status="inactive",
),
]
examples_str = parser.get_examples_str(examples=samples)
print(examples_str)
The output will be:
examples_str:
{
"description": "Sample description",
"category": "Sample category",
"value": 100,
"status": "active"
}
__________
{
"description": "Another description",
"category": "Another category",
"value": 200,
"status": "inactive"
}
__________
You can check out Deep Dive Parser for more.
API References