Ollama Integration¶

AdalFlow provides comprehensive support for Ollama, enabling you to run open-source LLMs locally without depending on external APIs. This integration supports both synchronous and asynchronous operations, including streaming responses.

Quick Start¶

Installation¶

# Install AdalFlow with Ollama support
pip install adalflow[ollama]

# Install Ollama (if not already installed)
curl -fsSL https://ollama.com/install.sh | sh

# Start Ollama server
ollama serve

# Pull a model (e.g., qwen2:0.5b, mistral, or gpt-oss)
ollama pull qwen2:0.5b

Basic Usage¶

from adalflow.components.model_client import OllamaClient
from adalflow.core import Generator

# Initialize the Generator with OllamaClient
generator = Generator(
    model_client=OllamaClient(host="http://localhost:11434"),
    model_kwargs={"model": "qwen2:0.5b"}
)

# Generate a response
response = generator({"input_str": "Hello, what can you do?"})
print(response.data)

Model Capabilities¶

Text Generation¶

Basic text generation with various open-source models:

# Configure with specific model parameters
generator = Generator(
    model_client=OllamaClient(),
    model_kwargs={
        "model": "qwen2:0.5b",
        "options": {
            "temperature": 0.7,
            "num_predict": 512,
            "top_k": 40,
            "top_p": 0.9,
            "repeat_penalty": 1.1,
            "num_ctx": 2048,
        }
    }
)

response = generator({"input_str": "Explain quantum computing"})
print(response.data)

Streaming Responses¶

Real-time streaming for better user experience:

Synchronous Streaming:

# Enable streaming
stream_generator = Generator(
    model_client=OllamaClient(),
    model_kwargs={
        "model": "qwen2:0.5b",
        "stream": True
    }
)

output = stream_generator.call(
    prompt_kwargs={"input_str": "Tell me a story"}
)

# Access the raw streaming response
for chunk in output.raw_response:
    if "message" in chunk:
        print(chunk["message"]["content"], end='', flush=True)

Asynchronous Streaming:

import asyncio

# Using async streaming
output = await stream_generator.acall(
    prompt_kwargs={"input_str": "Tell me a story"}
)

# Access the raw async streaming response
async for chunk in output.raw_response:
    if "message" in chunk:
        print(chunk["message"]["content"], end='', flush=True)

Chat vs Generate API¶

Ollama supports two APIs for text generation:

# Chat API (default) - uses conversation format
chat_generator = Generator(
    model_client=OllamaClient(),
    model_kwargs={"model": "qwen2:0.5b"}
)

# Generate API - uses raw prompt
generate_generator = Generator(
    model_client=OllamaClient(),
    model_kwargs={
        "model": "qwen2:0.5b",
        "generate": True  # Use generate API instead of chat
    }
)

Text Embeddings¶

Generate embeddings for semantic search and similarity:

from adalflow.core import Embedder

embedder = Embedder(
    model_client=OllamaClient(),
    model_kwargs={"model": "nomic-embed-text"}
)

# Single text embedding
text = "This is a sample text for embedding"
embedding = embedder(input=text)
print(f"Embedding dimension: {len(embedding.data[0].embedding)}")

Advanced Features¶

Reasoning Models (GPT-OSS)¶

Use OpenAI’s GPT-OSS models locally with Ollama:

# Pull GPT-OSS model
# Run in terminal: ollama pull gpt-oss:20b

reasoning_gen = Generator(
    model_client=OllamaClient(),
    model_kwargs={
        "model": "gpt-oss:20b",
        "options": {
            "temperature": 0.7,
            "num_predict": 1024,
        }
    }
)

response = reasoning_gen({
    "input_str": "Solve this problem step by step: ..."
})

# Access reasoning process if available
if response.thinking:
    print("Thinking:", response.thinking)
print("Answer:", response.data)

Model Options¶

Complete list of configurable options:

Option	Default	Description
`seed`	0	Random seed for reproducible generation
`num_predict`	128	Maximum tokens to generate (-1 for infinite)
`temperature`	0.8	Creativity level (0.0-2.0)
`top_k`	40	Number of top tokens to consider
`top_p`	0.9	Cumulative probability cutoff
`repeat_penalty`	1.1	Penalty for repeated tokens
`num_ctx`	2048	Context window size
`stop`	[]	Stop sequences (e.g., [”\n”, “user:”])
`mirostat`	0	Mirostat sampling (0=disabled, 1/2=enabled)

Available Models¶

Popular models compatible with Ollama:

Model	Size	Best For
`qwen2:0.5b`	0.5B	Lightweight, fast inference, good for testing
`llama3`	8B	General purpose, balanced performance
`mistral`	7B	Fast inference, good for coding
`mixtral`	8x7B	High quality, mixture of experts
`qwen2`	0.5B-72B	Multilingual, various sizes
`codellama`	7B-34B	Code generation and understanding
`gpt-oss`	20B/120B	OpenAI’s open-source reasoning model
`nomic-embed-text`		Text embeddings for semantic search

To see all available models, visit: https://ollama.com/library