postgres_retriever#

Leverage a postgres database to store and retrieve documents.

Classes

DistanceToOperator(value[, names, module, ...])

Enum for the distance to operator.

PostgresRetriever(embedder[, top_k, ...])

Use a postgres database to store and retrieve documents.

class DistanceToOperator(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]#

Bases: Enum

Enum for the distance to operator.

About pgvector:

  1. L2 distance: <->, inner product (<#>), cosine distance (<=>), and L1 distance (<+>, added in 0.7.0)

L2 = '<->'#
INNER_PRODUCT = '<#>'#
COSINE = '<=>'#
L1 = '<+>'#
class PostgresRetriever(embedder: Embedder, top_k: int | None = 1, database_url: str = None, table_name: str = 'document', distance_operator: DistanceToOperator = DistanceToOperator.INNER_PRODUCT)[source]#

Bases: Retriever[Any, str]

Use a postgres database to store and retrieve documents.

Users can follow this example and to customize the prompt or additionally ask it to output score along with the indices.

Parameters:
  • top_k (Optional[int], optional) – top k documents to fetch. Defaults to 1.

  • database_url (str) – the database url to connect to. Defaults to postgresql://postgres:password@localhost:5432/vector_db.

References: [1] pgvector extension: pgvector/pgvector

classmethod format_vector_search_query(table_name: str, vector_column: str, query_embedding: List[float], top_k: int, distance_operator: DistanceToOperator, sort_desc: bool = True) str[source]#

Formats a SQL query string to select all columns from a table, order the results by the distance or similarity score to a provided embedding, and also return that score.

Parameters:
  • table_name (str) – The name of the table to query.

  • column (str) – The name of the column containing the vector data.

  • query_embedding (list or str) – The embedding vector to compare against.

  • top_k (int) – The number of top results to return.

Returns:

A formatted SQL query string that includes the score.

Return type:

str

retrieve_by_sql(query: str) List[str][source]#

Retrieve documents from the postgres database.

call(input: str | Sequence[str], top_k: int | None = None, **kwargs) List[RetrieverOutput][source]#