lancedb_retriver

Classes

LanceDBRetriever(embedder, dimensions[, ...])

LanceDBRetriever is a retriever that leverages LanceDB to efficiently store and query document embeddings.

class LanceDBRetriever(embedder: Embedder, dimensions: int, db_uri: str = '/tmp/lancedb', top_k: int = 5, overwrite: bool = True)[source]

Bases: Retriever[Any, str]

LanceDBRetriever is a retriever that leverages LanceDB to efficiently store and query document embeddings.

Parameters:
  • embedder (Embedder) – An instance of the Embedder class used for computing embeddings.

  • dimensions (int) – The dimensionality of the embeddings used.

  • db_uri (str) – The URI of the LanceDB storage (default is “/tmp/lancedb”).

  • top_k (int) – The number of top results to retrieve for a given query (default is 5).

  • overwrite (bool) – If True, the existing table is overwritten; otherwise, new documents are appended.

This retriever supports adding documents with their embeddings to a LanceDB storage and retrieving relevant documents based on a given query.

More information on LanceDB can be found here:(https://github.com/lancedb/lancedb) Documentations: https://lancedb.github.io/lancedb/

add_documents(documents: Sequence[Dict[str, Any]])[source]

Adds documents with and computes their embeddings using the provided Embedder. :param documents: A sequence of documents, each with a ‘content’ field containing text. :type documents: Sequence[Dict[str, Any]]

retrieve(query: str | List[str], top_k: int | None = None) List[RetrieverOutput][source]

. Retrieve top-k documents from LanceDB for a given query or queries. :param query: A query string or a list of query strings. :type query: Union[str, List[str]] :param top_k: The number of top documents to retrieve (if not specified, defaults to the instance’s top_k). :type top_k: Optional[int]

Returns:

A list of RetrieverOutput containing the indices and scores of the retrieved documents.

Return type:

List[RetrieverOutput]