lancedb_retriver¶
Classes
|
LanceDBRetriever is a retriever that leverages LanceDB to efficiently store and query document embeddings. |
- class LanceDBRetriever(embedder: Embedder, dimensions: int, db_uri: str = '/tmp/lancedb', top_k: int = 5, overwrite: bool = True)[source]¶
Bases:
Retriever
[Any
,str
]LanceDBRetriever is a retriever that leverages LanceDB to efficiently store and query document embeddings.
- Parameters:
embedder (Embedder) – An instance of the Embedder class used for computing embeddings.
dimensions (int) – The dimensionality of the embeddings used.
db_uri (str) – The URI of the LanceDB storage (default is “/tmp/lancedb”).
top_k (int) – The number of top results to retrieve for a given query (default is 5).
overwrite (bool) – If True, the existing table is overwritten; otherwise, new documents are appended.
This retriever supports adding documents with their embeddings to a LanceDB storage and retrieving relevant documents based on a given query.
More information on LanceDB can be found here:(https://github.com/lancedb/lancedb) Documentations: https://lancedb.github.io/lancedb/
- add_documents(documents: Sequence[Dict[str, Any]])[source]¶
Adds documents with and computes their embeddings using the provided Embedder. :param documents: A sequence of documents, each with a ‘content’ field containing text. :type documents: Sequence[Dict[str, Any]]
- retrieve(query: str | List[str], top_k: int | None = None) List[RetrieverOutput] [source]¶
. Retrieve top-k documents from LanceDB for a given query or queries. :param query: A query string or a list of query strings. :type query: Union[str, List[str]] :param top_k: The number of top documents to retrieve (if not specified, defaults to the instance’s top_k). :type top_k: Optional[int]
- Returns:
A list of RetrieverOutput containing the indices and scores of the retrieved documents.
- Return type:
List[RetrieverOutput]