Skip to content

before_rabbithole_stores_documents

Intervene before the Rabbit Hole starts the ingestion pipeline.

Allows modifying how the list of Document is inserted in the vector memory.

For example, this hook is a good point to summarize the incoming documents and save both original and summarized contents. An official plugin is available to test this procedure.

📄 Arguments

Name Type Description
docs List[Document] List of chunked Langchain Documents before being inserted in memory.
cat StrayCat Cheshire Cat instance, allows you to use the framework components.

â†Šī¸ Return

Type: List[Document]

List of Langchain Documents that will be stored in vector memory.

✍ Example

from cat.mad_hatter.decorators import hook

@hook  # default priority = 1
def before_rabbithole_stores_documents(docs, cat):
    # summarize group of 5 documents and add them along original ones
    summaries = []
    for n, i in enumerate(range(0, len(docs), 5)):
        # Get the text from groups of docs and join to string
        group = docs[i: i + 5]
        group = list(map(lambda d: d.page_content, group))
        text_to_summarize = "\n".join(group)

        # Summarize and add metadata
        summary = cat.llm(f"Provide a concide summary of the following: {group}")
        summary = Document(page_content=summary)
        summary.metadata["is_summary"] = True
        summaries.append(summary)

    return docs.extend(summaries)