Skip to content

`before_rabbithole_stores_documents`

Intervene before the Rabbit Hole starts the ingestion pipeline.

Allows modifying how the list of Document is inserted in the vector memory.

For example, this hook is a good point to summarize the incoming documents and save both original and summarized contents. An official plugin is available to test this procedure.

Name Type Description
docs List[Document] List of chunked Langchain Documents before being inserted in memory.
cat Cat Cheshire Cat instance, allows you to use the framework components.

Type: List[Document]

List of Langchain Documents that will be stored in vector memory.

from cat.mad_hatter.decorators import hook
@hook # default priority = 1
def before_rabbithole_stores_documents(docs, cat):
# summarize group of 5 documents and add them along original ones
summaries = []
for n, i in enumerate(range(0, len(docs), 5)):
# Get the text from groups of docs and join to string
group = docs[i: i + 5]
group = list(map(lambda d: d.page_content, group))
text_to_summarize = "\n".join(group)
# Summarize and add metadata
summary = cat.llm(f"Provide a concide summary of the following: {group}")
summary = Document(page_content=summary)
summary.metadata["is_summary"] = True
summaries.append(summary)
return docs.extend(summaries)