Skip to content

`before_rabbithole_splits_text`

Intervene before the uploaded document is split into chunks.

Allows editing the uploaded document main Document(s) before the RabbitHole recursively splits it in shorter ones. Please note that this is a list because parsers can output one or more Document, that are afterward split.

For instance, the hook allows to change the text or edit/add metadata.

Name Type Description
docs List[Document] Langchain Documents resulted after parsing the file uploaded in the RabbitHole.
cat Cat Cheshire Cat instance, allows you to use the framework components.

doc example:

docs = List[Document(page_content="This is a very long document before being split", metadata={})]

Type: List[Document]

Edited Langchain Documents.

from cat.mad_hatter.decorators import hook
@hook # default priority = 1
def before_rabbithole_splits_text(docs, cat):
for doc in docs:
doc.page_content = doc.page_content.replace("dog", "cat")
return docs