`before_rabbithole_splits_text`
Intervene before the uploaded document is split into chunks.
Allows editing the uploaded document main Document(s) before the RabbitHole recursively splits it in shorter ones.
Please note that this is a list because parsers can output one or more Document, that are afterward split.
For instance, the hook allows to change the text or edit/add metadata.
📄 Arguments
Section titled “📄 Arguments”| Name | Type | Description |
|---|---|---|
docs |
List[Document] |
Langchain Documents resulted after parsing the file uploaded in the RabbitHole. |
cat |
Cat | Cheshire Cat instance, allows you to use the framework components. |
doc example:
docs = List[Document(page_content="This is a very long document before being split", metadata={})]↩️ Return
Section titled “↩️ Return”Type: List[Document]
Edited Langchain Documents.
✍ Example
Section titled “✍ Example”from cat.mad_hatter.decorators import hook
@hook # default priority = 1def before_rabbithole_splits_text(docs, cat): for doc in docs: doc.page_content = doc.page_content.replace("dog", "cat") return docs