rabbithole
Hooks to modify the RabbitHole's documents ingestion.
Here is a collection of methods to hook into the RabbitHole execution pipeline.
These hooks allow to intercept the uploaded documents at different places before they are saved into memory.
after_rabbithole_splitted_text(chunks, cat)
Hook the Document
after is split.
Allows editing the list of Document
right after the RabbitHole chunked them in smaller ones.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
chunks
|
List[Document]
|
List of Langchain |
required |
cat
|
CheshireCat
|
Cheshire Cat instance. |
required |
Returns:
Name | Type | Description |
---|---|---|
chunks |
List[Document]
|
List of modified chunked langchain documents to be stored in the episodic memory. |
Source code in cat/mad_hatter/core_plugin/hooks/rabbithole.py
after_rabbithole_stored_documents(source, stored_points, cat)
Hook the Document after is inserted in the vector memory.
Allows editing and enhancing the list of Document after is inserted in the vector memory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
source
|
Name of ingested file/url |
required | |
docs
|
List[PointStruct]
|
List of Qdrant PointStruct just inserted into the db. |
required |
cat
|
CheshireCat
|
Cheshire Cat instance. |
required |
Returns:
Type | Description |
---|---|
None
|
|
Source code in cat/mad_hatter/core_plugin/hooks/rabbithole.py
before_rabbithole_insert_memory(doc, cat)
Hook the Document
before is inserted in the vector memory.
Allows editing and enhancing a single Document
before the RabbitHole add it to the declarative vector memory.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
doc
|
Document
|
Langchain |
required |
cat
|
CheshireCat
|
Cheshire Cat instance. |
required |
Returns:
Name | Type | Description |
---|---|---|
doc |
Document
|
Langchain |
Notes
The Document
has two properties::
`page_content`: the string with the text to save in memory;
`metadata`: a dictionary with at least two keys:
`source`: where the text comes from;
`when`: timestamp to track when it's been uploaded.
Source code in cat/mad_hatter/core_plugin/hooks/rabbithole.py
before_rabbithole_splits_text(docs, cat)
Hook the Documents
before they are split into chunks.
Allows editing the uploaded document main Document(s) before the RabbitHole recursively splits it in shorter ones. Please note that this is a list because parsers can output one or more Document, that are afterwards splitted.
For instance, the hook allows to change the text or edit/add metadata.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
docs
|
List[Document]
|
Langchain |
required |
cat
|
CheshireCat
|
Cheshire Cat instance. |
required |
Returns:
Name | Type | Description |
---|---|---|
docs |
List[Document]
|
Edited Langchain |
Source code in cat/mad_hatter/core_plugin/hooks/rabbithole.py
before_rabbithole_stores_documents(docs, cat)
Hook into the memory insertion pipeline.
Allows modifying how the list of Document
is inserted in the vector memory.
For example, this hook is a good point to summarize the incoming documents and save both original and summarized contents. An official plugin is available to test this procedure.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
docs
|
List[Document]
|
List of Langchain |
required |
cat
|
Cheshire Cat instance. |
required |
Returns:
Name | Type | Description |
---|---|---|
docs |
List[Document]
|
List of edited Langchain documents. |
Source code in cat/mad_hatter/core_plugin/hooks/rabbithole.py
rabbithole_instantiates_parsers(file_handlers, cat)
Hook the available parsers for ingesting files in the declarative memory.
Allows replacing or extending existing supported mime types and related parsers to customize the file ingestion.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
file_handlers
|
dict
|
Keys are the supported mime types and values are the related parsers. |
required |
cat
|
CheshireCat
|
Cheshire Cat instance. |
required |
Returns:
Name | Type | Description |
---|---|---|
file_handlers |
dict
|
Edited dictionary of supported mime types and related parsers. |
Source code in cat/mad_hatter/core_plugin/hooks/rabbithole.py
rabbithole_instantiates_splitter(text_splitter, cat)
Hook the splitter used to split text in chunks.
Allows replacing the default text splitter to customize the splitting process.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text_splitter
|
TextSplitter
|
The text splitter used by default. |
required |
cat
|
CheshireCat
|
Cheshire Cat instance. |
required |
Returns:
Name | Type | Description |
---|---|---|
text_splitter |
TextSplitter
|
An instance of a TextSplitter subclass. |