Optimize document parsing for Banzai ReferenceCache
Problem
Banzai processing relies on a chain of filters. Each filter receives the output of the previous filter, then processes and returns the modified content.
We have multiple ReferenceFilter
s that include a ReferenceCache
. It generates a cached collection of parent references by parsing the markdown document. The parsing process can be time and memory-consuming for large documents.
The biggest problem is that we repeatedly parse the document for each ReferenceFilter
. That multiplies the overhead.
Proposal
- Avoid unnecessary document parsing calls
- Parse the document once and allow reference filters to access it
Edited by Sean Carroll