Plugins
Docling allows to be extended with third-party plugins which extend the choice of options provided in several steps of the pipeline.
Plugins are loaded via the pluggy system which allows third-party developers to register the new capabilities using the setuptools entrypoint.
The actual entrypoint definition might vary, depending on the packaging system you are using. Here are a few examples:
[project.entry-points."docling"]
your_plugin_name = "your_package.module"
[tool.poetry.plugins."docling"]
your_plugin_name = "your_package.module"
[options.entry_points]
docling =
your_plugin_name = your_package.module
from setuptools import setup
setup(
# ...,
entry_points = {
'docling': [
'your_plugin_name = "your_package.module"'
]
}
)
your_plugin_nameis the name you choose for your plugin. This must be unique among the broader Docling ecosystem.your_package.moduleis the reference to the module in your package which is responsible for the plugin registration.
Plugin factories
OCR factory
The OCR factory allows to provide more OCR engines to the Docling users.
The content of your_package.module registers the OCR engines with a code similar to:
# Factory registration
def ocr_engines():
return {
"ocr_engines": [
YourOcrModel,
]
}
where YourOcrModel must implement the BaseOcrModel and provide an options class derived from OcrOptions.
If you look for an example, the default Docling plugins is a good starting point.
Third-party plugins
When the plugin is not provided by the main docling package but by a third-party package this have to be enabled explicitly via the allow_external_plugins option.
from docling.datamodel.base_models import InputFormat
from docling.datamodel.pipeline_options import PdfPipelineOptions
from docling.document_converter import DocumentConverter, PdfFormatOption
pipeline_options = PdfPipelineOptions()
pipeline_options.allow_external_plugins = True # <-- enabled the external plugins
pipeline_options.ocr_options = YourOptions # <-- your options here
doc_converter = DocumentConverter(
format_options={
InputFormat.PDF: PdfFormatOption(
pipeline_options=pipeline_options
)
}
)
Using the docling CLI
Similarly, when using the docling users have to enable external plugins before selecting the new one.
# Show the external plugins
docling --show-external-plugins
# Run docling with the new plugin
docling --allow-external-plugins --ocr-engine=NAME