fix: switch to pdftohtml for pdf to html conversions (#998)

* fix: switch to pdftohtml for pdf to html conversions

* build: include poppler-utils in dockerfile for pdftohtml
This commit is contained in:
Eric 2024-03-29 17:02:33 -04:00 committed by GitHub
parent 27bbf7a513
commit dfb8c64f5a
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
37 changed files with 101 additions and 58 deletions

View file

@ -36,6 +36,8 @@ RUN echo "@testing https://dl-cdn.alpinelinux.org/alpine/edge/main" | tee -a /et
shadow \
# Doc conversion
libreoffice@testing \
# pdftohtml
poppler-utils \
# OCR MY PDF (unpaper for descew and other advanced featues)
ocrmypdf \
tesseract-ocr-data-eng \