From 50b921f318fa985b92bc97443658c0a26b9b795b Mon Sep 17 00:00:00 2001 From: trytomakeyouprivate <113100745+trytomakeyouprivate@users.noreply.github.com> Date: Sun, 14 May 2023 17:11:04 +0000 Subject: [PATCH 1/3] Update LocalRunGuide.md Added Fedora Installation translations. Not sure about zlib-devel. Why are pip packages installed through apt on ubuntu? Where is the reasoning? --- LocalRunGuide.md | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/LocalRunGuide.md b/LocalRunGuide.md index 50bb6e9d..49617a91 100644 --- a/LocalRunGuide.md +++ b/LocalRunGuide.md @@ -43,6 +43,12 @@ sudo apt-get update sudo apt-get install -y git automake autoconf libtool libleptonica-dev pkg-config zlib1g-dev make g++ java-17-openjdk python3 python3-pip ``` +For Fedora-based systems use this command: + +```bash +sudo dnf install -y git automake autoconf libtool leptonica-devel pkg-config zlib-devel make gcc-c++ java-17-openjdk python3 python3-pip +``` + ### Step 2: Clone and Build jbig2enc (Only required for certain OCR functionality) ```bash @@ -88,6 +94,13 @@ sudo apt-get install -y libreoffice-core libreoffice-common libreoffice-writer l pip3 install opencv-python-headless ``` +For Fedora: + +```bash +sudo dnf install -y libreoffice-writer libreoffice-calc libreoffice-impress unpaper ocrmypdf +pip3 install uno opencv-python-headless unoconv pngquant +``` + ### Step 4: Clone and Build Stirling-PDF ```bash From d6deb52731c9ade339764a3a1107aaf0f0ba1580 Mon Sep 17 00:00:00 2001 From: trytomakeyouprivate <113100745+trytomakeyouprivate@users.noreply.github.com> Date: Sun, 14 May 2023 17:56:50 +0000 Subject: [PATCH 2/3] Update LocalRunGuide.md added tesseract-osd nessecary in Fedora --- LocalRunGuide.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/LocalRunGuide.md b/LocalRunGuide.md index 49617a91..9b9648c2 100644 --- a/LocalRunGuide.md +++ b/LocalRunGuide.md @@ -97,7 +97,7 @@ pip3 install opencv-python-headless For Fedora: ```bash -sudo dnf install -y libreoffice-writer libreoffice-calc libreoffice-impress unpaper ocrmypdf +sudo dnf install -y libreoffice-writer libreoffice-calc libreoffice-impress unpaper ocrmypdf tesseract-osd pip3 install uno opencv-python-headless unoconv pngquant ``` From 42cc0312004822e39067f47a9af7151b47bb4392 Mon Sep 17 00:00:00 2001 From: trytomakeyouprivate <113100745+trytomakeyouprivate@users.noreply.github.com> Date: Sun, 14 May 2023 18:32:17 +0000 Subject: [PATCH 3/3] Update LocalRunGuide.md fixed one file that was not executable with chmod +x also added language-pack installation and viewing. Manually adding langpacks could also be useful but dont see the reason yet --- LocalRunGuide.md | 52 ++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 42 insertions(+), 10 deletions(-) diff --git a/LocalRunGuide.md b/LocalRunGuide.md index 9b9648c2..b39b40da 100644 --- a/LocalRunGuide.md +++ b/LocalRunGuide.md @@ -52,11 +52,11 @@ sudo dnf install -y git automake autoconf libtool leptonica-devel pkg-config zli ### Step 2: Clone and Build jbig2enc (Only required for certain OCR functionality) ```bash -git clone https:github.com/agl/jbig2enc -cd jbig2enc -./autogen.sh -./configure -make +git clone https://github.com/agl/jbig2enc.git &&\ +cd jbig2enc &&\ +./autogen.sh &&\ +./configure &&\ +make &&\ sudo make install ``` @@ -97,15 +97,16 @@ pip3 install opencv-python-headless For Fedora: ```bash -sudo dnf install -y libreoffice-writer libreoffice-calc libreoffice-impress unpaper ocrmypdf tesseract-osd +sudo dnf install -y libreoffice-writer libreoffice-calc libreoffice-impress unpaper ocrmypdf pip3 install uno opencv-python-headless unoconv pngquant ``` ### Step 4: Clone and Build Stirling-PDF ```bash -git clone https://github.com/Frooodle/Stirling-PDF.git -cd Stirling-PDF +git clone https://github.com/Frooodle/Stirling-PDF.git &&\ +cd Stirling-PDF &&\ +chmod +x ./gradlew &&\ ./gradlew build ``` @@ -117,18 +118,49 @@ You can move this file to a desired location, for example, `/opt/Stirling-PDF/`. You must also move the Script folder within the Stirling-PDF repo that you have downloaded to this directory. This folder is required for the python scripts using OpenCV +```bash +sudo mkdir /opt/Stirling-PDF &&\ +sudo mv /build/libs/S-PDF-*.jar /opt/Stirling-PDF/ &&\ +sudo mv scripts /opt/Stirling-PDF/ &&\ +echo "Scripts installed." +``` ### Step 6: Other files #### OCR -If you plan to use the OCR (Optical Character Recognition) functionality, you might need to install language packs for Tesseract if running none english scanning. +If you plan to use the OCR (Optical Character Recognition) functionality, you might need to install language packs for Tesseract if running non-english scanning. ##### Installing Language Packs -1. Download the desired language pack(s) by selecting the `.traineddata` file(s) for the language(s) you need. +1. Download the desired language pack(s) by selecting the `.traineddata` file(s) for the language(s) you need. You can also use your repositories provided langpacks. 2. Place the `.traineddata` files in the Tesseract tessdata directory: `/usr/share/tesseract-ocr/4.00/tessdata` Please view [OCRmyPDF install guide](https://ocrmypdf.readthedocs.io/en/latest/installation.html) for more info. **IMPORTANT:** DO NOT REMOVE EXISTING `eng.traineddata`, IT'S REQUIRED. +Debian based systems, install languages with this command: +```bash +sudo apt update &&\ +# All languages +# sudo apt install -y 'tesseract-ocr-*' + +# Find languages: +apt search tesseract-ocr- + +# View installed languages: +dpkg-query -W tesseract-ocr- | sed 's/tesseract-ocr-//g' +``` + +Fedora: + +```bash +# All languages +# sudo dnf install -y tesseract-langpack-* + +# Find languages: +dnf search -C tesseract-langpack- + +# View installed languages: +rpm -qa | grep tesseract-langpack | sed 's/tesseract-langpack-//g' +``` ### Step 7: Run Stirling-PDF