Merge branch 'main' into main

This commit is contained in:
Anthony Stirling 2023-05-14 20:40:45 +01:00 committed by GitHub
commit b1f8324c21
No known key found for this signature in database
GPG key ID: 4AEE18F83AFDEB23

View file

@ -43,14 +43,20 @@ sudo apt-get update
sudo apt-get install -y git automake autoconf libtool libleptonica-dev pkg-config zlib1g-dev make g++ java-17-openjdk python3 python3-pip sudo apt-get install -y git automake autoconf libtool libleptonica-dev pkg-config zlib1g-dev make g++ java-17-openjdk python3 python3-pip
``` ```
For Fedora-based systems use this command:
```bash
sudo dnf install -y git automake autoconf libtool leptonica-devel pkg-config zlib-devel make gcc-c++ java-17-openjdk python3 python3-pip
```
### Step 2: Clone and Build jbig2enc (Only required for certain OCR functionality) ### Step 2: Clone and Build jbig2enc (Only required for certain OCR functionality)
```bash ```bash
git clone https:github.com/agl/jbig2enc git clone https://github.com/agl/jbig2enc.git &&\
cd jbig2enc cd jbig2enc &&\
./autogen.sh ./autogen.sh &&\
./configure ./configure &&\
make make &&\
sudo make install sudo make install
``` ```
@ -88,11 +94,19 @@ sudo apt-get install -y libreoffice-core libreoffice-common libreoffice-writer l
pip3 install opencv-python-headless pip3 install opencv-python-headless
``` ```
For Fedora:
```bash
sudo dnf install -y libreoffice-writer libreoffice-calc libreoffice-impress unpaper ocrmypdf
pip3 install uno opencv-python-headless unoconv pngquant
```
### Step 4: Clone and Build Stirling-PDF ### Step 4: Clone and Build Stirling-PDF
```bash ```bash
git clone https://github.com/Frooodle/Stirling-PDF.git git clone https://github.com/Frooodle/Stirling-PDF.git &&\
cd Stirling-PDF cd Stirling-PDF &&\
chmod +x ./gradlew &&\
./gradlew build ./gradlew build
``` ```
@ -104,18 +118,49 @@ You can move this file to a desired location, for example, `/opt/Stirling-PDF/`.
You must also move the Script folder within the Stirling-PDF repo that you have downloaded to this directory. You must also move the Script folder within the Stirling-PDF repo that you have downloaded to this directory.
This folder is required for the python scripts using OpenCV This folder is required for the python scripts using OpenCV
```bash
sudo mkdir /opt/Stirling-PDF &&\
sudo mv /build/libs/S-PDF-*.jar /opt/Stirling-PDF/ &&\
sudo mv scripts /opt/Stirling-PDF/ &&\
echo "Scripts installed."
```
### Step 6: Other files ### Step 6: Other files
#### OCR #### OCR
If you plan to use the OCR (Optical Character Recognition) functionality, you might need to install language packs for Tesseract if running none english scanning. If you plan to use the OCR (Optical Character Recognition) functionality, you might need to install language packs for Tesseract if running non-english scanning.
##### Installing Language Packs ##### Installing Language Packs
1. Download the desired language pack(s) by selecting the `.traineddata` file(s) for the language(s) you need. 1. Download the desired language pack(s) by selecting the `.traineddata` file(s) for the language(s) you need. You can also use your repositories provided langpacks.
2. Place the `.traineddata` files in the Tesseract tessdata directory: `/usr/share/tesseract-ocr/4.00/tessdata` 2. Place the `.traineddata` files in the Tesseract tessdata directory: `/usr/share/tesseract-ocr/4.00/tessdata`
Please view [OCRmyPDF install guide](https://ocrmypdf.readthedocs.io/en/latest/installation.html) for more info. Please view [OCRmyPDF install guide](https://ocrmypdf.readthedocs.io/en/latest/installation.html) for more info.
**IMPORTANT:** DO NOT REMOVE EXISTING `eng.traineddata`, IT'S REQUIRED. **IMPORTANT:** DO NOT REMOVE EXISTING `eng.traineddata`, IT'S REQUIRED.
Debian based systems, install languages with this command:
```bash
sudo apt update &&\
# All languages
# sudo apt install -y 'tesseract-ocr-*'
# Find languages:
apt search tesseract-ocr-
# View installed languages:
dpkg-query -W tesseract-ocr- | sed 's/tesseract-ocr-//g'
```
Fedora:
```bash
# All languages
# sudo dnf install -y tesseract-langpack-*
# Find languages:
dnf search -C tesseract-langpack-
# View installed languages:
rpm -qa | grep tesseract-langpack | sed 's/tesseract-langpack-//g'
```
### Step 7: Run Stirling-PDF ### Step 7: Run Stirling-PDF