Set up self-hosted Document Processing
To set up self-hosted Document Processing, you need to:
-
The trial is 40 days, and is readily available via the PaperCut MF Application ServerAn Application Server is the primary server program responsible for providing the PaperCut user interface, storing data, and providing services to users. PaperCut uses the Application Server to manage user and account information, manage printers, calculate print costs, provide a web browser interface to administrators and end users, and much more. under Options > Capture > Document Processing > Use Self-Hosted Document Processing (requires additional setup).
-
After the self-hosted Document Processing trial period is finished, the solution requires the On-prem OCR & Document Processing Pack. For more information, contact your local Authorized Solution Center or reseller.
-
The self-hosted Document Processing solution is available only for Windows.
Step 1: Determine where to install Document Processing
For smaller environments, it makes sense to install Document Processing alongside the Application Server. In medium to larger environments, though, you can ensure optimum system and Application Server performance by setting up one or more dedicated Document Processing servers that the Application Server can contact.
See the table below for recommendations.
Environment size | Approx. scan jobs per day | Recommended processors* | Recommended installation location | Benefits |
---|---|---|---|---|
Small |
0 – 50 |
2 |
Application Server |
|
Medium |
50 – 200 |
3 |
Start on a well- resourced Application Server. Monitor and plan for a separate server on an as-needed basis. |
|
Large |
200+ |
4+ |
One or more separate high performing Document Processing servers |
|
*Recommended available processors to use (to support parallel jobs).
Keep in mind that the more storage and processing power available, the better Document Processing performs—make as much available as you can. For any environment size, we recommend:
-
at least 10 GB available disk space
-
512 MB available memory
-
running a 64-bit edition of Microsoft Windows.
For information about:
-
supported Windows versions, see System Requirements
-
performance tuning of a standalone or co-located installation, see the Tuning Document Processing server performance section below.
Step 2: Install Document Processing
-
Download and install both of the following:
-
On the Document Processing server, run the file. The Setup Wizard is displayed.
-
Follow the prompts during the install.
-
If you intend to scan documents to PDF, ensure that the GhostTrap component is selected for installation.
-
If you intend to scan to DOCX, ensure that the Pandoc component is selected for installation.
On Windows servers, the installer configures the Windows Firewall.
-
-
If you are using a non-Windows Firewall, open port 9181 (inbound) to allow connections from the PaperCut MF Application Server.
-
Repeat the process for each Document Processing server you wish to add.
Step 3: Configure the host location and available languages
-
In the PaperCut MF Admin web interface, do one of the following:
-
If you’re already on the Capture page, refresh the page.
-
Click Options > Capture. The Capture page is displayed.
-
-
In the Hosting area, select Use self-hosted Document Processing (requires additional setup).
-
In the Add Document Processing Server area, in Hostname, type the hostname or the IP address of the server where you installed Document Processing.
NOTEWe recommend that you use the server hostname. Only use the server IP address if it’s static.
-
Click Add.
-
If you want to set up multiple Document Processing servers, click Add new Document Processing Server; then repeat steps 3 and 4.
Each Document Processing server is listed on the Capture tab.
-
Click Apply.
-
Ensure that your scan actions have been configured with the desired Document Processing options enabled.
-
Run a test job for each configured Document Processing option and check the output files.
Step 4: Tuning Document Processing server performance
The approach to tuning a Document Processing server's performance depends on whether it's on a standalone system or co-located with other services.
By default, a Document Processing server processes two jobs in parallel, and they are processed with a normal CPU priority. As described below, you can change the default number of parallel jobs by modifying the configuration file at [ocr-server-path]/data/config/config.toml.
After making changes to the config file, you’ll need to restart the Windows service: PaperCut OCR Server.
Tuning for installation on a standalone system
For best performance when installing the Document Processing server on a standalone system, it's a good idea to maximize the number of jobs that can be processed in parallel.
The ideal number to use depends on many factors, such as the type and size of the documents being processed and the system architecture. A reasonable starting point is to use the total number of virtual CPUs (or cores times threads on a “bare metal” system) minus two.
Put another way, if you want to process four jobs in parallel and you're installing Document Processing on a virtual machine, give it six virtual CPUs.
To make this change:
-
In the config.toml file, remove the # at the start of the MaxJobsInParallel line to uncomment the option and make it active.
-
Set the MaxJobsInParallel line to MaxJobsInParallel = 4
-
Restart the Windows service: PaperCut OCR Server
Tuning for co-location with the Application Server
For medium to large environments we do not recommend this approach; see the table above. Document Processing’s heavy resource requirements can interfere with the normal operation of the Application Server.
If your system has additional available processors (beyond what the Application Server is using), you might want to consider increasing the number of jobs that are processed in parallel from the default of two.
To make this change:
-
In the config.toml file, remove the # at the start of the MaxJobsInParallel line to uncomment the option and make it active.
-
Set the MaxJobsInParallel = 3
-
Restart the Windows service: PaperCut OCR Server