Why Compare Patterns in Documents?

The PDF Pattern Matcher is especially useful for document reconciliation. According to the University of Washington, "Reconciliation is the process of comparing transactions and activity to supporting documentation. Further, reconciliation involves resolving any discrepancies that may have been discovered." In other words, this tool compares patterns in one stack of documents to those in another, highlighting any discrepancies.

Use Cases

Following are common use cases for document reconciliation.

  • Invoices and bank statements: Imagine you have a folder of 50 PDF invoices and a PDF bank export. You use a regex for invoice IDs, like 6-digit numbers, to see which invoices have been paid and which are missing from the bank record.
  • Inventory reconciliation: Matching serial numbers from a warehouse report against PDFs containing the original purchases.
  • Employee audit: Ensuring that every employee identification listed in the monthly payroll PDF exists in the roster of active employees. This helps catch ghost employees or data entry errors.
  • In law: Lawyers often need to track specific references across vast document sets.

Using the PDF Pattern Matcher

The tool provided on this site can be used to address the previously mentioned use cases and any other related to document reconciliation.

  1. Start with selecting one of the provided regex patterns or enter your own. The tool will search your documents for this pattern. Check out Mozilla's Developer Resources to find out more about regular expressions.
  2. Next, upload your reference and comparison documents. If you can't classify your documents as either "reference" or "comparison," think of it as comparing document stack "A" with stack "B." You can upload as many documents as you like.
  3. If you have uploaded scanned documents from which you cannot select text, choose the OCR (Optical Character Recognition) mode. In this mode, uploaded content is converted into machine-readable text by an algorithm, enabling pattern recognition. Beware: OCR is slow. Only use it when absolutely necessary. See this IBM article for more details.
  4. Finally, press the "Compare" button and wait for the results.
  5. All done! Now you can download a CSV export of the results or view the table and look for mismatches in your documents.

What's Next

Thank you for using the voidrab PDF Pattern Matcher! If you have any suggestions for improvements or would like to integrate this tool into your specific corporate settings, please feel free to contact us at contact@thomas-karner.com .

← Back to Tool