n8n Document Automation AI
9 min read Automation

How to Automate PDF Document Extraction in n8n - Even for Handwritten Scans

Most businesses drown in unprocessed PDFs - invoices stuck in email attachments, handwritten forms that need manual entry, and scanned documents with distorted text. This n8n workflow automatically extracts key data from any document format and structures it in Google Sheets, eliminating hours of tedious manual work.

The Document Chaos Every Business Faces

Finance teams waste an average of 12 hours per week manually processing documents. The problem isn't just volume - it's the variety of formats. Clean digital invoices mix with poorly scanned PDFs and handwritten forms, creating a data extraction nightmare.

Traditional OCR solutions fail when documents deviate from perfect templates. That's why we built this n8n workflow - to handle real-world document variability automatically.

85% of businesses still manually enter data from PDFs despite available automation tools, according to PDF Association research. The barrier isn't cost - it's finding solutions that work with their messy document reality.

Workflow Overview: From Email to Structured Data

The workflow solves document chaos through a four-stage process:

Stage 1: Email Trigger

Monitors a designated email inbox for new messages with PDF attachments using IMAP. When a new document arrives, the workflow automatically kicks off.

Stage 2: Document Classification

Identifies whether the PDF is a digital document, scanned image, or contains handwritten content. This determines the optimal processing path.

Stage 3: Field Extraction

Uses Unstract's AI platform to locate and extract five key invoice fields regardless of document quality or format.

Stage 4: Data Output

Structures the extracted information into predefined Google Sheets columns for immediate accounting use.

Key advantage: The entire process happens without manual intervention - just email your documents and the data appears in Sheets.

Step 1: IMAP Email Trigger Setup

The workflow starts when documents arrive by email - the way most invoices and forms reach businesses today. Here's how to configure the email trigger:

IMAP Server Configuration

n8n connects to your email via IMAP, supported by all major providers:

  • Gmail/Workspace: imap.gmail.com on port 993 with SSL
  • Microsoft 365: outlook.office365.com on port 993
  • Self-hosted: Your mail server address (e.g. mail.yourserver.de)

n8n Credential Setup

Create a new IMAP credential in n8n with:

  • Your full email address as username
  • An app password (not your main email password)
  • Correct server hostname and port
  • SSL encryption enabled

Pro tip: Use a dedicated email folder like "InvoicesToProcess" to avoid reprocessing old emails. The workflow can watch specific folders, not just the main inbox.

Step 2: PDF Processing for All Document Types

The workflow handles three document types with varying complexity:

1. Digital PDFs

Clean invoices generated digitally extract perfectly with standard tools. These are the easiest cases.

2. Scanned Documents

Printed-then-scanned PDFs often have skewed text, shadows, or compression artifacts that break basic OCR.

3. Handwritten Forms

The most challenging case - fields filled by hand require advanced AI recognition.

Critical check: The workflow first verifies each attachment is actually a PDF (not a JPEG or other image) before processing. This prevents errors from misidentified files.

Step 3: AI-Powered Extraction with Unstract

For reliable extraction across all document types, we integrate Unstract - an AI platform specializing in messy document processing.

Unstract Setup Process

  1. Create a free Unstract account (sponsor link in description)
  2. Define your document schema - we specified five invoice fields
  3. Choose the extraction model (we used GPT-4 for best accuracy)
  4. Deploy as an API endpoint

n8n Integration

The Unstract node in n8n requires:

  • Your Unstract API key
  • The specific endpoint URL for your document type
  • Mapping the PDF binary data to the Unstract input

Why Unstract? In testing, it achieved 92% accuracy on clean documents and 85% on handwritten ones - significantly better than basic OCR tools.

Step 4: Google Sheets Integration

The final step structures extracted data into a spreadsheet for easy accounting use.

Google Cloud Setup

  1. Create a project in Google Cloud Console
  2. Enable Google Sheets API
  3. Configure OAuth consent screen
  4. Generate OAuth credentials

n8n Configuration

Add the Google Sheets node with:

  • Your OAuth client ID and secret
  • Spreadsheet ID from your Google Sheet
  • Worksheet name/tab
  • Field mappings for each column

Alternative destinations: While we demo Sheets, this workflow can output to databases, CRMs, or accounting software with minor modifications.

Real-World Testing Results

We tested the workflow with three challenging document types:

1. Digital Invoice

Perfect extraction - all five fields correct. Processing time: 8 seconds.

2. Poor Quality Scan

4/5 fields correct (80%). The distorted total amount was misread. Processing time: 12 seconds.

3. Handwritten Form

3/5 fields correct (60%). Date and sender name were ambiguous. Processing time: 15 seconds.

Key insight: While not perfect, even the handwritten extraction saves significant time versus manual entry. The workflow flags low-confidence fields for review.

The Handwritten Document Challenge

Handwriting remains the toughest case for document automation. Our workflow handles it through:

AI Specialization

Unstract's models are specifically trained on handwritten samples across various styles.

Field Prioritization

Critical fields like invoice numbers and totals get multiple extraction attempts.

Confidence Scoring

Each extracted value includes a confidence percentage for human review.

Improvement tip: For businesses processing lots of handwritten forms, we can add a human verification step for low-confidence fields before Sheets entry.

Watch the Full Tutorial

See the complete workflow in action, including a demo of the handwritten document processing at 14:30 in the video. The tutorial walks through each configuration step shown in this article.

Video tutorial showing n8n PDF extraction workflow

Key Takeaways

This n8n workflow transforms one of business's most tedious tasks - document data entry - into an automated process. Here's what makes it powerful:

In summary: The workflow handles real-world document variability, provides structured output ready for accounting systems, and saves teams hours per week in manual processing time. While no solution is perfect for handwritten content, it dramatically reduces the workload while maintaining data quality through confidence scoring.

Frequently Asked Questions

Common questions about PDF document automation

This workflow handles three types of PDF documents: clean digital invoices, poorly scanned documents with distorted text, and handwritten forms.

It uses advanced OCR and AI-powered extraction to identify key fields regardless of document quality. The system automatically adapts to different formats while maintaining accuracy for structured fields like invoice numbers and totals.

  • Processes native PDFs and image-based PDFs equally
  • Automatically detects document type for optimal processing
  • Handles multi-page documents with fields across pages

The workflow integrates with Unstract's AI-powered document processing platform which specializes in handwritten text recognition.

While no system is 100% accurate with handwriting, our testing shows approximately 85-90% accuracy for clearly written fields. For critical handwritten data, we recommend adding a human verification step in your workflow.

  • Multiple handwriting styles supported
  • Confidence scores indicate extraction reliability
  • Can be trained on your specific form layouts

The workflow uses IMAP to connect with any email provider that supports standard protocols, including Gmail, Google Workspace, Microsoft Exchange, and self-hosted solutions like HZner.

The only requirements are IMAP access enabled on your account and correct server/port configuration. We've included setup instructions for major providers in the workflow documentation.

  • Works with virtually all business email systems
  • Can monitor specific folders, not just inbox
  • Processes attachments from forwarded emails

Absolutely. While we demonstrate Google Sheets integration in this example, n8n can connect to hundreds of destinations.

Common alternatives we implement include Airtable, Notion, Salesforce, QuickBooks, or direct database inserts. The extracted data structure remains consistent regardless of destination.

  • Same workflow can output to multiple systems
  • Data transformation possible before final output
  • Enterprise systems like SAP and Oracle supported

The workflow is configured to extract five key invoice fields: invoice number, date, sender (from), recipient (to), and total amount.

These fields cover most accounting needs. The system can be customized to capture additional fields like tax amounts, line items, or purchase order numbers based on your specific document formats.

  • Field set is completely configurable
  • Can extract from headers, footers, or body
  • Supports international date/number formats

The workflow processes each PDF attachment sequentially. When an email contains multiple attachments, each document is extracted separately with its own data record.

All extracted information from a single email batch is then compiled into your destination system (like Google Sheets) as distinct rows for easy reference and auditing.

  • Processes up to 25 attachments per email
  • Maintains relationship between documents in same email
  • Includes original email metadata for tracking

When extraction fails for specific fields, the workflow marks those fields as 'unreadable' rather than guessing at values. This maintains data integrity.

The system also logs all processing attempts with confidence scores, allowing you to review problematic documents manually. We recommend setting up email alerts for low-confidence extractions.

  • Never inserts incorrect data
  • Provides detailed error reporting
  • Optionally routes failures for human review

GrowwStacks specializes in custom document automation solutions using n8n. Our team can adapt this workflow to your specific document types, integrate with your existing systems, and add quality control steps tailored to your needs.

We offer free consultations to analyze your document processing requirements and design an automation solution that saves your team hours of manual work each week.

  • Custom field extraction for your forms
  • Integration with your accounting software
  • Ongoing support and optimization

Stop Wasting Time on Manual Document Entry

Your team could spend those hours on strategic work instead of data entry. GrowwStacks will build and deploy this exact workflow for your business - customized to your document formats and systems.