Somewhere in most companies, a person opens invoice PDFs one at a time and types numbers into a spreadsheet. It is slow, it is error-prone, and it is completely unnecessary in 2026.

Here is the workflow I build to kill that task.

The flow

  1. Gmail trigger watches for emails with attachments and "invoice" in the subject.
  2. Extract PDF text pulls the raw text out of the attachment.
  3. Claude extracts structured fields — vendor, invoice number, dates, currency, subtotal, tax, total, and every line item.
  4. A validation step checks whether subtotal plus tax actually equals the total.
  5. Anomalies get flagged to Finance in Slack for a human to check.
  6. Clean invoices append straight to your accounting sheet or tool.

Getting reliable extraction

The trick is forcing strict JSON and telling Claude exactly what to do with missing fields:

Extract invoice data. Return ONLY JSON:
{vendor, invoice_number, date, due_date,
currency, subtotal, tax, total,
line_items:[{desc, qty, unit_price, amount}]}.
Use null for missing fields.

The null for missing fields instruction matters. Without it, models invent values to fill the shape. With it, you get honest gaps you can flag.

The anomaly check is what makes this safe to trust. The workflow does simple math — does subtotal plus tax equal total within a cent? If not, a human looks. Everything else flows through untouched.

Where the data goes

The template logs to Google Sheets so it works out of the box, but the last node is a drop-in slot for QuickBooks, Xero, or NetSuite. Swap the node, keep the rest.

Why self-hosted matters here

Invoices contain sensitive financial data. Running this on your own n8n instance means vendor names, amounts, and bank details never pass through a third-party automation platform. That is often the difference between a workflow Finance will approve and one they will not.

Want the workflow, not just the article?

Import the full invoice-processing workflow and point the last node at your accounting tool.

Download the JSON