Every finance team knows the end-of-month scramble. Hundreds of PDFs from dozens of suppliers, each in a different format. Someone has to open each one, find the invoice number, match it to a PO, check the amounts, and flag anything that doesn't line up. In a busy operation this takes a full week every month. It's error-prone, it's slow, and it's one of the most automatable processes in the business.
The problem with manual invoice matching
The core issue is that supplier invoices come in every format imaginable. Some are well-structured PDFs with clear fields. Others are scanned paper documents. Others are email attachments with amounts buried in the text. Traditional OCR tools handle structured documents reasonably well but fail on anything non-standard. When 30% of your invoices are non-standard, that's 30% still requiring manual work.
The second problem is matching logic. Even when you extract the data correctly, you need to compare invoice line items against open POs, handle partial deliveries, catch duplicate invoices, and flag amount discrepancies above a threshold. This logic is complex enough that most teams just do it manually.
How AI document extraction works
Claude's vision API can read any PDF format - structured, unstructured, scanned, or mixed. You pass the invoice as an image or PDF and ask Claude to extract specific fields: invoice number, supplier name, date, line items, quantities, unit prices, totals, tax amounts, payment terms. Claude returns structured JSON regardless of what the original document looks like. Accuracy on real-world invoices is 97-99% on standard fields.
This is the key shift that makes invoice automation practical. You're not relying on rigid template matching that breaks when a supplier changes their PDF layout. You're using a model that reads the document the way a human would and extracts what you need.
The three-step pipeline
The system has three stages. First, ingest: invoices arrive via email attachment, a shared folder, or a supplier portal. An automation (Make or n8n) detects new documents and routes them to the extraction step. Second, extract: Claude reads each document and returns structured JSON with all relevant fields. The extracted data goes into a staging database. Third, match and flag: a matching script compares each invoice against open POs in your ERP or database. Matches above 98% confidence are auto-approved. Discrepancies get flagged with a Telegram or Slack alert showing exactly what doesn't match.
What the system handles automatically:
- Invoices that exactly match a PO - auto-approved with audit log
- Invoices within tolerance range (e.g. within 2% of PO amount) - auto-approved with note
- Duplicate invoice detection based on invoice number and supplier
- Invoices with no matching PO - flagged for manual review
- Amount discrepancies above threshold - flagged with delta highlighted
- Missing required fields - flagged for supplier follow-up
Tools used
The typical stack: Claude API for document extraction (vision endpoint), Python for matching logic, PostgreSQL for staging invoices and match results, Make or n8n for orchestrating the pipeline end-to-end. Your ERP connects via API (SAP, NetSuite, QuickBooks, and most mid-market ERPs have REST APIs). The automation runs on top of your ERP - no migration, no replacement, no renegotiating your enterprise license.
What to do with exceptions
The 2-3% of invoices that can't be auto-matched go into a review queue. The finance team member gets a Telegram message with the invoice, the PO it was matched against, and the specific discrepancy. One click to approve, one click to flag for supplier correction. The exception review that used to take two days takes two hours because the system has already done the matching work.
Real results
For clients running this system: auto-match rate lands at 97%+ within the first month as you tune the matching thresholds. Monthly close time drops from 5 days to 1 day for the AP component. Error rate on processed invoices drops close to zero because the system catches duplicates and discrepancies that humans miss when processing at speed.
What clients typically see after 90 days:
- 97%+ invoices processed without human involvement
- Monthly close: 5 days reduced to 1 day for AP
- Duplicate invoices caught: averages 1-3 per month that previously slipped through
- Staff time saved: 15-30 hours per month depending on invoice volume
- Build cost: $4,000-8,000 depending on ERP integration complexity
- Monthly running cost: $150-300 in API and tool fees
How to get started
The first step is auditing your current invoice volume and format mix. How many invoices per month? What percentage are PDFs vs scans vs structured exports? What does your current matching process look like step by step? That audit usually takes 30 minutes and surfaces exactly what to build. From there, a working pilot processing real invoices typically takes 2-3 weeks.
If your team is still doing invoice matching manually, a free audit call maps your current process and tells you exactly what an automated version would look like. Book at 2pizza.team/audit.
Free 30-min audit. We tell you what to automate first and what it would cost.