The legacy OCR pipeline was a brittle Tesseract chain hitting 71% accuracy on real-world invoices. Misreads cascaded into AP automation errors that customers had to reconcile by hand.
We built a hybrid pipeline — Textract for layout, a fine-tuned transformer for field extraction, with confidence-aware fallbacks to human-in-the-loop. Every release went through a regression eval against a 4,000-invoice gold set.
94% accuracy in production. Customer support tickets on AP errors dropped by more than half. Tooling is now testable and the team can iterate weekly.
94%
Field accuracy
−55%
AP error tickets
4,000
Gold-set invoices