Skip to content

Mortgage Services Firm

Mortgage Underwriting Process Automation

Intelligent Data Extraction Engine


The client had an existing domain-specific business line that used a traditional OCR-based solution, which needed predefined templates for data extraction. When the client wanted to expand its business into another related domain and use its existing solution, the new domain’s digital documents brought in the following challenges to the existing solution workflow:

  • The endless variations of structures in the new domain’s digital documents made it impossible to come up with predefined templates, almost requiring each document to have its own template and incurring hours of manual work for document identification
  • Verifying data extraction demanded employing hundreds of manual verifiers, increasing operational costs to the point of making the business expansion not viable


Without the business capability to automate the data extraction for the new domain’s digital documents, the client:

  • Could not expand the product line to the related domain market, which is huge
  • Could potentially lose the current customer base to the competition
  • Would incur prohibitive operational costs if they tried to enter the new market


Using low-level OCR libraries, we architected and implemented a data extraction algorithm and built domain intelligent automation to identify and extract data for the new domain’s documents, regardless of structural variance. Our solution enabled the client to:

  • Expand into the targeted market with a cost-effective business solution
  • Possess an IP that increased its market value and differentiated from its competitors
  • Adopt any new domain market with the solution’s plug-and-play model and learning capabilities


For a leading mortgage services provider, we designed and developed an intelligent end-to-end solution for data extraction from digital documents. With our hyper automation solution, we established a workflow and business process by which the client was able to:

  • Read digital documents of infinite structural variations, saving at least 80 man-hours of work per variation
  • Gradually reduce dependency on hundreds of manual back office verifiers identifying documents and verifying extraction