close

DEV Community

Irvan Gerhana Septiyana
Irvan Gerhana Septiyana

Posted on

Building a Transaction Intelligence System: From MT950 Bank Statements to Automated Reconciliation

Building a Transaction Intelligence System: From MT950 Bank Statements to Automated Reconciliation

Why We Built It

Most AI demos focus on chatbots, copilots, or AI agents.

However, one of the largest automation opportunities inside enterprises is much less glamorous:

Financial reconciliation.

Every day, finance teams receive thousands of transactions from bank statements.

A transaction may look like this:

PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157
Enter fullscreen mode Exit fullscreen mode

For a human accountant, the meaning is obvious.

For a machine, it's just text.

The challenge is transforming transaction narratives into structured business knowledge.

This article explains how I built a Transaction Intelligence System that converts raw MT950 bank statements into machine-readable entities that can be automatically reconciled against invoices, contracts, and customer records.


The Real Problem

Many people assume payment gateways solve reconciliation.

They don't.

Payment gateways solve payment collection.

Enterprise reconciliation requires answering different questions:

  • Which customer made the payment?
  • Which invoice is being settled?
  • Which contract governs the transaction?
  • Is this a partial payment?
  • Is the payment amount correct?

Those answers don't exist in the payment itself.

They exist in business context.


System Architecture

The architecture consists of multiple layers:

MT950 Statement
       ↓
Canonical Transformation
       ↓
Named Entity Recognition
       ↓
Entity Resolution
       ↓
Reconciliation Engine
       ↓
Automation API
Enter fullscreen mode Exit fullscreen mode

Each layer solves a specific problem.


Step 1: Synthetic Enterprise Dataset Generation

One of the biggest challenges was obtaining training data.

Real enterprise financial data is typically unavailable due to privacy restrictions.

Instead, I generated synthetic datasets containing:

Customer Master

{
  "customer_id": "CUS-00002",
  "legal_name": "ALPHABRIDGE SOLUTIONS"
}
Enter fullscreen mode Exit fullscreen mode

Contract Master

{
  "contract_id": "CNT-2024-587",
  "customer_id": "CUS-00002"
}
Enter fullscreen mode Exit fullscreen mode

Invoice Master

{
  "invoice_number": "MFG-INV-000157",
  "contract_id": "CNT-2024-587"
}
Enter fullscreen mode Exit fullscreen mode

MT950 Statements

PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157
Enter fullscreen mode Exit fullscreen mode

This created a complete ground-truth environment for training and evaluation.


Step 2: Canonical Transformation

Raw MT950 files are difficult to work with.

A transaction:

:61:240226C3979,85NTRFNONREF
:86:PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157
Enter fullscreen mode Exit fullscreen mode

is transformed into a canonical structure:

{
  "transaction_id": "...",
  "currency": "EUR",
  "amount": 3979.85,
  "narrative": "PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157"
}
Enter fullscreen mode Exit fullscreen mode

This becomes the standardized input for downstream processing.


Step 3: Taxonomy Design

Before training a model, we must define what matters.

The taxonomy includes:

COMPANY
INVOICE
CONTRACT
PURCHASE_ORDER
PAYMENT_TYPE
Enter fullscreen mode Exit fullscreen mode

Example:

PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157
Enter fullscreen mode Exit fullscreen mode

becomes:

{
  "COMPANY": "ALPHABRIDGE SOLUTIONS",
  "INVOICE": "MFG-INV-000157",
  "PAYMENT_TYPE": "PART PMT"
}
Enter fullscreen mode Exit fullscreen mode

This taxonomy becomes the language of the system.


Step 4: Automated Prelabeling

Manual annotation does not scale.

Instead, I built a prelabel engine using:

  • Regular expressions
  • Master data lookups
  • Heuristic rules

Example:

invoice_pattern = r"[A-Z]{3}-INV-\d+"
Enter fullscreen mode Exit fullscreen mode

This automatically generates initial annotations before human review.

The result:

  • Faster annotation
  • Higher consistency
  • Reduced labeling cost

Step 5: Doccano Annotation

Prelabeled data is imported into Doccano.

Human reviewers validate:

  • Company names
  • Invoice references
  • Contract identifiers
  • Purchase orders
  • Payment types

This creates the ground truth required for model training.


Step 6: Fine-Tuning a Financial NER Model

The training pipeline:

Doccano
    ↓
BIO Conversion
    ↓
IndoBERT
    ↓
Fine-Tuning
Enter fullscreen mode Exit fullscreen mode

Target entities:

COMPANY
INVOICE
CONTRACT
PURCHASE_ORDER
PAYMENT_TYPE
Enter fullscreen mode Exit fullscreen mode

The objective is not generic NER.

The objective is enterprise transaction understanding.


Step 7: Entity Resolution

Entity extraction alone is not enough.

For example:

ALPHABRIDGE
Enter fullscreen mode Exit fullscreen mode

must resolve to:

{
  "customer_id": "CUS-00002",
  "legal_name": "ALPHABRIDGE SOLUTIONS"
}
Enter fullscreen mode Exit fullscreen mode

The resolution engine uses:

Exact Matching

ALPHABRIDGE SOLUTIONS
Enter fullscreen mode Exit fullscreen mode

Alias Matching

ALPHABRIDGE LTD
Enter fullscreen mode Exit fullscreen mode

Fuzzy Matching

ALPHA BRIDGE
Enter fullscreen mode Exit fullscreen mode

Embedding Similarity

For more difficult cases.


Step 8: Reconciliation Engine

Once entities are resolved:

{
  "customer_id": "CUS-00002",
  "invoice_number": "MFG-INV-000157"
}
Enter fullscreen mode Exit fullscreen mode

the reconciliation engine validates:

  • Customer ownership
  • Contract relationships
  • Invoice existence
  • Amount consistency

Possible outcomes:

AUTO_RECONCILED
PARTIAL_MATCH
OVERPAYMENT
UNDERPAYMENT
REVIEW_REQUIRED
Enter fullscreen mode Exit fullscreen mode

Step 9: API Layer

The final system exposes endpoints such as:

POST /reconcile/text
Enter fullscreen mode Exit fullscreen mode

Input:

{
  "narrative": "PART PMT ALPHABRIDGE SOLUTIONS MFG-INV-000157"
}
Enter fullscreen mode Exit fullscreen mode

Output:

{
  "customer_id": "CUS-00002",
  "invoice_number": "MFG-INV-000157",
  "status": "AUTO_RECONCILED"
}
Enter fullscreen mode Exit fullscreen mode

This allows integration with:

  • ERP systems
  • Accounting platforms
  • Finance operations workflows
  • AI agents

Lessons Learned

Building the model was not the hardest part.

The hardest parts were:

Data Quality

Poor data produces poor automation.

Taxonomy Design

The model only understands the concepts you define.

Canonical Data

Without canonical structures, downstream automation becomes fragile.

Entity Resolution

Extraction without resolution has limited business value.


Final Thoughts

Most enterprise automation projects focus on AI models.

In my experience, the real challenge is business understanding.

The architecture that matters most is:

Raw Data
↓
Canonical Data
↓
Taxonomy
↓
NER
↓
Resolution
↓
Decision Intelligence
↓
Automation
Enter fullscreen mode Exit fullscreen mode

AI is only one layer in the stack.

The organizations that succeed with enterprise AI will be the ones that invest in data foundations, business taxonomies, and transaction intelligence before they invest in autonomous agents.

If you're building AI for enterprise operations, start with understanding before automation.

Top comments (0)