The Map is Not the Territory: Navigating the Hidden Complexities of Payment Rails

payment Dec 10, 2024

A masterclass for senior engineers on the gap between payment system documentation and operational reality


This is the first in a series on payment systems engineering.
  1. This Map is Not the Territory: Why documentation and reality diverge
  2. From Debugging to Design: Mental models for understanding any payment flow
  3. Building for the 1%: Engineering exception handling as first-class work
  4. The FinTech Staff Engineer: Translating technical reality into business decisions

Each piece stands alone. Together, they form a framework for building reliable payment infrastructure at scale.


Introduction

Every payments engineer has had this moment: you read the documentation, implement the integration exactly as specified, test it thoroughly in UAT, watch all your test transactions succeed and then you deploy to production where everything falls apart in ways you never anticipated.

The staging environment lied to you. Or we could just say it told you a partial truth. Test environments give you the documented behavior: clean responses and predictable timing. Production gives you the operational reality: batch delays, ambiguous statuses, retry storms and settlement files that arrive when they feel like it.

The documentation is the map. Production is the territory and they diverge in ways that will cost you.

The moment I understood this distinction came during TryGrip App development. We were building a unified transaction layer that routed payments across multiple Nigerian financial providers. For traditional bank connections, we utilized an aggregator that had already supported pretty much all of the banks we needed to support in our initial offering, but for wallet providers and fintechs, we had to integrate directly with their APIs. Each provider had different response formats and timing characteristics, with failure modes we hadn't anticipated. Our UAT integrations worked flawlessly, every test transaction returned clean responses within milliseconds. And then we went live. One provider's documentation said "real-time settlement". Their actual settlement file arrived at 4am the next day. Another provider's API returned "success" for transactions that failed hours later via a batch reversal we hadn't built handling for. None of this showed up in testing because test environments don't simulate operational load or batch processing schedules, and they certainly don't replicate the downstream systems that only wake up in production.

The documentation wasn't wrong. It was just incomplete. It described the happy path, the system as designed, not the system as operated.

After years of building payment systems across African fintech—from multi-provider transaction routing to tax authority integrations—I've learned that the most valuable knowledge isn't in any API reference. It's in understanding the operational realities that documentation assumes you already know.


Part 1: The Authorization-Settlement Gap

What the Documentation Says

Most payment documentation presents a clean model:

  1. Customer initiates payment
  2. Authorization request sent
  3. Authorization approved/declined
  4. Settlement occurs
  5. Funds transferred

This model is dangerously incomplete.

What Actually Happens

Authorization and settlement are separate systems running on different timelines, operated by different entities with misaligned incentives.

Authorization is a real-time question: "Does this customer have the ability to pay this amount right now?"

Settlement is a batch process answer: "Here's what actually needs to move between institutions based on yesterday's (or last week's) authorizations".

The gap between these two creates every edge case you'll ever debug:

The Hold Amount Problem

When you authorize a ₦350,000 charge, the issuing bank places a "hold" on ₦350,000. But the hold amount might not equal to the final settlement amount. Holds also expire; typically 7-30 days depending on merchant category and card network (Visa and Mastercard have different rules) and when they do, those funds are released and might be spent elsewhere. Settlement can then fail because the hold expired and the funds are gone.

A note for those building in African markets: the hold/pre-auth model described here is primarily card-network behavior in mature markets. In Nigerian payment contexts, pre-authorization are far less common and the patterns differ significantly.

Architectural Implication: Never treat authorization as payment confirmation. Design your system to handle settlement failures on previously authorized transactions.

// Bad: Treating authorization as confirmation
if (authorizationResponse.Approved) {
    await FulfillOrder(order);  // Dangerous
}

// Better: Authorization starts a process, not ends one
if (authorizationResponse.Approved) {
    await CreatePendingFulfillment(order, authorizationResponse.AuthCode);
    // Actual fulfillment triggered by settlement confirmation
}

The Reversal Timing Problem

Reversals exist in two forms that your system must handle differently:

Void (pre-settlement): Cancels the authorization before settlement. The hold is released, it is relatively clean.

Refund (post-settlement): A completely new transaction in the opposite direction. Creates reconciliation complexity because you now have two transactions for one logical event.

The trap: Many APIs abstract those into a single "refund" endpoint. Internally, they're doing different things with different timelines. Your reconciliation logic must understand both.

The Chargeback Problem

Voids and refunds are merchant-initiated. Chargebacks are customer-initiated through their issuing banks, and they operate on completely different timelines and flows.

A cardholders can dispute a transaction up to 120 days after the transaction date (Visa) or 120 days from expected delivery (Mastercard). When they do, you enter an arbitration process that involves evidence submission deadlines, representment fees if you fight and lose, and provisional credits that appear and disappear from settlement files.

public enum TransactionDisputeStatus
{
    None,
    FirstChargeback,      // Customer disputed, funds provisionally debited
    Represented,          // You submitted evidence challenging the dispute
    SecondChargeback,     // Customer escalated after representment
    Arbitration,          // Card network is deciding
    Won,                  // Funds returned to you
    Lost                  // Funds permanently gone, plus fees
}

Chargebacks create reconciliation nightmares because the same transaction can appear multiple times in settlement files across months: the original sale, the chargeback debit, the representment credit, potentially another debit. Your reconciliation logic must track the full dispute lifecycle, not just individual settlement entries.


Part 2: The Intermediary Chain You didn't Know Existed

The Documented Model

flowchart LR A[Your App] --> B[Payment Gateway] --> C[Card Network] --> D[Issuing Bank]

The Reality

flowchart TB A[Your App] --> B[Payment Gateway] B --> C[Payment Processor] C --> D[Acquiring Bank] D --> E[Card Network Switch] E --> F[Card Network] F --> G[Issuing Processor] G --> H[Issuing Bank] H --> I[Customer's Account]

Each intermediary adds latency and can fail independently. They maintain their own reconciliation files and take fees (often hidden in the spread). Some may also subtly transform the data as it passes through.

Why This Matters for Your Architecture

Timeout Cascades

If your gateway timeout is 30 seconds, but the chain has 7 intermediaries each with potential retry logic, a single slow issuer can cause your request to timeout while the downstream request eventually succeeds. The result: money moves with no confirmation to your system, and a customer who thinks their payment failed but got charged anyway.

Solution: Implement idempotency keys that survive the entire chain, and design for "payment status unknown" as a first-class state.

public enum PaymentStatus
{
    Pending,
    Authorized,
    Declined,
    Settled,
    Failed,
    Unknown  // This is not an error state—it's a real state
}

public async Task<PaymentResult> ProcessPayment(PaymentRequest request)
{
    try
    {
        var result = await _gateway.Charge(request);
        return MapToPaymentResult(result);
    }
    catch (TimeoutException)
    {
        // Do NOT assume failure
        return new PaymentResult
        {
            Status = PaymentStatus.Unknown,
            RequiresReconciliation = true,
            IdempotencyKey = request.IdempotencyKey
        };
    }
}

The Reconciliation File Reality

Settlement doesn't happen through APIs. It happens through batch files exchanged between institutions where merchants receive files from acquirers at end of day, and actual fund movement happens T+1 or T+2. The formats vary wildly, from ISO 8583 and ISO 20022 to proprietary CSV and fixed-width files.
Your beautiful real-time API is actually a thin layer over a batch-file-driven backend. Design accordingly.

Part 3: ACH is Not What You Think

The Documentation Model

ACH (or local equivalents like BACS, SEPA, NIBSS) appears simple: submit a payment instruction and money moves. Done.

Operational Reality

ACH is a two-phase blind protocol with no real-time confirmation:

sequenceDiagram participant You participant YourBank as Your Bank participant ACH as ACH Operator participant RecBank as Receiving Bank participant Customer Note over You,Customer: Phase 1: Origination You->>YourBank: Submit payment file YourBank->>ACH: Forward to network ACH-->>You: File accepted ✓ Note right of You: You have no idea
if funds will arrive Note over You,Customer: Phase 2: Settlement (24-72 hours later) ACH->>RecBank: Deliver payment file alt Success RecBank->>Customer: Post to account else Return RecBank->>ACH: Return transaction ACH->>YourBank: Return notification YourBank->>You: Return file end

The catch: Returns can happen up to 60 calendar days later for unauthorized transactions or 2 business days from settlement (not origination) for insufficient funds. These are NACHA rules; BACS, SEPA and NIBSS each have their own return windows.

What This Means for Your System

You Cannot Confirm ACH Success in Real-Time

// This is a lie
public async Task<bool> SendAchPayment(AchRequest request)
{
    var result = await _achProvider.Send(request);
    return result.Success;  // This only means "we accepted your file"
}

// This is honest
public async Task<AchSubmissionResult> SendAchPayment(AchRequest request)
{
    var result = await _achProvider.Send(request);
    return new AchSubmissionResult
    {
        Submitted = result.Accepted,
        TraceNumber = result.TraceNumber,
        ExpectedSettlement = DateTime.UtcNow.AddBusinessDays(2),
        // Final status unknown until return window closes
        FinalStatusAvailable = DateTime.UtcNow.AddBusinessDays(5)
    };
}

Your Account Balance is a Projection, Not a Fact

If you're operating accounts that receive ACH credits, your posted balance includes credits that might still be returned, and your available balance should account for that return risk. Offering "instant" access to ACH funds is a risk decision, not a technical capability.


Part 4: SWIFT and the Correspondent Banking Maze

The Myth

SWIFT is the "international wire transfer network"

The Reality

SWIFT is a messaging network, not a payment network. It tells banks to move money. The actual movement happens through correspondent banking relationships that form an invisible mesh.

When you send £1,000 from a UK account to a Nigerian Naira account:

flowchart TB A[Your Bank - UK] -->|MT103 Customer Transfer| B[Correspondent Bank
GBP Nostro] B -->|Internal ledger movement| B B -->|MT202 COV Cover Payment| C[Correspondent Bank
NGN Nostro] C -->|Internal ledger movement| C C -->|MT103| D[Receiving Bank - Nigeria] D -->|Credit| E[Beneficiary Account]

Note: The originating bank sends an MT103 (customer credit transfer), but inter-correspondent legs often use MT202 COV (cover payments) or MT205. The same message type doesn't flow end-to-end.

Each hop takes time (hours to days) and deducts fees unpredictably. Any hop can reject the payment for compliance reasons or request additional information like beneficiary details or purpose codes.

Architectural Implications

Fee Estimation is Probabilistic

You cannot know the exact fee for a cross-border wire in advance. Correspondent banks deduct fees from the principal, and the correspondent chain itself is chosen at runtime by your bank. FX conversion might happen at different points along the chain, each with different spreads.

The charging option (SHA/BEN/OUR) fundamentally changes who absorbs these intermediary fees. With SHA (shared), both parties pay their local fees but intermediary deductions come from the principal. With BEN (beneficiary), all fees come from the transfer amount. With OUR (sender pays all), you're committing to cover fees you can't fully predict. This is a common source of "the amount that arrived doesn't match what we sent" confusion.

Design choice: You can quote a fee range or maximum fee upfront, or pass through the fees post-settlement. Either way, expect variance.

Status Tracking is Limited

SWIFT gpi (Global Payments Innovation) has improved tracking, but many corridors still operate as a black box: you know when you sent it, and you find out 1-5 days later whether it arrived or was returned. Build your customer experience around this reality.


Part 5: How These Realities Shape API Design

Understanding payment rail realities leads to specific API design decisions:

  1. Separate Initiation from Confirmation
// Initiation endpoint - returns immediately
POST /payments
{
    "amount": 10000,
    "currency": "NGN",
    "destination": {...}
}
Response: {
    "payment_id": "pay_123",
    "status": "pending",
    "status_url": "/payments/pay_123/status"
}

// Status endpoint - polled or webhooks
GET /payments/pay_123/status
Response: {
    "status": "settled",  // or "pending", "failed", "unknown"
    "settled_amount": 9950,  // May differ from initiated amount
    "fees_deducted": 50,
    "settlement_timestamp": "..."
}
  1. Design for Webhook Reliability:

If you're arguing against trusting real-time API responses, the natural question is: how do you reliably receive async status updates? Webhooks are the answer, but they come with their own territory-vs-map problems.

public class WebhookHandler
{
    public async Task<IActionResult> HandlePaymentWebhook(WebhookPayload payload)
    {
        // Idempotency: webhooks can be delivered multiple times
        if (await _webhookStore.HasProcessed(payload.WebhookId))
            return Ok(); // Acknowledge but don't reprocess

        // Ordering: webhooks may arrive out of order
        var payment = await _paymentStore.Get(payload.PaymentId);
        if (payment.LastEventTimestamp > payload.Timestamp)
            return Ok(); // Ignore stale event

        await ProcessWebhookEvent(payment, payload);
        await _webhookStore.MarkProcessed(payload.WebhookId);

        return Ok();
    }
}

Your webhook consumer must handle delivery failures (the provider retries, you get duplicates), ordering guarantees (or lack thereof), and the gap between webhook delivery and your acknowledgment. Treat webhook consumption as its own reconciliation problem.

  1. Design for Idempotency at Every Layer
// Client provides idempotency key
POST /payments
Headers: {
    "Idempotency-Key": "client-generated-uuid"
}

// Your system maintains idempotency through the entire chain
public async Task<PaymentResult> ProcessPayment(PaymentRequest request)
{
    // Claim-then-process: atomically claim the key before processing
    // This prevents race conditions when duplicate requests arrive simultaneously
    var claimResult = await _idempotencyStore.TryClaimAsync(request.IdempotencyKey);

    if (!claimResult.IsNew)
    {
        // Another request already claimed this key
        return claimResult.ExistingResult
            ?? await WaitForCompletion(request.IdempotencyKey);
    }

    try
    {
        // Generate downstream idempotency keys
        var downstreamKey = GenerateDownstreamKey(request.IdempotencyKey);

        // Process with downstream key
        var result = await _provider.Process(request, downstreamKey);

        // Store result for future idempotent requests
        await _idempotencyStore.SetResult(request.IdempotencyKey, result);

        return result;
    }
    catch (Exception)
    {
        // Release the claim on failure so retries can proceed
        await _idempotencyStore.ReleaseClaim(request.IdempotencyKey);
        throw;
    }
}
  1. Embrace Eventual Consistency
// Your payment aggregate should model real states
public class Payment
{
    public PaymentStatus Status { get; private set; }
    public Money InitiatedAmount { get; }
    public Money? SettledAmount { get; private set; }  // Nullable until settled
    public Money? FeesDeducted { get; private set; }

    public DateTimeOffset Initiated { get; }
    public DateTimeOffset? Settled { get; private set; }
    public DateTimeOffset? Reconciled { get; private set; }  // When we confirmed with source of truth

    // The amount the customer can rely on
    public Money ConfirmedAmount => Status == PaymentStatus.Reconciled
        ? SettledAmount!.Value
        : throw new InvalidOperationException("Payment not yet confirmed");
}

// Currency precision matters: different currencies have different decimal places
public readonly struct Money
{
    public decimal Amount { get; }
    public string Currency { get; }
    public int DecimalPlaces => Currency switch
    {
        "JPY" or "KRW" => 0,      // No decimals
        "KWD" or "BHD" => 3,      // Three decimals
        _ => 2                      // Most currencies
    };

    public long MinorUnits => (long)(Amount * (decimal)Math.Pow(10, DecimalPlaces));
}

The currency precision problem is a classic payments footgun. Store amounts as minor units (cents, not dollars) and derive display formatting from the currency code. Assuming two decimal places everywhere will eventually corrupt data.

  1. Build Reconciliation as a First-Class Concern
public interface IReconciliationService
{
    // Daily reconciliation against provider settlement files
    Task ReconcileSettlementFile(Stream settlementFile, DateOnly settlementDate);

    // Identify payments in "unknown" state that need investigation
    Task<IEnumerable<Payment>> GetPaymentsRequiringInvestigation();

    // Match our records against external source of truth
    Task<ReconciliationReport> GenerateReport(DateRange range);
}

The real complexity is the three-way match: your records, gateway records, and settlement files. Here's what that actually looks like:

public class ReconciliationResult
{
    // Happy path: all three sources agree
    public List<Payment> FullyReconciled { get; set; }

    // Settlement file shows transaction we have no record of
    // Action: investigate, may need to create retroactive record
    public List<SettlementEntry> UnmatchedInSettlement { get; set; }

    // We have a record, but it's missing from settlement
    // Action: check if it failed, was voided, or will appear tomorrow
    public List<Payment> MissingFromSettlement { get; set; }

    // Amounts don't match (common with FX, fees, partial captures)
    // Action: determine variance source, update records
    public List<(Payment Ours, SettlementEntry Theirs)> AmountMismatches { get; set; }

    // Gateway says success, settlement says failure (or vice versa)
    // Action: trust settlement, investigate gateway discrepancy
    public List<(Payment Ours, SettlementEntry Theirs)> StatusMismatches { get; set; }
}

Each category requires different handling. The "settlement file shows transaction we don't have" case is particularly painful because it might be a duplicate charge, a test transaction that hit production, or a genuine gap in your logging.


Part 6: Practical Debugging Framework

When a payment fails in production, here's how we navigate the territory:

  1. Identify Where in the Chain the Failure Occurred
Authorization failed? → Issue is real-time, check gateway/processor logs
Settlement failed? → Issue is batch, check reconciliation files
  1. Check for State Mismatches
Your DB says "authorized" but customer says "declined"
→ Check if there was a subsequent void you didn't receive
→ Check if authorization expired before settlement

Also look for duplicate transactions with opposite outcomes, these are more common than you'd expect.

  1. Follow the Money, Not the Messages

API responses tell you what the system thinks happened. Settlement files tell you what actually happened. When they disagree, trust the settlement file.


Conclusion: Living with the Gap

The documentation for payment systems describes an idealized model: clean & deterministic. The reality is a distributed system with eventual consistency and multiple failure modes, where batch processes wear real-time API costumes.

The engineers who thrive in payments don't memorize API references. They understand that authorization is a question rather than an answer, and that "unknown" is a valid state rather than a bug. They design systems where initiation and confirmation are separate concerns, where reconciliation and not API responses, is the source of truth. And they ensure idempotency survives the entire intermediary chain.

The map helps you start the journey. Understanding the territory is what gets you there.


What payment rail realities have surprised you in production? I'd especially love to hear from engineers building in markets where the documentation gap is widest.

Tags

Views: Loading...