Building for the 1%: Engineering Handling for Exceptions and Disputes
How to design payment systems where the happy path doesn't cripple the exception path
This is the third in a series on payment systems engineering.
- This Map is Not the Territory: Why documentation and reality diverge
- From Debugging to Design: Mental models for understanding any payment flow
- Building for the 1%: Engineering exception handling as first-class work
- The FinTech Staff Engineer: Translating technical reality into business decisions
Each piece stands alone. Together, they form a framework for building reliable payment infrastructure at scale.
Introduction: The Night That Changed Everything
Our monitoring system raised an alarm in the early hours of a random Sunday morning, over 40 payments had been settled but our database showed as "Failed".
The reconciliation discrepancy rate had jumped from 0.02% to 4% in a single batch.
That night thought me painful lessons about exception handling.
Network reversals can arrive days after settlement, long after we've told the customer "payment successful" and fulfilled their order. Our "failed" status was irreversible: once a payment was marked failed, our system refused further state transitions. The settlement file as well as the provider's API said succeeded but our database said failed. We had no way to reconcile the divergence without manual SQL.
Worse, our ledger couldn't handle negative entries, When we tried to reverse a failed-but-actually-succeeded transaction, our accounting system threw an exception. The money had moved but our records couldn't reflect it.
This post is about what we built to never have that Sunday again.
Here's what separates payment engineers from other backend engineers: you will spend 80% of your career on 5% of transactions. The happy path is table stakes while the exception path is where you earn your reputation.
If that doesn't excite you, build something else.
Part 1: The Exception Taxonomy
Before you can handle exceptions well, you need to categorize them. Not by error code but by who initiates and when it happens relative to settlement.
Category 1: Customer-Initiated Reversals
These are reversals where the customer (or their bank) initiates a change after a successful transaction.
| Type | Refund | Chargeback | Dispute |
|---|---|---|---|
| Initiated by | Merchant | Customer via bank | Formal claim with evidence |
| Customer says | "I want to return this" | "I don't recognize this" | Varies |
| Timeline | Immediate | Up to 120 days | Varies |
| Cost to you | Transaction fee | $15–100 + amount | Time + investigation |
| Risk level | Controllable | High | Medium |
Category 2: Merchant/System-Initiated Reversals
These are reversals you initiate due to business logic or system issues
| Type | When | What Happens | Common Use Case |
|---|---|---|---|
| Void | Pre-settlement | Auth hold released | Customer changed mind |
| Cancel | Pre-capture | Auth hold released | Order cancelled |
| Partial Capture | At capture | Capture < auth amount | Tips, shipping adjustments |
| Partial Refund | Post-settlement | Return portion | Partial return |
| Adjustment | Post-settlement | New correcting transaction | Settlement fix |
| Credit | Any time | Add value | Goodwill gesture |
Category 3: Network/Provider-Initiated Events
These happen outside your control, the network or provider changes the state of a transaction.
| Type | What Happens | Your Response |
|---|---|---|
| Auth Expiry | Auth timed out, hold released | Re-authorize or cancel order |
| Network Reversal | Network corrects error after settlement | Update ledger, possibly contact customer |
| Settlement Adjustment | Provider changes settled amount | Reconcile and adjust records |
| Retrieval Request | Bank requests transaction info (pre-chargeback) | Provide evidence quickly |
| Fraud Alert | Network flags fraud, transaction reversed | Review, potentially block customer |
| Account Update | Card replaced/expired | Request updated payment method |
Part 2: Architecture for Exception Handling
The Anti-Pattern: Exception as Error
The most common mistake is treating exceptions like errors:
// The anti-pattern
public async Task<PaymentResult> ProcessPayment(PaymentRequest request)
{
try
{
var result = await _gateway.Charge(request);
return result;
}
catch (ChargebackException ex)
{
_logger.LogError(ex, "Chargeback occurred");
throw; // Now what? This isn't your error
}
}Exceptions aren't errors, they are alternative transactions stats that require handling workflows.
A chargeback isn't a bug. It's a business event with legal and financial consequences. Your system should handle it with the same rigor as a successful sale.
The Pattern: Exception as State Machine
Model your payment as a state machine where exceptions are valid state transitions:
public class Payment
{
public PaymentState State { get; private set; }
// Valid transitions
private static readonly Dictionary<PaymentState, PaymentState[]> ValidTransitions = new()
{
[PaymentState.Authorized] = new[]
{
PaymentState.Captured,
PaymentState.Voided,
PaymentState.Expired,
PaymentState.AuthorizationReversed // Network can reverse
},
[PaymentState.Captured] = new[]
{
PaymentState.Settled,
PaymentState.CaptureReversed // Network can reverse
},
[PaymentState.Settled] = new[]
{
PaymentState.Refunded,
PaymentState.PartiallyRefunded,
PaymentState.ChargebackInitiated,
PaymentState.SettlementAdjusted // Provider can adjust
},
[PaymentState.ChargebackInitiated] = new[]
{
PaymentState.ChargebackWon,
PaymentState.ChargebackLost,
PaymentState.ChargebackAccepted // We accept without fighting
},
// ... etc.
};
public void TransitionTo(PaymentState newState, string reason)
{
if (!CanTransitionTo(newState))
throw new InvalidStateTransitionException(State, newState);
var previousState = State;
State = newState;
AddDomainEvent(new PaymentStateChanged(Id, previousState, newState, reason));
}
}
Green states (Authorized -> Captured -> Settled -> Completed) are the happy path. Everything else is exception handling.
public abstract record PaymentExceptionEvent
{
public Guid PaymentId { get; init; }
public DateTimeOffset OccurredAt { get; init; }
public string Source { get; init; } // "customer", "merchant", "network", "provider"
public Money OriginalAmount { get; init; }
}
public record ChargebackInitiated : PaymentExceptionEvent
{
public string ChargebackId { get; init; }
public string ReasonCode { get; init; }
public string ReasonDescription { get; init; }
public DateTimeOffset ResponseDeadline { get; init; }
public Money DisputedAmount { get; init; }
}
public record RefundRequested : PaymentExceptionEvent
{
public Money RefundAmount { get; init; }
public string RefundReason { get; init; }
public bool IsPartial => RefundAmount < OriginalAmount;
}
public record NetworkReversal : PaymentExceptionEvent
{
public string ReversalCode { get; init; }
public string NetworkReferenceId { get; init; }
public bool WasSettled { get; init; }
}Part 3: Handling Specific Exceptions
Chargebacks: The Complete Flow
Chargebacks are the most complex exception. Here's the full handling architecture:
public class ChargebackHandler
{
public async Task HandleChargebackReceived(ChargebackReceivedEvent @event)
{
var payment = await _paymentRepository.Get(@event.PaymentId);
// 1. Record the chargeback
var chargeback = new Chargeback
{
PaymentId = payment.Id,
ChargebackId = @event.ChargebackId,
ReasonCode = @event.ReasonCode,
Amount = @event.DisputedAmount,
ReceivedAt = @event.OccurredAt,
ResponseDeadline = @event.ResponseDeadline,
Status = ChargebackStatus.Received
};
await _chargebackRepository.Save(chargeback);
// 2. Update payment state
payment.TransitionTo(PaymentState.ChargebackInitiated,
$"Chargeback {chargeback.ChargebackId} received");
// 3. Determine response strategy
var strategy = await _chargebackStrategyEngine.Determine(payment, chargeback);
// 4. Execute strategy
switch (strategy.Decision)
{
case ChargebackDecision.AutoAccept:
await AcceptChargeback(chargeback, strategy.Reason);
break;
case ChargebackDecision.AutoRepresent:
await InitiateRepresentment(chargeback, strategy.Evidence);
break;
case ChargebackDecision.ManualReview:
await QueueForManualReview(chargeback, strategy.Reason);
break;
}
// 5. Notify relevant parties
await _notificationService.NotifyChargebackReceived(payment, chargeback);
}
}SENIOR ENGINEER'S LENS: Chargeback Economics
Chargebacks aren't just technical problems, they are business economics expressed in code:
// Why this matters to the business
public class ChargebackBusinessImpact
{
public decimal WinRateThreshold => 0.65m; // Below this, WIN fees apply
public decimal ResponseRateThreshold => 0.95m; // If you don't respond, automatic loss
public Money AverageProcessingCost => new Money(15, "USD"); // Per chargeback
public Money AverageRepresentmentCost => new Money(50, "USD"); // Staff time + docs
// Financial calculation: should we fight?
public bool CanMeetRepresentDeadline(Chargeback chargeback)
{
return DaysUntilDeadline > _evidenceGatheringTime + _reviewQueueTime;
}
}At one company, we automated evidence gathering for common chargeback reasons. For "customer did not recognize transaction," we automatically bundled order history, shipping confirmation, IP geolocation, and customer communication logs. What used to take 45 minutes per chargeback now took seconds. Our representment rate increased 40% because we stopped losing deadlines.
Network Reversals: The Silent Killer
Network reversals happen when the card network corrects a transaction, often days later without warning and after you've already shipped the product.
public class NetworkReversalHandler
{
public async Task HandleNetworkReversal(NetworkReversalEvent @event)
{
var payment = await _paymentRepository.GetByProviderReference(@event.NetworkReferenceId);
if (payment == null)
{
// Orphan reversal — we don't have the original transaction
await _alertService.RaiseOrphanReversalAlert(@event);
await _orphanReversalRepository.Save(@event);
return;
}
// Check if this reversal has already been processed
var existing = await _reversalRepository.GetByNetworkReference(@event.NetworkReferenceId);
if (existing != null)
{
_logger.LogWarning("Duplicate network reversal received: {Reference}",
@event.NetworkReferenceId);
return; // Idempotent
}
// Determine the type of reversal
var reversalType = ClassifyReversal(@event, payment);
switch (reversalType)
{
case ReversalType.DuplicateCorrection:
await HandleDuplicateCorrection(payment, @event);
break;
case ReversalType.LateDecline:
await HandleLateDecline(payment, @event);
break;
case ReversalType.FraudReversal:
await HandleFraudReversal(payment, @event);
break;
case ReversalType.TechnicalError:
await HandleTechnicalErrorReversal(payment, @event);
break;
}
// Update financial records
await _ledgerService.RecordReversal(payment, @event);
// Potentially contact customer
if (payment.WasDelivered)
{
await _customerService.NotifyPostDeliveryReversal(payment);
await _recoveryService.InitiateRecovery(payment);
}
}
}Partial Captures: More Common Than You Think
Partial captures occur when you settle less than the authorized amount. Common examples:
- Hotels: Authorize full stay + incidentals, settle actual room charges
- Shipping: Authorize estimated shipping, settle actual cost
- Item Unavailable: Customer ordered 3 items, only 2 in stock
The complexity: Many payment providers treat partial capture as a separate transaction. Your reconciliation system must understand that one authorization can produce multiple settlements.
public class CaptureService
{
public async Task<CaptureResult> Capture(Guid paymentId, Money? amount = null)
{
var payment = await _paymentRepository.Get(paymentId);
// Determine capture amount
var captureAmount = amount ?? payment.AuthorizedAmount;
var isPartial = captureAmount < payment.AuthorizedAmount;
var uncapturedAmount = payment.AuthorizedAmount - captureAmount;
// Validate
if (captureAmount > payment.AuthorizedAmount)
throw new CaptureAmountExceedsAuthorizationException();
if (captureAmount <= Money.Zero)
throw new InvalidCaptureAmountException();
// Execute capture
var result = await _gateway.Capture(new CaptureRequest
{
PaymentId = payment.ProviderPaymentId,
Amount = captureAmount
});
if (result.Success)
{
payment.RecordCapture(captureAmount, isPartial);
if (isPartial)
{
// Release the uncaptured authorization
await _gateway.VoidPartial(payment.ProviderPaymentId, uncapturedAmount);
payment.AddEvent(new PartialCaptureCompleted
{
CapturedAmount = captureAmount,
ReleasedAmount = uncapturedAmount
});
}
}
return result;
}
}Part 4: The Ledger Problem
Every exception impacts your financial records. You need a ledger that can handle reversals cleanly, and never deletes or overrides.
Double-Entry for Payments
public class PaymentLedger
{
public async Task RecordCapture(Payment payment)
{
await _ledger.Record(new LedgerEntry[]
{
// Customer paid us (or we captured their authorization)
new() { Account = AccountType.AccountsReceivable, Amount = payment.Amount, Type = EntryType.Debit },
// Revenue recognized
new() { Account = AccountType.Revenue, Amount = payment.Amount, Type = EntryType.Credit }
}, $"Capture for payment {payment.Id}");
}
public async Task RecordRefund(Payment payment, Money refundAmount)
{
await _ledger.Record(new LedgerEntry[]
{
// Reverse the receivable
new() { Account = AccountType.AccountsReceivable, Amount = refundAmount, Type = EntryType.Credit },
// Reverse the revenue
new() { Account = AccountType.Revenue, Amount = refundAmount, Type = EntryType.Debit }
}, $"Refund for payment {payment.Id}");
}
public async Task RecordChargeback(Payment payment, Chargeback chargeback)
{
await _ledger.Record(new LedgerEntry[]
{
// Reverse revenue
new() { Account = AccountType.Revenue, Amount = chargeback.Amount, Type = EntryType.Debit },
// Charge to loss
new() { Account = AccountType.ChargebackLosses, Amount = chargeback.Amount, Type = EntryType.Debit },
// Record the fee
new() { Account = AccountType.ChargebackFees, Amount = chargeback.Fee, Type = EntryType.Debit },
// Cash outflow
new() { Account = AccountType.Cash, Amount = chargeback.Amount + chargeback.Fee, Type = EntryType.Credit }
}, $"Chargeback {chargeback.Id} for payment {payment.Id}");
}
}Never Delete, Only Append
// Never do this
payment.Amount = newAmount;
await _repository.Update(payment);
// ✅ Always do this
payment.RecordAdjustment(new AmountAdjustment
{
OriginalAmount = payment.Amount,
NewAmount = newAmount,
Reason = "Settlement adjustment from provider",
AdjustedAt = DateTimeOffset.UtcNow
});
await _repository.Update(payment);
// Original amount preserved in adjustment historyPart 5: Designing for Investigation
When exceptions occur, humans need to investigate. Design your system to support them.
The Investigation Dashboard Query Model
public class PaymentInvestigationView
{
public Guid PaymentId { get; set; }
public string CustomerReference { get; set; }
public string ProviderReference { get; set; }
public string NetworkReference { get; set; } // Multiple reference points
// Current state
public PaymentState CurrentState { get; set; }
// History for investigation
public List<StateTransition> StateHistory { get; set; }
public List<AmountChange> AmountHistory { get; set; }
public List<ExternalEvent> ExternalEvents { get; set; }
// Related entities
public List<RefundSummary> Refunds { get; set; }
public List<ChargebackSummary> Chargebacks { get; set; }
public List<DisputeSummary> Disputes { get; set; }
// Reconciliation status
public bool IsReconciled { get; set; }
public DateTimeOffset? LastReconciledAt { get; set; }
public List<ReconciliationDiscrepancy> Discrepancies { get; set; }
// Investigation helpers
public TimeSpan AgeOfPayment => DateTimeOffset.UtcNow - CreatedAt;
public bool HasOpenException => Chargebacks.Any(c => c.IsOpen) || Disputes.Any(d => d.IsOpen);
public Money CurrentAmount => Amount;
public Money TotalRefunded => Refunds.Sum(r => r.Amount);
public Money NetAmount => CurrentAmount - TotalRefunded - ChargebackLosses;
}Part 6: The Exception SLAs
Define explicit SLAs for exception handling. If you don't have deadlines, you will lost deadlines.
public static class ExceptionSLAs
{
public static readonly Dictionary<ExceptionType, TimeSpan> ResponseDeadlines = new()
{
// Customer-facing
[ExceptionType.RefundRequest] = TimeSpan.FromHours(4),
[ExceptionType.DisputeInquiry] = TimeSpan.FromHours(24),
// Chargeback (driven by network rules)
[ExceptionType.RetrievalRequest] = TimeSpan.FromDays(10),
[ExceptionType.ChargebackResponse] = TimeSpan.FromDays(7),
[ExceptionType.PreArbitrationResponse] = TimeSpan.FromDays(10),
// Internal operations
[ExceptionType.ReconciliationDiscrepancy] = TimeSpan.FromDays(1),
[ExceptionType.NetworkReversal] = TimeSpan.FromHours(4),
[ExceptionType.SettlementAdjustment] = TimeSpan.FromDays(1)
};
}
public class ExceptionSLAMonitor
{
public async Task<List<SLABreach>> GetBreaches()
{
var openExceptions = await _exceptionRepository.GetOpen();
return openExceptions
.Where(e => e.Age > ExceptionSLAs.ResponseDeadlines[e.Type])
.Select(e => new SLABreach
{
Exception = e,
SLA = ExceptionSLAs.ResponseDeadlines[e.Type],
BreachDuration = e.Age - ExceptionSLAs.ResponseDeadlines[e.Type]
})
.ToList();
}
}Part 7: When This Breaks
When Network Reversals Break
The code above assumes the reversal arrives before we've refunded the customer. At a previous role, we once received a network reversal for a transaction we had already fully refunded at the customer's request. Our ledger showed a negative liability. Our customer service team had to explain to a merchant why we were taking money back 45 days after the sale, money we no longer had, because we'd returned it to the customer.
The technical fix: Idempotency that survives refunds. We now track the entire lifecycle of each network reference ID, including which refunds and reversals have been applied. The system must recognize that a reversal arriving after a refund is not a new reversal, it's a duplicate event.
The operational fix: A policy change: "Never refund a transaction less than 7 days old." This gave the network reversal window time to close.
When Partial Captures Break
We once had a merchant who consistently settled less than the authorized amount due to shipping adjustments. Our system correctly captured the partial amount but our reconciliation system expected exactly one settlement per authorization. It flagged every single transaction as a discrepancy. 5,000 alerts in one day. Finance stopped trusting the reconciliation dashboard.
The technical fix: Reconciliation must understand one-to-many relationships between authorizations and settlements. We added support for "settlement groups" linked to a single authorization.
The lesson: Your exception handling will have exceptions. Build for adaption.
Conclusion: Build for the 1%
The systems that handle exceptions well share common traits. They use state machines rather than status fields; exceptions trigger transitions, not updates. They maintain append-only ledgers where every change is recorded and nothing is overwritten. They're event-driven, with exceptions emitting events that trigger workflows. And they're designed for investigation, because humans will inevitably need to understand what happened, working against explicit SLAs for every exception type.
The happy path is table stakes. The exception path is where you earn your reputation as a payments engineer.
Build for the 1%