Architecture Decisions
Production architectures I've designed and led — real systems handling real traffic under real constraints.
These aren't textbook designs. They're the result of navigating tradeoffs between speed and correctness, convincing teams to invest in the right abstractions, and learning what actually matters when systems can't go down.
Embedded Insurance Platform
A production architecture handling high-throughput traffic with strict regulatory and financial correctness requirements.
The Context
Embedded insurance integrates coverage directly into partner transaction journeys — device protection during mobile checkout, loan protection bundled with credit disbursement, travel insurance in ticket booking. It's highly automated, API-driven, real-time, and partner-distributed across e-commerce, fintech, and NBFC platforms.
Building this platform meant solving a fundamentally different problem than traditional insurance. We needed a configurable product engine, a reliable issuance pipeline, a financially correct ledger, and a compliant reporting backbone — all without rewriting the core for each new partner or line of business.
Architecture Overview
The platform is organized into 6 bounded contexts:
| Domain | Responsibility |
|---|---|
| Product Configuration | Product, Plan, Cover, Benefit, SKU hierarchy |
| Pricing & Eligibility | Grid-based pricing, rule engine for eligibility |
| Policy & Issuance | Proposal → Underwriting → Policy → Certificate |
| Finance & Ledger | Double-entry accounting, float wallet |
| Claims | FNOL, assessment, approval, payment, reserve tracking |
| Partner & Overlay | Partner config, pricing/commission overrides |
Key Architecture Decisions & Why I Made Them
Decision 1: Immutable Product Versioning
The problem: Early on, we allowed in-place plan edits. A pricing change propagated to active policies in production — incorrect premiums, regulatory exposure, and significant cleanup effort.
The decision: Plans are immutable once published. New version, soft-deprecate the old one, maintain backward compatibility. Partner overlays (custom pricing, modified eligibility) live in a separate layer.
The tradeoff: More storage, more version management complexity. But we never had that issue again. In a regulated industry, immutability is cheaper than incorrectness.
Decision 2: Grid-Based Pricing Over Runtime Actuarial Models
The problem: Traditional insurance uses complex actuarial models at runtime. Embedded insurance operates at a scale and speed where runtime computation introduces latency and failure points.
The decision: Pre-computed pricing grids versioned and preloaded in memory, with partner-level overrides.
Why this matters: At high throughput, every millisecond of pricing computation counts. Grids give us deterministic, auditable pricing with zero runtime risk. The grid versioning also simplified regulatory compliance — when regulators ask "what rate did this policy get?", we can point to the exact grid version.
Decision 3: Double-Entry Immutable Ledger
The problem: This is where most naive insurance systems fail. When I inherited the financial pipeline, there were data inconsistencies between the ledger, finance systems, and compliance reports. Financial closures were slow.
The decision: Every financial movement creates balanced journal entries in an immutable, append-only ledger. Never mix business tables with ledger tables. Never update ledger rows — ever.
- Premium collection: Debit Partner Float → Credit Premium Receivable
- Allocation: Debit Premium Receivable → Credit Insurer Payable + Commission Payable
- Settlement: Debit Insurer Payable → Credit Bank Account
The organizational challenge: The engineering team initially wanted a simpler approach. An immutable ledger is harder to build, harder to debug, and requires more storage. I had to make the case that in a regulated financial system, correctness is non-negotiable — and that the alternative (a mutable financial record) was a ticking time bomb.
The result: Closure time reduced significantly. Reporting accuracy improved. The ledger became the single source of truth for regulatory and financial reporting across all partners.
Decision 4: Failure Handling as a First-Class Concern
The philosophy: In high-throughput insurance systems, every failure mode needs a recovery path, and every recovery path needs testing.
| Scenario | Strategy |
|---|---|
| Payment success, issuance failure | Compensation workflow → Retry → If irrecoverable, refund + reversal entry |
| Policy created, ledger fails | Mark pending, retry ledger. Never leave a policy without ledger entries |
| Ledger posted, COI fails | Non-financial failure — retry independently, never reverse ledger |
| Duplicate requests | Idempotency keys + unique proposal reference + state validation |
The lesson: Designing failure handling up front is what separates systems that work at demo scale from systems that work at production scale. At high volume, even a tiny failure rate without a recovery path is unacceptable.
Decision 5: Reporting Architecture — Separate Reads from Writes
The decision: Operational DB → Event Stream → Reporting Warehouse → Regulatory Reports. Never run heavy reports on the transactional database. Reporting is eventually consistent — never block issuance for a reporting write.
Why: When regulators and finance teams need reports, they need them accurate and fast. But generating reports from a high-throughput database would degrade the very system generating the data. This separation gave us compliant reporting with audit trails without sacrificing platform performance.
Production Guarantees
- Daily reconciliation comparing ledger vs float vs settlement
- Event replay safety for disaster recovery
- Versioned pricing models with full audit trails
- Immutable ledger (append-only, no updates)
- Unique constraints on proposal references
- Idempotency keys on all mutating APIs
Platform Results
- High-throughput traffic at peak with zero-downtime migrations
- Hundreds of dealers onboarded with real-time settlement
- Full regulatory compliance across all partners
- Zero regulatory non-compliances across audits
Unified Partner Integration Layer
How I eliminated fragmented partner APIs across business lines and built a platform nobody asked for.
The Problem I Saw
The company had grown fast. Each business line had its own partner integration logic — different authentication mechanisms, different API contracts, different onboarding flows. Every new partner meant rebuilding integration logic that already existed somewhere else in the org.
The cost wasn't just engineering hours. It was inconsistent partner experience, painful debugging, and an inability to launch new products quickly because every launch required re-integration with existing partners.
The Architecture Decision
I proposed and led a unified partner integration platform standardizing APIs, authentication, and onboarding across all business lines.
The key design decisions:
- Shared authentication and authorization layer — one authentication mechanism for all partner interactions, regardless of product line
- Unified policy issuance and claim service standards — common API contracts with product-specific extensions rather than product-specific APIs
- Reusable SDKs and unified API documentation — partners integrate once, access all product lines
- Multi-LOB coordination layer — routing and orchestration across business lines without coupling
The Organizational Challenge
The technical design was the easy part. The hard part was getting multiple independent teams to agree on shared standards. Each team had their own integration patterns.
I chose to lead by influence rather than authority. I didn't have organizational control over the other teams. So I built the platform within my team first, proved the value, and let adoption happen through results:
- Started with my own team as the proving ground
- Demonstrated measurable improvements in integration speed
- Made the alternative — continuing with fragmented APIs — obviously worse
Results
- Engineering efficiency improved significantly by eliminating redundant integration logic
- New partner integration timelines reduced substantially
- Directly enabled rapid launches of multiple new product lines
Partner Onboarding Framework
How I turned a multi-day manual process into a self-service flow.
The Problem
Every new partner onboarding required multiple days of engineering time — credential setup, configuration, validation, QA checks, deployment approvals. With many active partners and a growing pipeline, the engineering team was becoming a bottleneck to business growth.
The Architecture Decision
Build a self-service onboarding platform that eliminates engineering involvement from routine partner setup:
- Self-serve APIs and UI workflows for configuration and credential management
- Automated validation and QA checks — no engineering review needed for standard onboarding
- Workflow automation integration for access and deployment approvals
- Built on top of the unified API layer for instant partner connectivity
The Tradeoff
The pushback was real: "We only onboard a few partners a month. Why build a platform for that?"
The bet wasn't about current volume. It was about where the business was heading. This was an investment in organizational scalability — and it paid off when we needed to rapidly onboard partners for new product launches.
Results
- Onboarding time reduced from days to hours
- Many partners onboarded with near-zero engineering involvement
- Internal throughput improved significantly
- Framework reused across all business lines for partner setup
New Business Line: Life Insurance
Architecture decisions behind launching a new insurance vertical and a hybrid product.
The Problem
The company was a General Insurance company entering Life Insurance — a completely different regulatory domain, actuarial model, and compliance framework. And after establishing the initial product, the challenge deepened: could we combine GI and LI into a single hybrid product?
Key Architecture Decisions
Decision 1: Build for configurability, not speed. The team wanted to hardcode assumptions to ship faster. I pushed for building issuance, endorsement, and claims systems that could extend to future products. This slowed the initial launch — but subsequent products shipped faster because the foundations were right.
Decision 2: Cross-entity policy ownership model. For the hybrid product, we designed a flexible ownership model allowing either LOB (GI or LI) to lead issuance while maintaining independent claims control. This required dual-regulatory orchestration — premium apportioning rules that satisfied both regulatory frameworks simultaneously.
Decision 3: Compliance architecture. When new regulatory guidelines required separating covers with different durations under distinct master policies, we designed a flexible policy framework — issuing multiple policies linked under a virtual "shallow" policy while keeping partner integration unchanged. Zero partner-side API changes.
The Organizational Lesson
Building in regulated environments taught me that technical competence alone isn't enough. Earning the trust of compliance, actuarial, and finance teams — all of whom had legitimate concerns about a tech team moving fast — was more important than the architecture itself. I invested weeks in cross-functional alignment before writing a line of code.
Results
- Successfully launched new Life Insurance vertical with zero regulatory non-compliances
- Delivered a first-of-its-kind hybrid GI + LI product
- Core systems reused for subsequent product launches
- Scalable compliance architecture for future regulatory updates
Payments & Checkout Platform (E-commerce)
Architecture decisions for systems handling high-throughput traffic during flash sales.
The Context
At a major e-commerce company, I led checkout, payments, and order management — platforms that could not go down during flash sales and high-traffic events. This was where I learned to think about scale, reliability, and the operational reality of mission-critical systems.
Key Architecture Decisions
Decision 1: Closed-Wallet System for Instant Payments
The problem: External payment settlements took 5–7 days for refunds. In e-commerce, slow refunds destroy customer trust.
The decision: Built an internal wallet for instant payments and refunds without external settlement delays. Users could add money and checkout instantly. Refunds credited immediately to wallet balance.
The tradeoff: Managing an internal float wallet introduced financial reconciliation complexity. But instant refunds — versus multi-day bank reversals — was a customer experience decision that justified the engineering investment.
Decision 2: Multi-Provider Payment Gateway with Automatic Failover
The problem: Single payment provider = single point of failure during the highest-traffic moments.
The decision: Integrated multiple payment providers with:
- Automatic retry with provider rotation on failure
- Configurable routing rules (cost optimization, success rate)
- Health monitoring with circuit breakers
The result: Very high payment success rate through intelligent multi-provider routing.
Decision 3: Flash Sale Concurrency Patterns
The problem: During flash sales, thousands of users trying to buy the same limited inventory simultaneously. Overselling = customer complaints, refund costs, and trust erosion.
The decisions:
- Pessimistic locking at the inventory layer to prevent oversells
- Time-limited cart holds to prevent inventory stockpiling
- Queue-based checkout during extreme traffic spikes
- Inventory holds with timeout — reserve stock during checkout, release on failure
The result: Zero inventory oversells during flash sales. High throughput sustained during sale events.
Decision 4: HSM-Based Key Management
The problem: Payment gateway credentials stored in application config or environment variables = security vulnerability.
The decision: HSM-based encryption for all payment credentials. Credentials never stored in application config. Key rotation without service restart.
Why this matters: Not glamorous. Not visible. But foundational security decisions are the ones that prevent the headlines you never want to see.
Core Platform Modernization
Zero-downtime migration of high-throughput systems to a unified core platform.
The Problem
Multiple products ran on legacy stacks with frequent scalability issues. The migration to a modern platform was essential for unification and future product reuse — but it involved migrating systems handling high-throughput traffic at peak.
The Architecture Approach
- Phase-wise migration of issuance and claim modules with rollback capability at every stage
- Domain-driven design patterns for clean service boundaries
- Observability and metrics tracking for faster RCA and rollout safety
- Shadow traffic testing before cutting over production traffic
The Leadership Challenge
The hardest part wasn't the technical migration — it was managing organizational risk. A failed migration at high volume would impact real customers, real partners, and real revenue. I had to balance the team's confidence with appropriate caution, ensure fallback plans were tested (not just documented), and maintain stakeholder trust throughout a multi-phase rollout.
Results
- Zero downtime during migration — including the highest-traffic partner
- Uptime and infrastructure efficiency improved significantly
- Unified platform across all business lines, enabling faster releases
- Platform ready for new product lines
What Ties These Decisions Together
Across all these systems, three principles have guided my architecture decisions:
- Correctness over convenience — Immutable ledgers, idempotency keys, and append-only logs are harder to build. But in regulated, high-throughput environments, shortcuts become liabilities.
- Build the platform before you need it — The unified integration layer, onboarding framework, and configurable product engine all required upfront investment that wasn't tied to immediate features. Every one of them paid for itself multiple times over.
- The organizational decision is harder than the technical one — Choosing the right architecture is important. But getting teams, stakeholders, and regulators aligned behind that choice is where leadership actually happens.