Schema-Driven Development

A methodology for building enterprise systems by modeling their complete semantics as interlocking, machine-readable schemas,before writing application code. Ten schema layers. Cross-referenced. Agent-ready.

The Problem

Enterprise systems are complex. A typical ERP has hundreds of entities, thousands of state transitions, intricate permission models, regulatory constraints, and multi-service orchestration patterns. Traditionally, this complexity is captured in requirements documents that are written once and never updated, architecture diagrams that diverge from reality within weeks, user stories that capture fragments without showing the whole, and tribal knowledge that lives in engineers' heads.

None of these are machine-readable. None of them cross-reference each other. None of them can be validated for consistency. And critically, none of them can be consumed by an AI agent that needs to understand what the system is before it can help build it.

The Core Idea

Model the entire system as a set of interlocking, machine-readable semantic schemas,entities, behaviors, events, tools, ownership, permissions, compliance,so that any operator (human or AI) can consume the specification directly and reason about the system without reading code.

The idea is that the specification is not documentation about the system,it is the system's definition. Code implements the specification. If the code and the specification disagree, you fix the code. But sometimes the code is right and the spec is wrong,production teaches you things the model didn't capture. The spec must be easy to update when reality wins.

This isn't a new concept (MDA said it in 2001), but applying it across 10 semantic layers simultaneously is where this approach tries to go further.

The Schema Layers

The approach proposes 10 primary schema layers and several derived artifacts. Each layer captures a distinct semantic dimension of the system, and every layer cross-references the others.

1. Ontology

Every domain entity,its attributes, valid states, emitted events, domain tags, and package membership.

{
  "name": "Order",
  "description": "Sales order with fulfillment lifecycle.",
  "attributes": {
    "required": ["order_id", "customer_id", "status"],
    "optional": ["priority", "shipping_method", "notes"]
  },
  "states": ["Draft", "Confirmed", "Picking", "Shipped", "Delivered", "Cancelled"],
  "events": ["OrderCreated", "OrderConfirmed", "OrderShipped", "OrderDelivered"],
  "tags": ["commercial"],
  "package": "core"
}

The ontology is the foundation. Every other schema references entities defined here. An entity's states must match its state machine. Its events must appear in the event catalog. Its package determines which capability group it belongs to.

2. State Machines

The lifecycle of every stateful entity,states, transitions, guard conditions, and triggering events.

{
  "entity": "Order",
  "states": ["Draft", "Confirmed", "Picking", "Shipped", "Delivered", "Cancelled"],
  "transitions": [
    {
      "from": "Draft",
      "to": "Confirmed",
      "event": "OrderConfirmed",
      "guards": ["all_lines_valid", "credit_check_passed"]
    }
  ],
  "package": "core"
}

State machines formalize what the ontology declares. The guards field names preconditions that must be true before a transition fires.

3. Event Catalog

Every domain event,who produces it, who consumes it, the payload schema, and the event envelope contract.

{
  "name": "OrderConfirmed",
  "version": "1.0",
  "producer": "OrderService",
  "consumers": ["FulfillmentService", "FinanceService", "AuditLog", "Analytics"],
  "payload": { "order_id": "string", "confirmed_at": "datetime" },
  "package": "core"
}

The event catalog is the nervous system. The envelope requires event_id, correlation_id, causation_id, actor_id, and tool_call_id,enabling full traceability from user action to system-wide effect.

4. Tool Registry

Every operation an agent or user can perform,API binding, safety classification, permissions, approval gates, emitted events, audit level.

{
  "id": "orders.order.confirm",
  "domain": "orders",
  "name": "Confirm Order",
  "kind": "write",
  "safety": "unsafe",
  "api": { "method": "POST", "path": "/orders/{id}/confirm" },
  "permissions": {
    "allow_roles": ["DistributorSales", "FinanceManager"],
    "policy_ids": ["order.confirm"]
  },
  "idempotency": "required",
  "requires_human_approval": false,
  "emits_events": ["OrderConfirmed"],
  "audit": { "level": "write" },
  "package": "core"
}

This is where the approach diverges from traditional API-first design. Tools are not just HTTP endpoints,they are semantic operations with identity, safety classification, permission binding, and side-effect declaration. An AI agent reads the tool registry to understand: what can I do, what will happen if I do it, and who needs to approve it?

5. Workflows and Sagas

Cross-service orchestrations,triggers, steps, compensations, and the tools that implement each step.

{
  "id": "WF_ORDER_TO_CASH",
  "name": "Order to Cash",
  "start_triggers": [{ "event": "OrderConfirmed" }],
  "steps": [
    {
      "id": "create_fulfillment",
      "type": "command",
      "service": "fulfillment_service",
      "tools": ["fulfillment.order.create"],
      "emits": ["FulfillmentOrderCreated"]
    }
  ],
  "compensations": [
    { "step": "create_fulfillment", "compensate": "fulfillment.order.cancel" }
  ]
}

Workflows tie tools and events into coherent business processes. Compensations define rollback behavior for saga patterns.

6. Data Ownership

Which service is the system of record for which entities,what it owns, references, emits, and consumes.

{
  "service_id": "order_service",
  "package": "core",
  "system_of_record": true,
  "owns_entities": ["Order", "OrderLine", "OrderHold", "OrderAmendment"],
  "references_entities": ["Customer", "Product", "PriceAgreement"],
  "emits_events": ["OrderCreated", "OrderConfirmed", "OrderShipped"],
  "consumes_events": ["PaymentReceived", "FulfillmentCompleted"]
}

Every entity has exactly one authoritative owner. Services that need data from other domains consume events or query read models,they don't share databases.

7. C4 Architecture Model

System structure,people (actors), containers (services), dependencies, and deployment topology.

{
  "system": {
    "id": "banking_platform",
    "name": "Core Banking Platform",
    "description": "Agent-first banking and financial services platform."
  },
  "tiers": {
    "platform_engine": ["api-gateway", "identity", "workflow", "event-bus", "audit"],
    "core_banking": ["parties", "accounts", "ledger", "payments", "risk"],
    "vertical_packages": ["lending", "wealth-management", "trade-finance"]
  }
}

Container IDs correspond to service_id values in data ownership. Actor IDs correspond to roles in the roles and policies schema.

8. Roles and Policies

The RBAC/ABAC model,roles, org types, and policies that gate tool access.

{
  "id": "DistributorSales",
  "org_types": ["Distributor"],
  "description": "Quote, order, and customer management.",
  "package": "core"
}

Roles connect to tools via the permissions.allow_roles field in the tool registry. Every tool invocation is permission-checked against this model.

9. Compliance Rules

Regulatory constraints,applicability, validation logic, effective dates, and exceptions.

{
  "rule_id": "PCI_DSS_4_0",
  "source": "PCI_SSC",
  "effective_date": "2024-03-31",
  "applies_to": ["PaymentOrder", "CardTransaction"],
  "validations": ["transaction.encryption_standard == 'AES-256'", "transaction.tokenized == true"],
  "package": "payments"
}

Compliance rules are package-scoped. A bank enabling the payments package gets PCI DSS and AML/KYC rules. A wealth-management-only tenant never sees them.

10. Integration Contracts

External system connectors,protocols, inbound/outbound message mappings, entity updates, and emitted events.

{
  "id": "x12_edi",
  "name": "X12 EDI Connector",
  "modes": ["AS2", "SFTP", "VAN"],
  "inbound": [
    {
      "transaction": "855",
      "maps_to": "PurchaseOrderAcknowledgment",
      "emits_events": ["POAcknowledged"]
    }
  ]
}

Each inbound message maps to domain entities and emits domain events, maintaining the event-driven contract even for legacy protocols.

Derived Artifacts

Several artifacts are generated from the primary schemas rather than authored directly:

Artifact	Generated From	Purpose
Actions Matrix	State machines + Tool registry	Joins transitions to implementing tools via event names. Reveals gaps.
DB Schema (SQL)	Ontology	DDL for the operational database. Entity attributes become columns.
OpenAPI Spec	Tool registry	HTTP API specification derived from tool definitions.
Graph Schema	Ontology relationships	Knowledge graph model for entity relationship queries.
Platform Spec	All schemas	Human-readable reference document from the complete schema set.

Primary schemas are authored, derived artifacts are generated. If a derived artifact is wrong, you fix the source schema and regenerate. This eliminates drift between documentation and reality.

The Package Dimension

Every element in every schema carries a package field. This single dimension cuts through the entire specification:

core (accounts, parties, ledger, audit),always enabled
  |
  |-- payments (PaymentOrder, CardTransaction, Settlement, Clearing)
  |-- lending (LoanApplication, CreditFacility, Collateral, Disbursement)
  |-- wealth-management (Portfolio, InvestmentOrder, AdvisoryMandate)
  |-- insurance (Policy, Claim, Underwriting, Premium)
  |-- trade-finance (LetterOfCredit, BillOfLading, TradeSettlement)
  |-- treasury (FXOrder, LiquidityPool, HedgingPosition)
  |-- compliance (KYCCheck, SARFiling, RegulatoryReport)
  |
  '-- islamic-finance (MurabahaContract, SukukIssuance)
        depends on: lending, treasury

When a package is disabled for a tenant, its entities are invisible, its tools are unavailable, its events are not emitted, its roles are not assignable, and its compliance rules are not enforced.

The package dimension is not a deployment concern bolted on later. It is a first-class dimension of the ontology itself, enabling the same platform to serve a retail bank and an investment firm from the same codebase.

Cross-Reference Integrity

Every schema references every other, forming a closed semantic loop:

Ontology
  |-- declares states --> State Machines
  '-- declares events --> Event Catalog

State Machines
  |-- transitions emit events --> Event Catalog
  '-- transitions guarded by --> (business rules)

Event Catalog
  |-- events produced by --> Data Ownership (services)
  '-- events emitted by --> Tool Registry (tools)

Tool Registry
  |-- tools require --> Roles & Policies
  |-- tools emit events --> Event Catalog
  '-- tools used in --> Workflows

Workflows
  |-- triggered by --> Event Catalog
  '-- steps invoke --> Tool Registry

Data Ownership
  |-- services own --> Ontology (entities)
  '-- services map to --> C4 Model (containers)

This means you can ask questions like:

"Which tools can modify an Order?",query the tool registry for tools that emit Order events
"What happens when a PO is acknowledged?",follow the EDI 855 mapping to the event, then find all consumers
"What does enabling the lending package add?",filter every schema by package: "lending"

Validation

Because every schema cross-references every other, automated validators can check for:

Check	What it catches
Orphaned entities	Entities in the ontology with no owning service
Missing state machines	Entities that declare states but have no corresponding state machine
Dangling events	Events emitted by tools or state machines that don't exist in the catalog
Toolless transitions	State machine transitions with no implementing tool
Permission gaps	Tools with permission requirements that reference non-existent roles
Duplicate ownership	Entities claimed by multiple services as system of record
C4/ownership mismatch	Services in data ownership that don't exist in the C4 model
Uncovered events	Events in the catalog that no tool emits

Why This Matters for AI

An AI agent building the order_service doesn't need to read thousands of lines of existing code, reverse-engineer state machines from conditional logic, guess which events to emit, or infer service boundaries from import patterns.

Instead, it reads the ontology for what entities it manages, the state machines for exact lifecycles, the tool registry for operations with permissions and side effects, the event catalog for what to emit and listen for, data ownership for what it owns versus references, and workflows for what multi-step processes it participates in.

The specification is the agent's operating manual. It constrains and guides every implementation decision. The agent doesn't need to be creative about architecture,the architecture is defined. It needs to be precise about implementation.

Operating Principles

Specification is source of truth,usually. If the code and the specification disagree, fix the code. But sometimes the code is right and the spec is wrong,production teaches you things the model didn't capture. The spec must be easy to update when reality wins.
Machine-readable over human-readable. Structured JSON over prose. Humans can read JSON; machines cannot read prose.
Cross-reference everything. Every schema references every other schema. Isolated schemas are incomplete schemas.
Package dimension is first-class. Every element carries a package tag. Composability is not an afterthought.
Derive, don't duplicate. If an artifact can be generated from primary schemas, generate it. Never maintain two sources of truth.
Tools, not APIs. Model operations as semantic tools with safety, permissions, and side effects,not as bare HTTP endpoints.
Events are the integration contract. Services communicate through events. The event catalog is the contract between them.
Validate continuously. Run cross-reference checks automatically. Catch architectural errors before they become code.

Getting Started

Start with the ontology. List every entity, its attributes, states, and domain tags.
Add state machines. For every entity with states, define the transitions, guards, and triggering events.
Build the event catalog. Every event referenced by an entity or state machine gets a catalog entry with producer, consumers, and payload.
Define tools. Every user- or agent-facing operation becomes a tool with API binding, permissions, safety classification, and emitted events.
Map data ownership. Assign every entity to exactly one owning service.
Model the architecture. Define the C4 containers that correspond to your services.
Add roles and policies. Define who can invoke which tools.
Add workflows. Define cross-service processes as step sequences referencing tools and events.
Add compliance rules. Define regulatory constraints scoped to packages.
Add integration contracts. Define external system connectors with message mappings.
Generate derived artifacts. Run generators to produce the actions matrix, DB schema, OpenAPI spec, and full platform spec.
Validate. Run cross-reference checks. Fix gaps. Repeat.