8. On-Chain Data Anchoring and Storage Architecture

Efficient persistence of Baseline data on the FirmaChain mainnet is one of the protocol's most consequential engineering challenges. A single Verification Object with its evidence graph, feature vectors, and attestation records can exceed 500 KB in serialized form. Writing this volume per-evaluation to a Cosmos SDK blockchain at sub-second finality is infeasible without a layered anchoring strategy.

8.1 Design Principles

  1. Anchor the minimum, store the maximum off-chain. The chain records cryptographic commitments (hashes) and lightweight metadata. The full VO payloads, evidence units, and evidence graphs reside in an off-chain data availability layer.

  2. Batch aggressively. Individual VO anchoring transactions are cost-prohibitive at high throughput. Multiple VOs are batched into a single Merkle root commitment per block window.

  3. Separate hot state from cold archive. Recent VOs (< 30 days) are indexed in hot PostgreSQL replicas for API serving. Historical VOs are compressed and migrated to an archival tier. The chain itself only ever stores fixed-size commitment records.

  4. Verification-first: the chain proves integrity, not stores data. A validator or auditor can independently verify any VO by: (1) retrieving the full VO from the off-chain layer, (2) recomputing its content hash, (3) confirming the hash is anchored on-chain at the claimed block height.

8.2 Anchoring Transaction Schema

Each anchoring transaction on FirmaChain commits a batch of VOs using a single Merkle root:

{
  "MsgAnchorBatch": {
    "sender":           "string           // Operator address (bech32)",
    "batchId":          "string           // Content-addressed batch identifier",
    "merkleRoot":       "bytes32          // Root of Merkle tree over VO content hashes",
    "voCount":          "uint32           // Number of VOs in this batch",
    "engineVersion":    "string           // Engine version used for this batch",
    "blockRangeFrom":   "uint64           // Earliest source-chain block referenced",
    "blockRangeTo":     "uint64           // Latest source-chain block referenced",
    "sourceChain":      "string           // 'solana' | 'ethereum' | 'base' | 'arbitrum'",
    "timestamp":        "int64            // Unix timestamp of batch creation"
  }
}

On-Chain State (per batch):

{
  "AnchorRecord": {
    "batchId":          "string",
    "merkleRoot":       "bytes32",
    "voCount":          "uint32",
    "engineVersion":    "string",
    "anchoredAt":       "int64            // FirmaChain block time",
    "blockHeight":      "uint64           // FirmaChain block height",
    "operator":         "string           // Anchoring operator address"
  }
}

Storage cost per batch: ~220 bytes of on-chain state, regardless of the number of VOs in the batch.

8.3 Merkle Batch Construction

VOs are accumulated in a rolling buffer and committed at fixed intervals or when the buffer reaches a size threshold.

Batching Parameters:

Parameter Value
Maximum batch interval 60 seconds (configurable)
Maximum batch size 1,000 VOs
Minimum batch size 1 VO (no empty batches)

Construction Algorithm:

  1. Collect VO content hashes: H_i = voId_i for each VO in the batch
  2. Sort hashes lexicographically (deterministic ordering)
  3. Build a binary Merkle tree:
    • Leaf nodes: H_0, H_1, ..., H_n
    • If n is odd, duplicate the last leaf
    • Internal nodes: Keccak-256(left || right)
    • Root: the single remaining hash
  4. Compute batchId = Keccak-256(merkleRoot || engineVersion || timestamp)
  5. Submit MsgAnchorBatch to FirmaChain

Inclusion Proof:

For any VO in a batch, a Merkle inclusion proof consists of O(log n) sibling hashes. This allows any party to verify that a specific voId was part of an anchored batch without downloading the entire batch. The proof size for a 1,000-VO batch is ~10 sibling hashes (320 bytes).

8.4 FirmaChain Module Design (x/baseline)

The Baseline anchoring logic is implemented as a custom Cosmos SDK module:

Module: x/baseline

Store Keys:

Store Key → Value
AnchorStore batchIdAnchorRecord
VOIndexStore voIdbatchId (reverse lookup)
OperatorStore operatorAnchorRecord[] (operator history)
AttestationStore voIdAttestation[]

Messages:

Message Description
MsgAnchorBatch Commit a batch of VO hashes (operator only)
MsgSubmitAttestation Validator submits replay attestation for a specific voId
MsgDisputeVO Challenge an existing VO (requires stake deposit)
MsgResolveDispute Protocol-level dispute resolution

Queries:

Query Returns
QueryAnchor(batchId) AnchorRecord
QueryVOAnchor(voId) { batchId, merkleProof, anchorRecord }
QueryAttestations(voId) Attestation[]
QueryOperatorHistory(operator) AnchorRecord[]

Gas Costs (target):

Operation Gas
MsgAnchorBatch ~50,000 gas (flat, independent of voCount)
MsgSubmitAttestation ~30,000 gas
MsgDisputeVO ~40,000 gas + dispute escrow (0.1 * minimumStake, see Section 7.5.2)

The VOIndexStore enables O(1) lookup from any voId to its anchoring batch. This index is the only per-VO on-chain state — at 96 bytes per entry (32-byte voId key + 64-byte batchId value), indexing 1 million VOs requires approximately 96 MB of state, manageable for a Cosmos SDK chain with state pruning.

8.5 Off-Chain Data Availability Layer

Full VO payloads, evidence units, and evidence graphs are stored off-chain with redundancy guarantees.

Primary Store: PostgreSQL (Hot Tier)

  • Read/write replicas with connection pooling
  • Tables: verification_objects, evidence_units, evidence_graphs, attestations
  • Retention: 90 days in hot tier
  • Indexed by: voId, claimId, subject, predicate, timestamp, qualification

Secondary Store: Object Storage (Warm Tier)

  • S3-compatible storage (AWS S3 / Cloudflare R2)
  • VOs serialized as BCE binary blobs, keyed by voId
  • Evidence graphs stored as adjacency lists in compressed binary format
  • Retention: 2 years
  • Accessed via content-addressed URI: baseline://vo/{voId}

Archival Store: Decentralized Storage (Cold Tier)

  • Arweave for permanent archival of VOs that have reached the weighted attestation threshold (attestationScore >= 3.0, see Section 7.5.3)
  • IPFS with Filecoin pinning for evidence units and raw provider responses
  • Content-addressed: CID derived from BCE-encoded payload
  • Retention: permanent (Arweave) / contract-duration (Filecoin)

Data Flow:

  1. VO created → written to PostgreSQL immediately
  2. OFFCHAIN_DECLARED evidence units promoted to IPFS/Filecoin immediately upon VO creation (these cannot be re-derived from any other source — see Section 3.3)
  3. OFFCHAIN_EPHEMERAL evidence units additionally submitted to Arweave within 7 days (permanent archival, since the original source may be deleted at any time)
  4. VO batched → Merkle root anchored on FirmaChain
  5. After anchoring → VO payload uploaded to S3 with batchId + merkleProof metadata
  6. ONCHAIN evidence units promoted to IPFS/Filecoin within 24h of VO anchoring (these can be re-derived from archive RPCs as fallback, so the urgency is lower)
  7. After reaching weighted attestation threshold (attestationScore >= 3.0, Section 7.5.3) → VO promoted to Arweave for permanent storage
  8. After 90 days → VO evicted from PostgreSQL hot tier (remains in S3 + Arweave/IPFS)

Evidence Serving API (for Validators)

Operators MUST expose an Evidence Serving API for validator evidence retrieval (Section 7.2, Step 2). This API is separate from the consumer-facing query API (Section 8.7).

Endpoints:

Endpoint Description Response
GET /v1/evidence/{evidenceId} Retrieve a single evidence unit by content hash BCE-encoded canonicalForm bytes
POST /v1/evidence/batch Retrieve multiple evidence units Array of {evidenceId, canonicalForm} pairs
GET /v1/vo/{voId}/evidence-manifest List all evidenceIds referenced by a VO Array of evidenceId strings with evidence type and ref field
GET /v1/evidence/{evidenceId}/provenance Retrieve full provenance record (retrievalMethod, anchor, ref, rawResponse) Full evidence unit metadata (excluding canonicalForm bytes for bandwidth)

Guarantees:

  • Availability: Evidence units MUST remain retrievable via this API for as long as the VO is in CURRENT or SUPPORTED engine version scope (minimum 12 months after engine version deprecation, per Section 5.5)
  • Integrity: Validators MUST verify Keccak-256(canonicalForm) == evidenceId for every response. The API does not provide trust — only data availability.
  • Rate limits: Minimum 1,000 evidence units per second per validator. Operators MAY require validator authentication (API key tied to on-chain validator identity).
  • Fallback declaration: If an operator can no longer serve a specific evidence unit (data loss, corruption), they MUST mark it as UNAVAILABLE in the API response, not return an error silently. Validators use this signal to fall back to decentralized storage or abort replay.

8.6 State Compression Strategies

To minimize on-chain footprint and off-chain storage costs:

  1. VO Deduplication — Identical evaluations (same claim, same evidence, same engine version) produce identical voIds by design. The content-addressing scheme naturally deduplicates. If a VO with a given voId already exists, anchoring is skipped.

  2. Evidence Sharing — Multiple VOs for the same token within the same context window share evidence units. Evidence units are stored once and referenced by evidenceId. A single ACCOUNT_BALANCE snapshot may be referenced by supply_concentration, holder_distribution, and wallet_clustering VOs simultaneously.

  3. Incremental Evidence Graphs — When a token is re-evaluated (e.g., at a later block), only the graph delta is stored. The delta consists of:

    • New nodes added since the previous evaluation
    • New edges added
    • Updated node properties (balance changes)
    • Removed nodes (pruned)

    Full graph reconstruction = base graph + ordered sequence of deltas. Storage reduction: typically 60-80% for re-evaluations within a 24-hour window.

  4. Feature Vector Compression — The 26-feature vector (Section 5.4) is stored as a fixed-size binary record:

    • 26 features x 8 bytes (fixedpoint64) = 208 bytes per feature snapshot (see Appendix B, Section B.2 for encoding and scale factors)
    • Compared to JSON representation (~1.2 KB), this is a 5-6x reduction
    • Time-series feature history uses delta encoding: store the first snapshot in full, subsequent snapshots as deltas from the previous
    • Note: fixedpoint64 (integer-based) is used instead of IEEE 754 float64 to guarantee cross-implementation determinism — see Section 5.4 for per-feature scale factors
  5. Attestation Aggregation — Rather than storing individual attestation signatures on-chain, attestations are aggregated:

    • BLS signature aggregation (when supported): N signatures → 1 aggregate signature
    • Until BLS is available: attestation bitmap + sorted validator set → compact representation
    • On-chain record per VO: aggregate_sig (64 bytes) + validator_bitmap (ceil(N/8) bytes)
    • For 100 validators: 64 + 13 = 77 bytes per VO attestation record (vs. 6,400 bytes for individual Ed25519 signatures)

8.7 Query and Indexing Architecture

Efficient querying of anchored data requires an indexing layer between the chain and the API:

FirmaChain Event Indexer:

  • Subscribes to FirmaChain block events via WebSocket (Tendermint RPC)
  • Indexes MsgAnchorBatch events into a PostgreSQL mirror table
  • Indexes MsgSubmitAttestation events for attestation counts
  • Provides sub-second lookup: voId → anchor proof → on-chain confirmation

Composite Query Path (for API consumers):

  1. Client requests VO by voId
  2. API server checks PostgreSQL hot tier → returns full VO if found
  3. If not in hot tier, fetches from S3 warm tier → caches in hot tier → returns
  4. On-chain anchor proof is attached to the response:
{
  "batchId":                "string",
  "merkleRoot":             "bytes32",
  "merkleProof":            "bytes32[]",
  "firmaChainBlockHeight":  "uint64",
  "firmaChainTxHash":       "string"
}
  1. Client can independently verify: recompute voId from VO content, walk Merkle proof to root, confirm root matches on-chain anchor

Performance Guarantees:

Scenario Target Latency
Hot-tier VO (95th percentile) < 50ms
Warm-tier VO (including S3 fetch) < 500ms
Full cryptographic verification Supported without trusting the API server
FirmaChain state growth ~100 MB per million VOs

8.8 Migration Path from Current Architecture

The reference implementation (pumpfun-monitor) currently stores all data in PostgreSQL without on-chain anchoring. The migration proceeds in phases.

Trust transparency: During Phases 1-2, VOs are produced without on-chain anchoring. Any consumer of the API during this period is trusting the operator entirely — there is no independent verification mechanism. This trust asymmetry is explicitly surfaced in API responses via the anchoringStatus and anchoringPhase fields (Section 6.1). Consumers MUST be aware that PRE_ANCHORING VOs carry operator-only trust.

Phase 1: Shadow Anchoring (current → +2 months)

  • Deploy x/baseline module on FirmaChain testnet
  • Run anchoring in parallel: write VOs to PostgreSQL AND compute Merkle batches
  • Do NOT submit batches on-chain yet; log batch metadata for validation
  • Validate: recompute batch roots from logged VOs, confirm determinism
  • API status: anchoringStatus: "PRE_ANCHORING", anchoringPhase: "SHADOW"

Phase 2: Testnet Anchoring (+2 → +4 months)

  • Submit MsgAnchorBatch transactions to FirmaChain testnet
  • Validators run replay protocol against testnet VOs
  • Stress test: measure gas costs, batch throughput, state growth
  • Deploy FirmaChain event indexer, validate query path end-to-end
  • API status: anchoringStatus: "PRE_ANCHORING", anchoringPhase: "TESTNET"
  • Testnet anchoring is informational — it does not constitute mainnet-verifiable anchoring

Phase 3: Mainnet Soft Launch (+4 → +6 months)

  • Deploy x/baseline module to FirmaChain mainnet
  • Begin anchoring production VOs with conservative batching (300-second intervals)
  • Monitor state growth, validator replay latency, gas economics
  • S3 warm tier and Arweave cold tier operational
  • API status: anchoringStatus: "PENDING" → "ANCHORED", anchoringPhase: "MAINNET"
  • First phase where independent verification is possible

Phase 4: Full Production (+6 months →)

  • Reduce batch interval to 60 seconds
  • Enable attestation aggregation on-chain
  • Open dispute resolution protocol
  • Historical backfill: anchor Merkle roots for pre-migration VOs (backfilled VOs receive anchoringStatus: "ANCHORED" with anchoringPhase: null and a note in baselineChain metadata indicating backfill)
  • API status: anchoringStatus: "PENDING" → "ANCHORED", anchoringPhase: null

results matching ""

    No results matching ""