8. On-Chain Data Anchoring and Storage Architecture

Efficient persistence of Baseline data on the FirmaChain mainnet is one of the protocol's most consequential engineering challenges. A single Verification Object with its evidence graph, feature vectors, and attestation records can exceed 500 KB in serialized form. Writing this volume per-evaluation to a Cosmos SDK blockchain at sub-second finality is infeasible without a layered anchoring strategy.

8.1 Design Principles

Anchor the minimum, store the maximum off-chain. The chain records cryptographic commitments (hashes) and lightweight metadata. The full VO payloads, evidence units, and evidence graphs reside in an off-chain data availability layer.
Batch aggressively. Individual VO anchoring transactions are cost-prohibitive at high throughput. Multiple VOs are batched into a single Merkle root commitment per block window.
Separate hot state from cold archive. Recent VOs (< 30 days) are indexed in hot PostgreSQL replicas for API serving. Historical VOs are compressed and migrated to an archival tier. The chain itself only ever stores fixed-size commitment records.
Verification-first: the chain proves integrity, not stores data. A validator or auditor can independently verify any VO by: (1) retrieving the full VO from the off-chain layer, (2) recomputing its content hash, (3) confirming the hash is anchored on-chain at the claimed block height.

8.2 Anchoring Transaction Schema

Each anchoring transaction on FirmaChain commits a batch of VOs using a single Merkle root:

{
  "MsgAnchorBatch": {
    "sender":           "string           // Operator address (bech32)",
    "batchId":          "string           // Content-addressed batch identifier",
    "merkleRoot":       "bytes32          // Root of Merkle tree over VO content hashes",
    "voCount":          "uint32           // Number of VOs in this batch",
    "engineVersion":    "string           // Engine version used for this batch",
    "blockRangeFrom":   "uint64           // Earliest source-chain block referenced",
    "blockRangeTo":     "uint64           // Latest source-chain block referenced",
    "sourceChain":      "string           // 'solana' | 'ethereum' | 'base' | 'arbitrum'",
    "timestamp":        "int64            // Unix timestamp of batch creation"
  }
}

On-Chain State (per batch):

{
  "AnchorRecord": {
    "batchId":          "string",
    "merkleRoot":       "bytes32",
    "voCount":          "uint32",
    "engineVersion":    "string",
    "anchoredAt":       "int64            // FirmaChain block time",
    "blockHeight":      "uint64           // FirmaChain block height",
    "operator":         "string           // Anchoring operator address"
  }
}

Storage cost per batch: ~220 bytes of on-chain state, regardless of the number of VOs in the batch.

8.3 Merkle Batch Construction

VOs are accumulated in a rolling buffer and committed at fixed intervals or when the buffer reaches a size threshold.

Batching Parameters:

Parameter	Value
Maximum batch interval	60 seconds (configurable)
Maximum batch size	1,000 VOs
Minimum batch size	1 VO (no empty batches)

Construction Algorithm:

Collect VO content hashes: H_i = voId_i for each VO in the batch
Sort hashes lexicographically (deterministic ordering)
Build a binary Merkle tree:
- Leaf nodes: H_0, H_1, ..., H_n
- If n is odd, duplicate the last leaf
- Internal nodes: Keccak-256(left || right)
- Root: the single remaining hash
Compute batchId = Keccak-256(merkleRoot || engineVersion || timestamp)
Submit MsgAnchorBatch to FirmaChain

Inclusion Proof:

For any VO in a batch, a Merkle inclusion proof consists of O(log n) sibling hashes. This allows any party to verify that a specific voId was part of an anchored batch without downloading the entire batch. The proof size for a 1,000-VO batch is ~10 sibling hashes (320 bytes).

8.4 FirmaChain Module Design (`x/baseline`)

The Baseline anchoring logic is implemented as a custom Cosmos SDK module:

Module: x/baseline

Store Keys:

Store	Key → Value
`AnchorStore`	`batchId` → `AnchorRecord`
`VOIndexStore`	`voId` → `batchId` (reverse lookup)
`OperatorStore`	`operator` → `AnchorRecord[]` (operator history)
`AttestationStore`	`voId` → `Attestation[]`

Messages:

Message	Description
`MsgAnchorBatch`	Commit a batch of VO hashes (operator only)
`MsgSubmitAttestation`	Validator submits replay attestation for a specific `voId`
`MsgDisputeVO`	Challenge an existing VO (requires stake deposit)
`MsgResolveDispute`	Protocol-level dispute resolution

Queries:

Query	Returns
`QueryAnchor(batchId)`	`AnchorRecord`
`QueryVOAnchor(voId)`	`{ batchId, merkleProof, anchorRecord }`
`QueryAttestations(voId)`	`Attestation[]`
`QueryOperatorHistory(operator)`	`AnchorRecord[]`

Gas Costs (target):

Operation	Gas
`MsgAnchorBatch`	~50,000 gas (flat, independent of `voCount`)
`MsgSubmitAttestation`	~30,000 gas
`MsgDisputeVO`	~40,000 gas + dispute escrow (`0.1 * minimumStake`, see Section 7.5.2)

The VOIndexStore enables O(1) lookup from any voId to its anchoring batch. This index is the only per-VO on-chain state — at 96 bytes per entry (32-byte voId key + 64-byte batchId value), indexing 1 million VOs requires approximately 96 MB of state, manageable for a Cosmos SDK chain with state pruning.

8.5 Off-Chain Data Availability Layer

Full VO payloads, evidence units, and evidence graphs are stored off-chain with redundancy guarantees.

Primary Store: PostgreSQL (Hot Tier)

Read/write replicas with connection pooling
Tables: verification_objects, evidence_units, evidence_graphs, attestations
Retention: 90 days in hot tier
Indexed by: voId, claimId, subject, predicate, timestamp, qualification

Secondary Store: Object Storage (Warm Tier)

S3-compatible storage (AWS S3 / Cloudflare R2)
VOs serialized as BCE binary blobs, keyed by voId
Evidence graphs stored as adjacency lists in compressed binary format
Retention: 2 years
Accessed via content-addressed URI: baseline://vo/{voId}

Archival Store: Decentralized Storage (Cold Tier)

Arweave for permanent archival of VOs that have reached the weighted attestation threshold (attestationScore >= 3.0, see Section 7.5.3)
IPFS with Filecoin pinning for evidence units and raw provider responses
Content-addressed: CID derived from BCE-encoded payload
Retention: permanent (Arweave) / contract-duration (Filecoin)

Data Flow:

VO created → written to PostgreSQL immediately
OFFCHAIN_DECLARED evidence units promoted to IPFS/Filecoin immediately upon VO creation (these cannot be re-derived from any other source — see Section 3.3)
OFFCHAIN_EPHEMERAL evidence units additionally submitted to Arweave within 7 days (permanent archival, since the original source may be deleted at any time)
VO batched → Merkle root anchored on FirmaChain
After anchoring → VO payload uploaded to S3 with batchId + merkleProof metadata
ONCHAIN evidence units promoted to IPFS/Filecoin within 24h of VO anchoring (these can be re-derived from archive RPCs as fallback, so the urgency is lower)
After reaching weighted attestation threshold (attestationScore >= 3.0, Section 7.5.3) → VO promoted to Arweave for permanent storage
After 90 days → VO evicted from PostgreSQL hot tier (remains in S3 + Arweave/IPFS)

Evidence Serving API (for Validators)

Operators MUST expose an Evidence Serving API for validator evidence retrieval (Section 7.2, Step 2). This API is separate from the consumer-facing query API (Section 8.7).

Endpoints:

Endpoint	Description	Response
`GET /v1/evidence/{evidenceId}`	Retrieve a single evidence unit by content hash	BCE-encoded `canonicalForm` bytes
`POST /v1/evidence/batch`	Retrieve multiple evidence units	Array of `{evidenceId, canonicalForm}` pairs
`GET /v1/vo/{voId}/evidence-manifest`	List all `evidenceId`s referenced by a VO	Array of `evidenceId` strings with evidence type and `ref` field
`GET /v1/evidence/{evidenceId}/provenance`	Retrieve full provenance record (retrievalMethod, anchor, ref, rawResponse)	Full evidence unit metadata (excluding `canonicalForm` bytes for bandwidth)

Guarantees:

Availability: Evidence units MUST remain retrievable via this API for as long as the VO is in CURRENT or SUPPORTED engine version scope (minimum 12 months after engine version deprecation, per Section 5.5)
Integrity: Validators MUST verify Keccak-256(canonicalForm) == evidenceId for every response. The API does not provide trust — only data availability.
Rate limits: Minimum 1,000 evidence units per second per validator. Operators MAY require validator authentication (API key tied to on-chain validator identity).
Fallback declaration: If an operator can no longer serve a specific evidence unit (data loss, corruption), they MUST mark it as UNAVAILABLE in the API response, not return an error silently. Validators use this signal to fall back to decentralized storage or abort replay.

8.6 State Compression Strategies

To minimize on-chain footprint and off-chain storage costs:

VO Deduplication — Identical evaluations (same claim, same evidence, same engine version) produce identical voIds by design. The content-addressing scheme naturally deduplicates. If a VO with a given voId already exists, anchoring is skipped.
Evidence Sharing — Multiple VOs for the same token within the same context window share evidence units. Evidence units are stored once and referenced by evidenceId. A single ACCOUNT_BALANCE snapshot may be referenced by supply_concentration, holder_distribution, and wallet_clustering VOs simultaneously.
Incremental Evidence Graphs — When a token is re-evaluated (e.g., at a later block), only the graph delta is stored. The delta consists of:
- New nodes added since the previous evaluation
- New edges added
- Updated node properties (balance changes)
- Removed nodes (pruned)
Full graph reconstruction = base graph + ordered sequence of deltas. Storage reduction: typically 60-80% for re-evaluations within a 24-hour window.
Feature Vector Compression — The 26-feature vector (Section 5.4) is stored as a fixed-size binary record:
- 26 features x 8 bytes (fixedpoint64) = 208 bytes per feature snapshot (see Appendix B, Section B.2 for encoding and scale factors)
- Compared to JSON representation (~1.2 KB), this is a 5-6x reduction
- Time-series feature history uses delta encoding: store the first snapshot in full, subsequent snapshots as deltas from the previous
- Note: fixedpoint64 (integer-based) is used instead of IEEE 754 float64 to guarantee cross-implementation determinism — see Section 5.4 for per-feature scale factors
Attestation Aggregation — Rather than storing individual attestation signatures on-chain, attestations are aggregated:
- BLS signature aggregation (when supported): N signatures → 1 aggregate signature
- Until BLS is available: attestation bitmap + sorted validator set → compact representation
- On-chain record per VO: aggregate_sig (64 bytes) + validator_bitmap (ceil(N/8) bytes)
- For 100 validators: 64 + 13 = 77 bytes per VO attestation record (vs. 6,400 bytes for individual Ed25519 signatures)

8.7 Query and Indexing Architecture

Efficient querying of anchored data requires an indexing layer between the chain and the API:

FirmaChain Event Indexer:

Subscribes to FirmaChain block events via WebSocket (Tendermint RPC)
Indexes MsgAnchorBatch events into a PostgreSQL mirror table
Indexes MsgSubmitAttestation events for attestation counts
Provides sub-second lookup: voId → anchor proof → on-chain confirmation

Composite Query Path (for API consumers):

Client requests VO by voId
API server checks PostgreSQL hot tier → returns full VO if found
If not in hot tier, fetches from S3 warm tier → caches in hot tier → returns
On-chain anchor proof is attached to the response:

{
  "batchId":                "string",
  "merkleRoot":             "bytes32",
  "merkleProof":            "bytes32[]",
  "firmaChainBlockHeight":  "uint64",
  "firmaChainTxHash":       "string"
}

Client can independently verify: recompute voId from VO content, walk Merkle proof to root, confirm root matches on-chain anchor

Performance Guarantees:

Scenario	Target Latency
Hot-tier VO (95th percentile)	< 50ms
Warm-tier VO (including S3 fetch)	< 500ms
Full cryptographic verification	Supported without trusting the API server
FirmaChain state growth	~100 MB per million VOs

8.8 Migration Path from Current Architecture

The reference implementation (pumpfun-monitor) currently stores all data in PostgreSQL without on-chain anchoring. The migration proceeds in phases.

Trust transparency: During Phases 1-2, VOs are produced without on-chain anchoring. Any consumer of the API during this period is trusting the operator entirely — there is no independent verification mechanism. This trust asymmetry is explicitly surfaced in API responses via the anchoringStatus and anchoringPhase fields (Section 6.1). Consumers MUST be aware that PRE_ANCHORING VOs carry operator-only trust.

Phase 1: Shadow Anchoring (current → +2 months)

Deploy x/baseline module on FirmaChain testnet
Run anchoring in parallel: write VOs to PostgreSQL AND compute Merkle batches
Do NOT submit batches on-chain yet; log batch metadata for validation
Validate: recompute batch roots from logged VOs, confirm determinism
API status: anchoringStatus: "PRE_ANCHORING", anchoringPhase: "SHADOW"

Phase 2: Testnet Anchoring (+2 → +4 months)

Submit MsgAnchorBatch transactions to FirmaChain testnet
Validators run replay protocol against testnet VOs
Stress test: measure gas costs, batch throughput, state growth
Deploy FirmaChain event indexer, validate query path end-to-end
API status: anchoringStatus: "PRE_ANCHORING", anchoringPhase: "TESTNET"
Testnet anchoring is informational — it does not constitute mainnet-verifiable anchoring

Phase 3: Mainnet Soft Launch (+4 → +6 months)

Deploy x/baseline module to FirmaChain mainnet
Begin anchoring production VOs with conservative batching (300-second intervals)
Monitor state growth, validator replay latency, gas economics
S3 warm tier and Arweave cold tier operational
API status: anchoringStatus: "PENDING" → "ANCHORED", anchoringPhase: "MAINNET"
First phase where independent verification is possible

Phase 4: Full Production (+6 months →)

Reduce batch interval to 60 seconds
Enable attestation aggregation on-chain
Open dispute resolution protocol
Historical backfill: anchor Merkle roots for pre-migration VOs (backfilled VOs receive anchoringStatus: "ANCHORED" with anchoringPhase: null and a note in baselineChain metadata indicating backfill)
API status: anchoringStatus: "PENDING" → "ANCHORED", anchoringPhase: null

8. On-Chain Data Anchoring and Storage Architecture

8. On-Chain Data Anchoring and Storage Architecture

8.1 Design Principles

8.2 Anchoring Transaction Schema

8.3 Merkle Batch Construction

8.4 FirmaChain Module Design (`x/baseline`)

8.5 Off-Chain Data Availability Layer

Primary Store: PostgreSQL (Hot Tier)

Secondary Store: Object Storage (Warm Tier)

Archival Store: Decentralized Storage (Cold Tier)

Evidence Serving API (for Validators)

8.6 State Compression Strategies

8.7 Query and Indexing Architecture

8.8 Migration Path from Current Architecture

Phase 1: Shadow Anchoring (current → +2 months)

Phase 2: Testnet Anchoring (+2 → +4 months)

Phase 3: Mainnet Soft Launch (+4 → +6 months)

Phase 4: Full Production (+6 months →)

results matching ""

No results matching ""

8. On-Chain Data Anchoring and Storage Architecture

8.1 Design Principles

8.2 Anchoring Transaction Schema

8.3 Merkle Batch Construction

8.4 FirmaChain Module Design (x/baseline)

8.5 Off-Chain Data Availability Layer

Primary Store: PostgreSQL (Hot Tier)

Secondary Store: Object Storage (Warm Tier)

Archival Store: Decentralized Storage (Cold Tier)

Evidence Serving API (for Validators)

8.6 State Compression Strategies

8.7 Query and Indexing Architecture

8.8 Migration Path from Current Architecture

Phase 1: Shadow Anchoring (current → +2 months)

Phase 2: Testnet Anchoring (+2 → +4 months)

Phase 3: Mainnet Soft Launch (+4 → +6 months)

Phase 4: Full Production (+6 months →)

results matching ""

No results matching ""

8.4 FirmaChain Module Design (`x/baseline`)