11. Reference Implementation Notes
11.1 Architecture Overview
The reference implementation comprises four primary codebases:
Data Ingestion (pumpfun-monitor)
- Node.js/TypeScript real-time transaction monitor
- WebSocket subscriptions to 9+ Solana DEX programs
- Borsh/Anchor event parsing from transaction logs
- 26-feature extraction pipeline (TypeScript)
- AES-256-GCM encrypted NDJSON logging
- PostgreSQL persistence (events, pairs, tokens, ML scores)
- ML prediction integration via HTTP to Python FastAPI service
- Web dashboard (HTML5 + WebSocket) for live visualization
Verification & Product API (baseline-api)
- Monorepo:
@baseline/core,@baseline/api,@baseline/worker,@baseline/admin - Fastify-based REST API with Swagger documentation
- PostgreSQL with read/write replica separation
- Background worker with browser-based scraping (Puppeteer)
- Twitter scraper for social content collection
- DexScreener scraper for market data
- JWT authentication (Google OAuth, Apple OAuth, wallet signing)
- PM2 process management (cluster mode for API, fork for worker)
@baseline/core Public Interface:
The @baseline/core package is the typed SDK for integrators. It exports all protocol types (claim, evidence, VO, attestation, error) and a client factory. The complete type definitions are specified in Appendix F.
| Module | Exports | Description |
|---|---|---|
@baseline/core |
createBaselineClient |
Factory function returning a typed BaselineClient instance |
@baseline/core/types |
Claim, ClaimSubmission, Subject, Context, Scope, etc. |
All claim-related types (Section 2) |
@baseline/core/types |
EvidenceUnit, EvidenceReference, EvidenceSourceType, etc. |
All evidence types (Section 3) |
@baseline/core/types |
VerificationObject, QualificationType, LineStatus, etc. |
All VO types (Section 6) |
@baseline/core/types |
Attestation, ValidatorInfo |
Attestation types (Section 7) |
@baseline/core/types |
BaselineError, ErrorCode, ClaimValidationError |
Error types |
@baseline/core/types |
PaginatedResponse, PaginationParams |
Pagination generics |
Product Frontend (baseline-web)
- React 18 + Vite + TailwindCSS
- TypeScript types aligned with Verification Object schema
- Social feed rendering, coin scoring dashboard, bot arena
- Baseline Feed (AI-verified aggregated insights)
- Watchlist management with auth context
- Radix UI primitives, Recharts for data visualization, Framer Motion
AI Agent Layer (Ghost / clawd)
- Claude-powered AI analyst agent ("Ghost")
- Ghost Scoring System v1 (manual) and v2 (automated, 12 categories)
- ML/AI Framework: XGBoost regression, GNN entity resolution, LSTM autoencoders, LLM contract interpretation
- Trading bots (paper trading + mainnet via Jupiter)
- KOL scraping and quality scoring
- PostgreSQL database with coins, social content, trading history
- Cron-scheduled automated analysis and reporting
- Memory persistence via markdown files (daily notes, long-term memory)
11.2 ML Model Stack
| Model Type | Framework | Use Case | Inference Latency |
|---|---|---|---|
| CatBoost (default) | catboost | Pump/dump prediction | < 10ms |
| CatBoost Deep | catboost | Enhanced regularization | < 10ms |
| XGBoost | xgboost | Alternative GBDT | < 10ms |
| Ensemble | catboost+xgb | Soft voting (best precision) | < 20ms |
| TCN | PyTorch | Temporal event sequences | < 50ms |
| LightGBM | lightgbm | Sybil detection | < 10ms |
| GNN (GAT) | PyTorch Geom. | Wallet clustering | < 500ms |
| LSTM Autoencoder | PyTorch | Distribution anomaly | < 50ms |
| FinBERT | transformers | Social sentiment | < 200ms |
| LLM (Llama/Mistral) | transformers | Contract interpretation | 2-5s |
Training Pipeline:
- Collect data via WebSocket monitor (24-48 hours)
- Build dataset: decrypt logs → compute features → label tokens
- Train: model selection, hyperparameter optimization (Optuna), SMOTE for imbalance
- Evaluate: temporal backtest (80/20 chronological split)
- Deploy: FastAPI inference server with hot model reload
Threshold Optimization:
| Strategy | Description |
|---|---|
| F0.5 (default) | Precision-favoring, balanced recall |
| F1 | Balanced precision/recall |
| F2 | Recall-favoring (catch more events) |
| Precision | Maximum precision (strict) |
| Recall | Maximum recall (comprehensive, precision >= 30%) |
11.3 Database Architecture
Primary Database: PostgreSQL
Operational Tables
| Table | Description |
|---|---|
coins |
Token registry (contract address, ticker, metadata) |
coin_social |
Social media links per coin |
coin_social_content |
Scraped social posts (JSONB info field) |
coin_onchain |
On-chain snapshots (rich list, snipers, etc.) |
coin_history |
Audit trail of coin status changes (trigger-based) |
bot_report |
AI-generated analysis reports |
bot_report_history |
Archived report versions (trigger-based) |
Trading Tables
| Table | Description |
|---|---|
trading_account |
Virtual/real trading accounts with balance |
trading_position |
Current token holdings per account |
trading_history |
Buy/sell trade log |
trading_account_pnl_history |
Portfolio snapshots (per-trade and periodic) |
Monitoring Tables
| Table | Description |
|---|---|
DEXTRACKER_EVENTS |
Raw DEX trade events |
DEXTRACKER_PAIRS |
Aggregated pair statistics |
DEXTRACKER_TOKENS |
Token creation events |
DEXTRACKER_GLOBAL_STATS |
System-wide statistics |
DEXTRACKER_ML_SCORES |
ML prediction results |
Cron Tables
| Table | Description |
|---|---|
openclaw_cron_runs |
Cron job execution history |