InferNode: A Decentralized AI Inference Marketplace on Solana
We present InferNode, a peer-to-peer marketplace for AI inference computation built on the Solana blockchain. InferNode enables buyers to submit inference jobs — text generation, summarization, embeddings, and classification — and receive results from an open network of GPU providers, with payment automatically settled via Anchor smart contracts. Providers register stake, advertise model endpoints, and earn per-completed job without custodial intermediaries. The protocol achieves trustless payment settlement through a commit-reveal result verification scheme, provider slashing for misbehavior, and deterministic on-chain fee distribution. InferNode is live on Solana devnet as of June 2025.
Introduction
Artificial intelligence inference — the act of running a trained model to produce output — has become a fundamental compute primitive. Every chatbot response, every document summary, every code suggestion is an inference request. Yet the market for inference compute remains highly centralized: a handful of API providers control access, set opaque prices, and create single points of failure for applications that depend on them.
Meanwhile, vast quantities of GPU compute sit idle. Developers with gaming rigs, researchers with university allocations, and operators of self-hosted model endpoints have excess capacity with no efficient market to sell it into.
InferNode addresses both sides of this imbalance. Buyers get transparent, competitive pricing for AI inference without API keys, monthly seats, or vendor lock-in. Providers get a permissionless market to monetize idle compute, with automatic, trustless payouts settled on Solana.
Solana is uniquely suited as the settlement layer: sub-second finality, transaction costs of fractions of a cent, and a mature smart-contract ecosystem via the Anchor framework make it practical to settle per-job payments that would be economically irrational on slower or more expensive chains.
Problem Statement
2.1 Centralization of Inference APIs
The dominant model for AI inference is the hosted API: a single company runs the model, sets the price, and controls access. This creates several failure modes:
- Opaque and volatile pricing with no market mechanism to drive costs down
- Vendor lock-in through proprietary request formats and authentication schemes
- Single-provider outages propagate to all dependent applications
- Geographic restrictions limit access in regulated or underserved markets
- Closed models preclude auditability of outputs
2.2 Idle GPU Capacity
Estimates suggest that consumer and prosumer GPU hardware runs at less than 20% average utilization globally. Research institutions, development teams, and enthusiasts collectively hold substantial inference capacity — Llama 3.1 8B requires only 6 GB VRAM and runs comfortably on a gaming-class GPU — but there is no standardized, low-friction way to sell excess capacity.
2.3 Payment Friction
Existing peer-to-peer GPU rental markets rely on fiat payment processors, requiring KYC, minimum balance thresholds, and settlement periods measured in days. These frictions are acceptable for renting a GPU for a week, but make per-inference micropayments economically impractical. A Solana transaction costs ~$0.00025, making per-job settlement of even sub-cent jobs viable.
Protocol Design
3.1 Architecture Overview
InferNode is a hybrid system: trust-critical state (payment, provider registration, slashing) is on-chain; latency-critical work (job dispatch, inference execution, result delivery) is off-chain. Only the result hash is submitted on-chain, enabling verification without storing raw model outputs in a blockchain.
┌──────────────────┐ submit job + pay ┌──────────────────────┐
│ Buyer (web) │ ─────────────────────▶ │ Anchor Escrow PDA │
│ wallet · prompt │ │ amount · expires_at │
└────────┬─────────┘ └──────────┬───────────┘
│ │
│ job metadata │ assign · release
▼ ▼
┌──────────────────┐ dispatch via queue ┌──────────────────────┐
│ Backend (API) │ ───────────────────────▶│ Worker CLI (provider)│
│ postgres·redis │ ◀───────────────────────│ ollama / vllm / api │
└────────┬─────────┘ result + hash └──────────────────────┘
│
▼
buyer sees result · provider receives payout3.2 On-Chain Program (Anchor)
The InferNode Anchor program exposes the following instructions:
// Registry initialize_registry(treasury: Pubkey) register_provider(stake_amount: u64) deactivate_provider() // Jobs create_job(job_id_hash: [u8; 32], amount: u64, protocol_fee_bps: u16, expires_at: i64) assign_provider(job_id_hash: [u8; 32], provider: Pubkey) submit_result_hash(job_id_hash: [u8; 32], result_hash: [u8; 32]) release_payment(job_id_hash: [u8; 32]) // Dispute resolution refund_job(job_id_hash: [u8; 32]) // on timeout slash_provider(provider: Pubkey, amount: u64) // on proven fault
3.3 Job Lifecycle
A job passes through these states, each gated by on-chain conditions:
3.4 Result Verification
Full on-chain storage of model outputs is not feasible given output sizes and cost. Instead, providers submit a SHA-256 hash of the raw output. During a 10-minute dispute window, any party may challenge a result by submitting a pre-image that produces a different hash. The smart contract then slashes the dishonest party's stake. In practice, disputes are rare: providers are pseudonymous but their stake is at risk, creating strong economic incentives for honest behavior.
Payment Model
4.1 Pricing Formula
Job cost is computed deterministically before escrow funding, giving buyers a guaranteed maximum cost:
price = baseFee + (estimatedTokens / 1000) × pricePerKTokens protocol_fee = price × protocolFeePct // default 5% provider_payout = price − protocol_fee
Current network defaults (provider-adjustable per model):
4.2 Escrow Design
Each job creates a unique Program Derived Address (PDA) as its escrow. The PDA is seeded by the job ID hash, ensuring no two jobs share an escrow. Funds can only leave the PDA via three instructions: release_payment (success), refund_job (timeout/dispute), or slash_provider (proven fault). There is no admin key that can drain escrows unilaterally.
4.3 Micropayment Viability
A typical 1,000-token job costs ~0.0006 SOL (~$0.08 at $130/SOL). A Solana transaction to settle that payment costs ~0.000005 SOL (~$0.00065). The settlement overhead is therefore less than 0.1% of job value, making true per-inference micropayments economically viable for the first time.
Provider Network
5.1 Registration & Staking
Any operator with a compatible inference endpoint can register as a provider by staking a minimum of 1 SOL. Stake is locked on-chain and serves as a security deposit that can be slashed in the event of proven misbehavior. Providers with higher stake are prioritized in job dispatch, creating a natural incentive for serious operators to over-collateralize.
5.2 Supported Engine Types
5.3 Worker CLI
Providers interact with the network via the infernode-worker CLI, a Node.js daemon that handles polling, inference routing, result submission, and on-chain interactions. Setup requires four commands:
npm i -g infernode-worker infernode provider init infernode provider set-endpoint --url http://localhost:11434 --mode ollama infernode provider register --stake 5 --model llama3.1:8b --price 0.00048 infernode worker start
5.4 Reputation System
Provider reputation is computed from on-chain data: completed jobs, failed jobs, timeout rate, and stake level. Reputation scores affect job dispatch priority but are not stored on-chain to avoid update costs — they are computed off-chain by the dispatcher and used to break ties when multiple providers match a job's requirements.
Security & Trust
6.1 Threat Model
InferNode operates in an adversarial environment where providers may attempt to submit false results, buyers may attempt to claim refunds on valid jobs, and the off-chain dispatcher may be unavailable. The protocol is designed to be safe under all three conditions.
6.2 Provider Dishonesty
A provider that submits a fabricated result hash risks slashing when a dispute is raised and the honest result pre-image is produced. The expected value of cheating is negative for any provider with significant stake: the slashed amount far exceeds the payout from a single fraudulent job.
6.3 Dispatcher Failure
If the off-chain dispatcher is unavailable, funded jobs are not lost. The escrow PDA records an expires_at timestamp. After expiry, the buyer may call refund_job directly, bypassing the dispatcher entirely. This ensures buyer funds are never permanently locked even if InferNode infrastructure is fully offline.
6.4 Model Output Confidentiality
Raw model outputs are not stored on-chain. Only the SHA-256 hash is submitted. Buyers receive outputs via HTTPS from the backend API. Confidentiality of input prompts from providers is a known limitation of the current design: providers necessarily see input text to run inference. A future ZK-inference integration could address this at the cost of significantly higher proof generation latency.
6.5 Sybil Resistance
The stake requirement creates an economic barrier to sybil attacks. An attacker who registers many low-stake providers gains low dispatch priority and risks losing all stake if any provider misbehaves. High-priority positions require meaningful capital commitment.
Roadmap
Devnet Launch
- Anchor program deployed on Solana devnet
- Worker CLI: Ollama & OpenAI-compatible endpoints
- Web app: job submission, dashboard, provider portal
- text-generation, summarization, embedding, classification tasks
Mainnet Preparation
- Independent security audit of Anchor program
- vLLM engine support for high-throughput providers
- Result dispute mechanism and slash implementation
- Provider reputation scoring on-chain anchoring
Mainnet & Ecosystem
- Mainnet deployment with graduated stake limits
- SDK: TypeScript and Python client libraries
- Streaming inference results via WebSocket
- Multi-modal tasks: image generation, speech-to-text
Trustless Verification
- Optimistic rollup of result hashes for batch settlement
- ZK-proof of inference for deterministic models
- Input prompt encryption for provider-blind execution
- Decentralized governance of protocol fee parameters
Conclusion
InferNode demonstrates that trustless, per-inference micropayments are practical today using Solana as a settlement layer and Anchor as the smart-contract framework. The hybrid architecture — on-chain payment, off-chain computation — achieves the latency required for real-time AI workloads while preserving the trust guarantees that make decentralized markets work.
The fundamental thesis is simple: AI compute is a commodity, and commodities work best in open, competitive markets. InferNode provides the infrastructure for that market to exist on Solana.
We invite GPU operators, AI developers, and protocol contributors to join the devnet, run a worker, and help build the open inference layer for the decentralized web.
This document describes a system under active development. All parameters are subject to change before mainnet launch.