Abstract

InferNode: A Decentralized AI Inference Marketplace on Solana

We present InferNode, a peer-to-peer marketplace for AI inference computation built on the Solana blockchain. InferNode enables buyers to submit inference jobs — text generation, summarization, embeddings, and classification — and receive results from an open network of GPU providers, with payment automatically settled via Anchor smart contracts. Providers register stake, advertise model endpoints, and earn per-completed job without custodial intermediaries. The protocol achieves trustless payment settlement through a commit-reveal result verification scheme, provider slashing for misbehavior, and deterministic on-chain fee distribution. InferNode is live on Solana devnet as of June 2025.

1 /

Introduction

Artificial intelligence inference — the act of running a trained model to produce output — has become a fundamental compute primitive. Every chatbot response, every document summary, every code suggestion is an inference request. Yet the market for inference compute remains highly centralized: a handful of API providers control access, set opaque prices, and create single points of failure for applications that depend on them.

Meanwhile, vast quantities of GPU compute sit idle. Developers with gaming rigs, researchers with university allocations, and operators of self-hosted model endpoints have excess capacity with no efficient market to sell it into.

InferNode addresses both sides of this imbalance. Buyers get transparent, competitive pricing for AI inference without API keys, monthly seats, or vendor lock-in. Providers get a permissionless market to monetize idle compute, with automatic, trustless payouts settled on Solana.

Solana is uniquely suited as the settlement layer: sub-second finality, transaction costs of fractions of a cent, and a mature smart-contract ecosystem via the Anchor framework make it practical to settle per-job payments that would be economically irrational on slower or more expensive chains.

2 /

Problem Statement

2.1 Centralization of Inference APIs

The dominant model for AI inference is the hosted API: a single company runs the model, sets the price, and controls access. This creates several failure modes:

Opaque and volatile pricing with no market mechanism to drive costs down
Vendor lock-in through proprietary request formats and authentication schemes
Single-provider outages propagate to all dependent applications
Geographic restrictions limit access in regulated or underserved markets
Closed models preclude auditability of outputs

2.2 Idle GPU Capacity

Estimates suggest that consumer and prosumer GPU hardware runs at less than 20% average utilization globally. Research institutions, development teams, and enthusiasts collectively hold substantial inference capacity — Llama 3.1 8B requires only 6 GB VRAM and runs comfortably on a gaming-class GPU — but there is no standardized, low-friction way to sell excess capacity.

2.3 Payment Friction

Existing peer-to-peer GPU rental markets rely on fiat payment processors, requiring KYC, minimum balance thresholds, and settlement periods measured in days. These frictions are acceptable for renting a GPU for a week, but make per-inference micropayments economically impractical. A Solana transaction costs ~$0.00025, making per-job settlement of even sub-cent jobs viable.

3 /

Protocol Design

3.1 Architecture Overview

InferNode is a hybrid system: trust-critical state (payment, provider registration, slashing) is on-chain; latency-critical work (job dispatch, inference execution, result delivery) is off-chain. Only the result hash is submitted on-chain, enabling verification without storing raw model outputs in a blockchain.

ASCII

┌──────────────────┐    submit job + pay     ┌──────────────────────┐
│   Buyer (web)    │ ─────────────────────▶  │  Anchor Escrow PDA   │
│ wallet · prompt  │                         │  amount · expires_at │
└────────┬─────────┘                         └──────────┬───────────┘
         │                                              │
         │ job metadata                                 │ assign · release
         ▼                                              ▼
┌──────────────────┐    dispatch via queue   ┌──────────────────────┐
│  Backend (API)   │ ───────────────────────▶│  Worker CLI (provider)│
│  postgres·redis  │ ◀───────────────────────│  ollama / vllm / api │
└────────┬─────────┘   result + hash          └──────────────────────┘
         │
         ▼
   buyer sees result · provider receives payout

3.2 On-Chain Program (Anchor)

The InferNode Anchor program exposes the following instructions:

Rust

// Registry
initialize_registry(treasury: Pubkey)
register_provider(stake_amount: u64)
deactivate_provider()

// Jobs
create_job(job_id_hash: [u8; 32], amount: u64, protocol_fee_bps: u16, expires_at: i64)
assign_provider(job_id_hash: [u8; 32], provider: Pubkey)
submit_result_hash(job_id_hash: [u8; 32], result_hash: [u8; 32])
release_payment(job_id_hash: [u8; 32])

// Dispute resolution
refund_job(job_id_hash: [u8; 32])      // on timeout
slash_provider(provider: Pubkey, amount: u64)  // on proven fault

3.3 Job Lifecycle

A job passes through these states, each gated by on-chain conditions:

PENDING_PAYMENT

Job created; escrow PDA exists; buyer must fund within 60s

FUNDED

Escrow holds full amount; dispatcher searching for provider

ASSIGNED

Provider locked in; worker polling has started

RESULT_SUBMITTED

Provider submitted result hash; dispute window open (10 min)

COMPLETED

Payment released to provider; protocol fee sent to treasury

REFUNDED

Expired or disputed; full amount returned to buyer

3.4 Result Verification

Full on-chain storage of model outputs is not feasible given output sizes and cost. Instead, providers submit a SHA-256 hash of the raw output. During a 10-minute dispute window, any party may challenge a result by submitting a pre-image that produces a different hash. The smart contract then slashes the dishonest party's stake. In practice, disputes are rare: providers are pseudonymous but their stake is at risk, creating strong economic incentives for honest behavior.

4 /

Payment Model

4.1 Pricing Formula

Job cost is computed deterministically before escrow funding, giving buyers a guaranteed maximum cost:

formula

price      = baseFee + (estimatedTokens / 1000) × pricePerKTokens
protocol_fee = price × protocolFeePct          // default 5%
provider_payout = price − protocol_fee

Current network defaults (provider-adjustable per model):

Base fee

0.0001 SOL

Flat per-job overhead

Per 1K tokens

0.0005 SOL

Provider-set; shown at submission

Protocol fee

Goes to InferNode treasury

4.2 Escrow Design

Each job creates a unique Program Derived Address (PDA) as its escrow. The PDA is seeded by the job ID hash, ensuring no two jobs share an escrow. Funds can only leave the PDA via three instructions: release_payment (success), refund_job (timeout/dispute), or slash_provider (proven fault). There is no admin key that can drain escrows unilaterally.

4.3 Micropayment Viability

A typical 1,000-token job costs ~0.0006 SOL (~$0.08 at $130/SOL). A Solana transaction to settle that payment costs ~0.000005 SOL (~$0.00065). The settlement overhead is therefore less than 0.1% of job value, making true per-inference micropayments economically viable for the first time.

5 /

Provider Network

5.1 Registration & Staking

Any operator with a compatible inference endpoint can register as a provider by staking a minimum of 1 SOL. Stake is locked on-chain and serves as a security deposit that can be slashed in the event of proven misbehavior. Providers with higher stake are prioritized in job dispatch, creating a natural incentive for serious operators to over-collateralize.

5.2 Supported Engine Types

Ollama

Local model runner. Worker polls `/api/generate`. Supports llama3.1, mistral, qwen, deepseek-coder, nomic-embed-text, and any GGUF-compatible model.

vLLM

High-throughput server. OpenAI-compatible `/v1/completions` endpoint. Ideal for datacenter-grade providers running 70B+ parameter models.

OpenAI-compatible

Any endpoint implementing the OpenAI Chat Completions or Embeddings API. Covers Together AI, Groq, Fireworks, self-hosted llama.cpp, etc.

5.3 Worker CLI

Providers interact with the network via the infernode-worker CLI, a Node.js daemon that handles polling, inference routing, result submission, and on-chain interactions. Setup requires four commands:

bash

npm i -g infernode-worker
infernode provider init
infernode provider set-endpoint --url http://localhost:11434 --mode ollama
infernode provider register --stake 5 --model llama3.1:8b --price 0.00048
infernode worker start

5.4 Reputation System

Provider reputation is computed from on-chain data: completed jobs, failed jobs, timeout rate, and stake level. Reputation scores affect job dispatch priority but are not stored on-chain to avoid update costs — they are computed off-chain by the dispatcher and used to break ties when multiple providers match a job's requirements.

6 /

Security & Trust

6.1 Threat Model

InferNode operates in an adversarial environment where providers may attempt to submit false results, buyers may attempt to claim refunds on valid jobs, and the off-chain dispatcher may be unavailable. The protocol is designed to be safe under all three conditions.

6.2 Provider Dishonesty

A provider that submits a fabricated result hash risks slashing when a dispute is raised and the honest result pre-image is produced. The expected value of cheating is negative for any provider with significant stake: the slashed amount far exceeds the payout from a single fraudulent job.

6.3 Dispatcher Failure

If the off-chain dispatcher is unavailable, funded jobs are not lost. The escrow PDA records an expires_at timestamp. After expiry, the buyer may call refund_job directly, bypassing the dispatcher entirely. This ensures buyer funds are never permanently locked even if InferNode infrastructure is fully offline.

6.4 Model Output Confidentiality

Raw model outputs are not stored on-chain. Only the SHA-256 hash is submitted. Buyers receive outputs via HTTPS from the backend API. Confidentiality of input prompts from providers is a known limitation of the current design: providers necessarily see input text to run inference. A future ZK-inference integration could address this at the cost of significantly higher proof generation latency.

6.5 Sybil Resistance

The stake requirement creates an economic barrier to sybil attacks. An attacker who registers many low-stake providers gains low dispatch priority and risks losing all stake if any provider misbehaves. High-priority positions require meaningful capital commitment.

7 /

Roadmap

Phase 1LIVE

Devnet Launch

Anchor program deployed on Solana devnet
Worker CLI: Ollama & OpenAI-compatible endpoints
Web app: job submission, dashboard, provider portal
text-generation, summarization, embedding, classification tasks

Phase 2IN PROGRESS

Mainnet Preparation

Independent security audit of Anchor program
vLLM engine support for high-throughput providers
Result dispute mechanism and slash implementation
Provider reputation scoring on-chain anchoring

Phase 3PLANNED

Mainnet & Ecosystem

Mainnet deployment with graduated stake limits
SDK: TypeScript and Python client libraries
Streaming inference results via WebSocket
Multi-modal tasks: image generation, speech-to-text

Phase 4RESEARCH

Trustless Verification

Optimistic rollup of result hashes for batch settlement
ZK-proof of inference for deterministic models
Input prompt encryption for provider-blind execution
Decentralized governance of protocol fee parameters

8 /

Conclusion

InferNode demonstrates that trustless, per-inference micropayments are practical today using Solana as a settlement layer and Anchor as the smart-contract framework. The hybrid architecture — on-chain payment, off-chain computation — achieves the latency required for real-time AI workloads while preserving the trust guarantees that make decentralized markets work.

The fundamental thesis is simple: AI compute is a commodity, and commodities work best in open, competitive markets. InferNode provides the infrastructure for that market to exist on Solana.

We invite GPU operators, AI developers, and protocol contributors to join the devnet, run a worker, and help build the open inference layer for the decentralized web.

GitHub →Read the Docs →

InferNode Whitepaper v0.1 — Draft for public review — June 2025
This document describes a system under active development. All parameters are subject to change before mainnet launch.