Benchmarks — EvoCortexAI

EARLY INDICATORS (PRE-SEED REVIEW ONLY)

Measured Saturn-Node-01 runs on M4-class hardware.

The page below renders from benchmark-data.json. Simulations and projections remain isolated from measured runs and are labelled separately.

Throughput (output tokens/sec)

Baselines

Efficiency (tokens/sec/W)

Key measured runs

Run	Tokens/sec	Avg wall power (W)	t/s per W	Evidence tier

Mesh scaling projections

Tier C simulations/projections only. These are not measured Saturn-Node-01 results and must not be presented as real benchmark outcomes.

METHODOLOGY

How we measure.

Every published number will meet these requirements. If a number does not come from a compliant run, it will not appear on this page.

◎

Token accounting

Prompt and completion tokens come directly from the inference engine's usage response field. Character or word counts are not used.

⚡

Power measurement

Public numbers use whole-node wall power (wall_power_W) from an external clamp meter or smart plug. Apple SMC package power is recorded separately as an internal reference and is clearly labelled as such if shown. A run without a wall meter is marked smc_only and not published publicly.

⬡

Fixed conditions

Every run records: hardware model and chip, OS version, runtime version, model identifier and quantization, prompt suite name and version, batch size, warm-up request count, power source, ambient temperature if available, and git SHA of Saturn-Control and Saturn-Node. Any change to these conditions produces a distinct run.

△

Sustained throughput, not peak

Published figures are mean sustained values over a fixed prompt suite. Peak figures are not published. Standard deviation is reported alongside the mean when the sample size allows it.

✕

What we do not claim

We do not compare against third-party systems unless using identical methodology on identical hardware. We do not extrapolate from one model or workload to another. We do not publish estimated or adjusted numbers.

TOOLING

How measurements are collected.

A benchmark harness (saturn-bench) is in development as a SwiftPM executable. It dispatches each prompt in a fixed suite to Saturn-Control's /v1/chat/completions endpoint, records per-request timing and token usage, and writes a versioned JSON result file. Wall-power sampling is recorded separately by the operator during the run.

RESULT FILE SCHEMA (illustrative — not a real measurement)

{
  "schemaVersion": 1,
  "status": "dry_run",
  "hardware": { "model": "<hardware model>", "chip": "<chip>", "ramGB": null },
  "software": { "modelID": "<model-id-and-quantization>", "saturnControlSHA": "<sha>" },
  "promptSuite": { "name": "<suite-name-and-version>", "promptCount": null },
  "conditions": { "powerSource": "wall_power_W", "batchSize": 1 },
  "results": {
    "tokensPerSecond": null,
    "meanWatts": null,
    "joulesPerToken": null,
    "tokensPerWattHour": null
  },
  "notes": "Pending first compliant run."
}

Early Indicators.Pre-Seed Review Only.