Home / Science & Methodology

Science & methodology · how the grade is made

The method behind every graded claim.

Veriqa, built by Quantum Nexus, is the decision layer — not the quantum stack. This page is for the skeptics: the physicist who wants to see the rule, and the security leader who needs to know exactly what a score does and does not mean. No black boxes. Every step is stated, and every limitation is named.

01 · The core thesis

An advantage claim with no baseline is not gradeable.

"Quantum advantage" is a comparative statement. It only means something against something else. A claim that omits the benchmark, the problem size, or the classical comparator has not been weakened — it has not been made. There is nothing to grade, so Veriqa does not pretend to grade it. It says so, in writing, and caps the verdict.

A named benchmark

Advantage on what task? Sampling, factoring, optimisation, simulation — each has its own classical state of the art. Without a named benchmark, "faster" has no referent and the claim cannot be scored.

A stated problem size

Advantage at what scale? Results on toy instances rarely survive the constant factors and error correction of useful sizes. A problem size is what separates a demonstration from a deployable result.

A classical comparator

Faster than what? The honest comparison is against the best known classical method on the same task at the same scale — not a strawman. Missing the comparator is the single most common defect we see.

The rule, stated plainly. If a core advantage claim is missing the benchmark, the problem size, or the classical comparator, it is flagged needs-baseline. Its technical-maturity contribution is capped, and a Proceed verdict is blocked in code. The report still ships — as a Monitor or Require-further-diligence call, never as a green light.
02 · The pipeline, end to end

From a raw claim to a gated verdict.

Every input runs through the same five steps in the same order. The classical-baseline gate sits in the middle on purpose: nothing reaches a Proceed verdict without passing it, and the reviewer gate at the end is enforced by software, not by goodwill.

  1. 1

    Claim extraction

    Each distinct claim is pulled from the source and bound to a source ID — separating what is asserted from how confidently it is phrased.

  2. 2

    Classical-baseline gate

    Every advantage claim is checked for a named benchmark, a problem size and a classical comparator. Missing any → needs-baseline.

  3. 3

    Three scores

    Technical maturity, commercial urgency and hype / overclaim risk are computed from features of the graded evidence.

  4. 4

    Evidence table

    Every sub-claim is written out with its source and a confidence in 0–1, so the verdict can be traced line by line.

  5. 5

    Reviewer gate

    The report is held in draft until an internal reviewer approves it. The gate is software-enforced; a report cannot be released around it.

Deterministic core, optional enrichment. The pipeline runs on an offline rules engine: the same inputs produce the same graded evidence and the same gate decisions. An optional AI-enrichment layer can summarise and surface context, but it never lifts a needs-baseline flag, uncaps a maturity score, or unlocks a blocked verdict.
03 · How the three scores are computed

A transparent, rule-based heuristic — not a model.

Each score is a deterministic function of features extracted from the text: evidence density, the balance of hedging versus superlative language, and the presence of benchmarks, baselines, dates and named entities. The rules are fixed and inspectable. Be clear about what this is not: it is not a sentiment score, and it is not a statistically validated or machine-learned model. It makes no claim to predictive accuracy. It is a consistent way to read what a source actually supports.

Technical maturity

Evidence versus assertion. Raised by a named benchmark, a stated problem size, a present classical comparator, and reproducible or peer-reviewed results. Lowered by toy-instance demonstrations, missing baselines, and claims marked unverified. Capped when any core claim is needs-baseline.

Commercial urgency

Time pressure on the decision. Raised by an external forcing function — a regulatory deadline, the harvest-now-decrypt-later exposure window, or short asset shelf life. Lowered by speculative timelines with no clock attached. Urgency is about when, never about whether the science holds.

Hype / overclaim risk

How far the framing outruns the evidence. Raised by unbenchmarked "advantage" language, superlatives without sources, and missing comparators. Lowered by precise scope, stated limitations, and claims that cite their evidence with confidence levels.

The text features, concretely. The engine counts the things that distinguish evidence from narrative: how many claims carry a source (evidence density), the ratio of hedged statements ("preliminary", "on a small instance") to unqualified superlatives ("breakthrough", "unmatched"), and whether the structural markers of a real result — a benchmark name, a problem size, a classical baseline, a date, a named entity — are present at all.

Each feature maps to a rule, not a weight learned from data. A missing baseline does not nudge a probability; it triggers a stated cap. A superlative with no source does not shift a sentiment dial; it adds to a counted overclaim signal. The rules live in the engine and are applied identically to every input.

Why a heuristic and not a model. A statistical model implies a validation set, a measured error rate, and a claim to accuracy we are not in a position to make. A transparent rule does not. We would rather ship a rule you can read and dispute than a number you have to trust.

What the scores are not

  • Not a sentiment score — tone is irrelevant; evidence is not.
  • Not a learned model — no training set, no fitted weights.
  • Not a probability of being right — it grades support, not truth.
  • Not a single number — three axes, each with its reason.
  • Not hidden — every rule is stated and inspectable.

Any score or example value shown on this page is Illustrative — generated to explain the approach, not drawn from a real subject. Decision-support and general information only — not investment, financial, legal, tax, security, or medical advice, and not an offer or solicitation of any security.

04 · Evidence & confidence

Every line carries a source and a confidence.

The evidence table is the report; the verdict is a function of what is in it. Each sub-claim is written out with the source it rests on and a confidence value in 0–1. The definition matters, so we are precise about it.

Confidence is how strongly the cited source supports that specific sub-claim — nothing more. A confidence of 0.9 means the source backs the sub-claim well. It is not a probability that the underlying claim is true, not a forecast, and not a bet on an outcome.

It degrades with the source, not with optimism. An undated whitepaper, a press release, or a second-hand summary carries lower confidence than a peer-reviewed result with a method section — because the support is weaker, regardless of how the claim is phrased.

Uncitable claims do not get the benefit of the doubt. If a factual claim cannot be tied to a source, it is marked unverified and cannot raise the verdict. Absence of evidence is recorded as absence — it is never silently scored as support.

The whole table exports as structured JSON conforming to a published schema, so the source ID and confidence on every line travel with the report into your diligence pipeline, data room, or risk register.

Illustrative evidence row

  • Sub-claim — solver beats classical optimisation on routing Illustrative
  • Source — SRC-04, vendor whitepaper, undated
  • Confidence — 0.2 (weak: no benchmark, no comparator)
  • Flagneeds-baseline
  • Effect — maturity capped · Proceed blocked
05 · Standards & frameworks we map to

Public standards, used honestly.

Where an external standard exists, we map to it rather than invent our own vocabulary. Two anchor the work: the finalised post-quantum cryptography standards, and a disciplined use of technology-readiness levels. The third anchor is our own non-negotiable — the benchmark, problem-size and classical-baseline discipline that runs through everything above.

Post-quantum cryptography → NIST FIPS 203 / 204 / 205. PQC migration is mapped against the finalised standards: FIPS 203 (ML-KEM, key establishment), FIPS 204 (ML-DSA, signatures) and FIPS 205 (SLH-DSA, signatures). A cryptographic inventory is graded against these named standards, then prioritised by exposure and data shelf life — not by vendor enthusiasm.

Quantum readiness → TRL, used honestly. Technology-readiness levels are a useful shorthand for how far a result sits from deployment, and an easy thing to inflate. We pin a TRL to the evidence that justifies it, and we say when a claimed level is unsupported. A lab demonstration is a lab demonstration; we do not let it be relabelled as a fielded system.

The baseline discipline → our own standard. No claim of advantage stands without a named benchmark, a stated problem size, and a classical comparator. This is the rule that the rest of the method enforces — written down, applied uniformly, and checked in code.

What we map to

  • FIPS 203 — ML-KEM, key establishment
  • FIPS 204 — ML-DSA, digital signatures
  • FIPS 205 — SLH-DSA, hash-based signatures
  • TRL — readiness pinned to evidence
  • Baseline rule — benchmark · size · comparator
06 · Scope lock

What we deliberately do not do.

A neutral referee loses its value the moment it has its own bet to defend. So the boundary is explicit, and it is a feature, not a limitation. Veriqa is the decision layer — it reads and grades claims. It does not produce quantum results of its own.

  • No circuit design or execution. Veriqa does not design, compile, or run quantum circuits. It grades claims about them.
  • No quantum hardware. We operate no quantum processors and integrate with none. There is no device whose results we are incented to flatter.
  • No claim of quantum advantage. Veriqa never asserts that it itself achieves quantum advantage. It evaluates the advantage claims that others make.
  • No stake in the modality. We hold no simulator, chemistry, or hardware position, so no verdict quietly steers you toward a product of ours.
  • No silent expansion. Autonomous quantum execution is out of scope. If that ever changes, it becomes a separate, clearly-scoped product — never a quiet broadening of what a Veriqa report claims.
  • No outcome promised. A verdict grades the strength of the evidence and names what would change it. It is decision-support, not a directive.
On the reviewer gate, precisely. Every customer-facing report is held in draft until an internal reviewer approves it — a gate the software enforces. The credential for any reviewer we engage is a written standard (a graduate degree in quantum information, physics, or applied cryptography, or equivalent hands-on grading experience). We do not represent that an independent outside expert reviews every report, and approval is not a guarantee of any outcome. For high-stakes decisions we recommend independent expert review on your side as well.

Public verdict vocabulary is Proceed / Monitor / Require further diligence. Decision-support and general information only — not investment, financial, legal, tax, security, or medical advice, and not an offer or solicitation of any security. The full disclaimer appears in the footer.

Read the rule, then test it on your own claim.

Bring a paper, a vendor roadmap, or a cryptographic inventory. Run it through the baseline gate, the three scores and the evidence table — and see exactly where the grade comes from.