GPU Decoding

Note

The NVIDIA GPU Decoder capability is currently only available to partners and collaborators. Additionally, it is limited only to the Helios-1 target. Please email QCSupport@quantinuum.com for more information.

Quantinuum Helios includes an integration with an NVIDIA Grace Hopper box, via NVQLink [1]. As the industry progresses towards fault tolerance, a big challenge is control and orchestration: quantum processors often need extremely fast, real-time classical compute support (for control signals, error correction, decoding). NVQLink addresses this gap by offering a high-throughput, low latency interconnect between quantum hardware and classical accelerated compute. This is required to achieve massively parallel decoding for Quantum Error Correction (QEC).

The NVIDIA Grace Hopper box currently provides one blackbox decoding algorithm, Belief Propagation - Order Statistics Decoding (BP-OSD). It works in 2 stages:

  1. The BP part is an iterative procedure that propagates local qubit information. A standalone BP decoder is sufficient for simple decoding.

  2. The OSD part requires vast classical compute resource, hence the use of the Grace Hopper box. The OSD method is required for more complicated syndromes. OSD performs matrix factorizations to rank the most likely errors.

The overall workflow consists of the user using an open source Guppy extension for GPU decoding. This extension provides bindings to externally call the BP-OSD method to asynchronously submit syndrome measurement results and retrieve corrections. In addition to the the Guppy program, a corresponding set of decoder matrices must also be submitted using a DecoderConfig instance.

Client-Side Tools

Guppy GPU Library

The guppy-gpu extension provides a Guppy implementation of the Nvidia CUDA-Q QEC Realtime Decoding API. It includes:

  • A Decoder class that wraps GPU-accelerated decoding functionality for processing syndromes and retrieving corrections in bit-packed integer format. The Decoder class uses the gpu and gpu_module decorators available from guppy-gpu package to enable GPU execution of QEC algorithms. All syndrome and correction data is represented as bit-packed unsigned 64-bit (uint64) integers. It enables enqueueing syndromes, retrieving corrections, and resetting decoder state for quantum error correction codes. All methods are GPU-accelerated and operate on bit-packed integer representations of syndromes and corrections.

  • Utility functions for packing and unpacking boolean arrays to/from integers for efficient data transfer to GPU decoders.

The guppy-gpu repository is available at https://github.com/CQCL/guppy-gpu. This package is installable with pip install git+https://github.com/CQCL/guppy-gpu.git.

The user imports the Decoder class, and the pack_int and unpack_int functions from guppy_gpu.cudaq_qec module.

The decoder class provides the following instance methods:

  • Decoder.enqueue_syndromes(self: "Decoder", decoder_id: int, syndrome_size: int, syndrome: int, tag: int): A void function to enqueue a syndrome for decoding.

  • Decoder.reset_decoder(self: "Decoder", decoder_id: int): A void function to reset the decoder. Clears any queued syndromes and resets corrections to 0.

  • Decoder.get_corrections(self: "Decoder", decoder_id: int, return_size: int, reset: int ): An integer function to get the corrections from the decoder.

from guppy_gpu.cudaq_qec import Decoder, pack_int, unpack_int

Nexus Decoder Config

The DecoderConfig used by Nexus requires a YAML formatted collection of decoder matrices [2]. The following matrices:

  1. Detector Matrix

  2. Error Vector

  3. Syndrome Measurement Matrix

  4. Parity Check Matrix (PCM)

  5. Observable Matrix

The decoder matrices must be defined in sparse format.

import qnexus as qnx

project = qnx.projects.get_or_create(name="decoder-project")
qnx.context.set_active_project(project)

The following parameters must be specified in YAML format.

  • id: Identifier for the decoder instance as an integer.

  • type: String defining the decoder function to use. The value, nv-qldpc-decoder, corresponds to the BP-OSD implementation.

  • block_size: Number of data qubits (integer).

  • syndrome_size: Number of syndrome measurement results (integer).

  • num_syndromes_per_round: Number of syndromes per round.

  • H_sparse: Parity Check Matrix

  • O_sparse: Observable Matrix

  • D_sparse: Detector Matrix

  • decoder_custom_args:

    • use_sparsity: true

    • error_rate_vec: Error vector

    • max_iterations: Maximum number of BP-OSD iterations.

    • use_osd: toggle the OSD method (Boolean). True uses OSD.

    • osd_method: OSD method (integer).

    • osd_order: OSD order (integer)

import qnexus as qnx

decoder_matrices = """decoders:
- id: 0
type: nv-qldpc-decoder
block_size: 30
syndrome_size: 11
num_syndromes_per_round: 1
H_sparse: [0, 1, 2, 3, 4, -1, 0, 5, 6, 7, 8, -1, 1, 5, 9, 10, 11, -1, 9, 12, 13, 14, 15, -1, 6, 16, 17, 18, 19, -1, 2, 20, 21, 22, 23, -1, 3, 7, 16, 20, 24, -1, 8, 10, 13, 17, 25, -1, 4, 11, 14,
        21, 26, -1, 15, 18, 25, 27, 28, -1, 19, 22, 24, 27, 29, -1]
O_sparse: [6, 15, 22, -1, 9, 18, 23, -1, 0, 13, 19, -1, 11, 20, 28, -1, 3, 10, 26, -1, 2, 4, 8, 9, 15, 22, 24, -1, 4, 8, 24, -1, 2, 6, 7, 11, 18, 23, 25, -1]
D_sparse: [5, 21, 15, 20, 25, -1, 4, 14, 1, 11, 5, -1, 26, 12, 18, 25, 4, -1, 28, 7, 19, 12, 24, -1, 1, 8, 2, 3, 6, -1, 15, 29, 23, 9, 16, -1, 6, 11, 21, 16, 27, -1, 19, 26, 14, 8, 22, -1, 29, 0,
        24, 18, 20, -1, 22, 2, 17, 10, 7, -1, 27, 9, 13, 17, 3, -1]
decoder_custom_args:
    use_sparsity: true
    error_rate_vec: [0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005,
        0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005, 0.0005]
    max_iterations: 200
    use_osd: true
    osd_method: 3
    osd_order: 50"""

decoder_ref = qnx.gpu_decoder_configs.upload(
    gpu_decoder_config=decoder_matrices,
    name="test",
    description="decoder matrices for test program",
)
qnx.gpu_decoder_configs.get_all().df()
name description created modified project id
0 test decoder matrices for test program 2025-11-05 13:58:09.240168+00:00 2025-11-05 13:58:09.253837+00:00 decoder-project d0d3e916-0355-47df-99b7-379219fe9cd0

Submission

The reference to the GPU decoder can be passed to execute job. A code sample is provided for submission. Currently only Helios-1 supports GPU decoding.

References