The quantum circuit simulator on HPC system plays a key role in algorithm implementation, algorithm design, quantum education and associated other research areas. In this work, we have implemented cuQuantum for accelerating quantum computing workflows on 64-bit Floating Point and TensorFloat-32 based accelerated system. The commonly used quantum algorithms like Shor’s, Quantum Fourier Transformation (QFT), and Sycamore circuit are implemented on HPC-AI system. These algorithms are further accelerated using cuQuantum. The observed performance for GPU enabled circuits increases linearly on 2, 4 and 8 A100 GPUs for the given qubit size. GPU enabled performance obtained on PARAM Siddhi AI system is compared with those observed from CPU only run as well as observed from previous generation volta architecture (V100) GPUs. For Shor, QFT and Sycamore circuit, the relative speed up, between CPU only and eight A100 GPU enabled run is observed as ~143x, ~115x, and 104x for 30, 32, and 32 qubits respectively. Similarly, the relative speed up, between CPU only and 4 V100 GPU enabled run is observed as ~43x, ~29x, and 24x for 30, 32, and 32 qubits respectively. In view of the better compute capability and memory, the relative performance between four A100 and four V100 GPUs varies from 1.5x to 2.2x for all the three algorithms.
Multi-GPU Enabled Quantum Computing on HPC-AI System