YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

Quantum ML

Overview

This guide provides a complete, actionable roadmap to replicate the results from Kung et al. (arXiv:2503.12099), which presents a machine learning approach for automatic characterization of fluxonium superconducting qubit parameters (E_J, E_C, E_L) using a Swin Transformer V2 model trained via deep transfer learning. The paper reports ~95.6% average accuracy across all three energy parameters. The authors mention that source code will be available on GitHub after publication, but since it may not yet be released, this guide reconstructs every detail needed for a from-scratch replication.


Phase 1: Environment Setup

Hardware Requirements

  • A GPU with at least 8 GB VRAM (NVIDIA RTX 3060 or better recommended). Swin Transformer V2 Tiny has ~28M parameters and is relatively lightweight.[^1]
  • Sufficient CPU/RAM for generating 15,000+ spectrum simulations via QuTiP.

Software Dependencies

Install the following Python packages:

  • PyTorch (β‰₯1.12) with CUDA support
  • torchvision β€” provides pre-built Swin Transformer V2 models (swin_v2_t, swin_v2_b)[^2][^1]
  • timm (PyTorch Image Models) β€” alternative source for swinv2_tiny_window8_256 and other variants[^3]
  • QuTiP β€” Quantum Toolbox in Python for Hamiltonian diagonalization and spectrum computation[^4]
  • scqubits β€” optional but helpful for fluxonium simulation and validation[^5][^6]
  • prodigyopt β€” the Prodigy optimizer (pip install prodigyopt)[^7][^8]
  • scipy β€” for find_peaks_cwt peak detection[^9]
  • numpy, matplotlib, PIL/Pillow
pip install torch torchvision timm qutip scqubits prodigyopt scipy numpy matplotlib pillow

Phase 2: Understanding the Fluxonium Hamiltonian

The Model Hamiltonian

The fluxonium qubit Hamiltonian is:

H = 4 * E_C * n^2 - E_J * cos(phi + phi_ext) + 0.5 * E_L * phi^2

where:

  • E_C = charging energy (capacitance)
  • E_J = Josephson energy
  • E_L = inductive energy
  • phi = phase operator across inductance
  • n = displacement charge operator
  • phi_ext = external magnetic flux (varied over one flux quantum period)

Parameter Ranges

The training data spans these experimentally relevant ranges:

Parameter Range (GHz) Span
E_C 0.5 – 3.0 2.5 GHz
E_L 0.1 – 2.0 1.9 GHz
E_J 2.0 – 10.0 8.0 GHz

Transitions Considered

The energy transitions used are: 0β†’1, 0β†’2, 0β†’3, 0β†’4, 0β†’5, 1β†’2, and 1β†’3, all within the frequency window of 4.0–8.0 GHz.


Phase 3: Generating Training Data

This is the most computationally intensive phase. There are two distinct datasets to generate.

Dataset 1: Pure Spectrum Dataset (N = 15,392)

This dataset contains only the bare transition energies (no coupling/readout effects), making it fast to compute.

For each parameter combination (E_C, E_L, E_J):

  1. Sample parameters randomly or on a grid within the ranges above. The paper uses 15,392 unique combinations.
  2. Sweep phi_ext with 256 points per flux period (0 to 2Ο€).
  3. Diagonalize the Hamiltonian at each flux point using QuTiP. Use scqubits.Fluxonium or build the Hamiltonian matrix directly in QuTiP with a sufficiently large cutoff (typically 110 states).[^5]
  4. Compute transition energies between all relevant level pairs (0-1, 0-2, ..., 1-3).
  5. Filter transitions to retain only those within 4.0–8.0 GHz.
  6. Render as an image: Plot each valid transition point as a black dot on a 2D image (x-axis = phi_ext, y-axis = frequency in GHz). The image serves as input to the Swin Transformer.

Example code sketch for a single spectrum:

import scqubits as scq
import numpy as np

def generate_pure_spectrum(EC, EL, EJ, n_flux=256, cutoff=110):
    fluxonium = scq.Fluxonium(EJ=EJ, EC=EC, EL=EL, flux=0.0, cutoff=cutoff)
    flux_vals = np.linspace(0.0, 1.0, n_flux)  # in units of Phi_0
    
    transitions = [(0,1), (0,2), (0,3), (0,4), (0,5), (1,2), (1,3)]
    spectrum_points = []
    
    for flux in flux_vals:
        fluxonium.flux = flux
        evals = fluxonium.eigenvals(evals_count=6)
        for (i, j) in transitions:
            if j < len(evals):
                freq = evals[j] - evals[i]
                if 4.0 <= freq <= 8.0:
                    spectrum_points.append((flux, freq))
    
    return spectrum_points

Image generation: Convert each spectrum into a fixed-resolution image (e.g., 256Γ—256 pixels). The Swin Transformer V2 Tiny expects 256Γ—256 input. Plot flux on the x-axis and frequency on the y-axis, with black dots on a white background. Save as PNG or convert directly to a tensor.[^1]

Dataset 2: Dispersive Readout Dataset (N = 469)

This dataset simulates a more realistic measurement scenario including dispersive readout effects:

  1. Readout resonator at 6.00 GHz with linewidth 7 MHz and coupling strength g = 100 MHz.
  2. Compute the dispersive shift for each transition using second-order perturbation theory.
  3. Calculate voltage change in readout response caused by dispersive shift for a saturation drive at every transition and flux value.
  4. Threshold: Exclude data points where readout voltage change < 10% of maximum magnitude at readout resonance.
  5. Render as image similarly to the pure spectrum, but now transition points carry varying intensities based on signal magnitude.

This computation is >100Γ— slower per spectrum than the pure dataset, which is why only 469 samples are used. The dispersive readout dataset is critical for the transfer learning step.


Phase 4: Model Architecture β€” Swin Transformer V2

Model Selection

The paper uses Swin Transformer V2, chosen for its lightweight architecture compared to ResNet and DenseNet alternatives. The exact variant isn't specified, but the Swin V2 Tiny model is the most practical choice:[^10][^11]

Property Swin V2 Tiny
Parameters ~28.3M[^1]
Input resolution 256 Γ— 256
GFLOPs 5.94[^1]
Embed dim 96
Depths [^12][^12][^13][^12]
Num heads [^14][^13][^15][^7]
Window size 8

Loading the Model

import torchvision.models as models
import torch.nn as nn

# Load pretrained Swin V2 Tiny (ImageNet weights)
model = models.swin_v2_t(weights=models.Swin_V2_T_Weights.IMAGENET1K_V1)

# Modify the classification head for regression (3 outputs: EC, EL, EJ)
model.head = nn.Linear(model.head.in_features, 3)

Alternatively, using timm:

import timm

model = timm.create_model('swinv2_tiny_window8_256', pretrained=True, num_classes=3)

Input Preprocessing

The spectrum images should be converted to 3-channel (RGB) tensors of size 256Γ—256. Apply standard ImageNet normalization (mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) since the model is pretrained on ImageNet.[^2][^1]


Phase 5: Two-Stage Transfer Learning Training

This is the core methodological contribution. The training proceeds in two stages.

Stage 1: Pre-train on Pure Spectrum Dataset

  • Dataset: 15,392 pure spectrum images

  • Labels: Corresponding [E_C, E_L, E_J] vectors (continuous values)

  • Loss function: Mean Squared Error (MSE):

    Loss = (1/N) * Ξ£ (F_NN(S_E^i) - E^i)^2

  • Optimizer: Prodigy with default lr=1.0. Prodigy is parameter-free and adaptively estimates the learning rate.[^8][^7]

from prodigyopt import Prodigy

optimizer = Prodigy(model.parameters(), lr=1.0, weight_decay=0.01)
  • Training details: Train until convergence. Use a validation split (~10-15%) from the pure dataset to monitor overfitting. The paper does not specify exact epoch counts, so train until validation loss plateaus (likely 50–200 epochs depending on batch size).
  • Batch size: Not explicitly stated; start with 32 or 64.
  • Scheduler: Cosine annealing is recommended with Prodigy.[^7]
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=total_iterations)

Stage 2: Fine-tune on Dispersive Readout Dataset

  • Dataset: 469 dispersive readout spectrum images
  • Initialization: Load all weights from Stage 1
  • Loss function: Same MSE loss
  • Optimizer: Prodigy (reinitialize for the new stage)
  • Training: Fine-tune the entire model on the smaller, more realistic dataset. This transfer learning step is critical β€” the pure spectrum pre-training provides a strong initialization, and the dispersive dataset aligns the model with experimental conditions.
  • Caution: With only 469 samples, overfitting is a risk. Use aggressive data augmentation (random horizontal flips, small rotations, slight noise injection) and early stopping.

Phase 6: Evaluation and Validation

Test Dataset

Generate 512 test spectra with non-repetitive parameter combinations distinct from training data, within the same parameter ranges.

Accuracy Metric

The paper defines accuracy per parameter as:

Acc(E_Ξ½) = (1/N_test) * Ξ£ (1 - |E_Ξ½^i - E_Ξ½^{true,i}| / R(E_Ξ½^{test}))

where R(E_Ξ½^{test}) is the training range (2.5 GHz for E_C, 1.9 GHz for E_L, 8.0 GHz for E_J). This differs from standard classification accuracy β€” it measures how close predictions are relative to the parameter range.

Target Accuracies

Parameter Target Accuracy Implied Average Deviation
E_C 94.5% 0.125 GHz
E_L 97.1% 0.095 GHz
E_J 95.3% 0.4 GHz
Overall 95.6% β€”

These are the benchmarks from the paper.

Error and Cost Metrics

The combined error function:

Error = 1 - (1/3) * Ξ£ Acc(E_Ξ½) for Ξ½ = C, L, J

The cost function measures spectral fit quality:

Cost = (1/N) * Ξ£ (f(phi_i) - f_i)^2

where f(phi_i) is the transition frequency calculated from the predicted parameters.


Phase 7: Automatic Fitting Pipeline (End-to-End)

Once the ML model is trained, the full automatic characterization pipeline works as follows:

Step 1: Preprocess Experimental Data

  • Apply a band-pass filter: keep data points with signal magnitude > 2.5 standard deviations above background average and < 20% of maximum measured magnitude.
  • Use scipy.signal.find_peaks_cwt to detect transition spectrum peaks at magnitude extrema.[^9]

Step 2: ML Initial Guess

  • Feed the preprocessed spectrum image into the trained Swin Transformer V2 model.
  • Obtain initial guesses: E_C^0, E_L^0, E_J^0.

Step 3: Transition Identification

  • Simulate a spectrum using the ML-predicted parameters.
  • Label each experimental data point by associating it with the nearest simulated transition, provided the nearest transition is within 0.3 GHz.
  • Exclude points that are far from any simulated transition or fall within regions where multiple transitions overlap within 0.3 GHz.

Step 4: Least-Squares Fitting

  • Use the ML predictions as initial guesses for a least-squares fit (e.g., scipy.optimize.least_squares or scipy.optimize.curve_fit).
  • Fit the labeled data points to the fluxonium Hamiltonian model.
  • Constrain fitting to 5 iterations as in the paper's benchmarks.
  • Output final refined values of E_C, E_L, E_J.

Phase 8: Reproducing Key Results

Result 1: Prediction Accuracy (Figure 4)

  • Run inference on 512 test spectra.
  • Plot predicted vs. true values for each of E_C, E_L, E_J.
  • Compute average accuracy using the custom metric. Target: ~95.6% overall.

Result 2: Error and Cost Landscapes (Figures 5–6)

  • Choose a test case, e.g., (E_C=1.28, E_J=6.50, E_L=0.70) GHz.
  • Generate a 2D grid of initial parameter guesses.
  • For each initial guess, run 5 fitting iterations and compute Error and Cost.
  • Plot heatmaps showing that the ML prediction falls in the darkest (lowest error/cost) region.

Result 3: ML vs. Random Initial Guess (Table 1)

  • For 60 parameter sets, compare:
    • 512 random initial guesses β†’ 5 fitting iterations β†’ average Error and Cost
    • ML initial guess β†’ 5 fitting iterations β†’ Error and Cost
Method Avg Error Std Error Avg Cost Std Cost
Random initial values 0.218 0.098 0.146 0.130
ML prediction 0.037 0.088 0.024 0.083

The ML approach should yield nearly one order of magnitude improvement.

Result 4: Real Experimental Data (Figure 7)

  • If access to real fluxonium measurement data is available, apply the full pipeline.
  • The paper demonstrates successful characterization with only partial spectra (4.0–5.9 GHz instead of 4.0–8.0 GHz) and even with half-period symmetrized data.

Phase 9: Practical Tips and Troubleshooting

Data Generation Optimization

  • Parallelization: Use Python's multiprocessing to generate spectra in parallel. Each spectrum is independent.
  • Caching: Save computed eigenvalues to disk (HDF5 or NumPy arrays) so you don't recompute if training is restarted.
  • scqubits cutoff: Use cutoff=110 for the fluxonium Hilbert space. Lower cutoffs may miss higher transitions; higher cutoffs waste computation time.[^5]

Image Representation

  • The paper plots spectra as black dots on a white background. Ensure consistent resolution (256Γ—256) and normalization.
  • Consider using a fixed pixel grid: map phi_ext ∈ [0, 2Ο€] to x ∈ and frequency ∈ [4.0, 8.0] GHz to y ∈ .
  • Each dot should be at least 1–2 pixels wide for visibility.

Training Stability

  • Prodigy with lr=1.0 is recommended. If training is unstable, reduce d_coef to 0.5.[^7]
  • For the fine-tuning stage (469 samples), consider freezing early layers of the Swin Transformer and only fine-tuning the later layers and the regression head.
  • Monitor for overfitting by tracking validation loss closely in Stage 2.

Label Normalization

  • Normalize target values to by dividing by the parameter range (e.g., E_C_normalized = (E_C - 0.5) / 2.5). This helps MSE loss treat all three parameters equally.[^16]
  • At inference time, denormalize predictions back to physical units.

Complete Replication Checklist

Step Task Status
1 Install all dependencies (PyTorch, QuTiP, scqubits, prodigyopt, timm) ☐
2 Implement fluxonium Hamiltonian spectrum generator ☐
3 Generate 15,392 pure spectrum images + labels ☐
4 Generate 469 dispersive readout spectrum images + labels ☐
5 Generate 512 test spectrum images + labels ☐
6 Set up Swin V2 Tiny model with 3-output regression head ☐
7 Stage 1: Train on pure spectrum dataset with Prodigy optimizer ☐
8 Stage 2: Fine-tune on dispersive readout dataset ☐
9 Evaluate on test set β€” target ~95.6% accuracy ☐
10 Implement automatic fitting pipeline (filter β†’ ML β†’ label β†’ fit) ☐
11 Reproduce Error/Cost comparison (Table 1) ☐
12 (Optional) Apply to real experimental data ☐

Key References and Resources

  • Paper: arXiv:2503.12099 β€” Kung et al., "Automatic Characterization of Fluxonium Superconducting Qubits Parameters with Deep Transfer Learning"
  • Swin Transformer V2: Liu et al., CVPR 2022 β€” architecture details and pretrained weights[^10]
  • Prodigy Optimizer: Mishchenko & Defazio, arXiv:2306.06101 β€” parameter-free adaptive optimizer[^8]
  • scqubits: Koch et al., Quantum 5, 583 (2021) β€” Python package for superconducting qubit simulation[^6]
  • QuTiP: Quantum Toolbox in Python β€” used for Hamiltonian diagonalization[^4]
  • torchvision SwinV2: Official PyTorch implementation with ImageNet-pretrained weights[^1]

References

  1. swin_v2_t β€” Torchvision main documentation

  2. swin_v2_b β€” Torchvision main documentation - Constructs a swin_v2_base architecture from Swin Transformer V2: Scaling Up Capacity and Resolution....

  3. Loading a pre-trained SwinV2 transformer and modifying the architecture Β· huggingface pytorch-image-models Β· Discussion #1843 - I am trying to create a SwinV2 transformer model by loading pretrained weights and later modifying s...

  4. Accelerate Qubit Research with NVIDIA cuQuantum Integrations in ... - The outputs of scQubits can also easily serve as inputs for analog quantum dynamics simulations usin...

  5. Fluxonium Qubit β€” scqubits Documentation - An instance of the fluxonium qubit is created as follows: fluxonium = scqubits.Fluxonium(EJ = 8.9, E...

  6. Scqubits: a Python package for superconducting qubits - $\textbf{scqubits}$ is an open-source Python package for simulating and analyzing superconducting ci...

  7. prodigyopt - An Adam-like optimizer for neural networks with adaptive estimation of learning rate

  8. The Prodigy optimizer and its variants for training neural ... - The Prodigy optimizer and its variants for training neural networks. - konstmish/prodigy

  9. find_peaks_cwt β€” SciPy v1.17.0 Manual

  10. Swin Transformer V2: Scaling Up Capacity and Resolution - We present techniques for scaling Swin Transformer [35] up to 3 billion parameters and making it cap...

  11. Swin Transformer V2: Advancing Computer Vision with Scalable ... - Architecture & Functionality​​ Swin Transformer V2 retains the hierarchical structure of its predece...

  12. SwinCNet leveraging Swin Transformer V2 and CNN for precise color correction and detail enhancement in underwater image restoration - Underwater image restoration confronts three major challenges: color distortion, contrast degradatio...

  13. Retinal vessel segmentation using a swin transformer-based encoder-decoder architecture

  14. DUSFormer: Dual-Swin Transformer V2 Aggregate Network for Polyp Segmentation - The convolutional neural network method has certain limitations in medical image segmentation. As a ...

  15. Leveraging Swin Transformer for Local-to-Global Weakly Supervised Semantic Segmentation - ...a 0.98% mAP higher localization accuracy, outperforming state-of-the-art models. It also yields c...

  16. An Image Denoising Method Based on Swin Transformer V2 and U-Net Architecture - To address the issue of image degradation caused by noise during image acquisition and transmission,...

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Papers for DaMsTaR/QuantumML