new

Get trending papers in your email inbox!

Subscribe

Daily Papers

byAK and the research community

Apr 20

Synthetic Generation and Latent Projection Denoising of Rim Lesions in Multiple Sclerosis

Quantitative susceptibility maps from magnetic resonance images can provide both prognostic and diagnostic information in multiple sclerosis, a neurodegenerative disease characterized by the formation of lesions in white matter brain tissue. In particular, susceptibility maps provide adequate contrast to distinguish between "rim" lesions, surrounded by deposited paramagnetic iron, and "non-rim" lesion types. These paramagnetic rim lesions (PRLs) are an emerging biomarker in multiple sclerosis. Much effort has been devoted to both detection and segmentation of such lesions to monitor longitudinal change. As paramagnetic rim lesions are rare, addressing this problem requires confronting the class imbalance between rim and non-rim lesions. We produce synthetic quantitative susceptibility maps of paramagnetic rim lesions and show that inclusion of such synthetic data improves classifier performance and provide a multi-channel extension to generate accompanying contrasts and probabilistic segmentation maps. We exploit the projection capability of our trained generative network to demonstrate a novel denoising approach that allows us to train on ambiguous rim cases and substantially increase the minority class. We show that both synthetic lesion synthesis and our proposed rim lesion label denoising method best approximate the unseen rim lesion distribution and improve detection in a clinically interpretable manner. We release our code and generated data at https://github.com/agr78/PRLx-GAN upon publication.

  • 9 authors
·
May 29, 2025

Assessment of Data Consistency through Cascades of Independently Recurrent Inference Machines for fast and robust accelerated MRI reconstruction

Machine Learning methods can learn how to reconstruct Magnetic Resonance Images and thereby accelerate acquisition, which is of paramount importance to the clinical workflow. Physics-informed networks incorporate the forward model of accelerated MRI reconstruction in the learning process. With increasing network complexity, robustness is not ensured when reconstructing data unseen during training. We aim to embed data consistency (DC) in deep networks while balancing the degree of network complexity. While doing so, we will assess whether either explicit or implicit enforcement of DC in varying network architectures is preferred to optimize performance. We propose a scheme called Cascades of Independently Recurrent Inference Machines (CIRIM) to assess DC through unrolled optimization. Herein we assess DC both implicitly by gradient descent and explicitly by a designed term. Extensive comparison of the CIRIM to CS as well as to other methods is performed: the E2EVN, CascadeNet, KIKINet, LPDNet, RIM, IRIM, and UNet. Models were trained and evaluated on T1-weighted and FLAIR contrast brain data, and T2-weighted knee data. Both 1D and 2D undersampling patterns were evaluated. Robustness was tested by reconstructing 7.5x prospectively undersampled 3D FLAIR MRI data of Multiple Sclerosis (MS) patients with white matter lesions. The CIRIM performed best when implicitly enforcing DC, while the E2EVN required an explicit DC formulation. In reconstructing MS patient data, prospectively acquired with a sampling pattern unseen during model training, the CIRIM maintained lesion contrast while efficiently denoising the images. The CIRIM showed highly promising generalization capabilities maintaining a very fair trade-off between reconstructed image quality and fast reconstruction times, which is crucial in the clinical workflow.

  • 5 authors
·
Nov 30, 2021

Is a PET all you need? A multi-modal study for Alzheimer's disease using 3D CNNs

Alzheimer's Disease (AD) is the most common form of dementia and often difficult to diagnose due to the multifactorial etiology of dementia. Recent works on neuroimaging-based computer-aided diagnosis with deep neural networks (DNNs) showed that fusing structural magnetic resonance images (sMRI) and fluorodeoxyglucose positron emission tomography (FDG-PET) leads to improved accuracy in a study population of healthy controls and subjects with AD. However, this result conflicts with the established clinical knowledge that FDG-PET better captures AD-specific pathologies than sMRI. Therefore, we propose a framework for the systematic evaluation of multi-modal DNNs and critically re-evaluate single- and multi-modal DNNs based on FDG-PET and sMRI for binary healthy vs. AD, and three-way healthy/mild cognitive impairment/AD classification. Our experiments demonstrate that a single-modality network using FDG-PET performs better than MRI (accuracy 0.91 vs 0.87) and does not show improvement when combined. This conforms with the established clinical knowledge on AD biomarkers, but raises questions about the true benefit of multi-modal DNNs. We argue that future work on multi-modal fusion should systematically assess the contribution of individual modalities following our proposed evaluation framework. Finally, we encourage the community to go beyond healthy vs. AD classification and focus on differential diagnosis of dementia, where fusing multi-modal image information conforms with a clinical need.

  • 6 authors
·
Jul 5, 2022

A skeletonization algorithm for gradient-based optimization

The skeleton of a digital image is a compact representation of its topology, geometry, and scale. It has utility in many computer vision applications, such as image description, segmentation, and registration. However, skeletonization has only seen limited use in contemporary deep learning solutions. Most existing skeletonization algorithms are not differentiable, making it impossible to integrate them with gradient-based optimization. Compatible algorithms based on morphological operations and neural networks have been proposed, but their results often deviate from the geometry and topology of the true medial axis. This work introduces the first three-dimensional skeletonization algorithm that is both compatible with gradient-based optimization and preserves an object's topology. Our method is exclusively based on matrix additions and multiplications, convolutional operations, basic non-linear functions, and sampling from a uniform probability distribution, allowing it to be easily implemented in any major deep learning library. In benchmarking experiments, we prove the advantages of our skeletonization algorithm compared to non-differentiable, morphological, and neural-network-based baselines. Finally, we demonstrate the utility of our algorithm by integrating it with two medical image processing applications that use gradient-based optimization: deep-learning-based blood vessel segmentation, and multimodal registration of the mandible in computed tomography and magnetic resonance images.

  • 9 authors
·
Sep 5, 2023

Automated SSIM Regression for Detection and Quantification of Motion Artefacts in Brain MR Images

Motion artefacts in magnetic resonance brain images can have a strong impact on diagnostic confidence. The assessment of MR image quality is fundamental before proceeding with the clinical diagnosis. Motion artefacts can alter the delineation of structures such as the brain, lesions or tumours and may require a repeat scan. Otherwise, an inaccurate (e.g. correct pathology but wrong severity) or incorrect diagnosis (e.g. wrong pathology) may occur. "Image quality assessment" as a fast, automated step right after scanning can assist in deciding if the acquired images are diagnostically sufficient. An automated image quality assessment based on the structural similarity index (SSIM) regression through a residual neural network is proposed in this work. Additionally, a classification into different groups - by subdividing with SSIM ranges - is evaluated. Importantly, this method predicts SSIM values of an input image in the absence of a reference ground truth image. The networks were able to detect motion artefacts, and the best performance for the regression and classification task has always been achieved with ResNet-18 with contrast augmentation. The mean and standard deviation of residuals' distribution were mu=-0.0009 and sigma=0.0139, respectively. Whilst for the classification task in 3, 5 and 10 classes, the best accuracies were 97, 95 and 89\%, respectively. The results show that the proposed method could be a tool for supporting neuro-radiologists and radiographers in evaluating image quality quickly.

  • 7 authors
·
Jun 14, 2022

MulModSeg: Enhancing Unpaired Multi-Modal Medical Image Segmentation with Modality-Conditioned Text Embedding and Alternating Training

In the diverse field of medical imaging, automatic segmentation has numerous applications and must handle a wide variety of input domains, such as different types of Computed Tomography (CT) scans and Magnetic Resonance (MR) images. This heterogeneity challenges automatic segmentation algorithms to maintain consistent performance across different modalities due to the requirement for spatially aligned and paired images. Typically, segmentation models are trained using a single modality, which limits their ability to generalize to other types of input data without employing transfer learning techniques. Additionally, leveraging complementary information from different modalities to enhance segmentation precision often necessitates substantial modifications to popular encoder-decoder designs, such as introducing multiple branched encoding or decoding paths for each modality. In this work, we propose a simple Multi-Modal Segmentation (MulModSeg) strategy to enhance medical image segmentation across multiple modalities, specifically CT and MR. It incorporates two key designs: a modality-conditioned text embedding framework via a frozen text encoder that adds modality awareness to existing segmentation frameworks without significant structural modifications or computational overhead, and an alternating training procedure that facilitates the integration of essential features from unpaired CT and MR inputs. Through extensive experiments with both Fully Convolutional Network and Transformer-based backbones, MulModSeg consistently outperforms previous methods in segmenting abdominal multi-organ and cardiac substructures for both CT and MR modalities. The code is available in this {https://github.com/ChengyinLee/MulModSeg_2024{link}}.

  • 8 authors
·
Nov 23, 2024

Brain Tumor Detection and Classification based on Hybrid Ensemble Classifier

To improve patient survival and treatment outcomes, early diagnosis of brain tumors is an essential task. It is a difficult task to evaluate the magnetic resonance imaging (MRI) images manually. Thus, there is a need for digital methods for tumor diagnosis with better accuracy. However, it is still a very challenging task in assessing their shape, volume, boundaries, tumor detection, size, segmentation, and classification. In this proposed work, we propose a hybrid ensemble method using Random Forest (RF), K-Nearest Neighbour, and Decision Tree (DT) (KNN-RF-DT) based on Majority Voting Method. It aims to calculate the area of the tumor region and classify brain tumors as benign and malignant. In the beginning, segmentation is done by using Otsu's Threshold method. Feature Extraction is done by using Stationary Wavelet Transform (SWT), Principle Component Analysis (PCA), and Gray Level Co-occurrence Matrix (GLCM), which gives thirteen features for classification. The classification is done by hybrid ensemble classifier (KNN-RF-DT) based on the Majority Voting method. Overall it aimed at improving the performance by traditional classifiers instead of going to deep learning. Traditional classifiers have an advantage over deep learning algorithms because they require small datasets for training and have low computational time complexity, low cost to the users, and can be easily adopted by less skilled people. Overall, our proposed method is tested upon dataset of 2556 images, which are used in 85:15 for training and testing respectively and gives good accuracy of 97.305%.

  • 2 authors
·
Jan 1, 2021

TorchEsegeta: Framework for Interpretability and Explainability of Image-based Deep Learning Models

Clinicians are often very sceptical about applying automatic image processing approaches, especially deep learning based methods, in practice. One main reason for this is the black-box nature of these approaches and the inherent problem of missing insights of the automatically derived decisions. In order to increase trust in these methods, this paper presents approaches that help to interpret and explain the results of deep learning algorithms by depicting the anatomical areas which influence the decision of the algorithm most. Moreover, this research presents a unified framework, TorchEsegeta, for applying various interpretability and explainability techniques for deep learning models and generate visual interpretations and explanations for clinicians to corroborate their clinical findings. In addition, this will aid in gaining confidence in such methods. The framework builds on existing interpretability and explainability techniques that are currently focusing on classification models, extending them to segmentation tasks. In addition, these methods have been adapted to 3D models for volumetric analysis. The proposed framework provides methods to quantitatively compare visual explanations using infidelity and sensitivity metrics. This framework can be used by data scientists to perform post-hoc interpretations and explanations of their models, develop more explainable tools and present the findings to clinicians to increase their faith in such models. The proposed framework was evaluated based on a use case scenario of vessel segmentation models trained on Time-of-fight (TOF) Magnetic Resonance Angiogram (MRA) images of the human brain. Quantitative and qualitative results of a comparative study of different models and interpretability methods are presented. Furthermore, this paper provides an extensive overview of several existing interpretability and explainability methods.

  • 10 authors
·
Oct 15, 2021

SPOCKMIP: Segmentation of Vessels in MRAs with Enhanced Continuity using Maximum Intensity Projection as Loss

Identification of vessel structures of different sizes in biomedical images is crucial in the diagnosis of many neurodegenerative diseases. However, the sparsity of good-quality annotations of such images makes the task of vessel segmentation challenging. Deep learning offers an efficient way to segment vessels of different sizes by learning their high-level feature representations and the spatial continuity of such features across dimensions. Semi-supervised patch-based approaches have been effective in identifying small vessels of one to two voxels in diameter. This study focuses on improving the segmentation quality by considering the spatial correlation of the features using the Maximum Intensity Projection~(MIP) as an additional loss criterion. Two methods are proposed with the incorporation of MIPs of label segmentation on the single~(z-axis) and multiple perceivable axes of the 3D volume. The proposed MIP-based methods produce segmentations with improved vessel continuity, which is evident in visual examinations of ROIs. Patch-based training is improved by introducing an additional loss term, MIP loss, to penalise the predicted discontinuity of vessels. A training set of 14 volumes is selected from the StudyForrest dataset comprising of 18 7-Tesla 3D Time-of-Flight~(ToF) Magnetic Resonance Angiography (MRA) images. The generalisation performance of the method is evaluated using the other unseen volumes in the dataset. It is observed that the proposed method with multi-axes MIP loss produces better quality segmentations with a median Dice of 80.245 pm 0.129. Also, the method with single-axis MIP loss produces segmentations with a median Dice of 79.749 pm 0.109. Furthermore, a visual comparison of the ROIs in the predicted segmentation reveals a significant improvement in the continuity of the vessels when MIP loss is incorporated into training.

  • 8 authors
·
Jul 11, 2024

Segmentation variability and radiomics stability for predicting Triple-Negative Breast Cancer subtype using Magnetic Resonance Imaging

Most papers caution against using predictive models for disease stratification based on unselected radiomic features, as these features are affected by contouring variability. Instead, they advocate for the use of the Intraclass Correlation Coefficient (ICC) as a measure of stability for feature selection. However, the direct effect of segmentation variability on the predictive models is rarely studied. This study investigates the impact of segmentation variability on feature stability and predictive performance in radiomics-based prediction of Triple-Negative Breast Cancer (TNBC) subtype using Magnetic Resonance Imaging. A total of 244 images from the Duke dataset were used, with segmentation variability introduced through modifications of manual segmentations. For each mask, explainable radiomic features were selected using the Shapley Additive exPlanations method and used to train logistic regression models. Feature stability across segmentations was assessed via ICC, Pearson's correlation, and reliability scores quantifying the relationship between feature stability and segmentation variability. Results indicate that segmentation accuracy does not significantly impact predictive performance. While incorporating peritumoral information may reduce feature reproducibility, it does not diminish feature predictive capability. Moreover, feature selection in predictive models is not inherently tied to feature stability with respect to segmentation, suggesting that an overreliance on ICC or reliability scores for feature selection might exclude valuable predictive features.

  • 7 authors
·
Apr 2, 2025

Symbrain: A large-scale dataset of MRI images for neonatal brain symmetry analysis

This paper presents an annotated dataset of brain MRI images designed to advance the field of brain symmetry study. Magnetic resonance imaging (MRI) has gained interest in analyzing brain symmetry in neonatal infants, and challenges remain due to the vast size differences between fetal and adult brains. Classification methods for brain structural MRI use scales and visual cues to assess hemisphere symmetry, which can help diagnose neonatal patients by comparing hemispheres and anatomical regions of interest in the brain. Using the Developing Human Connectome Project dataset, this work presents a dataset comprising cerebral images extracted as slices across selected portions of interest for clinical evaluation . All the extracted images are annotated with the brain's midline. All the extracted images are annotated with the brain's midline. From the assumption that a decrease in symmetry is directly related to possible clinical pathologies, the dataset can contribute to a more precise diagnosis because it can be used to train deep learning model application in neonatal cerebral MRI anomaly detection from postnatal infant scans thanks to computer vision. Such models learn to identify and classify anomalies by identifying potential asymmetrical patterns in medical MRI images. Furthermore, this dataset can contribute to the research and development of methods using the relative symmetry of the two brain hemispheres for crucial diagnosis and treatment planning.

  • 5 authors
·
Jan 22, 2024

Whole Heart 3D+T Representation Learning Through Sparse 2D Cardiac MR Images

Cardiac Magnetic Resonance (CMR) imaging serves as the gold-standard for evaluating cardiac morphology and function. Typically, a multi-view CMR stack, covering short-axis (SA) and 2/3/4-chamber long-axis (LA) views, is acquired for a thorough cardiac assessment. However, efficiently streamlining the complex, high-dimensional 3D+T CMR data and distilling compact, coherent representation remains a challenge. In this work, we introduce a whole-heart self-supervised learning framework that utilizes masked imaging modeling to automatically uncover the correlations between spatial and temporal patches throughout the cardiac stacks. This process facilitates the generation of meaningful and well-clustered heart representations without relying on the traditionally required, and often costly, labeled data. The learned heart representation can be directly used for various downstream tasks. Furthermore, our method demonstrates remarkable robustness, ensuring consistent representations even when certain CMR planes are missing/flawed. We train our model on 14,000 unlabeled CMR data from UK BioBank and evaluate it on 1,000 annotated data. The proposed method demonstrates superior performance to baselines in tasks that demand comprehensive 3D+T cardiac information, e.g. cardiac phenotype (ejection fraction and ventricle volume) prediction and multi-plane/multi-frame CMR segmentation, highlighting its effectiveness in extracting comprehensive cardiac features that are both anatomically and pathologically relevant.

  • 6 authors
·
Jun 1, 2024

Brain Captioning: Decoding human brain activity into images and text

Every day, the human brain processes an immense volume of visual information, relying on intricate neural mechanisms to perceive and interpret these stimuli. Recent breakthroughs in functional magnetic resonance imaging (fMRI) have enabled scientists to extract visual information from human brain activity patterns. In this study, we present an innovative method for decoding brain activity into meaningful images and captions, with a specific focus on brain captioning due to its enhanced flexibility as compared to brain decoding into images. Our approach takes advantage of cutting-edge image captioning models and incorporates a unique image reconstruction pipeline that utilizes latent diffusion models and depth estimation. We utilized the Natural Scenes Dataset, a comprehensive fMRI dataset from eight subjects who viewed images from the COCO dataset. We employed the Generative Image-to-text Transformer (GIT) as our backbone for captioning and propose a new image reconstruction pipeline based on latent diffusion models. The method involves training regularized linear regression models between brain activity and extracted features. Additionally, we incorporated depth maps from the ControlNet model to further guide the reconstruction process. We evaluate our methods using quantitative metrics for both generated captions and images. Our brain captioning approach outperforms existing methods, while our image reconstruction pipeline generates plausible images with improved spatial relationships. In conclusion, we demonstrate significant progress in brain decoding, showcasing the enormous potential of integrating vision and language to better understand human cognition. Our approach provides a flexible platform for future research, with potential applications in various fields, including neural art, style transfer, and portable devices.

  • 5 authors
·
May 19, 2023

Classification of Brain Tumours in MR Images using Deep Spatiospatial Models

A brain tumour is a mass or cluster of abnormal cells in the brain, which has the possibility of becoming life-threatening because of its ability to invade neighbouring tissues and also form metastases. An accurate diagnosis is essential for successful treatment planning and magnetic resonance imaging is the principal imaging modality for diagnostic of brain tumours and their extent. Deep Learning methods in computer vision applications have shown significant improvement in recent years, most of which can be credited to the fact that a sizeable amount of data is available to train models on, and the improvements in the model architectures yielding better approximations in a supervised setting. Classifying tumours using such deep learning methods has made significant progress with the availability of open datasets with reliable annotations. Typically those methods are either 3D models, which use 3D volumetric MRIs or even 2D models considering each slice separately. However, by treating the slice spatial dimension separately, spatiotemporal models can be employed as spatiospatial models for this task. These models have the capabilities of learning specific spatial and temporal relationship, while reducing computational costs. This paper uses two spatiotemporal models, ResNet (2+1)D and ResNet Mixed Convolution, to classify different types of brain tumours. It was observed that both these models performed superior to the pure 3D convolutional model, ResNet18. Furthermore, it was also observed that pre-training the models on a different, even unrelated dataset before training them for the task of tumour classification improves the performance. Finally, Pre-trained ResNet Mixed Convolution was observed to be the best model in these experiments, achieving a macro F1-score of 0.93 and a test accuracy of 96.98\%, while at the same time being the model with the least computational cost.

  • 4 authors
·
May 28, 2021

A multi-path 2.5 dimensional convolutional neural network system for segmenting stroke lesions in brain MRI images

Automatic identification of brain lesions from magnetic resonance imaging (MRI) scans of stroke survivors would be a useful aid in patient diagnosis and treatment planning. We propose a multi-modal multi-path convolutional neural network system for automating stroke lesion segmentation. Our system has nine end-to-end UNets that take as input 2-dimensional (2D) slices and examines all three planes with three different normalizations. Outputs from these nine total paths are concatenated into a 3D volume that is then passed to a 3D convolutional neural network to output a final lesion mask. We trained and tested our method on datasets from three sources: Medical College of Wisconsin (MCW), Kessler Foundation (KF), and the publicly available Anatomical Tracings of Lesions After Stroke (ATLAS) dataset. Cross-study validation results (with independent training and validation datasets) were obtained to compare with previous methods based on naive Bayes, random forests, and three recently published convolutional neural networks. Model performance was quantified in terms of the Dice coefficient. Training on the KF and MCW images and testing on the ATLAS images yielded a mean Dice coefficient of 0.54. This was reliably better than the next best previous model, UNet, at 0.47. Reversing the train and test datasets yields a mean Dice of 0.47 on KF and MCW images, whereas the next best UNet reaches 0.45. With all three datasets combined, the current system compared to previous methods also attained a reliably higher cross-validation accuracy. It also achieved high Dice values for many smaller lesions that existing methods have difficulty identifying. Overall, our system is a clear improvement over previous methods for automating stroke lesion segmentation, bringing us an important step closer to the inter-rater accuracy level of human experts.

  • 7 authors
·
May 26, 2019

Recurrent Variational Network: A Deep Learning Inverse Problem Solver applied to the task of Accelerated MRI Reconstruction

Magnetic Resonance Imaging can produce detailed images of the anatomy and physiology of the human body that can assist doctors in diagnosing and treating pathologies such as tumours. However, MRI suffers from very long acquisition times that make it susceptible to patient motion artifacts and limit its potential to deliver dynamic treatments. Conventional approaches such as Parallel Imaging and Compressed Sensing allow for an increase in MRI acquisition speed by reconstructing MR images from sub-sampled MRI data acquired using multiple receiver coils. Recent advancements in Deep Learning combined with Parallel Imaging and Compressed Sensing techniques have the potential to produce high-fidelity reconstructions from highly accelerated MRI data. In this work we present a novel Deep Learning-based Inverse Problem solver applied to the task of Accelerated MRI Reconstruction, called the Recurrent Variational Network (RecurrentVarNet), by exploiting the properties of Convolutional Recurrent Neural Networks and unrolled algorithms for solving Inverse Problems. The RecurrentVarNet consists of multiple recurrent blocks, each responsible for one iteration of the unrolled variational optimization scheme for solving the inverse problem of multi-coil Accelerated MRI Reconstruction. Contrary to traditional approaches, the optimization steps are performed in the observation domain (k-space) instead of the image domain. Each block of the RecurrentVarNet refines the observed k-space and comprises a data consistency term and a recurrent unit which takes as input a learned hidden state and the prediction of the previous block. Our proposed method achieves new state of the art qualitative and quantitative reconstruction results on 5-fold and 10-fold accelerated data from a public multi-coil brain dataset, outperforming previous conventional and deep learning-based approaches.

  • 4 authors
·
Nov 18, 2021

PI-RADS v2 Compliant Automated Segmentation of Prostate Zones Using co-training Motivated Multi-task Dual-Path CNN

The detailed images produced by Magnetic Resonance Imaging (MRI) provide life-critical information for the diagnosis and treatment of prostate cancer. To provide standardized acquisition, interpretation and usage of the complex MRI images, the PI-RADS v2 guideline was proposed. An automated segmentation following the guideline facilitates consistent and precise lesion detection, staging and treatment. The guideline recommends a division of the prostate into four zones, PZ (peripheral zone), TZ (transition zone), DPU (distal prostatic urethra) and AFS (anterior fibromuscular stroma). Not every zone shares a boundary with the others and is present in every slice. Further, the representations captured by a single model might not suffice for all zones. This motivated us to design a dual-branch convolutional neural network (CNN), where each branch captures the representations of the connected zones separately. Further, the representations from different branches act complementary to each other at the second stage of training, where they are fine-tuned through an unsupervised loss. The loss penalises the difference in predictions from the two branches for the same class. We also incorporate multi-task learning in our framework to further improve the segmentation accuracy. The proposed approach improves the segmentation accuracy of the baseline (mean absolute symmetric distance) by 7.56%, 11.00%, 58.43% and 19.67% for PZ, TZ, DPU and AFS zones respectively.

  • 3 authors
·
Sep 22, 2023

PK-YOLO: Pretrained Knowledge Guided YOLO for Brain Tumor Detection in Multiplanar MRI Slices

Brain tumor detection in multiplane Magnetic Resonance Imaging (MRI) slices is a challenging task due to the various appearances and relationships in the structure of the multiplane images. In this paper, we propose a new You Only Look Once (YOLO)-based detection model that incorporates Pretrained Knowledge (PK), called PK-YOLO, to improve the performance for brain tumor detection in multiplane MRI slices. To our best knowledge, PK-YOLO is the first pretrained knowledge guided YOLO-based object detector. The main components of the new method are a pretrained pure lightweight convolutional neural network-based backbone via sparse masked modeling, a YOLO architecture with the pretrained backbone, and a regression loss function for improving small object detection. The pretrained backbone allows for feature transferability of object queries on individual plane MRI slices into the model encoders, and the learned domain knowledge base can improve in-domain detection. The improved loss function can further boost detection performance on small-size brain tumors in multiplanar two-dimensional MRI slices. Experimental results show that the proposed PK-YOLO achieves competitive performance on the multiplanar MRI brain tumor detection datasets compared to state-of-the-art YOLO-like and DETR-like object detectors. The code is available at https://github.com/mkang315/PK-YOLO.

  • 4 authors
·
Oct 29, 2024

Experimental quantum adversarial learning with programmable superconducting qubits

Quantum computing promises to enhance machine learning and artificial intelligence. Different quantum algorithms have been proposed to improve a wide spectrum of machine learning tasks. Yet, recent theoretical works show that, similar to traditional classifiers based on deep classical neural networks, quantum classifiers would suffer from the vulnerability problem: adding tiny carefully-crafted perturbations to the legitimate original data samples would facilitate incorrect predictions at a notably high confidence level. This will pose serious problems for future quantum machine learning applications in safety and security-critical scenarios. Here, we report the first experimental demonstration of quantum adversarial learning with programmable superconducting qubits. We train quantum classifiers, which are built upon variational quantum circuits consisting of ten transmon qubits featuring average lifetimes of 150 mus, and average fidelities of simultaneous single- and two-qubit gates above 99.94% and 99.4% respectively, with both real-life images (e.g., medical magnetic resonance imaging scans) and quantum data. We demonstrate that these well-trained classifiers (with testing accuracy up to 99%) can be practically deceived by small adversarial perturbations, whereas an adversarial training process would significantly enhance their robustness to such perturbations. Our results reveal experimentally a crucial vulnerability aspect of quantum learning systems under adversarial scenarios and demonstrate an effective defense strategy against adversarial attacks, which provide a valuable guide for quantum artificial intelligence applications with both near-term and future quantum devices.

  • 24 authors
·
Apr 4, 2022

Multi-Coil MRI Reconstruction Challenge -- Assessing Brain MRI Reconstruction Models and their Generalizability to Varying Coil Configurations

Deep-learning-based brain magnetic resonance imaging (MRI) reconstruction methods have the potential to accelerate the MRI acquisition process. Nevertheless, the scientific community lacks appropriate benchmarks to assess MRI reconstruction quality of high-resolution brain images, and evaluate how these proposed algorithms will behave in the presence of small, but expected data distribution shifts. The Multi-Coil Magnetic Resonance Image (MC-MRI) Reconstruction Challenge provides a benchmark that aims at addressing these issues, using a large dataset of high-resolution, three-dimensional, T1-weighted MRI scans. The challenge has two primary goals: 1) to compare different MRI reconstruction models on this dataset and 2) to assess the generalizability of these models to data acquired with a different number of receiver coils. In this paper, we describe the challenge experimental design, and summarize the results of a set of baseline and state of the art brain MRI reconstruction models. We provide relevant comparative information on the current MRI reconstruction state-of-the-art and highlight the challenges of obtaining generalizable models that are required prior to broader clinical adoption. The MC-MRI benchmark data, evaluation code and current challenge leaderboard are publicly available. They provide an objective performance assessment for future developments in the field of brain MRI reconstruction.

  • 23 authors
·
Nov 9, 2020

Robust Depth Linear Error Decomposition with Double Total Variation and Nuclear Norm for Dynamic MRI Reconstruction

Compressed Sensing (CS) significantly speeds up Magnetic Resonance Image (MRI) processing and achieves accurate MRI reconstruction from under-sampled k-space data. According to the current research, there are still several problems with dynamic MRI k-space reconstruction based on CS. 1) There are differences between the Fourier domain and the Image domain, and the differences between MRI processing of different domains need to be considered. 2) As three-dimensional data, dynamic MRI has its spatial-temporal characteristics, which need to calculate the difference and consistency of surface textures while preserving structural integrity and uniqueness. 3) Dynamic MRI reconstruction is time-consuming and computationally resource-dependent. In this paper, we propose a novel robust low-rank dynamic MRI reconstruction optimization model via highly under-sampled and Discrete Fourier Transform (DFT) called the Robust Depth Linear Error Decomposition Model (RDLEDM). Our method mainly includes linear decomposition, double Total Variation (TV), and double Nuclear Norm (NN) regularizations. By adding linear image domain error analysis, the noise is reduced after under-sampled and DFT processing, and the anti-interference ability of the algorithm is enhanced. Double TV and NN regularizations can utilize both spatial-temporal characteristics and explore the complementary relationship between different dimensions in dynamic MRI sequences. In addition, Due to the non-smoothness and non-convexity of TV and NN terms, it is difficult to optimize the unified objective model. To address this issue, we utilize a fast algorithm by solving a primal-dual form of the original problem. Compared with five state-of-the-art methods, extensive experiments on dynamic MRI data demonstrate the superior performance of the proposed method in terms of both reconstruction accuracy and time complexity.

  • 3 authors
·
Oct 23, 2023

ISLES 2022: A multi-center magnetic resonance imaging stroke lesion segmentation dataset

Magnetic resonance imaging (MRI) is a central modality for stroke imaging. It is used upon patient admission to make treatment decisions such as selecting patients for intravenous thrombolysis or endovascular therapy. MRI is later used in the duration of hospital stay to predict outcome by visualizing infarct core size and location. Furthermore, it may be used to characterize stroke etiology, e.g. differentiation between (cardio)-embolic and non-embolic stroke. Computer based automated medical image processing is increasingly finding its way into clinical routine. Previous iterations of the Ischemic Stroke Lesion Segmentation (ISLES) challenge have aided in the generation of identifying benchmark methods for acute and sub-acute ischemic stroke lesion segmentation. Here we introduce an expert-annotated, multicenter MRI dataset for segmentation of acute to subacute stroke lesions. This dataset comprises 400 multi-vendor MRI cases with high variability in stroke lesion size, quantity and location. It is split into a training dataset of n=250 and a test dataset of n=150. All training data will be made publicly available. The test dataset will be used for model validation only and will not be released to the public. This dataset serves as the foundation of the ISLES 2022 challenge with the goal of finding algorithmic methods to enable the development and benchmarking of robust and accurate segmentation algorithms for ischemic stroke.

  • 25 authors
·
Jun 14, 2022

ATOMMIC: An Advanced Toolbox for Multitask Medical Imaging Consistency to facilitate Artificial Intelligence applications from acquisition to analysis in Magnetic Resonance Imaging

AI is revolutionizing MRI along the acquisition and processing chain. Advanced AI frameworks have been developed to apply AI in various successive tasks, such as image reconstruction, quantitative parameter map estimation, and image segmentation. Existing frameworks are often designed to perform tasks independently or are focused on specific models or datasets, limiting generalization. We introduce ATOMMIC, an open-source toolbox that streamlines AI applications for accelerated MRI reconstruction and analysis. ATOMMIC implements several tasks using DL networks and enables MultiTask Learning (MTL) to perform related tasks integrated, targeting generalization in the MRI domain. We first review the current state of AI frameworks for MRI through a comprehensive literature search and by parsing 12,479 GitHub repositories. We benchmark 25 DL models on eight publicly available datasets to present distinct applications of ATOMMIC on accelerated MRI reconstruction, image segmentation, quantitative parameter map estimation, and joint accelerated MRI reconstruction and image segmentation utilizing MTL. Our findings demonstrate that ATOMMIC is the only MTL framework with harmonized complex-valued and real-valued data support. Evaluations on single tasks show that physics-based models, which enforce data consistency by leveraging the physical properties of MRI, outperform other models in reconstructing highly accelerated acquisitions. Physics-based models that produce high reconstruction quality can accurately estimate quantitative parameter maps. When high-performing reconstruction models are combined with robust segmentation networks utilizing MTL, performance is improved in both tasks. ATOMMIC facilitates MRI reconstruction and analysis by standardizing workflows, enhancing data interoperability, integrating unique features like MTL, and effectively benchmarking DL models.

  • 4 authors
·
Apr 30, 2024

Conditional Denoising Diffusion Model-Based Robust MR Image Reconstruction from Highly Undersampled Data

Magnetic Resonance Imaging (MRI) is a critical tool in modern medical diagnostics, yet its prolonged acquisition time remains a critical limitation, especially in time-sensitive clinical scenarios. While undersampling strategies can accelerate image acquisition, they often result in image artifacts and degraded quality. Recent diffusion models have shown promise for reconstructing high-fidelity images from undersampled data by learning powerful image priors; however, most existing approaches either (i) rely on unsupervised score functions without paired supervision or (ii) apply data consistency only as a post-processing step. In this work, we introduce a conditional denoising diffusion framework with iterative data-consistency correction, which differs from prior methods by embedding the measurement model directly into every reverse diffusion step and training the model on paired undersampled-ground truth data. This hybrid design bridges generative flexibility with explicit enforcement of MRI physics. Experiments on the fastMRI dataset demonstrate that our framework consistently outperforms recent state-of-the-art deep learning and diffusion-based methods in SSIM, PSNR, and LPIPS, with LPIPS capturing perceptual improvements more faithfully. These results demonstrate that integrating conditional supervision with iterative consistency updates yields substantial improvements in both pixel-level fidelity and perceptual realism, establishing a principled and practical advance toward robust, accelerated MRI reconstruction.

  • 6 authors
·
Oct 7, 2025

SKM-TEA: A Dataset for Accelerated MRI Reconstruction with Dense Image Labels for Quantitative Clinical Evaluation

Magnetic resonance imaging (MRI) is a cornerstone of modern medical imaging. However, long image acquisition times, the need for qualitative expert analysis, and the lack of (and difficulty extracting) quantitative indicators that are sensitive to tissue health have curtailed widespread clinical and research studies. While recent machine learning methods for MRI reconstruction and analysis have shown promise for reducing this burden, these techniques are primarily validated with imperfect image quality metrics, which are discordant with clinically-relevant measures that ultimately hamper clinical deployment and clinician trust. To mitigate this challenge, we present the Stanford Knee MRI with Multi-Task Evaluation (SKM-TEA) dataset, a collection of quantitative knee MRI (qMRI) scans that enables end-to-end, clinically-relevant evaluation of MRI reconstruction and analysis tools. This 1.6TB dataset consists of raw-data measurements of ~25,000 slices (155 patients) of anonymized patient MRI scans, the corresponding scanner-generated DICOM images, manual segmentations of four tissues, and bounding box annotations for sixteen clinically relevant pathologies. We provide a framework for using qMRI parameter maps, along with image reconstructions and dense image labels, for measuring the quality of qMRI biomarker estimates extracted from MRI reconstruction, segmentation, and detection techniques. Finally, we use this framework to benchmark state-of-the-art baselines on this dataset. We hope our SKM-TEA dataset and code can enable a broad spectrum of research for modular image reconstruction and image analysis in a clinically informed manner. Dataset access, code, and benchmarks are available at https://github.com/StanfordMIMI/skm-tea.

  • 12 authors
·
Mar 13, 2022

Can Natural Image Autoencoders Compactly Tokenize fMRI Volumes for Long-Range Dynamics Modeling?

Modeling long-range spatiotemporal dynamics in functional Magnetic Resonance Imaging (fMRI) remains a key challenge due to the high dimensionality of the four-dimensional signals. Prior voxel-based models, although demonstrating excellent performance and interpretation capabilities, are constrained by prohibitive memory demands and thus can only capture limited temporal windows. To address this, we propose TABLeT (Two-dimensionally Autoencoded Brain Latent Transformer), a novel approach that tokenizes fMRI volumes using a pre-trained 2D natural image autoencoder. Each 3D fMRI volume is compressed into a compact set of continuous tokens, enabling long-sequence modeling with a simple Transformer encoder with limited VRAM. Across large-scale benchmarks including the UK-Biobank (UKB), Human Connectome Project (HCP), and ADHD-200 datasets, TABLeT outperforms existing models in multiple tasks, while demonstrating substantial gains in computational and memory efficiency over the state-of-the-art voxel-based method given the same input. Furthermore, we develop a self-supervised masked token modeling approach to pre-train TABLeT, which improves the model's performance for various downstream tasks. Our findings suggest a promising approach for scalable and interpretable spatiotemporal modeling of brain activity. Our code is available at https://github.com/beotborry/TABLeT.

SynthStrip: Skull-Stripping for Any Brain Image

The removal of non-brain signal from magnetic resonance imaging (MRI) data, known as skull-stripping, is an integral component of many neuroimage analysis streams. Despite their abundance, popular classical skull-stripping methods are usually tailored to images with specific acquisition properties, namely near-isotropic resolution and T1-weighted (T1w) MRI contrast, which are prevalent in research settings. As a result, existing tools tend to adapt poorly to other image types, such as stacks of thick slices acquired with fast spin-echo (FSE) MRI that are common in the clinic. While learning-based approaches for brain extraction have gained traction in recent years, these methods face a similar burden, as they are only effective for image types seen during the training procedure. To achieve robust skull-stripping across a landscape of imaging protocols, we introduce SynthStrip, a rapid, learning-based brain-extraction tool. By leveraging anatomical segmentations to generate an entirely synthetic training dataset with anatomies, intensity distributions, and artifacts that far exceed the realistic range of medical images, SynthStrip learns to successfully generalize to a variety of real acquired brain images, removing the need for training data with target contrasts. We demonstrate the efficacy of SynthStrip for a diverse set of image acquisitions and resolutions across subject populations, ranging from newborn to adult. We show substantial improvements in accuracy over popular skull-stripping baselines -- all with a single trained model. Our method and labeled evaluation data are available at https://w3id.org/synthstrip.

  • 5 authors
·
Mar 18, 2022

Contrastive learning of global and local features for medical image segmentation with limited annotations

A key requirement for the success of supervised deep learning is a large labeled dataset - a condition that is difficult to meet in medical image analysis. Self-supervised learning (SSL) can help in this regard by providing a strategy to pre-train a neural network with unlabeled data, followed by fine-tuning for a downstream task with limited annotations. Contrastive learning, a particular variant of SSL, is a powerful technique for learning image-level representations. In this work, we propose strategies for extending the contrastive learning framework for segmentation of volumetric medical images in the semi-supervised setting with limited annotations, by leveraging domain-specific and problem-specific cues. Specifically, we propose (1) novel contrasting strategies that leverage structural similarity across volumetric medical images (domain-specific cue) and (2) a local version of the contrastive loss to learn distinctive representations of local regions that are useful for per-pixel segmentation (problem-specific cue). We carry out an extensive evaluation on three Magnetic Resonance Imaging (MRI) datasets. In the limited annotation setting, the proposed method yields substantial improvements compared to other self-supervision and semi-supervised learning techniques. When combined with a simple data augmentation technique, the proposed method reaches within 8% of benchmark performance using only two labeled MRI volumes for training, corresponding to only 4% (for ACDC) of the training data used to train the benchmark. The code is made public at https://github.com/krishnabits001/domain_specific_cl.

  • 4 authors
·
Jun 18, 2020

pyMEAL: A Multi-Encoder Augmentation-Aware Learning for Robust and Generalizable Medical Image Translation

Medical imaging is critical for diagnostics, but clinical adoption of advanced AI-driven imaging faces challenges due to patient variability, image artifacts, and limited model generalization. While deep learning has transformed image analysis, 3D medical imaging still suffers from data scarcity and inconsistencies due to acquisition protocols, scanner differences, and patient motion. Traditional augmentation uses a single pipeline for all transformations, disregarding the unique traits of each augmentation and struggling with large data volumes. To address these challenges, we propose a Multi-encoder Augmentation-Aware Learning (MEAL) framework that leverages four distinct augmentation variants processed through dedicated encoders. Three fusion strategies such as concatenation (CC), fusion layer (FL), and adaptive controller block (BD) are integrated to build multi-encoder models that combine augmentation-specific features before decoding. MEAL-BD uniquely preserves augmentation-aware representations, enabling robust, protocol-invariant feature learning. As demonstrated in a Computed Tomography (CT)-to-T1-weighted Magnetic Resonance Imaging (MRI) translation study, MEAL-BD consistently achieved the best performance on both unseen- and predefined-test data. On both geometric transformations (like rotations and flips) and non-augmented inputs, MEAL-BD outperformed other competing methods, achieving higher mean peak signal-to-noise ratio (PSNR) and structural similarity index measure (SSIM) scores. These results establish MEAL as a reliable framework for preserving structural fidelity and generalizing across clinically relevant variability. By reframing augmentation as a source of diverse, generalizable features, MEAL supports robust, protocol-invariant learning, advancing clinically reliable medical imaging solutions.

  • 6 authors
·
May 30, 2025

SAM2-SGP: Enhancing SAM2 for Medical Image Segmentation via Support-Set Guided Prompting

Although new vision foundation models such as Segment Anything Model 2 (SAM2) have significantly enhanced zero-shot image segmentation capabilities, reliance on human-provided prompts poses significant challenges in adapting SAM2 to medical image segmentation tasks. Moreover, SAM2's performance in medical image segmentation was limited by the domain shift issue, since it was originally trained on natural images and videos. To address these challenges, we proposed SAM2 with support-set guided prompting (SAM2-SGP), a framework that eliminated the need for manual prompts. The proposed model leveraged the memory mechanism of SAM2 to generate pseudo-masks using image-mask pairs from a support set via a Pseudo-mask Generation (PMG) module. We further introduced a novel Pseudo-mask Attention (PMA) module, which used these pseudo-masks to automatically generate bounding boxes and enhance localized feature extraction by guiding attention to relevant areas. Furthermore, a low-rank adaptation (LoRA) strategy was adopted to mitigate the domain shift issue. The proposed framework was evaluated on both 2D and 3D datasets across multiple medical imaging modalities, including fundus photography, X-ray, computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), and ultrasound. The results demonstrated a significant performance improvement over state-of-the-art models, such as nnUNet and SwinUNet, as well as foundation models, such as SAM2 and MedSAM2, underscoring the effectiveness of the proposed approach. Our code is publicly available at https://github.com/astlian9/SAM_Support.

  • 4 authors
·
Jun 24, 2025

UNet++: Redesigning Skip Connections to Exploit Multiscale Features in Image Segmentation

The state-of-the-art models for medical image segmentation are variants of U-Net and fully convolutional networks (FCN). Despite their success, these models have two limitations: (1) their optimal depth is apriori unknown, requiring extensive architecture search or inefficient ensemble of models of varying depths; and (2) their skip connections impose an unnecessarily restrictive fusion scheme, forcing aggregation only at the same-scale feature maps of the encoder and decoder sub-networks. To overcome these two limitations, we propose UNet++, a new neural architecture for semantic and instance segmentation, by (1) alleviating the unknown network depth with an efficient ensemble of U-Nets of varying depths, which partially share an encoder and co-learn simultaneously using deep supervision; (2) redesigning skip connections to aggregate features of varying semantic scales at the decoder sub-networks, leading to a highly flexible feature fusion scheme; and (3) devising a pruning scheme to accelerate the inference speed of UNet++. We have evaluated UNet++ using six different medical image segmentation datasets, covering multiple imaging modalities such as computed tomography (CT), magnetic resonance imaging (MRI), and electron microscopy (EM), and demonstrating that (1) UNet++ consistently outperforms the baseline models for the task of semantic segmentation across different datasets and backbone architectures; (2) UNet++ enhances segmentation quality of varying-size objects -- an improvement over the fixed-depth U-Net; (3) Mask RCNN++ (Mask R-CNN with UNet++ design) outperforms the original Mask R-CNN for the task of instance segmentation; and (4) pruned UNet++ models achieve significant speedup while showing only modest performance degradation. Our implementation and pre-trained models are available at https://github.com/MrGiovanni/UNetPlusPlus.

  • 4 authors
·
Dec 10, 2019

Benchmarking the CoW with the TopCoW Challenge: Topology-Aware Anatomical Segmentation of the Circle of Willis for CTA and MRA

The Circle of Willis (CoW) is an important network of arteries connecting major circulations of the brain. Its vascular architecture is believed to affect the risk, severity, and clinical outcome of serious neurovascular diseases. However, characterizing the highly variable CoW anatomy is still a manual and time-consuming expert task. The CoW is usually imaged by two non-invasive angiographic imaging modalities, magnetic resonance angiography (MRA) and computed tomography angiography (CTA), but there exist limited datasets with annotations on CoW anatomy, especially for CTA. Therefore, we organized the TopCoW challenge with the release of an annotated CoW dataset. The TopCoW dataset is the first public dataset with voxel-level annotations for 13 CoW vessel components, enabled by virtual reality technology. It is also the first large dataset using 200 pairs of MRA and CTA from the same patients. As part of the benchmark, we invited submissions worldwide and attracted over 250 registered participants from six continents. The submissions were evaluated on both internal and external test datasets of 226 scans from over five centers. The top performing teams achieved over 90% Dice scores at segmenting the CoW components, over 80% F1 scores at detecting key CoW components, and over 70% balanced accuracy at classifying CoW variants for nearly all test sets. The best algorithms also showed clinical potential in classifying fetal-type posterior cerebral artery and locating aneurysms with CoW anatomy. TopCoW demonstrated the utility and versatility of CoW segmentation algorithms for a wide range of downstream clinical applications with explainability. The annotated datasets and best performing algorithms have been released as public Zenodo records to foster further methodological development and clinical tool building.

  • 113 authors
·
Dec 29, 2023

Joint Liver and Hepatic Lesion Segmentation in MRI using a Hybrid CNN with Transformer Layers

Deep learning-based segmentation of the liver and hepatic lesions therein steadily gains relevance in clinical practice due to the increasing incidence of liver cancer each year. Whereas various network variants with overall promising results in the field of medical image segmentation have been successfully developed over the last years, almost all of them struggle with the challenge of accurately segmenting hepatic lesions in magnetic resonance imaging (MRI). This led to the idea of combining elements of convolutional and transformer-based architectures to overcome the existing limitations. This work presents a hybrid network called SWTR-Unet, consisting of a pretrained ResNet, transformer blocks as well as a common Unet-style decoder path. This network was primarily applied to single-modality non-contrast-enhanced liver MRI and additionally to the publicly available computed tomography (CT) data of the liver tumor segmentation (LiTS) challenge to verify the applicability on other modalities. For a broader evaluation, multiple state-of-the-art networks were implemented and applied, ensuring a direct comparability. Furthermore, correlation analysis and an ablation study were carried out, to investigate various influencing factors on the segmentation accuracy of the presented method. With Dice scores of averaged 98+-2% for liver and 81+-28% lesion segmentation on the MRI dataset and 97+-2% and 79+-25%, respectively on the CT dataset, the proposed SWTR-Unet proved to be a precise approach for liver and hepatic lesion segmentation with state-of-the-art results for MRI and competing accuracy in CT imaging. The achieved segmentation accuracy was found to be on par with manually performed expert segmentations as indicated by inter-observer variabilities for liver lesion segmentation. In conclusion, the presented method could save valuable time and resources in clinical practice.

  • 7 authors
·
Jan 26, 2022

Decipher-MR: A Vision-Language Foundation Model for 3D MRI Representations

Magnetic Resonance Imaging (MRI) is a critical medical imaging modality in clinical diagnosis and research, yet its complexity and heterogeneity pose challenges for automated analysis, particularly in scalable and generalizable machine learning applications. While foundation models have revolutionized natural language and vision tasks, their application to MRI remains limited due to data scarcity and narrow anatomical focus. In this work, we present Decipher-MR, a 3D MRI-specific vision-language foundation model trained on a large-scale dataset comprising 200,000 MRI series from over 22,000 studies spanning diverse anatomical regions, sequences, and pathologies. Decipher-MR integrates self-supervised vision learning with report-guided text supervision to build robust, generalizable representations, enabling effective adaptation across broad applications. To enable robust and diverse clinical tasks with minimal computational overhead, Decipher-MR supports a modular design that enables tuning of lightweight, task-specific decoders attached to a frozen pretrained encoder. Following this setting, we evaluate Decipher-MR across diverse benchmarks including disease classification, demographic prediction, anatomical localization, and cross-modal retrieval, demonstrating consistent performance gains over existing foundation models and task-specific approaches. Our results establish Decipher-MR as a scalable and versatile foundation for MRI-based AI, facilitating efficient development across clinical and research domains.

  • 14 authors
·
Sep 25, 2025

MRI Super-Resolution with Deep Learning: A Comprehensive Survey

High-resolution (HR) magnetic resonance imaging (MRI) is crucial for many clinical and research applications. However, achieving it remains costly and constrained by technical trade-offs and experimental limitations. Super-resolution (SR) presents a promising computational approach to overcome these challenges by generating HR images from more affordable low-resolution (LR) scans, potentially improving diagnostic accuracy and efficiency without requiring additional hardware. This survey reviews recent advances in MRI SR techniques, with a focus on deep learning (DL) approaches. It examines DL-based MRI SR methods from the perspectives of computer vision, computational imaging, inverse problems, and MR physics, covering theoretical foundations, architectural designs, learning strategies, benchmark datasets, and performance metrics. We propose a systematic taxonomy to categorize these methods and present an in-depth study of both established and emerging SR techniques applicable to MRI, considering unique challenges in clinical and research contexts. We also highlight open challenges and directions that the community needs to address. Additionally, we provide a collection of essential open-access resources, tools, and tutorials, available on our GitHub: https://github.com/mkhateri/Awesome-MRI-Super-Resolution. IEEE keywords: MRI, Super-Resolution, Deep Learning, Computational Imaging, Inverse Problem, Survey.

Harvard Harvard University
·
Nov 20, 2025 2

A Tutorial on MRI Reconstruction: From Modern Methods to Clinical Implications

MRI is an indispensable clinical tool, offering a rich variety of tissue contrasts to support broad diagnostic and research applications. Clinical exams routinely acquire multiple structural sequences that provide complementary information for differential diagnosis, while research protocols often incorporate advanced functional, diffusion, spectroscopic, and relaxometry sequences to capture multidimensional insights into tissue structure and composition. However, these capabilities come at the cost of prolonged scan times, which reduce patient throughput, increase susceptibility to motion artifacts, and may require trade-offs in image quality or diagnostic scope. Over the last two decades, advances in image reconstruction algorithms--alongside improvements in hardware and pulse sequence design--have made it possible to accelerate acquisitions while preserving diagnostic quality. Central to this progress is the ability to incorporate prior information to regularize the solutions to the reconstruction problem. In this tutorial, we overview the basics of MRI reconstruction and highlight state-of-the-art approaches, beginning with classical methods that rely on explicit hand-crafted priors, and then turning to deep learning methods that leverage a combination of learned and crafted priors to further push the performance envelope. We also explore the translational aspects and eventual clinical implications of these methods. We conclude by discussing future directions to address remaining challenges in MRI reconstruction. The tutorial is accompanied by a Python toolbox (https://github.com/tutorial-MRI-recon/tutorial) to demonstrate select methods discussed in the article.

  • 7 authors
·
Jul 22, 2025

Multimodal Masked Autoencoder Pre-training for 3D MRI-Based Brain Tumor Analysis with Missing Modalities

Multimodal magnetic resonance imaging (MRI) constitutes the first line of investigation for clinicians in the care of brain tumors, providing crucial insights for surgery planning, treatment monitoring, and biomarker identification. Pre-training on large datasets have been shown to help models learn transferable representations and adapt with minimal labeled data. This behavior is especially valuable in medical imaging, where annotations are often scarce. However, applying this paradigm to multimodal medical data introduces a challenge: most existing approaches assume that all imaging modalities are available during both pre-training and fine-tuning. In practice, missing modalities often occur due to acquisition issues, specialist unavailability, or specific experimental designs on small in-house datasets. Consequently, a common approach involves training a separate model for each desired modality combination, making the process both resource-intensive and impractical for clinical use. Therefore, we introduce BM-MAE, a masked image modeling pre-training strategy tailored for multimodal MRI data. The same pre-trained model seamlessly adapts to any combination of available modalities, extracting rich representations that capture both intra- and inter-modal information. This allows fine-tuning on any subset of modalities without requiring architectural changes, while still benefiting from a model pre-trained on the full set of modalities. Extensive experiments show that the proposed pre-training strategy outperforms or remains competitive with baselines that require separate pre-training for each modality subset, while substantially surpassing training from scratch on several downstream tasks. Additionally, it can quickly and efficiently reconstruct missing modalities, highlighting its practical value. Code and trained models are available at: https://github.com/Lucas-rbnt/BM-MAE

  • 3 authors
·
May 1, 2025

DDoS-UNet: Incorporating temporal information using Dynamic Dual-channel UNet for enhancing super-resolution of dynamic MRI

Magnetic resonance imaging (MRI) provides high spatial resolution and excellent soft-tissue contrast without using harmful ionising radiation. Dynamic MRI is an essential tool for interventions to visualise movements or changes of the target organ. However, such MRI acquisition with high temporal resolution suffers from limited spatial resolution - also known as the spatio-temporal trade-off of dynamic MRI. Several approaches, including deep learning based super-resolution approaches, have been proposed to mitigate this trade-off. Nevertheless, such an approach typically aims to super-resolve each time-point separately, treating them as individual volumes. This research addresses the problem by creating a deep learning model which attempts to learn both spatial and temporal relationships. A modified 3D UNet model, DDoS-UNet, is proposed - which takes the low-resolution volume of the current time-point along with a prior image volume. Initially, the network is supplied with a static high-resolution planning scan as the prior image along with the low-resolution input to super-resolve the first time-point. Then it continues step-wise by using the super-resolved time-points as the prior image while super-resolving the subsequent time-points. The model performance was tested with 3D dynamic data that was undersampled to different in-plane levels. The proposed network achieved an average SSIM value of 0.951pm0.017 while reconstructing the lowest resolution data (i.e. only 4\% of the k-space acquired) - which could result in a theoretical acceleration factor of 25. The proposed approach can be used to reduce the required scan-time while achieving high spatial resolution.

  • 5 authors
·
Feb 10, 2022

The Imaging Database for Epilepsy And Surgery (IDEAS)

Magnetic resonance imaging (MRI) is a crucial tool to identify brain abnormalities in a wide range of neurological disorders. In focal epilepsy MRI is used to identify structural cerebral abnormalities. For covert lesions, machine learning and artificial intelligence algorithms may improve lesion detection if abnormalities are not evident on visual inspection. The success of this approach depends on the volume and quality of training data. Herein, we release an open-source dataset of preprocessed MRI scans from 442 individuals with drug-refractory focal epilepsy who had neurosurgical resections, and detailed demographic information. The MRI scan data includes the preoperative 3D T1 and where available 3D FLAIR, as well as a manually inspected complete surface reconstruction and volumetric parcellations. Demographic information includes age, sex, age of onset of epilepsy, location of surgery, histopathology of resected specimen, occurrence and frequency of focal seizures with and without impairment of awareness, focal to bilateral tonic-clonic seizures, number of anti-seizure medications (ASMs) at time of surgery, and a total of 1764 patient years of post-surgical follow up. Crucially, we also include resection masks delineated from post-surgical imaging. To demonstrate the veracity of our data, we successfully replicated previous studies showing long-term outcomes of seizure freedom in the range of around 50%. Our imaging data replicates findings of group level atrophy in patients compared to controls. Resection locations in the cohort were predominantly in the temporal and frontal lobes. We envisage our dataset, shared openly with the community, will catalyse the development and application of computational methods in clinical neurology.

  • 15 authors
·
Jun 10, 2024

Benchmarking Vanilla GAN, DCGAN, and WGAN Architectures for MRI Reconstruction: A Quantitative Analysis

Magnetic Resonance Imaging (MRI) is a crucial imaging modality for viewing internal body structures. This research work analyses the performance of popular GAN models for accurate and precise MRI reconstruction by enhancing image quality and improving diagnostic accuracy. Three GAN architectures considered in this study are Vanilla GAN, Deep Convolutional GAN (DCGAN), and Wasserstein GAN (WGAN). They were trained and evaluated using knee, brain, and cardiac MRI datasets to assess their generalizability across body regions. While the Vanilla GAN operates on the fundamentals of the adversarial network setup, DCGAN advances image synthesis by securing the convolutional layers, giving a superior appearance to the prevalent spatial features. Training instability is resolved in WGAN through the Wasserstein distance to minimize an unstable regime, therefore, ensuring stable convergence and high-quality images. The GAN models were trained and tested using 1000 MR images of an anonymized knee, 805 images of Heart, 90 images of Brain MRI dataset. The Structural Similarity Index (SSIM) for Vanilla GAN is 0.84, DCGAN is 0.97, and WGAN is 0.99. The Peak Signal to Noise Ratio (PSNR) for Vanilla GAN is 26, DCGAN is 49.3, and WGAN is 43.5. The results were further statistically validated. This study shows that DCGAN and WGAN-based frameworks are promising in MR image reconstruction because of good image quality and superior accuracy. With the first cross-organ benchmark of baseline GANs under a common preprocessing pipeline, this work provides a reproducible benchmark for future hybrid GANs and clinical MRI applications.

  • 5 authors
·
Jan 30

ReconResNet: Regularised Residual Learning for MR Image Reconstruction of Undersampled Cartesian and Radial Data

MRI is an inherently slow process, which leads to long scan time for high-resolution imaging. The speed of acquisition can be increased by ignoring parts of the data (undersampling). Consequently, this leads to the degradation of image quality, such as loss of resolution or introduction of image artefacts. This work aims to reconstruct highly undersampled Cartesian or radial MR acquisitions, with better resolution and with less to no artefact compared to conventional techniques like compressed sensing. In recent times, deep learning has emerged as a very important area of research and has shown immense potential in solving inverse problems, e.g. MR image reconstruction. In this paper, a deep learning based MR image reconstruction framework is proposed, which includes a modified regularised version of ResNet as the network backbone to remove artefacts from the undersampled image, followed by data consistency steps that fusions the network output with the data already available from undersampled k-space in order to further improve reconstruction quality. The performance of this framework for various undersampling patterns has also been tested, and it has been observed that the framework is robust to deal with various sampling patterns, even when mixed together while training, and results in very high quality reconstruction, in terms of high SSIM (highest being 0.990pm0.006 for acceleration factor of 3.5), while being compared with the fully sampled reconstruction. It has been shown that the proposed framework can successfully reconstruct even for an acceleration factor of 20 for Cartesian (0.968pm0.005) and 17 for radially (0.962pm0.012) sampled data. Furthermore, it has been shown that the framework preserves brain pathology during reconstruction while being trained on healthy subjects.

  • 7 authors
·
Mar 16, 2021

Surface-based parcellation and vertex-wise analysis of ultra high-resolution ex vivo 7 tesla MRI in Alzheimer's disease and related dementias

Magnetic resonance imaging (MRI) is the standard modality to understand human brain structure and function in vivo (antemortem). Decades of research in human neuroimaging has led to the widespread development of methods and tools to provide automated volume-based segmentations and surface-based parcellations which help localize brain functions to specialized anatomical regions. Recently ex vivo (postmortem) imaging of the brain has opened-up avenues to study brain structure at sub-millimeter ultra high-resolution revealing details not possible to observe with in vivo MRI. Unfortunately, there has been limited methodological development in ex vivo MRI primarily due to lack of datasets and limited centers with such imaging resources. Therefore, in this work, we present one-of-its-kind dataset of 82 ex vivo T2w whole brain hemispheres MRI at 0.3 mm isotropic resolution spanning Alzheimer's disease and related dementias. We adapted and developed a fast and easy-to-use automated surface-based pipeline to parcellate, for the first time, ultra high-resolution ex vivo brain tissue at the native subject space resolution using the Desikan-Killiany-Tourville (DKT) brain atlas. This allows us to perform vertex-wise analysis in the template space and thereby link morphometry measures with pathology measurements derived from histology. We will open-source our dataset docker container, Jupyter notebooks for ready-to-use out-of-the-box set of tools and command line options to advance ex vivo MRI clinical brain imaging research on the project webpage.

  • 23 authors
·
Mar 28, 2024

M3Ret: Unleashing Zero-shot Multimodal Medical Image Retrieval via Self-Supervision

Medical image retrieval is essential for clinical decision-making and translational research, relying on discriminative visual representations. Yet, current methods remain fragmented, relying on separate architectures and training strategies for 2D, 3D, and video-based medical data. This modality-specific design hampers scalability and inhibits the development of unified representations. To enable unified learning, we curate a large-scale hybrid-modality dataset comprising 867,653 medical imaging samples, including 2D X-rays and ultrasounds, RGB endoscopy videos, and 3D CT scans. Leveraging this dataset, we train M3Ret, a unified visual encoder without any modality-specific customization. It successfully learns transferable representations using both generative (MAE) and contrastive (SimDINO) self-supervised learning (SSL) paradigms. Our approach sets a new state-of-the-art in zero-shot image-to-image retrieval across all individual modalities, surpassing strong baselines such as DINOv3 and the text-supervised BMC-CLIP. More remarkably, strong cross-modal alignment emerges without paired data, and the model generalizes to unseen MRI tasks, despite never observing MRI during pretraining, demonstrating the generalizability of purely visual self-supervision to unseen modalities. Comprehensive analyses further validate the scalability of our framework across model and data sizes. These findings deliver a promising signal to the medical imaging community, positioning M3Ret as a step toward foundation models for visual SSL in multimodal medical image understanding.

  • 8 authors
·
Sep 1, 2025 1

Learning Modality-Aware Representations: Adaptive Group-wise Interaction Network for Multimodal MRI Synthesis

Multimodal MR image synthesis aims to generate missing modality images by effectively fusing and mapping from a subset of available MRI modalities. Most existing methods adopt an image-to-image translation paradigm, treating multiple modalities as input channels. However, these approaches often yield sub-optimal results due to the inherent difficulty in achieving precise feature- or semantic-level alignment across modalities. To address these challenges, we propose an Adaptive Group-wise Interaction Network (AGI-Net) that explicitly models both inter-modality and intra-modality relationships for multimodal MR image synthesis. Specifically, feature channels are first partitioned into predefined groups, after which an adaptive rolling mechanism is applied to conventional convolutional kernels to better capture feature and semantic correspondences between different modalities. In parallel, a cross-group attention module is introduced to enable effective feature fusion across groups, thereby enhancing the network's representational capacity. We validate the proposed AGI-Net on the publicly available IXI and BraTS2023 datasets. Experimental results demonstrate that AGI-Net achieves state-of-the-art performance in multimodal MR image synthesis tasks, confirming the effectiveness of its modality-aware interaction design. We release the relevant code at: https://github.com/zunzhumu/Adaptive-Group-wise-Interaction-Network-for-Multimodal-MRI-Synthesis.git.

  • 9 authors
·
Nov 21, 2024

CineMA: A Foundation Model for Cine Cardiac MRI

Cardiac magnetic resonance (CMR) is a key investigation in clinical cardiovascular medicine and has been used extensively in population research. However, extracting clinically important measurements such as ejection fraction for diagnosing cardiovascular diseases remains time-consuming and subjective. We developed CineMA, a foundation AI model automating these tasks with limited labels. CineMA is a self-supervised autoencoder model trained on 74,916 cine CMR studies to reconstruct images from masked inputs. After fine-tuning, it was evaluated across eight datasets on 23 tasks from four categories: ventricle and myocardium segmentation, left and right ventricle ejection fraction calculation, disease detection and classification, and landmark localisation. CineMA is the first foundation model for cine CMR to match or outperform convolutional neural networks (CNNs). CineMA demonstrated greater label efficiency than CNNs, achieving comparable or better performance with fewer annotations. This reduces the burden of clinician labelling and supports replacing task-specific training with fine-tuning foundation models in future cardiac imaging applications. Models and code for pre-training and fine-tuning are available at https://github.com/mathpluscode/CineMA, democratising access to high-performance models that otherwise require substantial computational resources, promoting reproducibility and accelerating clinical translation.

  • 9 authors
·
May 31, 2025

SCOPE-MRI: Bankart Lesion Detection as a Case Study in Data Curation and Deep Learning for Challenging Diagnoses

Deep learning has shown strong performance in musculoskeletal imaging, but prior work has largely targeted conditions where diagnosis is relatively straightforward. More challenging problems remain underexplored, such as detecting Bankart lesions (anterior-inferior glenoid labral tears) on standard MRIs. These lesions are difficult to diagnose due to subtle imaging features, often necessitating invasive MRI arthrograms (MRAs). We introduce ScopeMRI, the first publicly available, expert-annotated dataset for shoulder pathologies, and present a deep learning framework for Bankart lesion detection on both standard MRIs and MRAs. ScopeMRI contains shoulder MRIs from patients who underwent arthroscopy, providing ground-truth labels from intraoperative findings, the diagnostic gold standard. Separate models were trained for MRIs and MRAs using CNN- and transformer-based architectures, with predictions ensembled across multiple imaging planes. Our models achieved radiologist-level performance, with accuracy on standard MRIs surpassing radiologists interpreting MRAs. External validation on independent hospital data demonstrated initial generalizability across imaging protocols. By releasing ScopeMRI and a modular codebase for training and evaluation, we aim to accelerate research in musculoskeletal imaging and foster development of datasets and models that address clinically challenging diagnostic tasks.

  • 7 authors
·
Apr 29, 2025

TrackRAD2025 challenge dataset: Real-time tumor tracking for MRI-guided radiotherapy

Purpose: Magnetic resonance imaging (MRI) to visualize anatomical motion is becoming increasingly important when treating cancer patients with radiotherapy. Hybrid MRI-linear accelerator (MRI-linac) systems allow real-time motion management during irradiation. This paper presents a multi-institutional real-time MRI time series dataset from different MRI-linac vendors. The dataset is designed to support developing and evaluating real-time tumor localization (tracking) algorithms for MRI-guided radiotherapy within the TrackRAD2025 challenge (https://trackrad2025.grand-challenge.org/). Acquisition and validation methods: The dataset consists of sagittal 2D cine MRIs in 585 patients from six centers (3 Dutch, 1 German, 1 Australian, and 1 Chinese). Tumors in the thorax, abdomen, and pelvis acquired on two commercially available MRI-linacs (0.35 T and 1.5 T) were included. For 108 cases, irradiation targets or tracking surrogates were manually segmented on each temporal frame. The dataset was randomly split into a public training set of 527 cases (477 unlabeled and 50 labeled) and a private testing set of 58 cases (all labeled). Data Format and Usage Notes: The data is publicly available under the TrackRAD2025 collection: https://doi.org/10.57967/hf/4539. Both the images and segmentations for each patient are available in metadata format. Potential Applications: This novel clinical dataset will enable the development and evaluation of real-time tumor localization algorithms for MRI-guided radiotherapy. By enabling more accurate motion management and adaptive treatment strategies, this dataset has the potential to advance the field of radiotherapy significantly.

  • 28 authors
·
Mar 24, 2025

Brain-ID: Learning Contrast-agnostic Anatomical Representations for Brain Imaging

Recent learning-based approaches have made astonishing advances in calibrated medical imaging like computerized tomography (CT), yet they struggle to generalize in uncalibrated modalities -- notably magnetic resonance (MR) imaging, where performance is highly sensitive to the differences in MR contrast, resolution, and orientation. This prevents broad applicability to diverse real-world clinical protocols. We introduce Brain-ID, an anatomical representation learning model for brain imaging. With the proposed "mild-to-severe" intra-subject generation, Brain-ID is robust to the subject-specific brain anatomy regardless of the appearance of acquired images (e.g., contrast, deformation, resolution, artifacts). Trained entirely on synthetic data, Brain-ID readily adapts to various downstream tasks through only one layer. We present new metrics to validate the intra- and inter-subject robustness of Brain-ID features, and evaluate their performance on four downstream applications, covering contrast-independent (anatomy reconstruction/contrast synthesis, brain segmentation), and contrast-dependent (super-resolution, bias field estimation) tasks. Extensive experiments on six public datasets demonstrate that Brain-ID achieves state-of-the-art performance in all tasks on different MRI modalities and CT, and more importantly, preserves its performance on low-resolution and small datasets. Code is available at https://github.com/peirong26/Brain-ID.

  • 5 authors
·
Nov 28, 2023

Cross-modality (CT-MRI) prior augmented deep learning for robust lung tumor segmentation from small MR datasets

Lack of large expert annotated MR datasets makes training deep learning models difficult. Therefore, a cross-modality (MR-CT) deep learning segmentation approach that augments training data using pseudo MR images produced by transforming expert-segmented CT images was developed. Eighty-One T2-weighted MRI scans from 28 patients with non-small cell lung cancers were analyzed. Cross-modality prior encoding the transformation of CT to pseudo MR images resembling T2w MRI was learned as a generative adversarial deep learning model. This model augmented training data arising from 6 expert-segmented T2w MR patient scans with 377 pseudo MRI from non-small cell lung cancer CT patient scans with obtained from the Cancer Imaging Archive. A two-dimensional Unet implemented with batch normalization was trained to segment the tumors from T2w MRI. This method was benchmarked against (a) standard data augmentation and two state-of-the art cross-modality pseudo MR-based augmentation and (b) two segmentation networks. Segmentation accuracy was computed using Dice similarity coefficient (DSC), Hausdroff distance metrics, and volume ratio. The proposed approach produced the lowest statistical variability in the intensity distribution between pseudo and T2w MR images measured as Kullback-Leibler divergence of 0.069. This method produced the highest segmentation accuracy with a DSC of 0.75 and the lowest Hausdroff distance on the test dataset. This approach produced highly similar estimations of tumor growth as an expert (P = 0.37). A novel deep learning MR segmentation was developed that overcomes the limitation of learning robust models from small datasets by leveraging learned cross-modality priors to augment training. The results show the feasibility of the approach and the corresponding improvement over the state-of-the-art methods.

  • 7 authors
·
Jan 31, 2019

GBT-SAM: Adapting a Foundational Deep Learning Model for Generalizable Brain Tumor Segmentation via Efficient Integration of Multi-Parametric MRI Data

Gliomas are aggressive brain tumors that require accurate imaging-based diagnosis, with segmentation playing a critical role in evaluating morphology and treatment decisions. Manual delineation of gliomas is time-consuming and prone to variability, motivating the use of deep learning to improve consistency and alleviate clinical workload. However, existing methods often fail to fully exploit the information available in multi-parametric MRI (mp-MRI), particularly inter-slice contextual features, and typically require considerable computational resources while lacking robustness across tumor type variations. We present GBT-SAM, a parameter-efficient deep learning framework that adapts the Segment Anything Model (SAM), a large-scale vision model, to volumetric mp-MRI data. GBT-SAM reduces input complexity by selecting fewer than 2.6\% of slices per scan while incorporating all four MRI modalities, preserving essential tumor-related information with minimal cost. Furthermore, our model is trained by a two-step fine-tuning strategy that incorporates a depth-aware module to capture inter-slice correlations and lightweight adaptation layers, resulting in just 6.5M trainable parameters, which is the lowest among SAM-based approaches. GBT-SAM achieves a Dice Score of 93.54 on the BraTS Adult Glioma dataset and demonstrates robust performance on Meningioma, Pediatric Glioma, and Sub-Saharan Glioma datasets. These results highlight GBT-SAM's potential as a computationally efficient and domain-robust framework for brain tumor segmentation using mp-MRI. Our code and models are available at https://github.com/vpulab/med-sam-brain .

  • 5 authors
·
Mar 6, 2025

Fine-tuning deep learning model parameters for improved super-resolution of dynamic MRI with prior-knowledge

Dynamic imaging is a beneficial tool for interventions to assess physiological changes. Nonetheless during dynamic MRI, while achieving a high temporal resolution, the spatial resolution is compromised. To overcome this spatio-temporal trade-off, this research presents a super-resolution (SR) MRI reconstruction with prior knowledge based fine-tuning to maximise spatial information while reducing the required scan-time for dynamic MRIs. An U-Net based network with perceptual loss is trained on a benchmark dataset and fine-tuned using one subject-specific static high resolution MRI as prior knowledge to obtain high resolution dynamic images during the inference stage. 3D dynamic data for three subjects were acquired with different parameters to test the generalisation capabilities of the network. The method was tested for different levels of in-plane undersampling for dynamic MRI. The reconstructed dynamic SR results after fine-tuning showed higher similarity with the high resolution ground-truth, while quantitatively achieving statistically significant improvement. The average SSIM of the lowest resolution experimented during this research (6.25~\% of the k-space) before and after fine-tuning were 0.939 pm 0.008 and 0.957 pm 0.006 respectively. This could theoretically result in an acceleration factor of 16, which can potentially be acquired in less than half a second. The proposed approach shows that the super-resolution MRI reconstruction with prior-information can alleviate the spatio-temporal trade-off in dynamic MRI, even for high acceleration factors.

  • 6 authors
·
Feb 4, 2021

Reconstructing unseen modalities and pathology with an efficient Recurrent Inference Machine

Objective: To allow efficient learning using the Recurrent Inference Machine (RIM) for image reconstruction whereas not being strictly dependent on the training data distribution so that unseen modalities and pathologies are still accurately recovered. Methods: Theoretically, the RIM learns to solve the inverse problem of accelerated-MRI reconstruction whereas being robust to variable imaging conditions. The efficiency and generalization capabilities with different training datasets were studied, as well as recurrent network units with decreasing complexity: the Gated Recurrent Unit (GRU), the Minimal Gated Unit (MGU), and the Independently Recurrent Neural Network (IndRNN), to reduce inference times. Validation was performed against Compressed Sensing (CS) and further assessed based on data unseen during training. A pathology study was conducted by reconstructing simulated white matter lesions and prospectively undersampled data of a Multiple Sclerosis patient. Results: Training on a single modality of 3T T_1-weighted brain data appeared sufficient to also reconstruct 7T T_{2}^*-weighted brain and 3T T_2-weighted knee data. The IndRNN is an efficient recurrent unit, reducing inference time by 68\% compared to CS, whereas maintaining performance. The RIM was able to reconstruct lesions unseen during training more accurately than CS when trained on T_2-weighted knee data. Training on T_1-weighted brain data and on combined data slightly enhanced the signal compared to CS. Conclusion: The RIM is efficient when decreasing its complexity, which reduces the inference time, whereas still being able to reconstruct data and pathology that was unseen during training.

  • 7 authors
·
Dec 14, 2020

CuNeRF: Cube-Based Neural Radiance Field for Zero-Shot Medical Image Arbitrary-Scale Super Resolution

Medical image arbitrary-scale super-resolution (MIASSR) has recently gained widespread attention, aiming to super sample medical volumes at arbitrary scales via a single model. However, existing MIASSR methods face two major limitations: (i) reliance on high-resolution (HR) volumes and (ii) limited generalization ability, which restricts their application in various scenarios. To overcome these limitations, we propose Cube-based Neural Radiance Field (CuNeRF), a zero-shot MIASSR framework that can yield medical images at arbitrary scales and viewpoints in a continuous domain. Unlike existing MIASSR methods that fit the mapping between low-resolution (LR) and HR volumes, CuNeRF focuses on building a coordinate-intensity continuous representation from LR volumes without the need for HR references. This is achieved by the proposed differentiable modules: including cube-based sampling, isotropic volume rendering, and cube-based hierarchical rendering. Through extensive experiments on magnetic resource imaging (MRI) and computed tomography (CT) modalities, we demonstrate that CuNeRF outperforms state-of-the-art MIASSR methods. CuNeRF yields better visual verisimilitude and reduces aliasing artifacts at various upsampling factors. Moreover, our CuNeRF does not need any LR-HR training pairs, which is more flexible and easier to be used than others. Our code will be publicly available soon.

  • 4 authors
·
Mar 28, 2023

MapGlue: Multimodal Remote Sensing Image Matching

Multimodal remote sensing image (MRSI) matching is pivotal for cross-modal fusion, localization, and object detection, but it faces severe challenges due to geometric, radiometric, and viewpoint discrepancies across imaging modalities. Existing unimodal datasets lack scale and diversity, limiting deep learning solutions. This paper proposes MapGlue, a universal MRSI matching framework, and MapData, a large-scale multimodal dataset addressing these gaps. Our contributions are twofold. MapData, a globally diverse dataset spanning 233 sampling points, offers original images (7,000x5,000 to 20,000x15,000 pixels). After rigorous cleaning, it provides 121,781 aligned electronic map-visible image pairs (512x512 pixels) with hybrid manual-automated ground truth, addressing the scarcity of scalable multimodal benchmarks. MapGlue integrates semantic context with a dual graph-guided mechanism to extract cross-modal invariant features. This structure enables global-to-local interaction, enhancing descriptor robustness against modality-specific distortions. Extensive evaluations on MapData and five public datasets demonstrate MapGlue's superiority in matching accuracy under complex conditions, outperforming state-of-the-art methods. Notably, MapGlue generalizes effectively to unseen modalities without retraining, highlighting its adaptability. This work addresses longstanding challenges in MRSI matching by combining scalable dataset construction with a robust, semantics-driven framework. Furthermore, MapGlue shows strong generalization capabilities on other modality matching tasks for which it was not specifically trained. The dataset and code are available at https://github.com/PeihaoWu/MapGlue.

  • 7 authors
·
Mar 20, 2025

A Novel Center-based Deep Contrastive Metric Learning Method for the Detection of Polymicrogyria in Pediatric Brain MRI

Polymicrogyria (PMG) is a disorder of cortical organization mainly seen in children, which can be associated with seizures, developmental delay and motor weakness. PMG is typically diagnosed on magnetic resonance imaging (MRI) but some cases can be challenging to detect even for experienced radiologists. In this study, we create an open pediatric MRI dataset (PPMR) with PMG and controls from the Children's Hospital of Eastern Ontario (CHEO), Ottawa, Canada. The differences between PMG MRIs and control MRIs are subtle and the true distribution of the features of the disease is unknown. This makes automatic detection of cases of potential PMG in MRI difficult. We propose an anomaly detection method based on a novel center-based deep contrastive metric learning loss function (cDCM) which enables the automatic detection of cases of potential PMG. Additionally, based on our proposed loss function, we customize a deep learning model structure that integrates dilated convolution, squeeze-and-excitation blocks and feature fusion for our PPMR dataset. Despite working with a small and imbalanced dataset our method achieves 92.01% recall at 55.04% precision. This will facilitate a computer aided tool for radiologists to select potential PMG MRIs. To the best of our knowledge, this research is the first to apply machine learning techniques to identify PMG from MRI only.

  • 3 authors
·
Nov 22, 2022

Generating Synthetic Computed Tomography for Radiotherapy: SynthRAD2023 Challenge Report

Radiation therapy plays a crucial role in cancer treatment, necessitating precise delivery of radiation to tumors while sparing healthy tissues over multiple days. Computed tomography (CT) is integral for treatment planning, offering electron density data crucial for accurate dose calculations. However, accurately representing patient anatomy is challenging, especially in adaptive radiotherapy, where CT is not acquired daily. Magnetic resonance imaging (MRI) provides superior soft-tissue contrast. Still, it lacks electron density information while cone beam CT (CBCT) lacks direct electron density calibration and is mainly used for patient positioning. Adopting MRI-only or CBCT-based adaptive radiotherapy eliminates the need for CT planning but presents challenges. Synthetic CT (sCT) generation techniques aim to address these challenges by using image synthesis to bridge the gap between MRI, CBCT, and CT. The SynthRAD2023 challenge was organized to compare synthetic CT generation methods using multi-center ground truth data from 1080 patients, divided into two tasks: 1) MRI-to-CT and 2) CBCT-to-CT. The evaluation included image similarity and dose-based metrics from proton and photon plans. The challenge attracted significant participation, with 617 registrations and 22/17 valid submissions for tasks 1/2. Top-performing teams achieved high structural similarity indices (>0.87/0.90) and gamma pass rates for photon (>98.1%/99.0%) and proton (>99.0%/97.3%) plans. However, no significant correlation was found between image similarity metrics and dose accuracy, emphasizing the need for dose evaluation when assessing the clinical applicability of sCT. SynthRAD2023 facilitated the investigation and benchmarking of sCT generation techniques, providing insights for developing MRI-only and CBCT-based adaptive radiotherapy.

  • 59 authors
·
Mar 13, 2024

MRMR: A Realistic and Expert-Level Multidisciplinary Benchmark for Reasoning-Intensive Multimodal Retrieval

We introduce MRMR, the first expert-level multidisciplinary multimodal retrieval benchmark requiring intensive reasoning. MRMR contains 1,502 queries spanning 23 domains, with positive documents carefully verified by human experts. Compared to prior benchmarks, MRMR introduces three key advancements. First, it challenges retrieval systems across diverse areas of expertise, enabling fine-grained model comparison across domains. Second, queries are reasoning-intensive, with images requiring deeper interpretation such as diagnosing microscopic slides. We further introduce Contradiction Retrieval, a novel task requiring models to identify conflicting concepts. Finally, queries and documents are constructed as image-text interleaved sequences. Unlike earlier benchmarks restricted to single images or unimodal documents, MRMR offers a realistic setting with multi-image queries and mixed-modality corpus documents. We conduct an extensive evaluation of 4 categories of multimodal retrieval systems and 14 frontier models on MRMR. The text embedding model Qwen3-Embedding with LLM-generated image captions achieves the highest performance, highlighting substantial room for improving multimodal retrieval models. Although latest multimodal models such as Ops-MM-Embedding perform competitively on expert-domain queries, they fall short on reasoning-intensive tasks. We believe that MRMR paves the way for advancing multimodal retrieval in more realistic and challenging scenarios.

  • 8 authors
·
Oct 10, 2025 2

Invisible Attributes, Visible Biases: Exploring Demographic Shortcuts in MRI-based Alzheimer's Disease Classification

Magnetic resonance imaging (MRI) is the gold standard for brain imaging. Deep learning (DL) algorithms have been proposed to aid in the diagnosis of diseases such as Alzheimer's disease (AD) from MRI scans. However, DL algorithms can suffer from shortcut learning, in which spurious features, not directly related to the output label, are used for prediction. When these features are related to protected attributes, they can lead to performance bias against underrepresented protected groups, such as those defined by race and sex. In this work, we explore the potential for shortcut learning and demographic bias in DL based AD diagnosis from MRI. We first investigate if DL algorithms can identify race or sex from 3D brain MRI scans to establish the presence or otherwise of race and sex based distributional shifts. Next, we investigate whether training set imbalance by race or sex can cause a drop in model performance, indicating shortcut learning and bias. Finally, we conduct a quantitative and qualitative analysis of feature attributions in different brain regions for both the protected attribute and AD classification tasks. Through these experiments, and using multiple datasets and DL models (ResNet and SwinTransformer), we demonstrate the existence of both race and sex based shortcut learning and bias in DL based AD classification. Our work lays the foundation for fairer DL diagnostic tools in brain MRI. The code is provided at https://github.com/acharaakshit/ShortMR

  • 4 authors
·
Sep 11, 2025

MRSegmentator: Robust Multi-Modality Segmentation of 40 Classes in MRI and CT Sequences

Purpose: To introduce a deep learning model capable of multi-organ segmentation in MRI scans, offering a solution to the current limitations in MRI analysis due to challenges in resolution, standardized intensity values, and variability in sequences. Materials and Methods: he model was trained on 1,200 manually annotated MRI scans from the UK Biobank, 221 in-house MRI scans and 1228 CT scans, leveraging cross-modality transfer learning from CT segmentation models. A human-in-the-loop annotation workflow was employed to efficiently create high-quality segmentations. The model's performance was evaluated on NAKO and the AMOS22 dataset containing 600 and 60 MRI examinations. Dice Similarity Coefficient (DSC) and Hausdorff Distance (HD) was used to assess segmentation accuracy. The model will be open sourced. Results: The model showcased high accuracy in segmenting well-defined organs, achieving Dice Similarity Coefficient (DSC) scores of 0.97 for the right and left lungs, and 0.95 for the heart. It also demonstrated robustness in organs like the liver (DSC: 0.96) and kidneys (DSC: 0.95 left, 0.95 right), which present more variability. However, segmentation of smaller and complex structures such as the portal and splenic veins (DSC: 0.54) and adrenal glands (DSC: 0.65 left, 0.61 right) revealed the need for further model optimization. Conclusion: The proposed model is a robust, tool for accurate segmentation of 40 anatomical structures in MRI and CT images. By leveraging cross-modality learning and interactive annotation, the model achieves strong performance and generalizability across diverse datasets, making it a valuable resource for researchers and clinicians. It is open source and can be downloaded from https://github.com/hhaentze/MRSegmentator.

  • 11 authors
·
May 10, 2024

Cross-Shaped Windows Transformer with Self-supervised Pretraining for Clinically Significant Prostate Cancer Detection in Bi-parametric MRI

Multiparametric magnetic resonance imaging (mpMRI) has demonstrated promising results in prostate cancer (PCa) detection using deep convolutional neural networks (CNNs). Recently, transformers have achieved competitive performance compared to CNNs in computer vision. Large-scale transformers need abundant annotated data for training, which are difficult to obtain in medical imaging. Self-supervised learning can effectively leverage unlabeled data to extract useful semantic representations without annotation and its associated costs. This can improve model performance on downstream tasks with limited labelled data and increase generalizability. We introduce a novel end-to-end Cross-Shaped windows (CSwin) transformer UNet model, CSwin UNet, to detect clinically significant prostate cancer (csPCa) in prostate bi-parametric MR imaging (bpMRI) and demonstrate the effectiveness of our proposed self-supervised pre-training framework. Using a large prostate bpMRI dataset with 1500 patients, we first pre-train CSwin transformer using multi-task self-supervised learning to improve data-efficiency and network generalizability. We then finetuned using lesion annotations to perform csPCa detection. Five-fold cross validation shows that self-supervised CSwin UNet achieves 0.888 AUC and 0.545 Average Precision (AP), significantly outperforming four state-of-the-art models (Swin UNETR, DynUNet, Attention UNet, UNet). Using a separate bpMRI dataset with 158 patients, we evaluated our model robustness to external hold-out data. Self-supervised CSwin UNet achieves 0.79 AUC and 0.45 AP, still outperforming all other comparable methods and demonstrating generalization to a dataset shift.

  • 11 authors
·
Apr 30, 2023

BlindHarmony: "Blind" Harmonization for MR Images via Flow model

In MRI, images of the same contrast (e.g., T_1) from the same subject can exhibit noticeable differences when acquired using different hardware, sequences, or scan parameters. These differences in images create a domain gap that needs to be bridged by a step called image harmonization, to process the images successfully using conventional or deep learning-based image analysis (e.g., segmentation). Several methods, including deep learning-based approaches, have been proposed to achieve image harmonization. However, they often require datasets from multiple domains for deep learning training and may still be unsuccessful when applied to images from unseen domains. To address this limitation, we propose a novel concept called `Blind Harmonization', which utilizes only target domain data for training but still has the capability to harmonize images from unseen domains. For the implementation of blind harmonization, we developed BlindHarmony using an unconditional flow model trained on target domain data. The harmonized image is optimized to have a correlation with the input source domain image while ensuring that the latent vector of the flow model is close to the center of the Gaussian distribution. BlindHarmony was evaluated on both simulated and real datasets and compared to conventional methods. BlindHarmony demonstrated noticeable performance on both datasets, highlighting its potential for future use in clinical settings. The source code is available at: https://github.com/SNU-LIST/BlindHarmony

  • 4 authors
·
May 18, 2023

A for-loop is all you need. For solving the inverse problem in the case of personalized tumor growth modeling

Solving the inverse problem is the key step in evaluating the capacity of a physical model to describe real phenomena. In medical image computing, it aligns with the classical theme of image-based model personalization. Traditionally, a solution to the problem is obtained by performing either sampling or variational inference based methods. Both approaches aim to identify a set of free physical model parameters that results in a simulation best matching an empirical observation. When applied to brain tumor modeling, one of the instances of image-based model personalization in medical image computing, the overarching drawback of the methods is the time complexity for finding such a set. In a clinical setting with limited time between imaging and diagnosis or even intervention, this time complexity may prove critical. As the history of quantitative science is the history of compression, we align in this paper with the historical tendency and propose a method compressing complex traditional strategies for solving an inverse problem into a simple database query task. We evaluated different ways of performing the database query task assessing the trade-off between accuracy and execution time. On the exemplary task of brain tumor growth modeling, we prove that the proposed method achieves one order speed-up compared to existing approaches for solving the inverse problem. The resulting compute time offers critical means for relying on more complex and, hence, realistic models, for integrating image preprocessing and inverse modeling even deeper, or for implementing the current model into a clinical workflow.

  • 15 authors
·
May 9, 2022

A Unified Model for Compressed Sensing MRI Across Undersampling Patterns

Compressed Sensing MRI reconstructs images of the body's internal anatomy from undersampled measurements, thereby reducing scan time. Recently, deep learning has shown great potential for reconstructing high-fidelity images from highly undersampled measurements. However, one needs to train multiple models for different undersampling patterns and desired output image resolutions, since most networks operate on a fixed discretization. Such approaches are highly impractical in clinical settings, where undersampling patterns and image resolutions are frequently changed to accommodate different real-time imaging and diagnostic requirements. We propose a unified MRI reconstruction model robust to various measurement undersampling patterns and image resolutions. Our approach uses neural operators, a discretization-agnostic architecture applied in both image and measurement spaces, to capture local and global features. Empirically, our model improves SSIM by 11% and PSNR by 4 dB over a state-of-the-art CNN (End-to-End VarNet), with 600times faster inference than diffusion methods. The resolution-agnostic design also enables zero-shot super-resolution and extended field-of-view reconstruction, offering a versatile and efficient solution for clinical MR imaging. Our unified model offers a versatile solution for MRI, adapting seamlessly to various measurement undersampling and imaging resolutions, making it highly effective for flexible and reliable clinical imaging. Our code is available at https://armeet.ca/nomri.

  • 7 authors
·
Oct 5, 2024 1

RoentGen: Vision-Language Foundation Model for Chest X-ray Generation

Multimodal models trained on large natural image-text pair datasets have exhibited astounding abilities in generating high-quality images. Medical imaging data is fundamentally different to natural images, and the language used to succinctly capture relevant details in medical data uses a different, narrow but semantically rich, domain-specific vocabulary. Not surprisingly, multi-modal models trained on natural image-text pairs do not tend to generalize well to the medical domain. Developing generative imaging models faithfully representing medical concepts while providing compositional diversity could mitigate the existing paucity of high-quality, annotated medical imaging datasets. In this work, we develop a strategy to overcome the large natural-medical distributional shift by adapting a pre-trained latent diffusion model on a corpus of publicly available chest x-rays (CXR) and their corresponding radiology (text) reports. We investigate the model's ability to generate high-fidelity, diverse synthetic CXR conditioned on text prompts. We assess the model outputs quantitatively using image quality metrics, and evaluate image quality and text-image alignment by human domain experts. We present evidence that the resulting model (RoentGen) is able to create visually convincing, diverse synthetic CXR images, and that the output can be controlled to a new extent by using free-form text prompts including radiology-specific language. Fine-tuning this model on a fixed training set and using it as a data augmentation method, we measure a 5% improvement of a classifier trained jointly on synthetic and real images, and a 3% improvement when trained on a larger but purely synthetic training set. Finally, we observe that this fine-tuning distills in-domain knowledge in the text-encoder and can improve its representation capabilities of certain diseases like pneumothorax by 25%.

  • 10 authors
·
Nov 23, 2022

MedMNIST-C: Comprehensive benchmark and improved classifier robustness by simulating realistic image corruptions

The integration of neural-network-based systems into clinical practice is limited by challenges related to domain generalization and robustness. The computer vision community established benchmarks such as ImageNet-C as a fundamental prerequisite to measure progress towards those challenges. Similar datasets are largely absent in the medical imaging community which lacks a comprehensive benchmark that spans across imaging modalities and applications. To address this gap, we create and open-source MedMNIST-C, a benchmark dataset based on the MedMNIST+ collection covering 12 datasets and 9 imaging modalities. We simulate task and modality-specific image corruptions of varying severity to comprehensively evaluate the robustness of established algorithms against real-world artifacts and distribution shifts. We further provide quantitative evidence that our simple-to-use artificial corruptions allow for highly performant, lightweight data augmentation to enhance model robustness. Unlike traditional, generic augmentation strategies, our approach leverages domain knowledge, exhibiting significantly higher robustness when compared to widely adopted methods. By introducing MedMNIST-C and open-sourcing the corresponding library allowing for targeted data augmentations, we contribute to the development of increasingly robust methods tailored to the challenges of medical imaging. The code is available at https://github.com/francescodisalvo05/medmnistc-api .

  • 3 authors
·
Jun 25, 2024

MICDIR: Multi-scale Inverse-consistent Deformable Image Registration using UNetMSS with Self-Constructing Graph Latent

Image registration is the process of bringing different images into a common coordinate system - a technique widely used in various applications of computer vision, such as remote sensing, image retrieval, and, most commonly, medical imaging. Deep learning based techniques have been applied successfully to tackle various complex medical image processing problems, including medical image registration. Over the years, several image registration techniques have been proposed using deep learning. Deformable image registration techniques such as Voxelmorph have been successful in capturing finer changes and providing smoother deformations. However, Voxelmorph, as well as ICNet and FIRE, do not explicitly encode global dependencies (i.e. the overall anatomical view of the supplied image) and, therefore, cannot track large deformations. In order to tackle the aforementioned problems, this paper extends the Voxelmorph approach in three different ways. To improve the performance in case of small as well as large deformations, supervision of the model at different resolutions has been integrated using a multi-scale UNet. To support the network to learn and encode the minute structural co-relations of the given image-pairs, a self-constructing graph network (SCGNet) has been used as the latent of the multi-scale UNet - which can improve the learning process of the model and help the model to generalise better. And finally, to make the deformations inverse-consistent, cycle consistency loss has been employed. On the task of registration of brain MRIs, the proposed method achieved significant improvements over ANTs and VoxelMorph, obtaining a Dice score of 0.8013 \pm 0.0243 for intramodal and 0.6211 \pm 0.0309 for intermodal, while VoxelMorph achieved 0.7747 \pm 0.0260 and 0.6071 \pm 0.0510, respectively

  • 8 authors
·
Mar 8, 2022

The state-of-the-art in Cardiac MRI Reconstruction: Results of the CMRxRecon Challenge in MICCAI 2023

Cardiac MRI, crucial for evaluating heart structure and function, faces limitations like slow imaging and motion artifacts. Undersampling reconstruction, especially data-driven algorithms, has emerged as a promising solution to accelerate scans and enhance imaging performance using highly under-sampled data. Nevertheless, the scarcity of publicly available cardiac k-space datasets and evaluation platform hinder the development of data-driven reconstruction algorithms. To address this issue, we organized the Cardiac MRI Reconstruction Challenge (CMRxRecon) in 2023, in collaboration with the 26th International Conference on MICCAI. CMRxRecon presented an extensive k-space dataset comprising cine and mapping raw data, accompanied by detailed annotations of cardiac anatomical structures. With overwhelming participation, the challenge attracted more than 285 teams and over 600 participants. Among them, 22 teams successfully submitted Docker containers for the testing phase, with 7 teams submitted for both cine and mapping tasks. All teams use deep learning based approaches, indicating that deep learning has predominately become a promising solution for the problem. The first-place winner of both tasks utilizes the E2E-VarNet architecture as backbones. In contrast, U-Net is still the most popular backbone for both multi-coil and single-coil reconstructions. This paper provides a comprehensive overview of the challenge design, presents a summary of the submitted results, reviews the employed methods, and offers an in-depth discussion that aims to inspire future advancements in cardiac MRI reconstruction models. The summary emphasizes the effective strategies observed in Cardiac MRI reconstruction, including backbone architecture, loss function, pre-processing techniques, physical modeling, and model complexity, thereby providing valuable insights for further developments in this field.

  • 48 authors
·
Apr 1, 2024

Latent Interpolation Learning Using Diffusion Models for Cardiac Volume Reconstruction

Cardiac Magnetic Resonance (CMR) imaging is a critical tool for diagnosing and managing cardiovascular disease, yet its utility is often limited by the sparse acquisition of 2D short-axis slices, resulting in incomplete volumetric information. Accurate 3D reconstruction from these sparse slices is essential for comprehensive cardiac assessment, but existing methods face challenges, including reliance on predefined interpolation schemes (e.g., linear or spherical), computational inefficiency, and dependence on additional semantic inputs such as segmentation labels or motion data. To address these limitations, we propose a novel Cardiac Latent Interpolation Diffusion (CaLID) framework that introduces three key innovations. First, we present a data-driven interpolation scheme based on diffusion models, which can capture complex, non-linear relationships between sparse slices and improves reconstruction accuracy. Second, we design a computationally efficient method that operates in the latent space and speeds up 3D whole-heart upsampling time by a factor of 24, reducing computational overhead compared to previous methods. Third, with only sparse 2D CMR images as input, our method achieves SOTA performance against baseline methods, eliminating the need for auxiliary input such as morphological guidance, thus simplifying workflows. We further extend our method to 2D+T data, enabling the effective modeling of spatiotemporal dynamics and ensuring temporal coherence. Extensive volumetric evaluations and downstream segmentation tasks demonstrate that CaLID achieves superior reconstruction quality and efficiency. By addressing the fundamental limitations of existing approaches, our framework advances the state of the art for spatio and spatiotemporal whole-heart reconstruction, offering a robust and clinically practical solution for cardiovascular imaging.

  • 11 authors
·
Aug 19, 2025