Update README.md

Browse files

Files changed (1) hide show

README.md +81 -93

README.md CHANGED Viewed

@@ -8,6 +8,7 @@ tags:
   - respiratory-sounds
   - cardiac-sounds
   - auscultation
   - representation-learning
   - cross-modal-alignment
   - audio-language-alignment
@@ -16,74 +17,104 @@ tags:
   - pytorch
 pipeline_tag: feature-extraction
 library_name: pytorch
 ---
-# AcuLa: Audio–Clinical Understanding via Language Alignment
-This repository provides a checkpoint associated with the paper **“Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio Understanding.”**
-AcuLa is a lightweight post-training alignment framework for improving medical audio representations. The method aligns a pretrained audio encoder with clinical-language representations from a language model, encouraging audio embeddings to capture clinically meaningful semantic structure while preserving fine-grained acoustic information.
-The checkpoint is intended for research use, especially feature extraction, representation analysis, and downstream evaluation on cardio-respiratory audio tasks.
 ---
-![截圖 2026-04-27 17.30.20](https://cdn-uploads.huggingface.co/production/uploads/6506cb686ba49887d312cfa2/cdaYDrKIqssBeueIs_esA.png)
-## Model Overview
-| Field | Description |
 |---|---|
-| Method name | **AcuLa**: Audio–Clinical Understanding via Language Alignment |
-| Paper title | **Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio Understanding** |
-| Model type | Post-trained / aligned medical audio encoder |
-| Primary function | Audio representation learning and feature extraction |
-| Input modality | Medical audio |
-| Target domains | Respiratory sounds, cough sounds, breathing sounds, exhalation sounds, heart sounds |
-| Training paradigm | Audio-language representation alignment with self-supervised audio preservation |
-| Main use cases | Feature extraction, linear probing, transfer learning, retrieval-style analysis |
-| Framework | PyTorch |
 ---
-## Intended Applications
-This checkpoint is designed for research on medical audio understanding and clinically informed audio representation learning.
-| Application | Description |
-|---|---|
-| Feature extraction | Extract frozen embeddings from cardio-respiratory audio |
-| Linear probing | Train lightweight downstream classifiers or regressors |
-| Transfer learning | Adapt the aligned encoder to task-specific medical audio datasets |
-| Representation analysis | Study semantic organization in audio embedding spaces |
-| Audio-text retrieval | Explore similarity between medical audio and clinical text representations |
-| Benchmarking | Compare audio-language alignment methods and pretrained audio encoders |
 ---
-## Method Summary
-AcuLa follows a teacher-student alignment strategy. A pretrained language model provides clinical-language representations, while a pretrained audio encoder is adapted to better reflect those semantic structures.
-| Component | Role |
-|---|---|
-| Audio encoder | Encodes medical audio into acoustic representations |
-| Language model | Provides clinical-language semantic representations |
-| Audio projection head | Maps audio features into a shared representation space |
-| Language projection head | Maps language features into the same shared space |
-| Alignment objective | Encourages audio and language representations to share similar geometry |
-| Self-supervised objective | Preserves detailed acoustic modeling ability during alignment |
-The total training objective combines semantic alignment and acoustic preservation:
-    L_total = lambda_align * L_align + lambda_ssm * L_ssm
-where `L_align` denotes the audio-language alignment loss and `L_ssm` denotes the self-supervised audio modeling loss.
 ---
 ## Training Data
 AcuLa was trained using paired medical audio and clinical reports generated from structured metadata. The alignment corpus contains cardio-respiratory audio from multiple public datasets.
@@ -102,46 +133,6 @@ The paper reports more than 100,000 paired audio-report samples for alignment.
 ---
-## Clinical Report Generation
-The paired text reports were generated from structured metadata associated with each audio recording. This provides scalable semantic supervision for audio-language alignment.
-| Metadata type | Examples |
-|---|---|
-| Recording information | Dataset, modality, recording condition |
-| Diagnostic labels | COVID-19, COPD, smoker status, murmur, symptomatic status |
-| Acoustic annotations | Crackles, wheezes, murmurs, normal findings |
-| Physiological information | Lung-function-related information when available |
-| Subject metadata | Demographic information when available |
-The generated reports are used to guide representation learning and provide clinically meaningful textual context for the audio recordings.
----
-## Input Format
-The expected input is medical audio. A typical preprocessing pipeline follows the alignment setup used in the paper.
-| Step | Setting |
-|---|---|
-| Sampling rate | 16 kHz |
-| Segment length | Fixed-length segments, commonly around 8 seconds |
-| Audio representation | Log-mel spectrogram |
-| Number of mel bins | 64 |
-| Padding/truncation | Applied as needed |
-| Training augmentation | Optional |
-Possible training augmentations include:
-| Augmentation | Purpose |
-|---|---|
-| Volume adjustment | Robustness to loudness variation |
-| Normalization | Reduced recording-level amplitude variation |
-| Low-pass filtering | Robustness to frequency-response differences |
-| High-pass filtering | Robustness to recording-condition differences |
----
 ## Downstream Evaluation
 The paper evaluates AcuLa on 18 cardio-respiratory tasks.
@@ -172,6 +163,17 @@ Please refer to the paper for full task-by-task results and experimental details
 ---
 ## Limitations
@@ -186,20 +188,6 @@ Please refer to the paper for full task-by-task results and experimental details
 ---
-## Ethical Considerations
-Medical audio research involves sensitive data and potential real-world implications. Users should evaluate models carefully before applying them beyond research settings.
-| Consideration | Description |
-|---|---|
-| Privacy | Medical audio data may contain sensitive information |
-| Consent | Data should be collected and used with appropriate consent |
-| Fairness | Performance should be evaluated across relevant demographic groups |
-| Robustness | Models should be tested across devices, environments, and recording conditions |
-| Expert review | Clinical interpretation should involve domain experts |
----
 ## Citation
 Please cite the paper if you use this checkpoint:

   - respiratory-sounds
   - cardiac-sounds
   - auscultation
+  - cardiopulmonary
   - representation-learning
   - cross-modal-alignment
   - audio-language-alignment
   - pytorch
 pipeline_tag: feature-extraction
 library_name: pytorch
+arxiv: 2512.04847
 ---
+# AcuLa
+AcuLa (**Audio–Clinical Understanding via Language Alignment**) is a post-training alignment framework for medical audio understanding. It improves pretrained audio encoders by aligning their representations with clinical-language representations from a language model, allowing the audio encoder to capture richer clinical semantics while preserving fine-grained acoustic information.
+This repository provides the checkpoint for AcuLa. The accompanying code is available at:
+**GitHub:** https://github.com/janine714/AcuLA
+This work is described in the paper **“Language Models as Semantic Teachers: Post-Training Alignment for Medical Audio Understanding.”**
+![截圖 2026-04-27 17.30.20](https://cdn-uploads.huggingface.co/production/uploads/6506cb686ba49887d312cfa2/KzktEnDsOrNqoBY-BMxuJ.png)
 ---
+## Intended Use
+AcuLa is designed for research on clinically informed medical audio representation learning.
+It can be used for:
+| Task | Description |
 |---|---|
+| Feature extraction | Extract embeddings from cardio-respiratory audio |
+| Linear probing | Train lightweight classifiers or regressors on frozen embeddings |
+| Transfer learning | Adapt the aligned encoder to downstream medical audio datasets |
+| Respiratory analysis | Study cough, breath, exhalation, and lung sound representations |
+| Cardiac audio analysis | Study heart sound representations |
+| Audio-text retrieval | Retrieve semantically related clinical reports or audio samples |
+| Representation analysis | Analyze how clinical semantics are reflected in audio embeddings |
+AcuLa was evaluated on 18 downstream cardio-respiratory tasks, including respiratory condition inference, lung function estimation, and cardiac condition inference.
+> This checkpoint is intended for research use.
 ---
+## Installation
+Clone the GitHub repository:
+    git clone https://github.com/janine714/AcuLA
+    cd AcuLA
+Install the required dependencies:
+    pip install -r requirements.txt
+If you use OPERA-family encoders, please make sure the required OPERA dependencies and checkpoints are available in your environment.
 ---
+## How to Use
+The checkpoint can be loaded together with the AcuLa codebase.
+First, clone the repository and enter the project directory:
+    git clone https://github.com/janine714/AcuLA
+    cd AcuLA
+Then load the checkpoint:
+    import torch
+    from audio_encoder import initialize_pretrained_model
+    checkpoint_path = "path/to/acula.pt"
+    audio_model = initialize_pretrained_model(pretrain="operaGT")
+    ckpt = torch.load(checkpoint_path, map_location="cpu")
+    if "audio_model_state_dict" in ckpt:
+        state_dict = ckpt["audio_model_state_dict"]
+    elif "state_dict" in ckpt:
+        state_dict = ckpt["state_dict"]
+    else:
+        state_dict = ckpt
+    audio_model.load_state_dict(state_dict, strict=False)
+    audio_model.eval()
+Extract audio features:
+    import torch
+    with torch.no_grad():
+        features = audio_model.forward_feature(audio_input)
+The variable `audio_input` should follow the preprocessing format expected by the selected audio encoder.
 ---
 ## Training Data
 AcuLa was trained using paired medical audio and clinical reports generated from structured metadata. The alignment corpus contains cardio-respiratory audio from multiple public datasets.
 ---
 ## Downstream Evaluation
 The paper evaluates AcuLa on 18 cardio-respiratory tasks.
 ---
+## Code
+The implementation is available at:
+    https://github.com/janine714/AcuLA
+Repository setup:
+    git clone https://github.com/janine714/AcuLA
+    cd AcuLA
 ## Limitations
 ---
 ## Citation
 Please cite the paper if you use this checkpoint: