🗣️ Vietnamese Pronunciation Classifier — Multiclass (12 classes)

Mô hình nhận diện giọng miền + lỗi phát âm cụ thể từ audio tiếng Việt.

📋 Classes (12)

ID	Label	Mô tả
0	`bac_correct`	✓ Miền Bắc — Phát âm đúng
1	`bac_L_to_N`	✗ Miền Bắc — Lỗi L → N
2	`trung_correct`	✓ Miền Trung — Phát âm đúng
3	`trung_S_to_X`	✗ Miền Trung — Lỗi S → X
4	`trung_X_to_S`	✗ Miền Trung — Lỗi X → S
5	`trung_TR_to_CH`	✗ Miền Trung — Lỗi TR → CH
6	`trung_CH_to_TR`	✗ Miền Trung — Lỗi CH → TR
7	`nam_correct`	✓ Miền Nam — Phát âm đúng
8	`nam_D_to_R`	✗ Miền Nam — Lỗi D → R
9	`nam_R_to_D`	✗ Miền Nam — Lỗi R → D
10	`nam_G_to_D`	✗ Miền Nam — Lỗi GI → D
11	`nam_V_to_D`	✗ Miền Nam — Lỗi V → D

🏗️ Architecture

Backbone: nguyenvulebinh/wav2vec2-large-vi-vlsp2020
Head: Mean pooling → Linear(1024, 256) → ReLU → Linear(256, 12)
Audio: 16kHz mono, 500ms duration
Export: ONNX (opset 14) for cross-platform inference

🚀 Usage (ONNX Runtime)

import onnxruntime as ort
import librosa
import numpy as np
import json

# Load model
sess = ort.InferenceSession("phoneme_classifier_multiclass.onnx")
config = json.load(open("config.json", encoding="utf-8"))

# Load audio (16kHz, 500ms)
audio, _ = librosa.load("your_audio.wav", sr=16000, mono=True)
audio = audio[:8000]  # 500ms at 16kHz
if len(audio) < 8000:
    audio = np.pad(audio, (0, 8000 - len(audio)))

# Inference
logits = sess.run(None, {"input_values": audio.reshape(1, -1).astype(np.float32)})[0]
class_id = int(np.argmax(logits))

# Parse result
label_names = {v: k for k, v in config["label_map"].items()}
label = label_names[class_id]
info = config["label_info"][label]

print(f"Region: {info['region']}")
print(f"Correct: {info['is_correct']}")
print(f"Error: {info['error']}")
print(f"Description: {info['vi']}")

📊 Training Data

Miền Bắc: Lỗi L/N
Miền Trung: Lỗi S/X, TR/CH
Miền Nam: Lỗi D/R/GI/V

📁 Files

phoneme_classifier_multiclass.onnx — ONNX model
best_model.pt — PyTorch checkpoint
config.json — Label map + metadata
preprocessor_config.json — Wav2Vec2 feature extractor config

Downloads last month: 94

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Bao2311
/

speak-journey-multiclass-onx

🗣️ Vietnamese Pronunciation Classifier — Multiclass (12 classes)

📋 Classes (12)

🏗️ Architecture

🚀 Usage (ONNX Runtime)

📊 Training Data

📁 Files

Space using Bao2311/speak-journey-multiclass-onx 1