Spaces:
Running
Running
| # GPU database — v0.1. | |
| # | |
| # DATA PROVENANCE: | |
| # Numeric specs (memory_gb, nvlink_bandwidth_gbps, fp16_tflops, fp8/fp4_support) | |
| # come from public vendor datasheets and commonly-cited benchmarks. Each entry | |
| # records its source in `spec_source` so users can audit. | |
| # | |
| # Conventions: | |
| # - memory_gb: per-card HBM / GDDR in GB (vendor nominal) | |
| # - nvlink_bandwidth_gbps: aggregate NVLink (or equivalent like xGMI/HCCS) | |
| # bandwidth. 0 if the GPU has no high-bandwidth interconnect (e.g. consumer | |
| # Ada removed NVLink). | |
| # - fp16_tflops: peak dense FP16/BF16 with Tensor Cores; vendor cited figure. | |
| # - fp8_support / fp4_support: whether the GPU has NATIVE Tensor Core | |
| # acceleration for that precision. Software emulation does NOT count. | |
| # | |
| # To add a new GPU: append an entry with all required fields + spec_source. | |
| # See docs/architecture-guide.md "How to add a new GPU". | |
| schema_version: 1 | |
| gpus: | |
| # ======================================================================== | |
| # NVIDIA Blackwell (2024+) — native FP4 | |
| # ======================================================================== | |
| - id: B200 | |
| aliases: [B200-SXM, B200-192G] | |
| memory_gb: 192 | |
| nvlink_bandwidth_gbps: 1800 | |
| memory_bandwidth_gbps: 8000 | |
| fp16_tflops: 2250 | |
| fp8_support: true | |
| fp4_support: true | |
| spec_source: "NVIDIA Blackwell architecture overview (nvidia.com/blackwell)" | |
| notes_en: "Blackwell flagship. Native FP4 Tensor Cores. First GPU that accelerates DeepSeek-V4-Flash-style FP4 at hardware level." | |
| notes_zh: "Blackwell 旗舰。原生 FP4 Tensor Core,首款在硬件层加速 DeepSeek-V4-Flash 类 FP4 模型的 GPU。" | |
| # ======================================================================== | |
| # NVIDIA Hopper (2022+) | |
| # ======================================================================== | |
| - id: H100 | |
| aliases: [H100-SXM5, H100-80G, H100-SXM] | |
| memory_gb: 80 | |
| nvlink_bandwidth_gbps: 900 | |
| memory_bandwidth_gbps: 3350 | |
| fp16_tflops: 989 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "NVIDIA H100 datasheet (nvidia.com/h100)" | |
| notes_en: "Hopper flagship. Full NVLink." | |
| notes_zh: "Hopper 架构旗舰,完整 NVLink 带宽。" | |
| - id: H800 | |
| aliases: [H800-SXM5, H800-80G] | |
| memory_gb: 80 | |
| nvlink_bandwidth_gbps: 400 | |
| memory_bandwidth_gbps: 3350 | |
| fp16_tflops: 989 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "NVIDIA H800 compliance variant — NVLink halved from H100 per US export controls" | |
| notes_en: "China-regulated H100 variant. NVLink bandwidth halved (400 vs 900). Same HBM and compute as H100." | |
| notes_zh: "H100 的中国合规版本。NVLink 带宽减半(400 vs 900 GB/s),HBM 容量和算力与 H100 相同。" | |
| - id: H200 | |
| aliases: [H200-SXM, H200-141G] | |
| memory_gb: 141 | |
| nvlink_bandwidth_gbps: 900 | |
| memory_bandwidth_gbps: 4800 | |
| fp16_tflops: 989 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "NVIDIA H200 datasheet (nvidia.com/h200)" | |
| notes_en: "Hopper with HBM3e. 141 GB per GPU." | |
| notes_zh: "搭载 HBM3e 的 Hopper,单卡 141 GB。" | |
| - id: GH200 | |
| aliases: [Grace-Hopper, GH200-144G, GH200-96G] | |
| memory_gb: 144 | |
| nvlink_bandwidth_gbps: 900 | |
| memory_bandwidth_gbps: 4800 | |
| fp16_tflops: 989 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "NVIDIA GH200 Grace Hopper datasheet 2023 (144GB HBM3e variant, dense FP16=989 TFLOPS; sparsity doubles it)" | |
| notes_en: "Grace Hopper superchip — Hopper GPU + Grace CPU on one module. 144 GB HBM3e (96 GB HBM3 variant also exists). NVLink-C2C 900 GB/s CPU<->GPU unified. TDP programmable 450-1000W. Ideal for models that spill beyond single GPU memory because GPU can access CPU LPDDR coherently." | |
| notes_zh: "Grace Hopper 超级芯片 — Hopper GPU + Grace CPU 融合模组。144 GB HBM3e(另有 96 GB HBM3 版本)。NVLink-C2C 让 CPU/GPU 共享统一内存空间,900 GB/s 双向。TDP 可编程 450-1000W。模型单卡显存装不下时,可一致地访问 CPU 的 LPDDR。" | |
| - id: GB200 | |
| aliases: [Grace-Blackwell, GB200-per-GPU] | |
| memory_gb: 192 | |
| nvlink_bandwidth_gbps: 1800 | |
| memory_bandwidth_gbps: 8000 | |
| fp16_tflops: 2250 | |
| fp8_support: true | |
| fp4_support: true | |
| spec_source: "NVIDIA GB200 Superchip datasheet 2024 — per-GPU view. Each GB200 = 2 B200 + Grace CPU. Per B200: 192 GB HBM3e, 8 TB/s, 2250 TFLOPS dense FP16 (4500 sparsity). Grace CPU adds up to 480 GB LPDDR5x accessible via NVLink-C2C." | |
| notes_en: "Grace Blackwell superchip — 2 B200 GPUs + Grace CPU on one module. Per-GPU specs here match B200, but each GB200 module unlocks 384 GB HBM3e total (192+192) plus coherent access to 480 GB Grace CPU LPDDR5x. FP4 native. Only deployable in NVL4/NVL72 rack-scale systems with liquid cooling. Per-GPU TDP 1200W." | |
| notes_zh: "Grace Blackwell 超级芯片 — 双 B200 GPU + Grace CPU 融合。此处展示单 GPU 视角规格,与 B200 基本一致。每块 GB200 模组合计 384 GB HBM3e(双卡),并通过 NVLink-C2C 一致访问 480 GB Grace CPU 的 LPDDR5x。原生 FP4。仅在 NVL4 / NVL72 液冷机架系统中部署。单 GPU TDP 1200W。" | |
| - id: H20 | |
| aliases: [H20-96G, H20-SXM] | |
| memory_gb: 96 | |
| nvlink_bandwidth_gbps: 900 | |
| memory_bandwidth_gbps: 4000 | |
| fp16_tflops: 148 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "NVIDIA H20 — released 2024 as China-compliant successor to H800. Compute heavily reduced (~15% of H100); memory bandwidth and HBM3e preserved." | |
| notes_en: "China-compliance Hopper post-Oct-2023 export rules. Compute ~15% of H100 (148 vs 989 TFLOPS), but HBM3e memory bandwidth preserved. Good for memory-bound LLM inference, poor for training." | |
| notes_zh: "2023 年 10 月出口管制后的中国合规 Hopper。算力仅为 H100 的约 15%(148 vs 989 TFLOPS),但 HBM3e 显存带宽保留。推理(显存带宽受限)尚可,训练基本不实用。" | |
| # ======================================================================== | |
| # NVIDIA Ada Lovelace (datacenter) — FP8 yes, NVLink no | |
| # ======================================================================== | |
| - id: L40S | |
| aliases: [L40-S, L40S-48G] | |
| memory_gb: 48 | |
| nvlink_bandwidth_gbps: 0 | |
| memory_bandwidth_gbps: 864 | |
| fp16_tflops: 362 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "NVIDIA L40S datasheet 2023" | |
| notes_en: "Ada datacenter. 48 GB GDDR6. No NVLink — multi-GPU setups rely on PCIe. Cost-effective for small/medium model inference." | |
| notes_zh: "Ada 架构数据中心卡,48 GB GDDR6。无 NVLink,多卡需走 PCIe。中小模型推理性价比高。" | |
| - id: L40 | |
| aliases: [L40-48G] | |
| memory_gb: 48 | |
| nvlink_bandwidth_gbps: 0 | |
| memory_bandwidth_gbps: 864 | |
| fp16_tflops: 181 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "NVIDIA L40 datasheet 2022" | |
| notes_en: "Ada datacenter predecessor to L40S. Same 48 GB, half the compute. Widely deployed in enterprise clouds." | |
| notes_zh: "L40S 的前代,Ada 架构数据中心卡。同为 48 GB,算力减半。企业私有云部署量较大。" | |
| - id: L4 | |
| aliases: [L4-24G] | |
| memory_gb: 24 | |
| nvlink_bandwidth_gbps: 0 | |
| memory_bandwidth_gbps: 300 | |
| fp16_tflops: 121 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "NVIDIA L4 datasheet 2023" | |
| notes_en: "Low-profile Ada, 24 GB GDDR6. Common in low-concurrency inference / transcoding. No NVLink." | |
| notes_zh: "低功耗 Ada,24 GB GDDR6。常用于低并发推理和转码场景。无 NVLink。" | |
| - id: RTX6000-Ada | |
| aliases: [RTX-6000-Ada, RTX6000Ada, L6000] | |
| memory_gb: 48 | |
| nvlink_bandwidth_gbps: 0 | |
| memory_bandwidth_gbps: 960 | |
| fp16_tflops: 365 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "NVIDIA RTX 6000 Ada datasheet 2022" | |
| notes_en: "Ada Pro workstation. 48 GB, similar to L40S but for workstations. FP8 yes, no NVLink." | |
| notes_zh: "Ada Pro 工作站卡。48 GB,规格接近 L40S 但面向工作站。支持 FP8,无 NVLink。" | |
| - id: RTX4090 | |
| aliases: ["4090", RTX-4090] | |
| memory_gb: 24 | |
| nvlink_bandwidth_gbps: 0 | |
| memory_bandwidth_gbps: 1008 | |
| fp16_tflops: 165 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "NVIDIA RTX 4090 datasheet 2022" | |
| notes_en: "Consumer Ada. No NVLink. Large models need multi-GPU via PCIe (slower)." | |
| notes_zh: "消费级 Ada 架构,无 NVLink。大模型多卡只能走 PCIe(明显更慢)。" | |
| # ======================================================================== | |
| # NVIDIA Ampere (2020+) | |
| # ======================================================================== | |
| - id: A100-80G | |
| aliases: [A100-80, A100-SXM-80G] | |
| memory_gb: 80 | |
| nvlink_bandwidth_gbps: 600 | |
| memory_bandwidth_gbps: 2039 | |
| fp16_tflops: 312 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "NVIDIA A100 datasheet 2020" | |
| notes_en: "Ampere. No native FP8. Still widely deployed." | |
| notes_zh: "Ampere 架构。不原生支持 FP8,但部署量仍然非常大。" | |
| - id: A100-40G | |
| aliases: [A100-40, A100-SXM-40G] | |
| memory_gb: 40 | |
| nvlink_bandwidth_gbps: 600 | |
| memory_bandwidth_gbps: 1555 | |
| fp16_tflops: 312 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "NVIDIA A100 40GB datasheet 2020" | |
| notes_en: "Ampere 40 GB variant. Smaller HBM limits large-model single-node deployments." | |
| notes_zh: "Ampere 的 40 GB 版本,显存较小,大模型单机部署受限。" | |
| - id: A40 | |
| aliases: [A40-48G] | |
| memory_gb: 48 | |
| nvlink_bandwidth_gbps: 112 | |
| memory_bandwidth_gbps: 696 | |
| fp16_tflops: 150 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "NVIDIA A40 datasheet 2020" | |
| notes_en: "Ampere workstation. 48 GB with NVLink bridge (limited bandwidth). No FP8." | |
| notes_zh: "Ampere 工作站卡,48 GB + NVLink 桥接(带宽较低)。不支持 FP8。" | |
| - id: A10 | |
| aliases: [A10-24G] | |
| memory_gb: 24 | |
| nvlink_bandwidth_gbps: 0 | |
| memory_bandwidth_gbps: 600 | |
| fp16_tflops: 125 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "NVIDIA A10 datasheet 2021" | |
| notes_en: "Ampere inference card. 24 GB GDDR6. Widely used for low-cost inference in enterprise clouds." | |
| notes_zh: "Ampere 推理卡,24 GB GDDR6。企业云低成本推理常用配置。" | |
| - id: A10G | |
| aliases: [A10G-24G] | |
| memory_gb: 24 | |
| nvlink_bandwidth_gbps: 0 | |
| memory_bandwidth_gbps: 600 | |
| fp16_tflops: 125 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "NVIDIA A10G — AWS-specific variant of A10, g5 instances" | |
| notes_en: "AWS-specific A10 variant. Same silicon as A10, deployed in g5 EC2 instances. No NVLink." | |
| notes_zh: "AWS 定制版 A10,用于 g5 EC2 实例。核心规格与 A10 相同,无 NVLink。" | |
| # ======================================================================== | |
| # NVIDIA Volta / Turing (older, still deployed) | |
| # ======================================================================== | |
| - id: V100-SXM2-32G | |
| aliases: [V100, V100-32G, V100-SXM2] | |
| memory_gb: 32 | |
| nvlink_bandwidth_gbps: 300 | |
| memory_bandwidth_gbps: 900 | |
| fp16_tflops: 125 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "NVIDIA V100 SXM2 datasheet 2017" | |
| notes_en: "Volta. No FP8. Still deployed in many existing clusters — works for smaller models, tight for 70B+." | |
| notes_zh: "Volta 架构。不支持 FP8,但仍在大量老集群中服役。小模型够用,70B+ 紧张。" | |
| - id: V100-PCIe-32G | |
| aliases: [V100-PCIe, V100-PCI] | |
| memory_gb: 32 | |
| nvlink_bandwidth_gbps: 0 | |
| memory_bandwidth_gbps: 900 | |
| fp16_tflops: 112 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "NVIDIA V100 PCIe datasheet 2017 — PCIe variant of V100, no NVLink." | |
| notes_en: "PCIe version of V100. No NVLink, lower clocks than SXM2. Common in older servers." | |
| notes_zh: "V100 的 PCIe 版本,无 NVLink,主频稍低。老服务器常见配置。" | |
| - id: T4 | |
| aliases: [T4-16G] | |
| memory_gb: 16 | |
| nvlink_bandwidth_gbps: 0 | |
| memory_bandwidth_gbps: 320 | |
| fp16_tflops: 65 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "NVIDIA T4 datasheet 2018" | |
| notes_en: "Turing inference card. 16 GB, no NVLink, no FP8. Common as the cheapest cloud GPU option." | |
| notes_zh: "Turing 推理卡。16 GB,无 NVLink,无 FP8。各云厂商最便宜的 GPU 选项之一。" | |
| # ======================================================================== | |
| # AMD (ROCm, xGMI instead of NVLink) | |
| # ======================================================================== | |
| - id: MI325X | |
| aliases: [MI325X-256G, AMD-MI325X] | |
| memory_gb: 256 | |
| nvlink_bandwidth_gbps: 896 | |
| memory_bandwidth_gbps: 6000 | |
| fp16_tflops: 1307 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "AMD Instinct MI325X datasheet 2024 — 256 GB HBM3E, 6 TB/s bandwidth, 1000W TDP, CDNA 3." | |
| notes_en: "AMD flagship 2024. 256 GB HBM3E (largest single-card memory in v0.1 database). Upgraded MI300X with faster HBM3E and more capacity. Dense FP16 1307 TFLOPS, FP8 2615 TFLOPS. 1000W TDP, OAM format. ROCm software stack." | |
| notes_zh: "AMD 2024 年旗舰。256 GB HBM3E(v0.1 数据库中单卡最大)。MI300X 升级版,HBM3E 更快、容量更大。Dense FP16 1307 TFLOPS,FP8 2615 TFLOPS。1000W TDP,OAM 形态。需要 ROCm 软件栈。" | |
| - id: MI300X | |
| aliases: [MI300X-192G, AMD-MI300X] | |
| memory_gb: 192 | |
| nvlink_bandwidth_gbps: 896 | |
| memory_bandwidth_gbps: 5300 | |
| fp16_tflops: 1307 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "AMD Instinct MI300X datasheet 2023-12" | |
| notes_en: "AMD flagship 2023. 192 GB HBM3. xGMI 896 GB/s (like NVLink). Software stack: ROCm + vLLM. Support for DeepSeek V4 etc. lags Nvidia by weeks." | |
| notes_zh: "AMD 2023 年旗舰。192 GB HBM3。xGMI 互联 896 GB/s(类 NVLink)。需要 ROCm + vLLM 栈。新模型支持通常比 NVIDIA 晚几周。" | |
| - id: MI250X | |
| aliases: [MI250X-128G, AMD-MI250X] | |
| memory_gb: 128 | |
| nvlink_bandwidth_gbps: 800 | |
| memory_bandwidth_gbps: 3280 | |
| fp16_tflops: 383 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "AMD Instinct MI250X datasheet 2022" | |
| notes_en: "AMD previous-gen. 128 GB HBM2e. No FP8. Deployed in some HPC clusters (Frontier)." | |
| notes_zh: "AMD 上代数据中心卡。128 GB HBM2e,不支持 FP8。少数 HPC 集群(如 Frontier 超算)有部署。" | |
| - id: MI210 | |
| aliases: [MI210-64G, AMD-MI210] | |
| memory_gb: 64 | |
| nvlink_bandwidth_gbps: 300 | |
| memory_bandwidth_gbps: 1600 | |
| fp16_tflops: 181 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "AMD Instinct MI210 datasheet 2022 — CDNA 2, single-die version of MI250. 64 GB HBM2e." | |
| notes_en: "AMD CDNA 2 single-die. 64 GB HBM2e, 1.6 TB/s. No FP8 (CDNA 2 limitation). Common as entry-level AMD datacenter card." | |
| notes_zh: "AMD CDNA 2 单 die 版本,64 GB HBM2e,1.6 TB/s 带宽。不支持 FP8(CDNA 2 限制)。AMD 入门数据中心卡常见配置。" | |
| # ======================================================================== | |
| # Intel Habana Gaudi | |
| # ======================================================================== | |
| - id: Gaudi3 | |
| aliases: [Gaudi-3, Habana-Gaudi3] | |
| memory_gb: 128 | |
| nvlink_bandwidth_gbps: 1200 | |
| memory_bandwidth_gbps: 3700 | |
| fp16_tflops: 1835 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "Intel Gaudi 3 datasheet 2024" | |
| notes_en: "Intel Habana Gaudi 3. 128 GB HBM2e. FP8 support. Software stack: SynapseAI (not CUDA). vLLM support via Intel fork." | |
| notes_zh: "Intel Habana Gaudi 3。128 GB HBM2e,支持 FP8。软件栈为 SynapseAI(非 CUDA)。vLLM 需走 Intel 分支。" | |
| - id: Gaudi2 | |
| aliases: [Gaudi-2, Habana-Gaudi2] | |
| memory_gb: 96 | |
| nvlink_bandwidth_gbps: 2400 | |
| memory_bandwidth_gbps: 2450 | |
| fp16_tflops: 432 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "Intel Gaudi 2 datasheet 2022" | |
| notes_en: "Intel Habana Gaudi 2. 96 GB HBM2e with 24x100GbE on-board (used for scale-out). FP8 support." | |
| notes_zh: "Intel Habana Gaudi 2。96 GB HBM2e,板载 24 个 100GbE(用于横向扩展)。支持 FP8。" | |
| # ======================================================================== | |
| # Huawei Ascend | |
| # ======================================================================== | |
| # The 910B "series" is actually a set of sub-variants (B1/B2/B3/B4) with | |
| # different compute tiers and memory sizes. `910B` as a plain id resolves | |
| # to 910B3 (the most common training configuration). | |
| - id: "910A" | |
| aliases: [Ascend-910A] | |
| memory_gb: 32 | |
| nvlink_bandwidth_gbps: 400 | |
| memory_bandwidth_gbps: 1200 | |
| fp16_tflops: 256 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Ascend 910 (1st gen) — 7nm, 32 GB HBM. Community-compiled spec." | |
| notes_en: "Huawei Ascend 910 (1st gen, 2019). Predecessor to 910B. Still deployed in many older clusters. HCCS interconnect." | |
| notes_zh: "华为昇腾 910 第一代(2019 年),910B 的前身。很多老集群仍在使用。HCCS 互联。" | |
| - id: "910B1" | |
| aliases: [Ascend-910B1] | |
| memory_gb: 64 | |
| nvlink_bandwidth_gbps: 400 | |
| memory_bandwidth_gbps: 1600 | |
| fp16_tflops: 414 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Ascend 910B1 — training variant, Atlas 800T A2. Commonly cited as top-tier 910B sub-variant; TSMC 7nm process." | |
| notes_en: "Top-tier 910B training variant. 64 GB HBM2, 414 TFLOPS FP16. Used in Atlas 800T A2 training servers. No native FP8." | |
| notes_zh: "910B 系列顶配训练版本。64 GB HBM2,FP16 算力 414 TFLOPS。搭载于 Atlas 800T A2 训练服务器。不原生支持 FP8。" | |
| - id: "910B2" | |
| aliases: [Ascend-910B2] | |
| memory_gb: 64 | |
| nvlink_bandwidth_gbps: 400 | |
| memory_bandwidth_gbps: 1600 | |
| fp16_tflops: 376 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Ascend 910B2 — training variant, commonly cited as standard 910B training configuration." | |
| notes_en: "Standard 910B training variant. 64 GB HBM2, 376 TFLOPS FP16. General-purpose training server baseline." | |
| notes_zh: "910B 常规训练版本。64 GB HBM2,FP16 算力 376 TFLOPS。通用训练服务器标准配置。" | |
| - id: "910B3" | |
| aliases: [Ascend-910B3, "910B", Ascend-910B] | |
| memory_gb: 64 | |
| nvlink_bandwidth_gbps: 400 | |
| memory_bandwidth_gbps: 1600 | |
| fp16_tflops: 313 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Ascend 910B3 — training variant, SMIC-produced per industry reports. (aliased as bare `910B` for convenience)" | |
| notes_en: "910B3 training variant, 313 TFLOPS FP16. Believed to be SMIC-produced (vs TSMC for B1/B2). The `910B` bare name resolves here since B3 is the most commonly referenced." | |
| notes_zh: "910B3 训练版本,FP16 算力 313 TFLOPS。业界普遍认为由中芯国际生产(B1/B2 据传为台积电)。裸写 `910B` 时默认解析到此条目(最常被引用)。" | |
| - id: "910B4" | |
| aliases: [Ascend-910B4] | |
| memory_gb: 32 | |
| nvlink_bandwidth_gbps: 400 | |
| memory_bandwidth_gbps: 1600 | |
| fp16_tflops: 280 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Ascend 910B4 — inference variant, 32 GB HBM (half of B1/B2/B3). Atlas 800I A2 inference server." | |
| notes_en: "910B4 is the inference-oriented 910B variant. 32 GB HBM (half of training variants), 280 TFLOPS FP16. Deployed in Atlas 800I A2 inference servers." | |
| notes_zh: "910B4 是 910B 系列的推理版本。32 GB HBM(训练版本的一半),FP16 算力 280 TFLOPS。搭载于 Atlas 800I A2 推理服务器。" | |
| - id: "910C" | |
| aliases: [Ascend-910C] | |
| memory_gb: 64 | |
| nvlink_bandwidth_gbps: 400 | |
| memory_bandwidth_gbps: 3200 | |
| fp16_tflops: 780 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Huawei Ascend 910C — launched 2024, commonly cited specs pending official datasheet" | |
| notes_en: "Huawei Ascend 910C (2024). Roughly 2x compute vs 910B at similar memory. FP8 support status unclear — check CANN version notes. Software ecosystem matures but still behind NVIDIA." | |
| notes_zh: "华为昇腾 910C(2024 年)。算力大约是 910B 的两倍,显存相当。FP8 支持情况需看 CANN 版本。软件生态持续完善但仍落后于 NVIDIA。" | |
| - id: Atlas-300I-Duo | |
| aliases: [Atlas300IDuo, 300I-Duo] | |
| memory_gb: 48 | |
| nvlink_bandwidth_gbps: 0 | |
| memory_bandwidth_gbps: 204 | |
| fp16_tflops: 140 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Huawei Atlas 300I Duo inference card — 2x Ascend 310P per card. 140 TFLOPS FP16 per card, 48 GB LPDDR4X." | |
| notes_en: "Huawei Atlas 300I Duo inference card: 2x Ascend 310P with combined 48 GB LPDDR4X (96 GB variant available). 280 TOPS INT8. LPDDR4X gives 204 GB/s total bandwidth — much lower than HBM-based cards. PCIe-only, no NVLink. Best for cost-sensitive inference." | |
| notes_zh: "华为 Atlas 300I Duo 推理卡:双 Ascend 310P,合计 48 GB LPDDR4X(另有 96 GB 版本)。INT8 280 TOPS。显存是 LPDDR4X,带宽 204 GB/s,远低于 HBM 卡。仅 PCIe,无 NVLink。主要面向成本敏感的推理场景。" | |
| # ======================================================================== | |
| # Chinese domestic AI accelerators (non-NVIDIA / non-AMD) | |
| # ======================================================================== | |
| - id: MXC500 | |
| aliases: [MetaX-MXC500, XiYun-C500, 曦云C500] | |
| memory_gb: 64 | |
| nvlink_bandwidth_gbps: 800 | |
| memory_bandwidth_gbps: 1800 | |
| fp16_tflops: 240 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "MetaX 沐曦 MXC500 / 曦云 C500 (PCIe variant, 350W). OAM variant has 280 TFLOPS FP16 @ 450W. 64 GB HBM2e, 1.8 TB/s memory bandwidth, MetaXLink interconnect." | |
| notes_en: "MetaX (沐曦) MXC500. 7nm, CUDA-compatible via MXMACA stack. PCIe variant: 240 TFLOPS FP16, 350W. OAM variant: 280 TFLOPS FP16, 450W. Targets A100-class workloads. No native FP8." | |
| notes_zh: "沐曦曦云 C500。7nm 工艺,通过 MXMACA 软件栈兼容 CUDA。PCIe 版本 FP16 240 TFLOPS / 350W,OAM 版本 280 TFLOPS / 450W。对标 A100 场景。不原生支持 FP8。" | |
| - id: MXC550 | |
| aliases: [MetaX-MXC550, XiYun-C550, 曦云C550] | |
| memory_gb: 64 | |
| nvlink_bandwidth_gbps: 896 | |
| memory_bandwidth_gbps: 1600 | |
| fp16_tflops: 240 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "MetaX 沐曦 MXC550 / 曦云 C550 (OAM, 2024). Partial specs from third-party comparison docs; full datasheet TBD. 8-card fabric bandwidth 896 GB/s." | |
| notes_en: "MetaX (沐曦) MXC550 — 2024 OAM-format flagship. Supports OAM 1.5 + 2.0. 8-card fabric bandwidth 896 GB/s. Full specs pending official datasheet — figures here are from third-party comparison articles." | |
| notes_zh: "沐曦曦云 C550 — 2024 年 OAM 形态旗舰。支持 OAM 1.5 + 2.0 规范。八卡全互联带宽 896 GB/s。完整规格待官方数据表披露,此处数字来自第三方对比资料。" | |
| - id: Kunlun-P800 | |
| aliases: [KunlunXin-P800, 昆仑芯P800, Kunlun-Gen3] | |
| memory_gb: 96 | |
| nvlink_bandwidth_gbps: 400 | |
| memory_bandwidth_gbps: 2000 | |
| fp16_tflops: 345 | |
| fp8_support: true | |
| fp4_support: false | |
| spec_source: "KunlunXin P800 (3rd gen, 2024). 96 GB HBM3 (largest among Chinese domestic AI chips). Baidu Cloud uses P800 for first-party inference. Specs partially inferred from public Baidu announcements; official datasheet limited distribution." | |
| notes_en: "Baidu KunlunXin P800 — 3rd gen, 2024. 96 GB HBM3. Reported to support 8-bit inference and MoE optimizations. Baidu's internal clusters run Kunlun P800 at 10k+ card scale. Figures here are from public Baidu materials; official spec sheet not fully public." | |
| notes_zh: "百度昆仑芯 P800 — 第三代,2024 年。96 GB HBM3(国产 AI 芯片中显存最大之一)。报告支持 8bit 推理和 MoE 优化。百度内部 1 万卡以上规模部署。数字来自百度公开资料,完整规格表未完全披露。" | |
| - id: Kunlun-R200 | |
| aliases: [KunlunXin-R200, 昆仑芯R200, Kunlun-Gen2] | |
| memory_gb: 32 | |
| nvlink_bandwidth_gbps: 200 | |
| memory_bandwidth_gbps: 512 | |
| fp16_tflops: 128 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "KunlunXin R200 (2nd gen, 2021). 7nm XPU architecture. FP16 128 TFLOPS / INT8 256 TOPS." | |
| notes_en: "Baidu KunlunXin R200 — 2nd gen, 7nm. FP16 128 TFLOPS, INT8 256 TOPS. XPU architecture. PCIe 4.0 + XCCL interconnect. No FP8." | |
| notes_zh: "百度昆仑芯 R200 — 第二代,7nm XPU 架构。FP16 128 TFLOPS,INT8 256 TOPS。PCIe 4.0 + 昆仑芯互联 XCCL。无 FP8。" | |
| - id: BR100 | |
| aliases: [Biren-BR100, 壁仞BR100, 壁砺100] | |
| memory_gb: 64 | |
| nvlink_bandwidth_gbps: 512 | |
| memory_bandwidth_gbps: 1640 | |
| fp16_tflops: 1024 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Biren 壁仞 BR100 (OAM, 550W). 7nm Chiplet, 77B transistors. BF16/FP16 1024 TFLOPS, INT8 2048 TOPS, 64 GB HBM2e 1.64 TB/s. BLINK 512 GB/s 8-card fabric." | |
| notes_en: "Biren BR100 (壁仞) — 2022 flagship. OAM format, 550W. 1024 TFLOPS BF16/FP16 (PFLOPS class), 64 GB HBM2e. BLINK interconnect 512 GB/s (8-card fabric). No FP8. US export-restricted since 2022 — production status uncertain." | |
| notes_zh: "壁仞 BR100 — 2022 年旗舰 OAM 卡,550W。BF16/FP16 1024 TFLOPS(PFLOPS 级),64 GB HBM2e。BLINK 互联 512 GB/s(8 卡全互联)。无 FP8。2022 年被美国出口管制,后续量产状态不明。" | |
| - id: BR104 | |
| aliases: [Biren-BR104, 壁仞BR104, 壁砺104] | |
| memory_gb: 32 | |
| nvlink_bandwidth_gbps: 128 | |
| memory_bandwidth_gbps: 820 | |
| fp16_tflops: 512 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Biren 壁仞 BR104 (PCIe, 300W). Single-die version of BR100 with halved specs. BF16/FP16 512 TFLOPS, 32 GB HBM2e. Won MLPerf Inference ResNet50 and BERT single-card top-1 in its class." | |
| notes_en: "Biren BR104 — PCIe single-die version of BR100. 300W, 512 TFLOPS BF16/FP16, 32 GB HBM2e. Won MLPerf Inference BERT (1.58x A100 in server mode). No FP8. Export-restricted." | |
| notes_zh: "壁仞 BR104 — BR100 的单 die PCIe 版本。300W,BF16/FP16 512 TFLOPS,32 GB HBM2e。MLPerf Inference BERT 测试 server 模式性能达 A100 的 1.58 倍。无 FP8。已被出口管制。" | |
| - id: BI-V100 | |
| aliases: [Iluvatar-BI-V100, 天数天垓100, TianGai-100] | |
| memory_gb: 32 | |
| nvlink_bandwidth_gbps: 64 | |
| memory_bandwidth_gbps: 1200 | |
| fp16_tflops: 147 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Iluvatar CoreX 天数智芯 BI-V100 (天垓100). 7nm, SIMT, 24B transistors, 2.5D CoWoS packaging. FP16 147 TFLOPS / INT8 295 TOPS. 32 GB HBM2, 1.2 TB/s bandwidth. PCIe 4.0 x16, 250W TDP." | |
| notes_en: "Iluvatar (天数智芯) BI-V100 — training/general-purpose. 7nm SIMT architecture, 32 GB HBM2, 1.2 TB/s memory bandwidth. FP16 147 TFLOPS, INT8 295 TOPS. 250W TDP. Interconnect bandwidth per card is modest (~64 GB/s shared)." | |
| notes_zh: "天数智芯 BI-V100(天垓100)— 训练/通用 GPU。7nm SIMT 架构,32 GB HBM2,1.2 TB/s 显存带宽。FP16 147 TFLOPS,INT8 295 TOPS。250W TDP。单卡互联带宽 ~64 GB/s,相对较低。" | |
| - id: MR-V100 | |
| aliases: [Iluvatar-MR-V100, 天数智铠100, ZhiKai-100] | |
| memory_gb: 32 | |
| nvlink_bandwidth_gbps: 0 | |
| memory_bandwidth_gbps: 1200 | |
| fp16_tflops: 100 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Iluvatar CoreX 天数智芯 智铠100 (MR-V100) 2022. Inference card, 32 GB HBM2E, ~200 TFLOPS BF16/FP16-low-precision-aggregated, 128-channel 1080p video decode, 150W TDP." | |
| notes_en: "Iluvatar inference card (智铠100). 32 GB HBM2E. 150W TDP. Primarily inference-focused — mixed-precision aggregated throughput ~200 TFLOPS." | |
| notes_zh: "天数智芯智铠100 推理卡。32 GB HBM2E,150W TDP。主要面向推理场景,混合精度聚合算力约 200 TFLOPS。" | |
| - id: MLU370-X8 | |
| aliases: [Cambricon-MLU370-X8, 寒武纪MLU370-X8, 思元370-X8] | |
| memory_gb: 48 | |
| nvlink_bandwidth_gbps: 200 | |
| memory_bandwidth_gbps: 614 | |
| fp16_tflops: 48 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Cambricon 寒武纪 MLU370-X8 (dual MLU370 chiplet, 250W). 48 GB LPDDR5, INT8 256 TOPS, FP32 24 TFLOPS (FP16 ~48 TFLOPS estimated, official not given). MLU-Link 200 GB/s." | |
| notes_en: "Cambricon (寒武纪) MLU370-X8 — dual-chip package, 250W. 48 GB LPDDR5 (not HBM), INT8 256 TOPS, FP32 24 TFLOPS. MLU-Link 200 GB/s for 8-card setups. LPDDR5 means lower memory bandwidth than HBM cards." | |
| notes_zh: "寒武纪 MLU370-X8 — 双芯粒封装,250W。48 GB LPDDR5(非 HBM),INT8 256 TOPS,FP32 24 TFLOPS。MLU-Link 200 GB/s,支持 8 卡部署。LPDDR5 意味着显存带宽低于 HBM 卡。" | |
| - id: MLU590 | |
| aliases: [Cambricon-MLU590, 寒武纪MLU590, 思元590] | |
| memory_gb: 80 | |
| nvlink_bandwidth_gbps: 372 | |
| memory_bandwidth_gbps: 2000 | |
| fp16_tflops: 314 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Cambricon 寒武纪 思元590 (MLU590) — 7nm, MLUv02/MLUarch05. 80 GB HBM (likely HBM2e based on 2 TB/s bandwidth), FP16 314 TFLOPS, FP32 80 TFLOPS, MLU-Link 372 GB/s. Used at Baidu ERNIE (文心一言) project." | |
| notes_en: "Cambricon (寒武纪) MLU590 — flagship AI training chip. 80 GB HBM, 2 TB/s memory bandwidth. FP16 314 TFLOPS (dense). MLU-Link 372 GB/s 8-card fabric. Comparable FP16 compute to NVIDIA A100 level. No FP8. Production volume and ecosystem still maturing." | |
| notes_zh: "寒武纪思元590 — 旗舰 AI 训练芯片。80 GB HBM,2 TB/s 显存带宽。FP16 314 TFLOPS(dense),综合性能约为 A100 级别。MLU-Link 372 GB/s 八卡互联。无 FP8。量产规模和生态仍在成熟。" | |
| - id: Hygon-K100-AI | |
| aliases: [K100-AI, 海光K100AI, DCU-K100-AI] | |
| memory_gb: 64 | |
| nvlink_bandwidth_gbps: 184 | |
| memory_bandwidth_gbps: 896 | |
| fp16_tflops: 192 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Hygon 海光 K100 AI — DCU architecture (GPGPU+AI hybrid), 64 GB HBM, 896 GB/s memory bandwidth, 350W TDP. FP16 192 TFLOPS dense (some sources cite 256 TFLOPS but values vary). xGMI 184 GB/s." | |
| notes_en: "Hygon (海光) K100 AI — DCU series. 64 GB HBM, 896 GB/s bandwidth. FP16 192 TFLOPS (industry reports vary 100-256 TFLOPS depending on compute unit/mode). ROCm-compatible, can leverage AMD software ecosystem. Positioned against A800 for Chinese market. 350W TDP." | |
| notes_zh: "海光 K100 AI — DCU 系列。64 GB HBM,896 GB/s 带宽。FP16 192 TFLOPS(公开资料数字因计算单元和精度模式不同有 100-256 TFLOPS 差异)。兼容 ROCm,可复用 AMD 软件生态。面向国产 A800 替代场景。350W TDP。" | |
| - id: Hygon-Z100 | |
| aliases: [Z100, 海光Z100, DCU-Z100, 深算二号] | |
| memory_gb: 32 | |
| nvlink_bandwidth_gbps: 184 | |
| memory_bandwidth_gbps: 1000 | |
| fp16_tflops: 180 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Hygon 海光 DCU Z100 (深算二号) — 32 GB HBM2, 1 TB/s bandwidth, 8192 compute cores, FP32 90 TFLOPS, FP16 ~180 TFLOPS (2x FP32), FP64 10.8 TFLOPS. xGMI 184 GB/s. Performance reported as 80-90% of A100. 350W TDP." | |
| notes_en: "Hygon (海光) DCU Z100 / 深算二号. 32 GB HBM2, 1 TB/s bandwidth, 8192 compute units. FP16 180 TFLOPS, FP32 90 TFLOPS, FP64 10.8 TFLOPS. 350W. Performance cited at 80-90% of A100. ROCm stack, PCIe Gen4 + xGMI multi-card." | |
| notes_zh: "海光 DCU Z100(深算二号)。32 GB HBM2,1 TB/s 带宽,8192 计算单元。FP16 180 TFLOPS,FP32 90 TFLOPS,FP64 10.8 TFLOPS。350W。综合性能约为 A100 的 80-90%。基于 ROCm 栈,PCIe Gen4 + xGMI 多卡互联。" | |
| - id: MTT-S4000 | |
| aliases: [MooreThreads-S4000, 摩尔线程S4000, MTT-S4000-48G] | |
| memory_gb: 48 | |
| nvlink_bandwidth_gbps: 240 | |
| memory_bandwidth_gbps: 768 | |
| fp16_tflops: 100 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Moore Threads MTT S4000 datasheet 2023 — 3rd-gen MUSA (曲院). 48 GB GDDR6, 768 GB/s bandwidth. FP16/BF16 100 TFLOPS, INT8 200 TOPS. MTLink 1.0 240 GB/s." | |
| notes_en: "Moore Threads (摩尔线程) S4000 — domestic AI training card. 48 GB GDDR6 (not HBM), 768 GB/s. FP16/BF16 100 TFLOPS. MTLink 1.0 240 GB/s. CUDA compatibility via MUSA translation." | |
| notes_zh: "摩尔线程 S4000 — 国产训推加速卡。48 GB GDDR6(非 HBM),768 GB/s 带宽。FP16/BF16 100 TFLOPS。MTLink 1.0 互联 240 GB/s。通过 MUSA 兼容 CUDA 生态。" | |
| - id: MTT-S3000 | |
| aliases: [MooreThreads-S3000, 摩尔线程S3000] | |
| memory_gb: 32 | |
| nvlink_bandwidth_gbps: 0 | |
| memory_bandwidth_gbps: 448 | |
| fp16_tflops: 30 | |
| fp8_support: false | |
| fp4_support: false | |
| spec_source: "Moore Threads MTT S3000 — MUSA 春晓 architecture. 32 GB GDDR6, 448 GB/s. FP32 ~15.2 TFLOPS inferred from S4000 comparison (S4000 is 64%+ higher); FP16 ~30 TFLOPS estimate (datasheet not fully public)." | |
| notes_en: "Moore Threads (摩尔线程) S3000 — predecessor to S4000. 32 GB GDDR6, 448 GB/s. FP16 specs not fully published; estimated ~30 TFLOPS based on S4000 comparison. Multi-purpose server GPU, also supports rendering." | |
| notes_zh: "摩尔线程 S3000 — S4000 的前代。32 GB GDDR6,448 GB/s。FP16 官方未完全披露,基于 S4000 对比推算约 30 TFLOPS。通用服务器 GPU,兼顾渲染场景。" | |