javirk1 commited on
Commit
f6e319a
·
verified ·
1 Parent(s): edf1672

Upload folder using huggingface_hub

Browse files
Files changed (5) hide show
  1. README.md +132 -0
  2. bias.md +6 -0
  3. explainability.md +35 -0
  4. privacy.md +33 -0
  5. safety.md +12 -0
README.md ADDED
@@ -0,0 +1,132 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Telesurgery Neural Tokenizer v1.0 Overview
2
+
3
+ ## Description:
4
+ Telesurgery Neural Tokenizer processes surgical scenario inputs by tokenizing frames using a distilled frame autoencoder, optimized for low-latency applications like telesurgery video streaming.
5
+
6
+ _This model is available for commercial use._
7
+
8
+ ### License/Terms of Use:
9
+ NVIDIA Open Model License
10
+
11
+ ### Deployment Geography:
12
+ Global
13
+
14
+ ### Use Case:
15
+ Primarily intended for surgical robotics researchers, healthcare AI developers, academic institutions, or companies exploring neural codecs for telesurgery applications, particularly where low latency video streaming is critical.
16
+
17
+ ## Model Architecture:
18
+ **Architecture Type:** Convolutional Neural Network with Residual and Attention Blocks (based on Wan2.1 with 2D Convolutions)
19
+ **Network Architecture:** Telesurgery Neural Tokenizer (Custom Architecture, 1GB VRAM Requirement, Optimized for NVIDIA GPUs)
20
+
21
+ **This model was distilled from Wan2.1.**
22
+ **Number of model parameters:** 12.6M
23
+
24
+ ## Input:
25
+ **Input Type(s):** Image
26
+ **Input Format(s):** Red, Green, Blue (RGB)
27
+ **Input Parameters:** Two-Dimensional (2D)
28
+ **Other Properties Related to Input:** Image Resolution: 536x960, 720x1280 or 1080x1920; Image Range: [-1, 1]
29
+
30
+ ## Output:
31
+ **Output Type(s):** Embeddings
32
+ **Output Format:** Pytorch Tensor
33
+ **Output Parameters:** Three-Dimensional (3D)
34
+ **Other Properties Related to Output:** Embeddings format: 2x(H/8)x(W/8) (With `H` and `W` being Height and Width of the original image).
35
+
36
+ **Output Type(s):** Image
37
+ **Output Format:** Red, Green, Blue (RGB)
38
+ **Output Parameters:** Two-Dimensional (2D)
39
+ **Other Properties Related to Output:** Minimum Resolution: 480x848, Maximum Resolution: 536x960, Image Range: [-1, 1]
40
+
41
+ Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems NVIDIA GPUs or equivalent GPU-accelerated hardware. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
42
+
43
+ ## Software Integration:
44
+ **Runtime Engine(s):**
45
+ * TensorRT
46
+
47
+ **Supported Hardware Microarchitecture Compatibility:**
48
+ * NVIDIA Ampere
49
+ * NVIDIA Blackwell
50
+ * NVIDIA Hopper
51
+
52
+ **[Preferred/Supported] Operating System(s):**
53
+ * Linux
54
+
55
+ The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
56
+
57
+
58
+ ## Model Version(s):
59
+ v0.1
60
+
61
+ The Telesurgery Neural Tokenizer can be integrated into an AI system via ONNX or TensorRT runtime engines, supporting NVIDIA Ampere, Blackwell, and Hopper microarchitectures, and Linux-based operating systems. It accepts 2D RGB image frames (numeric vectors) at specific resolutions (536x960, 720x1280 or 1080x1920) for low-latency video streaming in telesurgery scenarios.
62
+
63
+ ## Training, Testing, and Evaluation Datasets:
64
+
65
+ ## Training Dataset:
66
+ **Link:** In-house surgical data (laparoscopic surgeries)
67
+
68
+ **Data Modality:**
69
+ * Image
70
+
71
+ **Image Training Data Size:**
72
+ * Less than a Million Images
73
+
74
+ **Text Training Data Size:**
75
+ * Less than a Billion Tokens
76
+
77
+ **Video Training Data Size:**
78
+ * 10,000 to 1 Million Hours
79
+
80
+ **Non-Audio, Image, Text Training Data Size:**
81
+ * Approximately 536x960 to 1080x1920 pixels (RGB images)
82
+
83
+ **Data Collection Method by dataset:**
84
+ * Human
85
+
86
+ **Labeling Method by dataset:**
87
+ * Human
88
+
89
+ **Properties (Quantity, Dataset Descriptions, Sensor(s)):** Training set consists of 5765 (5-minute) video items for laparoscopic surgeries. **Modality**: Video (Image sequences). **Content Nature**: In-house surgical data.
90
+
91
+ ## Testing Dataset:
92
+ **Link:** In-house surgical data (laparoscopic surgeries)
93
+
94
+ **Data Collection Method by dataset:**
95
+ * Human
96
+
97
+ **Labeling Method by dataset:**
98
+ * Human
99
+
100
+ **Properties (Quantity, Dataset Descriptions, Sensor(s)):** Testing set consists of 1224 (5-minute) video items for laparoscopic surgeries. **Modality**: Video (Image sequences). **Content Nature**: In-house surgical data.
101
+
102
+ ## Evaluation Dataset:
103
+ **Link:** In-house surgical dataset (laparoscopic surgeries)
104
+ (Internal Only: Not To Be Published)
105
+ **Benchmark Score:** v1.0: PSNR=34.61±3.18, SSIM=0.961±0.026, LPIPS=0.105±0.026
106
+
107
+ **Data Collection Method by dataset:**
108
+ * Human
109
+
110
+ **Labeling Method by dataset:**
111
+ * Human
112
+
113
+ **Properties (Quantity, Dataset Descriptions, Sensor(s)):** Evaluation set consists of 1220 (5-minute) video items for laparoscopic surgeries. **Modality**: Video (Image sequences). **Content Nature**: In-house surgical data.
114
+
115
+ ## Inference:
116
+ **Acceleration Engine:** TensorRT
117
+
118
+ **Test Hardware:**
119
+ * A100
120
+ * A6000
121
+ * RTX 6000 ADA
122
+
123
+ ## Ethical Considerations:
124
+ NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
125
+
126
+ Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
127
+
128
+ For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards.
129
+
130
+ Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment.
131
+
132
+ Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).
bias.md ADDED
@@ -0,0 +1,6 @@
 
 
 
 
 
 
 
1
+ # Bias Subcard
2
+ ## Participation considerations from adversely impacted groups protected classes in model design and testing:
3
+ None
4
+
5
+ ## Measures taken to mitigate against unwanted bias:
6
+ None
explainability.md ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Explainability Subcard
2
+ ## intended_domain
3
+ Image compression and reconstruction
4
+
5
+ ## Model Type
6
+ Convolutional with quantization
7
+
8
+ ## Intended Users
9
+ Surgeons, Telemedicine Professionals, Medical Robotics Engineers
10
+
11
+ ## Output
12
+ Types: Image. Formats: Red, Green, Blue (RGB)
13
+
14
+ ## Describe how the model works:
15
+ Type: Convolutional Neural Network with Residual and Attention Blocks (distilled from Wan2.1 with 2D Convolutions),
16
+ It works as an autoencoder and quantizes the latent space.
17
+
18
+ ## Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of:
19
+ None
20
+
21
+ ## Technical Limitations & Mitigation:
22
+ This model may struggle when confronted with high spatial frequency data, such as text in the images. As the model has not been explicitly evaluated across a substantial and diverse population, its performance may vary and should be evaluated by qualified experts for use in clinical settings.
23
+ **Mitigation:** For scenarios where accurate text reconstruction is critical, it is recommended to preprocess images to exclude textual content, use supplementary OCR models to extract and transmit text separately, or increase the image resolution if hardware constraints allow.
24
+
25
+ ## Verified to have met prescribed NVIDIA quality standards:
26
+ Yes
27
+
28
+ ## Performance Metrics:
29
+ PSNR 34.61, LPIPS 0.10, 0.5 bpp, latency
30
+
31
+ ## Potential Known Risks:
32
+ This model may inaccurately reproduce text, making it illegible.
33
+
34
+ ## Licensing:
35
+ NVIDIA Open Model License.
privacy.md ADDED
@@ -0,0 +1,33 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Privacy Subcard
2
+ ## Generatable or reverse engineerable personal data?
3
+ No
4
+
5
+ ## Personal data used to create this model?
6
+ Yes
7
+
8
+ ## How often is dataset reviewed?
9
+ Dataset is initially reviewed upon addition, and subsequent reviews are conducted as needed or upon request for changes.
10
+
11
+ ## Is a mechanism in place to honor data subject right of access or deletion of personal data?
12
+ Yes
13
+
14
+ ## If personal data was collected for the development of the model, was it collected directly by NVIDIA?
15
+ No
16
+
17
+ ## If personal data was collected for the development of this AI model, was it minimized to only what was required?
18
+ Yes
19
+
20
+ ## Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model?
21
+ No
22
+
23
+ ## Is there provenance for all datasets used in training?
24
+ Yes
25
+
26
+ ## Does data labeling (annotation, metadata) comply with privacy laws?
27
+ Yes
28
+
29
+ ## Is data compliant with data subject requests for data correction or removal, if such a request was made?
30
+ Yes
31
+
32
+ ## Applicable Privacy Policy
33
+ https://www.nvidia.com/en-us/about-nvidia/privacy-policy/
safety.md ADDED
@@ -0,0 +1,12 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Safety & Security Subcard
2
+ ## Model Application Field(s):
3
+ Healthcare
4
+
5
+ ## Describe the life critical impact (if present).
6
+ This model performs image compression; it is not intended for diagnostic purposes. Additional testing and evaluation is recommended prior to use in clinical settings and non-experimental downstream applications.
7
+
8
+ ## Use Case Restrictions:
9
+ Abide by [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)
10
+
11
+ ## Model and dataset restrictions:
12
+ The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development. Restrictions enforce dataset access during training, and dataset license constraints adhered to.