Upload folder using huggingface_hub

Browse files

Files changed (5) hide show

README.md +132 -0
bias.md +6 -0
explainability.md +35 -0
privacy.md +33 -0
safety.md +12 -0

README.md ADDED Viewed

	@@ -0,0 +1,132 @@

+# Telesurgery Neural Tokenizer v1.0 Overview
+## Description:
+Telesurgery Neural Tokenizer processes surgical scenario inputs by tokenizing frames using a distilled frame autoencoder, optimized for low-latency applications like telesurgery video streaming.
+_This model is available for commercial use._
+### License/Terms of Use:
+NVIDIA Open Model License
+### Deployment Geography:
+Global
+### Use Case:
+Primarily intended for surgical robotics researchers, healthcare AI developers, academic institutions, or companies exploring neural codecs for telesurgery applications, particularly where low latency video streaming is critical.
+## Model Architecture:
+**Architecture Type:** Convolutional Neural Network with Residual and Attention Blocks (based on Wan2.1 with 2D Convolutions)
+**Network Architecture:** Telesurgery Neural Tokenizer (Custom Architecture, 1GB VRAM Requirement, Optimized for NVIDIA GPUs)
+**This model was distilled from Wan2.1.**
+**Number of model parameters:** 12.6M
+## Input:
+**Input Type(s):** Image
+**Input Format(s):** Red, Green, Blue (RGB)
+**Input Parameters:** Two-Dimensional (2D)
+**Other Properties Related to Input:** Image Resolution: 536x960, 720x1280 or 1080x1920; Image Range: [-1, 1]
+## Output:
+**Output Type(s):** Embeddings
+**Output Format:** Pytorch Tensor
+**Output Parameters:** Three-Dimensional (3D)
+**Other Properties Related to Output:** Embeddings format: 2x(H/8)x(W/8) (With `H` and `W` being Height and Width of the original image).
+**Output Type(s):** Image
+**Output Format:** Red, Green, Blue (RGB)
+**Output Parameters:** Two-Dimensional (2D)
+**Other Properties Related to Output:** Minimum Resolution: 480x848, Maximum Resolution: 536x960, Image Range: [-1, 1]
+Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems NVIDIA GPUs or equivalent GPU-accelerated hardware. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.
+## Software Integration:
+**Runtime Engine(s):**
+* TensorRT
+**Supported Hardware Microarchitecture Compatibility:**
+* NVIDIA Ampere
+* NVIDIA Blackwell
+* NVIDIA Hopper
+**[Preferred/Supported] Operating System(s):**
+* Linux
+The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.
+## Model Version(s):
+v0.1
+The Telesurgery Neural Tokenizer can be integrated into an AI system via ONNX or TensorRT runtime engines, supporting NVIDIA Ampere, Blackwell, and Hopper microarchitectures, and Linux-based operating systems. It accepts 2D RGB image frames (numeric vectors) at specific resolutions (536x960, 720x1280 or 1080x1920) for low-latency video streaming in telesurgery scenarios.
+## Training, Testing, and Evaluation Datasets:
+## Training Dataset:
+**Link:** In-house surgical data (laparoscopic surgeries)
+**Data Modality:**
+* Image
+**Image Training Data Size:**
+* Less than a Million Images
+**Text Training Data Size:**
+* Less than a Billion Tokens
+**Video Training Data Size:**
+* 10,000 to 1 Million Hours
+**Non-Audio, Image, Text Training Data Size:**
+* Approximately 536x960 to 1080x1920 pixels (RGB images)
+**Data Collection Method by dataset:**
+* Human
+**Labeling Method by dataset:**
+* Human
+**Properties (Quantity, Dataset Descriptions, Sensor(s)):** Training set consists of 5765 (5-minute) video items for laparoscopic surgeries. **Modality**: Video (Image sequences). **Content Nature**: In-house surgical data.
+## Testing Dataset:
+**Link:** In-house surgical data (laparoscopic surgeries)
+**Data Collection Method by dataset:**
+* Human
+**Labeling Method by dataset:**
+* Human
+**Properties (Quantity, Dataset Descriptions, Sensor(s)):** Testing set consists of 1224 (5-minute) video items for laparoscopic surgeries. **Modality**: Video (Image sequences). **Content Nature**: In-house surgical data.
+## Evaluation Dataset:
+**Link:** In-house surgical dataset (laparoscopic surgeries)
+(Internal Only: Not To Be Published)
+**Benchmark Score:** v1.0: PSNR=34.61±3.18, SSIM=0.961±0.026, LPIPS=0.105±0.026
+**Data Collection Method by dataset:**
+* Human
+**Labeling Method by dataset:**
+* Human
+**Properties (Quantity, Dataset Descriptions, Sensor(s)):** Evaluation set consists of 1220 (5-minute) video items for laparoscopic surgeries. **Modality**: Video (Image sequences). **Content Nature**: In-house surgical data.
+## Inference:
+**Acceleration Engine:** TensorRT
+**Test Hardware:**
+* A100
+* A6000
+* RTX 6000 ADA
+## Ethical Considerations:
+NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications.  When downloaded or used in accordance with our terms of service, developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.
+Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.
+For more detailed information on ethical considerations for this model, please see the Model Card++ Bias, Explainability, Safety & Security, and Privacy Subcards.
+Users are responsible for model inputs and outputs. Users are responsible for ensuring safe integration of this model, including implementing guardrails as well as other safety mechanisms, prior to deployment.
+Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns [here](https://app.intigriti.com/programs/nvidia/nvidiavdp/detail).

bias.md ADDED Viewed

	@@ -0,0 +1,6 @@

+# Bias Subcard
+## Participation considerations from adversely impacted groups protected classes in model design and testing:
+None
+## Measures taken to mitigate against unwanted bias:
+None

explainability.md ADDED Viewed

	@@ -0,0 +1,35 @@

+# Explainability Subcard
+## intended_domain
+Image compression and reconstruction
+## Model Type
+Convolutional with quantization
+## Intended Users
+Surgeons, Telemedicine Professionals, Medical Robotics Engineers
+## Output
+Types: Image. Formats: Red, Green, Blue (RGB)
+## Describe how the model works:
+Type: Convolutional Neural Network with Residual and Attention Blocks (distilled from Wan2.1 with 2D Convolutions),
+It works as an autoencoder and quantizes the latent space.
+## Name the adversely impacted groups this has been tested to deliver comparable outcomes regardless of:
+None
+## Technical Limitations & Mitigation:
+This model may struggle when confronted with high spatial frequency data, such as text in the images. As the model has not been explicitly evaluated across a substantial and diverse population, its performance may vary and should be evaluated by qualified experts for use in clinical settings.
+**Mitigation:** For scenarios where accurate text reconstruction is critical, it is recommended to preprocess images to exclude textual content, use supplementary OCR models to extract and transmit text separately, or increase the image resolution if hardware constraints allow.
+## Verified to have met prescribed NVIDIA quality standards:
+Yes
+## Performance Metrics:
+PSNR 34.61, LPIPS 0.10, 0.5 bpp, latency
+## Potential Known Risks:
+This model may inaccurately reproduce text, making it illegible.
+## Licensing:
+NVIDIA Open Model License.

privacy.md ADDED Viewed

	@@ -0,0 +1,33 @@

+# Privacy Subcard
+## Generatable or reverse engineerable personal data?
+No
+## Personal data used to create this model?
+Yes
+## How often is dataset reviewed?
+Dataset is initially reviewed upon addition, and subsequent reviews are conducted as needed or upon request for changes.
+## Is a mechanism in place to honor data subject right of access or deletion of personal data?
+Yes
+## If personal data was collected for the development of the model, was it collected directly by NVIDIA?
+No
+##  If personal data was collected for the development of this AI model, was it minimized to only what was required?
+Yes
+## Was data from user interactions with the AI model (e.g. user input and prompts) used to train the model?
+No
+## Is there provenance for all datasets used in training?
+Yes
+## Does data labeling (annotation, metadata) comply with privacy laws?
+Yes
+## Is data compliant with data subject requests for data correction or removal, if such a request was made?
+Yes
+## Applicable Privacy Policy
+https://www.nvidia.com/en-us/about-nvidia/privacy-policy/

safety.md ADDED Viewed

	@@ -0,0 +1,12 @@

+# Safety & Security Subcard
+## Model Application Field(s):
+Healthcare
+## Describe the life critical impact (if present).
+This model performs image compression; it is not intended for diagnostic purposes. Additional testing and evaluation is recommended prior to use in clinical settings and non-experimental downstream applications.
+## Use Case Restrictions:
+Abide by [NVIDIA Open Model License](https://www.nvidia.com/en-us/agreements/enterprise-software/nvidia-open-model-license/)
+## Model and dataset restrictions:
+The Principle of least privilege (PoLP) is applied limiting access for dataset generation and model development.  Restrictions enforce dataset access during training, and dataset license constraints adhered to.