Upload TRT model_l40s_fp16.plan for Nvidia L40S (batch_size=1024)

#7
Robust Intelligence org

---- Resolved TRT Profile ----
MIN_BATCH=1
OPT_BATCH=1024
MAX_BATCH=1024
MIN_SEQ_LEN=1
OPT_SEQ_LEN=512
MAX_SEQ_LEN=512
WORKSPACE_SIZE=24696061952
BUILDER_OPTIMIZATION_LEVEL=3
PRECISION=fp16

==== TensorRT Engine ====
Name: Unnamed Network 0 | Explicit Batch Engine

---- 2 Engine Input(s) ----
{input_ids [dtype=int64, shape=(-1, -1)],
attention_mask [dtype=int64, shape=(-1, -1)]}

---- 1 Engine Output(s) ----
{logits [dtype=float32, shape=(-1, 12)]}

---- Memory ----
Device Memory: 22011999744 bytes

---- 1 Profile(s) (3 Tensor(s) Each) ----

  • Profile: 0
    Tensor: input_ids (Input), Index: 0 | Shapes: min=(1, 1), opt=(1024, 512), max=(1024, 512)
    Tensor: attention_mask (Input), Index: 1 | Shapes: min=(1, 1), opt=(1024, 512), max=(1024, 512)
    Tensor: logits (Output), Index: 2 | Shape: (-1, 12)

---- 453 Layer(s) ----

Cannot merge
This branch has merge conflicts in the following files:
  • model_l40s_fp16.plan
  • trt_engine_layer_summary_l40s_fp16.txt

Sign up or log in to comment