EfficientNetV2B0 RGB + CNN Depth Carbohydrate Regression

This model predicts dish-level carbohydrate content from overhead RGB images and overhead depth images from the Nutrition5K dataset.

Architecture

  • Dual-input multimodal regression model
  • One pretrained EfficientNetV2B0 branch for overhead RGB images
  • One from-scratch CNN branch for overhead depth images
  • Global average pooling on both branches
  • Feature fusion through concatenation
  • Fully connected regression head
  • Final dense layer with linear activation for carbohydrate prediction

Backbone setup

  • EfficientNetV2B0 is initialized with ImageNet pretrained weights for RGB
  • The RGB backbone is frozen during the initial training stage
  • The depth branch is trained from scratch

Input modalities

  • rgb_input: overhead RGB image
  • depth_input: overhead depth image

Target

  • total_carb
Downloads last month
94
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support