EfficientNetV2B0 RGB + CNN Depth Carbohydrate Regression
This model predicts dish-level carbohydrate content from overhead RGB images and overhead depth images from the Nutrition5K dataset.
Architecture
- Dual-input multimodal regression model
- One pretrained EfficientNetV2B0 branch for overhead RGB images
- One from-scratch CNN branch for overhead depth images
- Global average pooling on both branches
- Feature fusion through concatenation
- Fully connected regression head
- Final dense layer with linear activation for carbohydrate prediction
Backbone setup
- EfficientNetV2B0 is initialized with ImageNet pretrained weights for RGB
- The RGB backbone is frozen during the initial training stage
- The depth branch is trained from scratch
Input modalities
rgb_input: overhead RGB imagedepth_input: overhead depth image
Target
total_carb
- Downloads last month
- 94
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support