somatosmpl / docs /BIAS.md
zirobtc's picture
Upload folder using huggingface_hub
bd95c9c verified
Field Response
Participation considerations from adversely impacted groups (protected classes) in model design and testing: The SOMA native shape PCA was fitted on a combination of SizeUSA (large-scale U.S. anthropometric survey) and TripleGangers (303 individuals, commercially purchased). While SizeUSA covers a broad range of age, sex, and BMI groups, both datasets reflect body shapes predominantly from North American/Western populations and may under-represent shapes common in other geographic regions (e.g., South/Southeast Asia, Sub-Saharan Africa). SOMA mitigates this by adding additional data from GarmentMeasurements which contains some european population and by supporting the ANNY backend, which derives body shapes from anthropometric measurements rather than 3D scan data, enabling representation of human body shapes from infants to elders without inheriting scan-collection demographic biases.
Measures taken to mitigate against unwanted bias: (1) Multi-backend design: SOMA's unified framework supports six identity backends. The ANNY backend is explicitly constructed from anthropometric phenotypes (age, height, weight, body composition) rather than scans, avoiding demographic sampling biases that affect scan-collected datasets. Developers requiring globally diverse or age-spanning body shape representation are encouraged to use ANNY. (2) Shape space coverage: The SOMA-shape PCA backend samples from the full statistical range of the SizeUSA, TripleGanger and GarmentMeasurment scan distribution; no demographic subgroup is excluded. (3) Evaluation coverage: Quantitative benchmarks sample 100 random identities spanning the full shape space extremes per backend, ensuring evaluation is not biased toward mean/average bodies.
Bias Metric (If Measured): No formal demographic bias metric has been measured for the SOMA-shape against external demographic benchmarks.
Which characteristic (feature) shows the greatest difference in performance?: Not applicable.
Representation in training data: SizeUSA and TripleGangers (303 individuals) together represent predominantly the U.S./Western population and do not collectively or exhaustively represent all global demographic groups proportionally. For instance, East Asian, South Asian, and Sub-Saharan African body proportions may be under-represented. To mitigate this for applications requiring global diversity, we recommend using the ANNY identity backend or fine-tuning the SOMA native shape PCA with supplementary scan data representative of the target population.