imranali291
/

genderize

Text Classification

Model card Files Files and versions

imranali291 commited on Jan 31, 2025

Commit

9c34372

·

verified ·

1 Parent(s): 820a199

Update README.md

Files changed (1) hide show

README.md +72 -3

README.md CHANGED Viewed

@@ -1,3 +1,72 @@
----
-license: mit
----

+---
+license: mit
+language:
+- en
+base_model:
+- google-bert/bert-base-cased
+tags:
+- genderization
+- text-classification
+- prediction
+---
+# Gender Classification by Name
+### Model Details
+- **Model Name**: Genderize
+- **Developed By**: Imran Ali
+- **Model Type**: Text Classification
+- **Language**: English
+- **License**: MIT
+### Description
+This model classifies gender based on the input name. It uses a pre-trained BERT model as the base and has been fine-tuned on a dataset of names and their associated genders.
+### Training Details
+- **Training Data**: Dataset of names and genders (e.g., Dannel gender-name dataset)
+- **Training Procedure**: Fine-tuned using BERT model with a classification head
+- **Training Hyperparameters**:
+  - Batch size: 8
+  - Gradient accumulation steps: 1
+  - learning_rate: 2e-5
+  - Total steps: 20,005
+  - Number of trainable parameters: 109,483,778 (1.9M)
+### Evaluation
+- **Testing Data**: Split from the training dataset
+- **Metrics**: Accuracy, Precision, Recall, F1 Score
+### Uses
+- **Direct Use**: Classifying the gender of a given name
+- **Downstream Use**: Enhancing applications that require gender identification based on names (e.g., personalized marketing, user profiling)
+- **Out-of-Scope Use**: Using the model for purposes other than gender classification without proper validation
+### Bias, Risks, and Limitations
+- **Bias**: The model may reflect biases present in the training data. It is important to validate its performance across diverse datasets.
+- **Risks**: Misclassification can occur, especially for names that are unisex or less common.
+- **Limitations**: The model's accuracy may vary depending on the cultural and linguistic context of the names.
+### Recommendations
+- Users should be aware of the potential biases and limitations of the model.
+- Further validation is recommended for specific use cases and datasets.
+### How to Get Started with the Model
+```python
+from transformers import AutoModelForSequenceClassification, AutoTokenizer
+# Load the model and tokenizer from the Hub
+model_name = "imranali291/genderize"
+model = AutoModelForSequenceClassification.from_pretrained(model_name)
+tokenizer = AutoTokenizer.from_pretrained(model_name)
+# Example inference function
+def predict_gender(name):
+    inputs = tokenizer(name, return_tensors="pt", padding=True, truncation=True, max_length=32)
+    outputs = model(**inputs)
+    predicted_label = outputs.logits.argmax(dim=-1).item()
+    return label_encoder.inverse_transform([predicted_label])[0]
+print(predict_gender("Alex"))  # Example output: 'M'
+print(predict_gender("Maria"))  # Example output: 'F'
+```