Deepdive404 RealFalconsAI commited on
Commit
a2fd078
·
0 Parent(s):

Duplicate from Falconsai/nsfw_image_detection

Browse files

Co-authored-by: Falcons.ai <RealFalconsAI@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,200 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ pipeline_tag: image-classification
4
+ ---
5
+ # Model Card: Fine-Tuned Vision Transformer (ViT) for NSFW Image Classification
6
+
7
+ ## Model Description
8
+
9
+ The **Fine-Tuned Vision Transformer (ViT)** is a variant of the transformer encoder architecture, similar to BERT, that has been adapted for image classification tasks. This specific model, named "google/vit-base-patch16-224-in21k," is pre-trained on a substantial collection of images in a supervised manner, leveraging the ImageNet-21k dataset. The images in the pre-training dataset are resized to a resolution of 224x224 pixels, making it suitable for a wide range of image recognition tasks.
10
+
11
+ During the training phase, meticulous attention was given to hyperparameter settings to ensure optimal model performance. The model was fine-tuned with a judiciously chosen batch size of 16. This choice not only balanced computational efficiency but also allowed for the model to effectively process and learn from a diverse array of images.
12
+
13
+ To facilitate this fine-tuning process, a learning rate of 5e-5 was employed. The learning rate serves as a critical tuning parameter that dictates the magnitude of adjustments made to the model's parameters during training. In this case, a learning rate of 5e-5 was selected to strike a harmonious balance between rapid convergence and steady optimization, resulting in a model that not only learns swiftly but also steadily refines its capabilities throughout the training process.
14
+
15
+ This training phase was executed using a proprietary dataset containing an extensive collection of 80,000 images, each characterized by a substantial degree of variability. The dataset was thoughtfully curated to include two distinct classes, namely "normal" and "nsfw." This diversity allowed the model to grasp nuanced visual patterns, equipping it with the competence to accurately differentiate between safe and explicit content.
16
+
17
+ The overarching objective of this meticulous training process was to impart the model with a deep understanding of visual cues, ensuring its robustness and competence in tackling the specific task of NSFW image classification. The result is a model that stands ready to contribute significantly to content safety and moderation, all while maintaining the highest standards of accuracy and reliability.
18
+ ## Intended Uses & Limitations
19
+
20
+ ### Intended Uses
21
+ - **NSFW Image Classification**: The primary intended use of this model is for the classification of NSFW (Not Safe for Work) images. It has been fine-tuned for this purpose, making it suitable for filtering explicit or inappropriate content in various applications.
22
+
23
+ ### How to use
24
+ Here is how to use this model to classifiy an image based on 1 of 2 classes (normal,nsfw):
25
+
26
+ ```markdown
27
+
28
+ # Use a pipeline as a high-level helper
29
+ from PIL import Image
30
+ from transformers import pipeline
31
+
32
+ img = Image.open("<path_to_image_file>")
33
+ classifier = pipeline("image-classification", model="Falconsai/nsfw_image_detection")
34
+ classifier(img)
35
+
36
+ ```
37
+
38
+ <hr>
39
+
40
+ ``` markdown
41
+
42
+ # Load model directly
43
+ import torch
44
+ from PIL import Image
45
+ from transformers import AutoModelForImageClassification, ViTImageProcessor
46
+
47
+ img = Image.open("<path_to_image_file>")
48
+ model = AutoModelForImageClassification.from_pretrained("Falconsai/nsfw_image_detection")
49
+ processor = ViTImageProcessor.from_pretrained('Falconsai/nsfw_image_detection')
50
+ with torch.no_grad():
51
+ inputs = processor(images=img, return_tensors="pt")
52
+ outputs = model(**inputs)
53
+ logits = outputs.logits
54
+
55
+ predicted_label = logits.argmax(-1).item()
56
+ model.config.id2label[predicted_label]
57
+
58
+ ```
59
+
60
+ <hr>
61
+ Run Yolo Version
62
+
63
+ ``` markdown
64
+
65
+ import os
66
+ import matplotlib.pyplot as plt
67
+ from PIL import Image
68
+ import numpy as np
69
+ import onnxruntime as ort
70
+ import json # Added import for json
71
+
72
+ # Predict using YOLOv9 model
73
+ def predict_with_yolov9(image_path, model_path, labels_path, input_size):
74
+ """
75
+ Run inference using the converted YOLOv9 model on a single image.
76
+
77
+ Args:
78
+ image_path (str): Path to the input image file.
79
+ model_path (str): Path to the ONNX model file.
80
+ labels_path (str): Path to the JSON file containing class labels.
81
+ input_size (tuple): The expected input size (height, width) for the model.
82
+
83
+ Returns:
84
+ str: The predicted class label.
85
+ PIL.Image.Image: The original loaded image.
86
+ """
87
+ def load_json(file_path):
88
+ with open(file_path, "r") as f:
89
+ return json.load(f)
90
+
91
+ # Load labels
92
+ labels = load_json(labels_path)
93
+
94
+ # Preprocess image
95
+ original_image = Image.open(image_path).convert("RGB")
96
+ image_resized = original_image.resize(input_size, Image.Resampling.BILINEAR)
97
+ image_np = np.array(image_resized, dtype=np.float32) / 255.0
98
+ image_np = np.transpose(image_np, (2, 0, 1)) # [C, H, W]
99
+ input_tensor = np.expand_dims(image_np, axis=0).astype(np.float32)
100
+
101
+ # Load YOLOv9 model
102
+ session = ort.InferenceSession(model_path)
103
+ input_name = session.get_inputs()[0].name
104
+ output_name = session.get_outputs()[0].name # Assuming classification output
105
+
106
+ # Run inference
107
+ outputs = session.run([output_name], {input_name: input_tensor})
108
+ predictions = outputs[0]
109
+
110
+ # Postprocess predictions (assuming classification output)
111
+ # Adapt this section if your model output is different (e.g., detection boxes)
112
+ predicted_index = np.argmax(predictions)
113
+ predicted_label = labels[str(predicted_index)] # Assumes labels are indexed by string numbers
114
+
115
+ return predicted_label, original_image
116
+
117
+ # Display prediction for a single image
118
+ def display_single_prediction(image_path, model_path, labels_path, input_size):
119
+ """
120
+ Predicts the class for a single image and displays the image with its prediction.
121
+
122
+ Args:
123
+ image_path (str): Path to the input image file.
124
+ model_path (str): Path to the ONNX model file.
125
+ labels_path (str): Path to the JSON file containing class labels.
126
+ input_size (tuple): The expected input size (height, width) for the model.
127
+ """
128
+ try:
129
+ # Run prediction
130
+ prediction, img = predict_with_yolov9(image_path, model_path, labels_path, input_size)
131
+
132
+ # Display image and prediction
133
+ fig, ax = plt.subplots(1, 1, figsize=(8, 8)) # Create a single plot
134
+ ax.imshow(img)
135
+ ax.set_title(f"Prediction: {prediction}", fontsize=14)
136
+ ax.axis("off") # Hide axes ticks and labels
137
+
138
+ plt.tight_layout()
139
+ plt.show()
140
+
141
+ except FileNotFoundError:
142
+ print(f"Error: Image file not found at {image_path}")
143
+ except Exception as e:
144
+ print(f"An error occurred: {e}")
145
+
146
+
147
+ # --- Main Execution ---
148
+
149
+ # Paths and parameters - **MODIFY THESE**
150
+ single_image_path = "path/to/your/single_image.jpg" # <--- Replace with the actual path to your image file
151
+ model_path = "path/to/your/yolov9_model.onnx" # <--- Replace with the actual path to your ONNX model
152
+ labels_path = "path/to/your/labels.json" # <--- Replace with the actual path to your labels JSON file
153
+ input_size = (224, 224) # Standard input size, adjust if your model differs
154
+
155
+ # Check if the image file exists before proceeding (optional but recommended)
156
+ if os.path.exists(single_image_path):
157
+ # Run prediction and display for the single image
158
+ display_single_prediction(single_image_path, model_path, labels_path, input_size)
159
+ else:
160
+ print(f"Error: The specified image file does not exist: {single_image_path}")
161
+
162
+ ```
163
+
164
+ <hr>
165
+
166
+
167
+
168
+ ### Limitations
169
+ - **Specialized Task Fine-Tuning**: While the model is adept at NSFW image classification, its performance may vary when applied to other tasks.
170
+ - Users interested in employing this model for different tasks should explore fine-tuned versions available in the model hub for optimal results.
171
+
172
+ ## Training Data
173
+
174
+ The model's training data includes a proprietary dataset comprising approximately 80,000 images. This dataset encompasses a significant amount of variability and consists of two distinct classes: "normal" and "nsfw." The training process on this data aimed to equip the model with the ability to distinguish between safe and explicit content effectively.
175
+
176
+ ### Training Stats
177
+ ``` markdown
178
+
179
+ - 'eval_loss': 0.07463177293539047,
180
+ - 'eval_accuracy': 0.980375,
181
+ - 'eval_runtime': 304.9846,
182
+ - 'eval_samples_per_second': 52.462,
183
+ - 'eval_steps_per_second': 3.279
184
+
185
+ ```
186
+
187
+ <hr>
188
+
189
+
190
+ **Note:** It's essential to use this model responsibly and ethically, adhering to content guidelines and applicable regulations when implementing it in real-world applications, particularly those involving potentially sensitive content.
191
+
192
+ For more details on model fine-tuning and usage, please refer to the model's documentation and the model hub.
193
+
194
+ ## References
195
+
196
+ - [Hugging Face Model Hub](https://huggingface.co/models)
197
+ - [Vision Transformer (ViT) Paper](https://arxiv.org/abs/2010.11929)
198
+ - [ImageNet-21k Dataset](http://www.image-net.org/)
199
+
200
+ **Disclaimer:** The model's performance may be influenced by the quality and representativeness of the data it was fine-tuned on. Users are encouraged to assess the model's suitability for their specific applications and datasets.
config.json ADDED
@@ -0,0 +1,32 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "Falconsai/nsfw_image_detection",
3
+ "architectures": [
4
+ "ViTForImageClassification"
5
+ ],
6
+ "attention_probs_dropout_prob": 0.0,
7
+ "encoder_stride": 16,
8
+ "hidden_act": "gelu",
9
+ "hidden_dropout_prob": 0.0,
10
+ "hidden_size": 768,
11
+ "id2label": {
12
+ "0": "normal",
13
+ "1": "nsfw"
14
+ },
15
+ "image_size": 224,
16
+ "initializer_range": 0.02,
17
+ "intermediate_size": 3072,
18
+ "label2id": {
19
+ "normal": "0",
20
+ "nsfw": "1"
21
+ },
22
+ "layer_norm_eps": 1e-12,
23
+ "model_type": "vit",
24
+ "num_attention_heads": 12,
25
+ "num_channels": 3,
26
+ "num_hidden_layers": 12,
27
+ "patch_size": 16,
28
+ "problem_type": "single_label_classification",
29
+ "qkv_bias": true,
30
+ "torch_dtype": "float32",
31
+ "transformers_version": "4.31.0"
32
+ }
falconsai_yolov9_nsfw_model_quantized.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ad6659b81050e97d04224e9d02ecf7c6c4a34fccc1d1f40bcc00e0f22379319c
3
+ size 87132571
labels.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "0": "normal",
3
+ "1": "nsfw"
4
+ }
model.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:97b2ce64ec146884b37f98ee7944ca4891aa72f6827dc0cb10684a1cbecd5830
3
+ size 343223968
optimizer.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:02ff26c6fe3d23991889a373ee79cdb2ea615b3a04b65b1b5cf6985365edf414
3
+ size 686518917
preprocessor_config.json ADDED
@@ -0,0 +1,22 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "do_normalize": true,
3
+ "do_rescale": true,
4
+ "do_resize": true,
5
+ "image_mean": [
6
+ 0.5,
7
+ 0.5,
8
+ 0.5
9
+ ],
10
+ "image_processor_type": "ViTImageProcessor",
11
+ "image_std": [
12
+ 0.5,
13
+ 0.5,
14
+ 0.5
15
+ ],
16
+ "resample": 2,
17
+ "rescale_factor": 0.00392156862745098,
18
+ "size": {
19
+ "height": 224,
20
+ "width": 224
21
+ }
22
+ }
pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:2a6b06faec569ad6b3873b7040020ab170be9d78d1cd1099a9d331b568aed12d
3
+ size 343268717