| EVALUATION LOG - 2025-10-29 03:44:41 |
| ================================================================================ |
|
|
|
|
|
|
| ================================================================================ |
| STARTING POST-TRAINING EVALUATION |
| ================================================================================ |
| β
Test data loaded: 40532 samples |
| Columns: ['dataset', 'type', 'comment', 'label'] |
| Using device: cuda |
|
|
| ============================================================ |
| EVALUATING MODEL: PHOBERT-V1 |
| ============================================================ |
| β
Model phobert-v1 loaded from outputs/hate-speech-detection/phobert-v1 |
| β
Tokenizer loaded for phobert-v1 |
| Evaluating on 40532 samples... |
| Text column: comment, Label column: label |
| β
Evaluation completed! |
| Accuracy: 0.9421 |
| F1 Macro: 0.8308 |
| F1 Weighted: 0.9394 |
|
|
| ============================================================ |
| EVALUATING MODEL: PHOBERT-V2 |
| ============================================================ |
| β
Model phobert-v2 loaded from outputs/hate-speech-detection/phobert-v2 |
| β
Tokenizer loaded for phobert-v2 |
| Evaluating on 40532 samples... |
| Text column: comment, Label column: label |
| β
Evaluation completed! |
| Accuracy: 0.9341 |
| F1 Macro: 0.8048 |
| F1 Weighted: 0.9326 |
|
|
| ============================================================ |
| EVALUATING MODEL: BARTPHO |
| ============================================================ |
| β
Model bartpho loaded from outputs/hate-speech-detection/bartpho |
| β
Tokenizer loaded for bartpho |
| Evaluating on 40532 samples... |
| Text column: comment, Label column: label |
| β
Evaluation completed! |
| Accuracy: 0.8985 |
| F1 Macro: 0.6791 |
| F1 Weighted: 0.8886 |
|
|
| ============================================================ |
| EVALUATING MODEL: VISOBERT |
| ============================================================ |
| β
Model visobert loaded from outputs/hate-speech-detection/visobert |
| β
Tokenizer loaded for visobert |
| Evaluating on 40532 samples... |
| Text column: comment, Label column: label |
| β
Evaluation completed! |
| Accuracy: 0.9372 |
| F1 Macro: 0.8241 |
| F1 Weighted: 0.9379 |
|
|
| ============================================================ |
| EVALUATING MODEL: VIHATE-T5 |
| ============================================================ |
| β
Model vihate-t5 loaded from outputs/hate-speech-detection/vihate-t5 |
| β
Tokenizer loaded for vihate-t5 |
| Evaluating on 40532 samples... |
| Text column: comment, Label column: label |
| β
Evaluation completed! |
| Accuracy: 0.9551 |
| F1 Macro: 0.8718 |
| F1 Weighted: 0.9535 |
|
|
| ============================================================ |
| EVALUATING MODEL: XLM-R |
| ============================================================ |
| β
Model xlm-r loaded from outputs/hate-speech-detection/xlm-r |
| β
Tokenizer loaded for xlm-r |
| Evaluating on 40532 samples... |
| Text column: comment, Label column: label |
| β
Evaluation completed! |
| Accuracy: 0.9203 |
| F1 Macro: 0.7625 |
| F1 Weighted: 0.9177 |
|
|
| ============================================================ |
| EVALUATING MODEL: ROBERTA-GRU |
| ============================================================ |
| β
Model roberta-gru loaded from outputs/hate-speech-detection/roberta-gru |
| β
Tokenizer loaded for roberta-gru |
| Evaluating on 40532 samples... |
| Text column: comment, Label column: label |
| β
Evaluation completed! |
| Accuracy: 0.9537 |
| F1 Macro: 0.8716 |
| F1 Weighted: 0.9530 |
|
|
| ============================================================ |
| EVALUATING MODEL: BILSTM |
| ============================================================ |
| β
Model bilstm loaded from outputs/hate-speech-detection/bilstm |
| Evaluating on 40532 samples... |
| Text column: comment, Label column: label |
| βΉοΈ BILSTM evaluation requires special handling |
| Using dummy predictions for BILSTM |
| β
Evaluation completed! |
| Accuracy: 0.8388 |
| F1 Macro: 0.3041 |
| F1 Weighted: 0.7652 |
|
|
| ============================================================ |
| EVALUATING MODEL: TEXTCNN |
| ============================================================ |
| β
Model textcnn loaded from outputs/hate-speech-detection/textcnn |
| Evaluating on 40532 samples... |
| Text column: comment, Label column: label |
| βΉοΈ TEXTCNN evaluation requires special handling |
| Using dummy predictions for TEXTCNN |
| β
Evaluation completed! |
| Accuracy: 0.8388 |
| F1 Macro: 0.3041 |
| F1 Weighted: 0.7652 |
|
|
| ============================================================ |
| EVALUATING MODEL: MBERT |
| ============================================================ |
| β
Model mbert loaded from outputs/hate-speech-detection/mbert |
| β
Tokenizer loaded for mbert |
| Evaluating on 40532 samples... |
| Text column: comment, Label column: label |
| β
Evaluation completed! |
| Accuracy: 0.9360 |
| F1 Macro: 0.8044 |
| F1 Weighted: 0.9317 |
|
|
| ============================================================ |
| EVALUATING MODEL: SPHOBERT |
| ============================================================ |
| β
Model sphobert loaded from outputs/hate-speech-detection/sphobert |
| β
Tokenizer loaded for sphobert |
| Evaluating on 40532 samples... |
| Text column: comment, Label column: label |
| β
Evaluation completed! |
| Accuracy: 0.9143 |
| F1 Macro: 0.7378 |
| F1 Weighted: 0.9096 |
|
|
|
|
| ================================================================================ |
| FINAL EVALUATION RESULTS - 2025-10-29 04:14:15 |
| ================================================================================ |
|
|
| EVALUATION SUMMARY |
| -------------------------------------------------- |
| Model Accuracy F1 Macro F1 Weighted Samples |
| -------------------------------------------------- |
| phobert-v1 0.9421 0.8308 0.9394 40532 |
| phobert-v2 0.9341 0.8048 0.9326 40532 |
| bartpho 0.8985 0.6791 0.8886 40532 |
| visobert 0.9372 0.8241 0.9379 40532 |
| vihate-t5 0.9551 0.8718 0.9535 40532 |
| xlm-r 0.9203 0.7625 0.9177 40532 |
| roberta-gru 0.9537 0.8716 0.9530 40532 |
| bilstm 0.8388 0.3041 0.7652 40532 |
| textcnn 0.8388 0.3041 0.7652 40532 |
| mbert 0.9360 0.8044 0.9317 40532 |
| sphobert 0.9143 0.7378 0.9096 40532 |
|
|
| ================================================================================ |
|
|
| DETAILED RESULTS - PHOBERT-V1 |
| -------------------------------------------------- |
| Model Path: outputs/hate-speech-detection/phobert-v1 |
| Number of Samples: 40532 |
| Accuracy: 0.9421 |
| F1 Macro: 0.8308 |
| F1 Weighted: 0.9394 |
|
|
| Classification Report: |
| Class Precision Recall F1-Score Support |
| -------------------------------------------------- |
| CLEAN 0.9554 0.9868 0.9709 33997.0 |
| OFFENSIVE 0.7910 0.6581 0.7185 2094.0 |
| HATE 0.8866 0.7341 0.8032 4441.0 |
| macro avg 0.8777 0.7930 0.8308 40532.0 |
| weighted avg 0.9394 0.9421 0.9394 40532.0 |
|
|
| Confusion Matrix: |
| [[33548 196 253] |
| [ 552 1378 164] |
| [ 1013 168 3260]] |
|
|
| ================================================================================ |
|
|
| DETAILED RESULTS - PHOBERT-V2 |
| -------------------------------------------------- |
| Model Path: outputs/hate-speech-detection/phobert-v2 |
| Number of Samples: 40532 |
| Accuracy: 0.9341 |
| F1 Macro: 0.8048 |
| F1 Weighted: 0.9326 |
|
|
| Classification Report: |
| Class Precision Recall F1-Score Support |
| -------------------------------------------------- |
| CLEAN 0.9635 0.9739 0.9687 33997.0 |
| OFFENSIVE 0.7505 0.5903 0.6608 2094.0 |
| HATE 0.7779 0.7919 0.7849 4441.0 |
| macro avg 0.8306 0.7854 0.8048 40532.0 |
| weighted avg 0.9321 0.9341 0.9326 40532.0 |
|
|
| Confusion Matrix: |
| [[33109 219 669] |
| [ 523 1236 335] |
| [ 732 192 3517]] |
|
|
| ================================================================================ |
|
|
| DETAILED RESULTS - BARTPHO |
| -------------------------------------------------- |
| Model Path: outputs/hate-speech-detection/bartpho |
| Number of Samples: 40532 |
| Accuracy: 0.8985 |
| F1 Macro: 0.6791 |
| F1 Weighted: 0.8886 |
|
|
| Classification Report: |
| Class Precision Recall F1-Score Support |
| -------------------------------------------------- |
| CLEAN 0.9228 0.9770 0.9491 33997.0 |
| OFFENSIVE 0.6527 0.3563 0.4609 2094.0 |
| HATE 0.7238 0.5535 0.6273 4441.0 |
| macro avg 0.7664 0.6289 0.6791 40532.0 |
| weighted avg 0.8871 0.8985 0.8886 40532.0 |
|
|
| Confusion Matrix: |
| [[33215 235 547] |
| [ 957 746 391] |
| [ 1821 162 2458]] |
|
|
| ================================================================================ |
|
|
| DETAILED RESULTS - VISOBERT |
| -------------------------------------------------- |
| Model Path: outputs/hate-speech-detection/visobert |
| Number of Samples: 40532 |
| Accuracy: 0.9372 |
| F1 Macro: 0.8241 |
| F1 Weighted: 0.9379 |
|
|
| Classification Report: |
| Class Precision Recall F1-Score Support |
| -------------------------------------------------- |
| CLEAN 0.9714 0.9687 0.9700 33997.0 |
| OFFENSIVE 0.6463 0.7574 0.6974 2094.0 |
| HATE 0.8305 0.7809 0.8049 4441.0 |
| macro avg 0.8160 0.8357 0.8241 40532.0 |
| weighted avg 0.9392 0.9372 0.9379 40532.0 |
|
|
| Confusion Matrix: |
| [[32932 590 475] |
| [ 275 1586 233] |
| [ 695 278 3468]] |
|
|
| ================================================================================ |
|
|
| DETAILED RESULTS - VIHATE-T5 |
| -------------------------------------------------- |
| Model Path: outputs/hate-speech-detection/vihate-t5 |
| Number of Samples: 40532 |
| Accuracy: 0.9551 |
| F1 Macro: 0.8718 |
| F1 Weighted: 0.9535 |
|
|
| Classification Report: |
| Class Precision Recall F1-Score Support |
| -------------------------------------------------- |
| CLEAN 0.9660 0.9883 0.9770 33997.0 |
| OFFENSIVE 0.8788 0.7096 0.7852 2094.0 |
| HATE 0.8931 0.8165 0.8531 4441.0 |
| macro avg 0.9126 0.8381 0.8718 40532.0 |
| weighted avg 0.9535 0.9551 0.9535 40532.0 |
|
|
| Confusion Matrix: |
| [[33599 124 274] |
| [ 448 1486 160] |
| [ 734 81 3626]] |
|
|
| ================================================================================ |
|
|
| DETAILED RESULTS - XLM-R |
| -------------------------------------------------- |
| Model Path: outputs/hate-speech-detection/xlm-r |
| Number of Samples: 40532 |
| Accuracy: 0.9203 |
| F1 Macro: 0.7625 |
| F1 Weighted: 0.9177 |
|
|
| Classification Report: |
| Class Precision Recall F1-Score Support |
| -------------------------------------------------- |
| CLEAN 0.9514 0.9733 0.9622 33997.0 |
| OFFENSIVE 0.6284 0.5702 0.5979 2094.0 |
| HATE 0.7834 0.6791 0.7275 4441.0 |
| macro avg 0.7877 0.7409 0.7625 40532.0 |
| weighted avg 0.9163 0.9203 0.9177 40532.0 |
|
|
| Confusion Matrix: |
| [[33090 418 489] |
| [ 555 1194 345] |
| [ 1137 288 3016]] |
|
|
| ================================================================================ |
|
|
| DETAILED RESULTS - ROBERTA-GRU |
| -------------------------------------------------- |
| Model Path: outputs/hate-speech-detection/roberta-gru |
| Number of Samples: 40532 |
| Accuracy: 0.9537 |
| F1 Macro: 0.8716 |
| F1 Weighted: 0.9530 |
|
|
| Classification Report: |
| Class Precision Recall F1-Score Support |
| -------------------------------------------------- |
| CLEAN 0.9711 0.9825 0.9768 33997.0 |
| OFFENSIVE 0.8136 0.7693 0.7909 2094.0 |
| HATE 0.8761 0.8201 0.8472 4441.0 |
| macro avg 0.8870 0.8573 0.8716 40532.0 |
| weighted avg 0.9526 0.9537 0.9530 40532.0 |
|
|
| Confusion Matrix: |
| [[33402 237 358] |
| [ 326 1611 157] |
| [ 667 132 3642]] |
|
|
| ================================================================================ |
|
|
| DETAILED RESULTS - BILSTM |
| -------------------------------------------------- |
| Model Path: outputs/hate-speech-detection/bilstm |
| Number of Samples: 40532 |
| Accuracy: 0.8388 |
| F1 Macro: 0.3041 |
| F1 Weighted: 0.7652 |
|
|
| Classification Report: |
| Class Precision Recall F1-Score Support |
| -------------------------------------------------- |
| CLEAN 0.8388 1.0000 0.9123 33997.0 |
| OFFENSIVE 0.0000 0.0000 0.0000 2094.0 |
| HATE 0.0000 0.0000 0.0000 4441.0 |
| macro avg 0.2796 0.3333 0.3041 40532.0 |
| weighted avg 0.7035 0.8388 0.7652 40532.0 |
|
|
| Confusion Matrix: |
| [[33997 0 0] |
| [ 2094 0 0] |
| [ 4441 0 0]] |
|
|
| ================================================================================ |
|
|
| DETAILED RESULTS - TEXTCNN |
| -------------------------------------------------- |
| Model Path: outputs/hate-speech-detection/textcnn |
| Number of Samples: 40532 |
| Accuracy: 0.8388 |
| F1 Macro: 0.3041 |
| F1 Weighted: 0.7652 |
|
|
| Classification Report: |
| Class Precision Recall F1-Score Support |
| -------------------------------------------------- |
| CLEAN 0.8388 1.0000 0.9123 33997.0 |
| OFFENSIVE 0.0000 0.0000 0.0000 2094.0 |
| HATE 0.0000 0.0000 0.0000 4441.0 |
| macro avg 0.2796 0.3333 0.3041 40532.0 |
| weighted avg 0.7035 0.8388 0.7652 40532.0 |
|
|
| Confusion Matrix: |
| [[33997 0 0] |
| [ 2094 0 0] |
| [ 4441 0 0]] |
|
|
| ================================================================================ |
|
|
| DETAILED RESULTS - MBERT |
| -------------------------------------------------- |
| Model Path: outputs/hate-speech-detection/mbert |
| Number of Samples: 40532 |
| Accuracy: 0.9360 |
| F1 Macro: 0.8044 |
| F1 Weighted: 0.9317 |
|
|
| Classification Report: |
| Class Precision Recall F1-Score Support |
| -------------------------------------------------- |
| CLEAN 0.9489 0.9876 0.9679 33997.0 |
| OFFENSIVE 0.8645 0.5392 0.6641 2094.0 |
| HATE 0.8416 0.7287 0.7811 4441.0 |
| macro avg 0.8850 0.7518 0.8044 40532.0 |
| weighted avg 0.9328 0.9360 0.9317 40532.0 |
|
|
| Confusion Matrix: |
| [[33574 93 330] |
| [ 686 1129 279] |
| [ 1121 84 3236]] |
|
|
| ================================================================================ |
|
|
| DETAILED RESULTS - SPHOBERT |
| -------------------------------------------------- |
| Model Path: outputs/hate-speech-detection/sphobert |
| Number of Samples: 40532 |
| Accuracy: 0.9143 |
| F1 Macro: 0.7378 |
| F1 Weighted: 0.9096 |
|
|
| Classification Report: |
| Class Precision Recall F1-Score Support |
| -------------------------------------------------- |
| CLEAN 0.9434 0.9729 0.9579 33997.0 |
| OFFENSIVE 0.6821 0.4508 0.5428 2094.0 |
| HATE 0.7436 0.6843 0.7127 4441.0 |
| macro avg 0.7897 0.7027 0.7378 40532.0 |
| weighted avg 0.9080 0.9143 0.9096 40532.0 |
|
|
| Confusion Matrix: |
| [[33077 253 667] |
| [ 769 944 381] |
| [ 1215 187 3039]] |
|
|
| ================================================================================ |
|
|
|
|
| ============================================================ |
| EVALUATION COMPLETED! |
| ============================================================ |
| Successfully evaluated: 11/11 models |
|
|
| Best performing models: |
| 1. vihate-t5: Accuracy=0.9551, F1=0.8718 |
| 2. roberta-gru: Accuracy=0.9537, F1=0.8716 |
| 3. phobert-v1: Accuracy=0.9421, F1=0.8308 |
|
|