| # NER Benchmark Results |
| **Model:** Minibase-NER-Standard |
| **Dataset:** ner_benchmark_dataset.jsonl |
| **Sample Size:** 100 |
| **Date:** 2025-10-07T13:41:36.866891 |
|
|
| ## Overall Performance |
|
|
| | Metric | Score | Description | |
| |--------|-------|-------------| |
| | F1 Score | 0.951 | Overall NER performance (harmonic mean of precision and recall) | |
| | Precision | 0.915 | Accuracy of entity predictions | |
| | Recall | 1.000 | Ability to find all entities | |
| | Average Latency | 323.3ms | Response time performance | |
|
|
| ## Entity Type Performance |
|
|
| | Entity Type | Accuracy | Correct/Total | |
| |-------------|----------|---------------| |
| | PERSON | 1.000 | 100/100 | |
| | ORG | 1.000 | 100/100 | |
| | LOC | 0.660 | 66/100 | |
| | MISC | 1.000 | 34/34 | |
|
|
| ## Key Improvements |
|
|
| - **BIO Tagging**: Model outputs entities in BIO (Beginning-Inside-Outside) format |
| - **Multiple Entity Types**: Supports PERSON, ORG, LOC, and MISC entities |
| - **Entity-Level Evaluation**: Metrics calculated at entity level rather than token level |
| - **Comprehensive Coverage**: Evaluates across different text domains |
|
|
| ## Example Results |
|
|
| ### Example 1 |
| **Input:** John Smith works at Google in New York and uses Python programming language.... |
| **Predicted:** { "PER": ["John Smith"], "ORG": ["Google"], "LOC": ["New York"], "MISC": ["Python"] }... |
| **F1 Score:** 0.857 |
|
|
| ### Example 2 |
| **Input:** Microsoft Corporation announced that Satya Nadella will visit London next week.... |
| **Predicted:** { "PER": ["Satya Nadella"], "ORG": ["Microsoft Corporation"], "LOC": ["London"], "MISC": [] }... |
| **F1 Score:** 1.000 |
|
|
| ### Example 3 |
| **Input:** The University of Cambridge is located in the United Kingdom and was founded by King Henry III.... |
| **Predicted:** { "PER": ["King Henry III"], "ORG": ["University of Cambridge"], "LOC": ["United Kingdom"], "MISC": [] }... |
| **F1 Score:** 1.000 |
|
|
|
|