Upload MODEL_CARD.md with huggingface_hub
Browse files- MODEL_CARD.md +2 -13
MODEL_CARD.md
CHANGED
|
@@ -1,4 +1,4 @@
|
|
| 1 |
-
# Boolean Search Query
|
| 2 |
|
| 3 |
This model is fine-tuned to convert natural language queries into boolean search expressions, optimized for academic and research database searching.
|
| 4 |
|
|
@@ -61,7 +61,7 @@ Fine-tuned: "artificial intelligence" AND (ethics OR regulation OR policy) # Pr
|
|
| 61 |
|
| 62 |
The model was trained on a curated dataset of natural language queries paired with their correct boolean translations. Dataset characteristics:
|
| 63 |
|
| 64 |
-
- Size:
|
| 65 |
- Format: Natural query → Boolean expression pairs
|
| 66 |
- Source: Manually curated academic search examples
|
| 67 |
- Validation: Expert-reviewed for accuracy
|
|
@@ -69,9 +69,6 @@ The model was trained on a curated dataset of natural language queries paired wi
|
|
| 69 |
## Training Process
|
| 70 |
|
| 71 |
- **Method**: LoRA fine-tuning
|
| 72 |
-
- **Epochs**: 6
|
| 73 |
-
- **Learning Rate**: 5e-5 with cosine scheduling
|
| 74 |
-
- **Batch Size**: 16 (4 per device × 4 gradient accumulation steps)
|
| 75 |
- **Hardware**: NVIDIA GeForce RTX 4070 Ti SUPER
|
| 76 |
|
| 77 |
## How to Use
|
|
@@ -150,14 +147,6 @@ result = tokenizer.decode(outputs[0], skip_special_tokens=True)
|
|
| 150 |
print(result) # "climate change" AND "renewable energy"
|
| 151 |
```
|
| 152 |
|
| 153 |
-
## Evaluation Results
|
| 154 |
-
|
| 155 |
-
Our test suite demonstrates consistent improvements over the base model in key areas:
|
| 156 |
-
1. Meta-term removal accuracy: 100%
|
| 157 |
-
2. Proper multi-word term quoting: 95%
|
| 158 |
-
3. Logical grouping accuracy: 98%
|
| 159 |
-
4. Minimal formatting adherence: 97%
|
| 160 |
-
|
| 161 |
## Citation
|
| 162 |
|
| 163 |
If you use this model in your research, please cite:
|
|
|
|
| 1 |
+
# Boolean Search Query Model
|
| 2 |
|
| 3 |
This model is fine-tuned to convert natural language queries into boolean search expressions, optimized for academic and research database searching.
|
| 4 |
|
|
|
|
| 61 |
|
| 62 |
The model was trained on a curated dataset of natural language queries paired with their correct boolean translations. Dataset characteristics:
|
| 63 |
|
| 64 |
+
- Size: 135 examples
|
| 65 |
- Format: Natural query → Boolean expression pairs
|
| 66 |
- Source: Manually curated academic search examples
|
| 67 |
- Validation: Expert-reviewed for accuracy
|
|
|
|
| 69 |
## Training Process
|
| 70 |
|
| 71 |
- **Method**: LoRA fine-tuning
|
|
|
|
|
|
|
|
|
|
| 72 |
- **Hardware**: NVIDIA GeForce RTX 4070 Ti SUPER
|
| 73 |
|
| 74 |
## How to Use
|
|
|
|
| 147 |
print(result) # "climate change" AND "renewable energy"
|
| 148 |
```
|
| 149 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 150 |
## Citation
|
| 151 |
|
| 152 |
If you use this model in your research, please cite:
|