Text Classification
setfit
Safetensors
sentence-transformers
bert
geospatial
spatial-queries
text-embeddings-inference
Instructions to use ilyankou/is-geospatial-query with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- setfit
How to use ilyankou/is-geospatial-query with setfit:
from setfit import SetFitModel model = SetFitModel.from_pretrained("ilyankou/is-geospatial-query") - sentence-transformers
How to use ilyankou/is-geospatial-query with sentence-transformers:
from sentence_transformers import SentenceTransformer model = SentenceTransformer("ilyankou/is-geospatial-query") sentences = [ "The weather is lovely today.", "It's so sunny outside!", "He drove to the stadium." ] embeddings = model.encode(sentences) similarities = model.similarity(embeddings, embeddings) print(similarities.shape) # [3, 3] - Notebooks
- Google Colab
- Kaggle
| tags: | |
| - setfit | |
| - sentence-transformers | |
| - text-classification | |
| - geospatial | |
| - spatial-queries | |
| widget: | |
| - text: hotel in geneva airport | |
| - text: what payroll deduction is mpp | |
| - text: weather in erlanger ky | |
| - text: what is the coordinates of point p | |
| - text: what's the weather in roseburg | |
| metrics: | |
| - accuracy | |
| - f1 | |
| pipeline_tag: text-classification | |
| library_name: setfit | |
| inference: true | |
| base_model: BAAI/bge-small-en-v1.5 | |
| license: mit | |
| # Geospatial (Web Search) Query Detector | |
| A binary [SetFit](https://github.com/huggingface/setfit) classifier that distinguishes geospatial | |
| from non-geospatial web search queries. Trained on 1,200 gold-labelled | |
| [MS MARCO](https://microsoft.github.io/msmarco/) web search queries with weak supervision from Llama 3.1, then manually verified. See COSIT 2026 paper preprint here - https://arxiv.org/abs/2605.11336 | |
| Achieves **F1 = 0.931** on a held-out test set of 800 samples (421 non-spatial, 379 spatial), | |
| with the evaluation model trained on 200 samples (105 non-spatial, 95 spatial). The deployed model was trained on the full 1,200. | |
| ## What counts as a geospatial query? | |
| As per [Mai et al. (2021)](https://agile-giss.copernicus.org/articles/2/8/2021/) and | |
| [Kefalidis et al. (2024)](https://www.sciencedirect.com/science/article/pii/S1569843224005594), | |
| a query is geospatial if it requires qualitative or quantitative geographic | |
| knowledge of Earth-bound features to be answered. | |
| This is usually the case if the query involves: | |
| - A geographic entity (named place on Earth: city, country, river, POI, address) | |
| - A geographic concept (place type: city, lake, mountain, park, building) | |
| - A spatial relation (near, within, north of, between, borders, crosses, distance) | |
| Non-geospatial: anatomical, microscopic, astronomical, fictional, or abstract | |
| 'where' questions; queries needing no geographic knowledge. | |
| ## Model details | |
| - **Sentence Transformer body:** [BAAI/bge-small-en-v1.5](https://huggingface.co/BAAI/bge-small-en-v1.5) | |
| - **Classification head:** LogisticRegression | |
| - **Training data:** 1,200 gold-labelled MS MARCO queries (632 non-spatial, 568 spatial), sampled via K-means centroids across the full embedding space of all 1M+ queries for representativeness | |
| - **Labels:** `1` = geospatial, `0` = non-geospatial | |
| ## Usage | |
| ```python | |
| from setfit import SetFitModel | |
| model = SetFitModel.from_pretrained("ilyankou/is-geospatial-query") | |
| preds = model([ | |
| "nearest hospital", | |
| "far from the truth", | |
| "close to my heart", | |
| "flood risk in this area" | |
| ]) | |
| # => [1, 0, 0, 1] | |
| ``` | |
| ## Training | |
| Weak labels were generated by running Llama 3.1 five times per query at temperature 0.3, | |
| then manually verified. The SetFit model was trained for 3 epochs with batch size 64 | |
| and learning rate 2e-5 on 200 samples (95 positive and 105 negative) for validation, | |
| then retrained on the full gold dataset (1,200 samples) for production inference. |