πŸš• NYC Taxi Fare Regression (PyTorch ANN)

A small PyTorch tabular regression model that predicts NYC taxi fare (USD) from trip time + pickup/dropoff coordinates + passenger count.

This Hugging Face repo stores the trained model weights + preprocessing schema used by a Streamlit inference app.

Training β†’ Model β†’ Inference

What’s in this repo

  • artifacts/model_state.pt β€” PyTorch model state dict (ANN with embeddings)
  • artifacts/schema.json β€” feature schema, category lists, embedding sizes, cont mean/std
  • artifacts/metrics.json β€” validation metrics (RMSE/MAE)
  • artifacts/sample_rows.csv β€” small sample used by the Streamlit UI β€œload a row”
  • NYCTaxiFares.csv β€” dataset file used in training

Inputs

The model uses: Categorical (embedded)

  • Hour (0–23)
  • AMorPM (am / pm)
  • Weekday (Mon…Sun)

Continuous

  • pickup_latitude, pickup_longitude
  • dropoff_latitude, dropoff_longitude
  • passenger_count (1–6)
  • dist_km (computed using haversine distance)

Preprocessing (same as training / Streamlit app)

  • dist_km is computed from pickup/dropoff lat/lon using the haversine formula.
  • Continuous features are standardized using cont_mean / cont_std stored in schema.json.
  • Categorical values are converted to integer codes using the fixed category lists in schema.json (cat_categories).

Training notes (from notebook):

  • Hour / AMorPM / Weekday were derived from pickup_datetime after a ~4 hour timezone shift (stored as timezone_shift_hours=4 in schema.json).
  • Basic trimming was applied during training (fare, passenger_count, distance ranges) to reduce outliers.

Output

A single float: predicted taxi fare (USD).

Metrics

From artifacts/metrics.json (validation split):

  • RMSE: 2.8648
  • MAE: 1.4056
  • Rows used (after cleaning): 119,602

Quickstart (load model + schema)

import json
import torch
import numpy as np
from huggingface_hub import hf_hub_download

REPO_ID = "ash001/nyc-taxi-fare-regression-ann"

schema_path = hf_hub_download(REPO_ID, "artifacts/schema.json")
state_path  = hf_hub_download(REPO_ID, "artifacts/model_state.pt")

with open(schema_path, "r", encoding="utf-8") as f:
    schema = json.load(f)

cat_cols = schema["cat_cols"]
cont_cols = schema["cont_cols"]
cat_categories = schema["cat_categories"]
cont_mean = np.array(schema["cont_mean"], dtype=np.float32)
cont_std  = np.array(schema["cont_std"], dtype=np.float32)

# Define the same model class used in your inference app, then:
# model.load_state_dict(torch.load(state_path, map_location="cpu"))
# model.eval()

For an end-to-end prediction example (feature building + distance + standardization), see the Streamlit inference implementation in your app.


license: apache-2.0

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support