--- library_name: sklearn tags: - postgresql - sql - query-cache - plan-cache - redis - database - tabular-classification - tabular-regression pipeline_tag: tabular-classification license: mit --- # pg_plan_cache Models Three machine learning models for the **pg_plan_cache** PostgreSQL extension — a query execution plan cache backed by Redis. ## Models ### 1. SQL Cache Advisor - **Task:** Classification (high / medium / low) - **Algorithm:** Random Forest (200 trees) - **Purpose:** Predicts whether caching a query's execution plan will be beneficial ### 2. Cache TTL Recommender - **Task:** Regression (seconds) - **Algorithm:** Gradient Boosting - **Purpose:** Recommends optimal cache TTL based on query characteristics ### 3. Query Complexity Estimator - **Task:** Regression (1-100 score) - **Algorithm:** Gradient Boosting - **Purpose:** Estimates query complexity to prioritize caching resources ## Features All models use 28 structural features extracted from raw SQL text: | Feature | Description | |---------|------------| | `query_length` | Character count | | `query_type` | SELECT=0, INSERT=1, UPDATE=2, DELETE=3 | | `num_tables` | Tables referenced | | `num_joins` | JOIN clause count | | `num_conditions` | AND/OR conditions | | `num_aggregates` | Aggregate function count | | `num_subqueries` | Subquery count | | `has_window_func` | Window functions present | | `has_cte` | Common Table Expressions | | `nesting_depth` | Max parenthesis depth | | ... | 18 more features | ## Usage ```python from predict import predict, format_prediction result = predict("SELECT u.name, COUNT(o.id) FROM users u JOIN orders o ON u.id = o.user_id GROUP BY u.name") print(format_prediction(result)) # Cache Benefit: HIGH # Recommended TTL: 4200s (1h 10m) # Complexity: 62/100 (complex) ``` ## Training Trained on 8,000 synthetic SQL queries across 18 complexity tiers: - Simple SELECTs, filtered queries, ORDER BY - Single and multi-table JOINs - Aggregations with GROUP BY / HAVING - Subqueries, correlated subqueries, EXISTS - CTEs, window functions, UNION - Complex analytics queries - INSERT / UPDATE / DELETE (non-cacheable) ```bash pip install -r requirements.txt python train.py ``` ## About pg_plan_cache pg_plan_cache is a PostgreSQL extension that caches query execution plans in Redis. It hooks into the PostgreSQL planner, normalizes queries, computes SHA-256 hashes, and stores serialized plans with configurable TTL and automatic schema-change invalidation.