Snowflake/Arctic-Text2SQL-R1-7B Fine-tuned for NL2SQL++ v8

This model is a fine-tuned version of Snowflake/Arctic-Text2SQL-R1-7B on the NL2SQL++ v8 dataset with code-with-thought reasoning.

Model Details

  • Base Model: Snowflake/Arctic-Text2SQL-R1-7B
  • Task: Text-to-SQL generation
  • Dataset: NL2SQL++ v8 with code-with-thought reasoning
  • Fine-tuning Method: LoRA (Low-Rank Adaptation) with Unsloth
  • Quantization: 16-bit merged weights
  • Training Dataset Size: (20886, 1) examples
  • Validation Dataset Size: (1066, 1) examples

Training Configuration

  • output_dir: trainer_output
  • overwrite_output_dir: False
  • do_train: False
  • do_eval: True
  • do_predict: False
  • eval_strategy: IntervalStrategy.STEPS
  • prediction_loss_only: False
  • per_device_train_batch_size: 8
  • per_device_eval_batch_size: 8
  • per_gpu_train_batch_size: None
  • per_gpu_eval_batch_size: None
  • gradient_accumulation_steps: 8
  • eval_accumulation_steps: 10
  • eval_delay: 0
  • torch_empty_cache_steps: None
  • learning_rate: 0.0001
  • weight_decay: 0.01
  • adam_beta1: 0.9
  • adam_beta2: 0.999
  • adam_epsilon: 1e-08
  • max_grad_norm: 1.0
  • num_train_epochs: 3.0
  • max_steps: -1
  • lr_scheduler_type: SchedulerType.COSINE
  • lr_scheduler_kwargs: None
  • warmup_ratio: 0.1
  • warmup_steps: 0
  • log_level: passive
  • log_level_replica: warning
  • log_on_each_node: True
  • logging_dir: trainer_output/runs/Apr02_09-38-14_ip-172-31-32-148.ap-southeast-2.compute.internal
  • logging_strategy: IntervalStrategy.STEPS
  • logging_first_step: False
  • logging_steps: 0.004
  • logging_nan_inf_filter: True
  • save_strategy: SaveStrategy.BEST
  • save_steps: 0.04
  • save_total_limit: 2
  • save_safetensors: True
  • save_on_each_node: False
  • save_only_model: False
  • restore_callback_states_from_checkpoint: False
  • no_cuda: False
  • use_cpu: False
  • use_mps_device: False
  • seed: 3407
  • data_seed: None
  • jit_mode_eval: False
  • bf16: True
  • fp16: False
  • fp16_opt_level: O1
  • half_precision_backend: auto
  • bf16_full_eval: False
  • fp16_full_eval: False
  • tf32: None
  • local_rank: 0
  • ddp_backend: None
  • tpu_num_cores: None
  • tpu_metrics_debug: False
  • debug: []
  • dataloader_drop_last: False
  • eval_steps: 0.04
  • dataloader_num_workers: 0
  • dataloader_prefetch_factor: None
  • past_index: -1
  • run_name: None
  • disable_tqdm: False
  • remove_unused_columns: True
  • label_names: None
  • load_best_model_at_end: True
  • metric_for_best_model: eval_loss
  • greater_is_better: False
  • ignore_data_skip: False
  • fsdp: []
  • fsdp_min_num_params: 0
  • fsdp_config: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}
  • fsdp_transformer_layer_cls_to_wrap: None
  • accelerator_config: AcceleratorConfig(split_batches=False, dispatch_batches=None, even_batches=True, use_seedable_sampler=True, non_blocking=False, gradient_accumulation_kwargs=None, use_configured_state=False)
  • parallelism_config: None
  • deepspeed: None
  • label_smoothing_factor: 0.0
  • optim: OptimizerNames.ADAMW_TORCH_FUSED
  • optim_args: None
  • adafactor: False
  • group_by_length: False
  • length_column_name: length
  • report_to: ['wandb']
  • project: huggingface
  • trackio_space_id: trackio
  • ddp_find_unused_parameters: None
  • ddp_bucket_cap_mb: None
  • ddp_broadcast_buffers: None
  • dataloader_pin_memory: True
  • dataloader_persistent_workers: False
  • skip_memory_metrics: True
  • use_legacy_prediction_loop: False
  • push_to_hub: False
  • resume_from_checkpoint: None
  • hub_model_id: None
  • hub_strategy: HubStrategy.EVERY_SAVE
  • hub_token: None
  • hub_private_repo: None
  • hub_always_push: False
  • hub_revision: None
  • gradient_checkpointing: True
  • gradient_checkpointing_kwargs: None
  • include_inputs_for_metrics: False
  • include_for_metrics: []
  • eval_do_concat_batches: True
  • fp16_backend: auto
  • push_to_hub_model_id: None
  • push_to_hub_organization: None
  • push_to_hub_token: None
  • _n_gpu: 1
  • mp_parameters:
  • auto_find_batch_size: False
  • full_determinism: False
  • torchdynamo: None
  • ray_scope: last
  • ddp_timeout: 1800
  • torch_compile: False
  • torch_compile_backend: None
  • torch_compile_mode: None
  • include_tokens_per_second: False
  • include_num_input_tokens_seen: no
  • neftune_noise_alpha: None
  • optim_target_modules: None
  • batch_eval_metrics: False
  • eval_on_start: False
  • use_liger_kernel: False
  • liger_kernel_config: None
  • eval_use_gather_object: False
  • average_tokens_across_devices: True
  • model_init_kwargs: None
  • chat_template_path: None
  • dataset_text_field: text
  • dataset_kwargs: None
  • dataset_num_proc: None
  • eos_token: None
  • pad_token: None
  • max_length: 1024
  • packing: False
  • packing_strategy: bfd
  • padding_free: False
  • pad_to_multiple_of: None
  • eval_packing: None
  • completion_only_loss: None
  • assistant_only_loss: False
  • loss_type: nll
  • activation_offloading: False
  • vllm_sampling_params: None
  • unsloth_num_chunks: -1
  • unsloth_logit_chunk_multiplier: None
  • unsloth_grpo_mini_batch: None
  • max_seq_length: 17000
  • model_name: Snowflake/Arctic-Text2SQL-R1-7B
  • train_batch_size: 8
  • val_batch_size: 1
  • num_epochs: 3
  • lora_r: 64
  • lora_alpha: 128

Train Dataset Example

<|im_start|>system
You are Qwen, created by Alibaba Cloud. You are a helpful assistant.<|im_end|>
<|im_start|>user

You are an expert in SQL++ query generation. Given a document schema and a natural language query, generate a valid SQL++ query.

CORE RULES:
1. Column names with spaces or special characters (e.g., "Enrollment (K-12)") must be enclosed in backticks.
2. SUBSTR uses 0-based indexing: SUBSTR(str, 0, 4) gets the first 4 characters.
3. Prioritize evidence over natural language if they conflict.
4. Only use provided keyspaces and existing fields.
5. Never use SELECT *; be explicit about columns.
6. Never use CAST() function because it's not supported in SQL++.

SELECT REQUIREMENTS:
- Main SELECT must NOT use RAW; return only standard expressions.
- RAW is allowed only in SELECT subqueries.
- Scalar subqueries must use IN (SELECT RAW ...) or = (SELECT RAW ...)[0].
- Always append [0] to SELECT RAW when used as a scalar.
- NEVER use [0] with aggregate functions (MAX, MIN, AVG, etc.).

JOIN & QUERY STRUCTURE:
- Use JOINs only when required; never use correlated subqueries.
- Never self-JOIN unless explicitly needed.
- Don't repeat table aliases; don't name aliases after buckets/tables.
- No GROUP BY for single keyspace queries.

FIELD & NULL HANDLING:
- Every field in final SELECT or WHERE must have IS NOT NULL in WHERE clause unless explicitly checking for NULL.
- If an object is requested without specific attributes, return its id.

AGGREGATION & CALCULATIONS:
- Forbidden functions: DIVIDE(), SUBTRACT(), YEAR(), ARRAY_AGG().
- Use DISTINCT when unique entities are requested.
- PERCENTAGE RULE: numerator and denominator must come from the SAME row set. Never compute denominator in a separate subquery. Define one base set (FROM/JOIN/WHERE), compute denominator from it, and derive numerator by adding conditions in aggregate expressions (CASE WHEN).

FORMATTING:
- No colon after FROM.
- Return ONLY what is asked.

Bucket Name: training_bucket
Scope Name: Advertising_Attribution_and_ROI_Analytics_scope

Document Schema:
{"`training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`attribution_results`": {"Flavor": "", "properties": {"advertiser_id": {"samples": ["advertiser_149fbd20-8708-4124-..."], "type": "string"}, "attribution_model_id": {"samples": ["attr_model_6ca6427a-0123-4a31-..."], "type": "string"}, "computed_at": {"samples": ["2025-10-25T00:18:54Z"], "type": "string"}, "conversion_id": {"samples": ["conversion_3d0ec24c-bc6a-4bac-..."], "type": "string"}, "created_at": {"samples": ["2025-10-25T00:18:59Z"], "type": "string"}, "credits": {"items": {}, "samples": [[{"ad_group_id": null, "campaign_id": null, "channel_id": "channel_fd1d3391-152f-4e92-b9cf-b3cc0afe1362", "creative_id": null, "credited_conversions": 1, "credited_revenue": 11.63, "currency": "USD", "occurred_at": "2025-10-25T16:59:35Z", "touchpoint_id": "touchpoint_912bd90c-ecc2-4989-82d7-615889ddb438", "touchpoint_type": "ad_click", "weight": 1}]], "type": "array"}, "explain": {"properties": {"dropped_touchpoints": {"items": {}, "samples": [[{"reason": "outside_rules", "touchpoint_id": "touchpoint_4a40ba11-66b6-46c6-af40-c66794c952a9"}]], "type": "array"}, "notes": {"samples": [["Model applied with lookback an..."]], "type": "string"}}, "type": "object"}, "person_id": {"samples": [["person_311faa53-8c2c-4fe6-bd2e..."]], "type": "string"}, "totals": {"properties": {"currency": {"samples": [["AUD"]], "type": "string"}, "total_credited_conversions": {"samples": [[1]], "type": "number"}, "total_credited_revenue": {"samples": [[11.63]], "type": "number"}}, "type": "object"}, "window": {"properties": {"from": {"samples": ["2025-09-10T00:17:54Z"], "type": "string"}, "to": {"samples": ["2025-10-25T00:17:54Z"], "type": "string"}}, "type": "object"}}, "type": "object"}, "`training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`attribution_models`": {"Flavor": "", "properties": {"advertiser_id": {"samples": ["advertiser_40b38653-2ba0-4f82-..."], "type": "string"}, "created_at": {"samples": ["2025-03-03T19:53:28Z"], "type": "string"}, "is_default": {"samples": [false], "type": "boolean"}, "lookback_windows": {"properties": {"click_days": {"samples": [7], "type": "number"}, "view_days": {"samples": [1], "type": "number"}}, "type": "object"}, "model_type": {"samples": ["first_click"], "type": "string"}, "name": {"samples": ["First Click v1"], "type": "string"}, "rules": {"properties": {"dedupe_within_minutes": {"samples": [[10]], "type": "number"}, "exclude_channels": {"items": {"type": "string"}, "samples": [["affiliate"]], "type": "array"}, "include_channels": {"items": {"type": "string"}, "samples": [["affiliate"]], "type": "array"}}, "type": "object"}, "updated_at": {"samples": [["2025-02-16T17:35:41Z"]], "type": "string"}, "version": {"samples": [1], "type": "number"}, "weights": {"items": {"properties": {"touchpoint_type": {"samples": ["ad_click"], "type": "string"}, "weight": {"samples": [1], "type": "number"}}, "type": "object"}, "samples": [[{"touchpoint_type": "ad_click", "weight": 0.5}]], "type": "array"}}, "type": "object"}, "`training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`costs`": {"Flavor": "", "properties": {"ad_group_id": {"samples": [["ad_group_1d81992b-7ae7-45bc-bb..."]], "type": "string"}, "advertiser_id": {"samples": ["advertiser_2f6e754e-c2f0-4ae5-..."], "type": "string"}, "campaign_id": {"samples": [["campaign_794729c9-e191-4ca1-83..."]], "type": "string"}, "channel_id": {"samples": ["channel_17aea452-560e-403a-a2b..."], "type": "string"}, "clicks": {"samples": [[198]], "type": "number"}, "creative_id": {"samples": [["creative_2c8d15d9-6146-430d-a0..."]], "type": "string"}, "currency": {"samples": ["AUD"], "type": "string"}, "date": {"samples": ["2025-11-22"], "type": "string"}, "fees": {"properties": {"agency_fees": {"samples": [[25.24]], "type": "number"}, "platform_fees": {"samples": [[0.34]], "type": "number"}}, "type": "object"}, "granularity": {"samples": ["daily"], "type": "string"}, "impressions": {"samples": [[4085]], "type": "number"}, "ingested_at": {"samples": ["2025-11-22T20:05:14Z"], "type": "string"}, "source": {"samples": ["api"], "type": "string"}, "spend": {"samples": [622.44], "type": "number"}}, "type": "object"}, "`training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`conversions`": {"Flavor": "", "properties": {"advertiser_id": {"samples": ["advertiser_0060c7cf-3cec-4368-..."], "type": "string"}, "conversion_type": {"samples": ["lead"], "type": "string"}, "created_at": {"samples": ["2025-10-29T15:48:17Z"], "type": "string"}, "dedupe_key": {"samples": [["dd_02c3ee0b2ce74aa9a3"]], "type": "string"}, "event_source": {"samples": ["offline_import"], "type": "string"}, "external_event_id": {"samples": [["ext_034fdd40765440b98b66ac5435..."]], "type": "string"}, "occurred_at": {"samples": ["2025-10-29T14:04:17Z"], "type": "string"}, "order": {"properties": {"currency": {"samples": ["AUD"], "type": "string"}, "discount": {"samples": [[16.41]], "type": "number"}, "items": {"items": {}, "samples": [[{"name": "Down", "product_id": null, "quantity": 1, "revenue": 236.31, "sku": "SKU-68405", "unit_price": 236.31}]], "type": "array"}, "order_id": {"samples": [["ord_0ad1403de4df"]], "type": "string"}, "shipping": {"samples": [[0]], "type": "number"}, "subtotal": {"samples": [[441.25]], "type": "number"}, "tax": {"samples": [[20.04]], "type": "number"}, "total": {"samples": [1.92], "type": "number"}}, "type": "object"}, "person_id": {"samples": [["person_2b90965e-911c-44ee-9ebd..."]], "type": "string"}, "updated_at": {"samples": [["2025-11-08T15:48:17Z"]], "type": "string"}, "value": {"properties": {"currency": {"samples": [["AUD"]], "type": "string"}, "lifetime_value": {"samples": [[496.03]], "type": "number"}, "margin": {"samples": [[117.16]], "type": "number"}, "revenue": {"samples": [[3.85]], "type": "number"}}, "type": "object"}}, "type": "object"}, "`training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`touchpoints`": {"Flavor": "", "properties": {"ad_group_id": {"samples": [["ad_group_44136a98-ce0c-44f2-b5..."]], "type": "string"}, "advertiser_id": {"samples": ["advertiser_0053becd-41b4-4242-..."], "type": "string"}, "attribution_context": {"properties": {"page_url": {"samples": [["http://leon-roman.com/"]], "type": "string"}, "utm": {"properties": {"campaign": {"samples": [["performance"]], "type": "string"}, "content": {"samples": [["cta_b"]], "type": "string"}, "medium": {"samples": [["affiliate"]], "type": "string"}, "source": {"samples": [["facebook"]], "type": "string"}, "term": {"samples": [["plant"]], "type": "string"}}, "type": "object"}}, "type": "object"}, "campaign_id": {"samples": [["campaign_0ce273e4-c4ca-46cb-b2..."]], "type": "string"}, "channel_id": {"samples": [["channel_7194e13b-3b0a-4ab1-b58..."]], "type": "string"}, "click": {"properties": {"fbclid": {"samples": [["972c095bb74b40f0b9037643f4da19..."]], "type": "string"}, "gclid": {"samples": [["Cj0KCQ145f911493a5418c81"]], "type": "string"}, "landing_page_url": {"samples": [["http://www.dunn-rodriguez.com/"]], "type": "string"}, "msclkid": {"samples": [["1df604af63cc4dfb908a4ed85bfbf2..."]], "type": "string"}, "referrer": {"samples": [["http://bridges.biz/"]], "type": "string"}}, "type": "object"}, "created_at": {"samples": ["2025-10-07T21:29:48Z"], "type": "string"}, "creative_id": {"samples": [["creative_88551e0c-e427-4f1f-90..."]], "type": "string"}, "device": {"properties": {"device_type": {"samples": [["desktop"]], "type": "string"}, "ip": {"samples": [["163.224.62.40"]], "type": "string"}, "user_agent": {"samples": [["Opera/9.32.(Windows 95; ku-TR)..."]], "type": "string"}}, "type": "object"}, "event_id": {"samples": [["evt_1a35a2d67694406e"]], "type": "string"}, "geo": {"properties": {"city": {"samples": [["Davidburgh"]], "type": "string"}, "country": {"samples": [["AU"]], "type": "string"}, "region": {"samples": [["AB"]], "type": "string"}}, "type": "object"}, "occurred_at": {"samples": ["2025-10-07T20:41:15Z"], "type": "string"}, "person_id": {"samples": ["person_2fbaa531-d818-4730-ba1b..."], "type": "string"}, "session_id": {"samples": [["sess_d271df81fa654fc3"]], "type": "string"}, "touchpoint_type": {"samples": ["ad_click"], "type": "string"}, "tracking_link_id": {"samples": [["tracking_link_1f4bafa7-4899-4b..."]], "type": "string"}}, "type": "object"}, "`training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`identities`": {"Flavor": "", "properties": {"advertiser_id": {"samples": ["advertiser_a4f05d8e-b8d6-4675-..."], "type": "string"}, "consent": {"properties": {"ccpa": {"samples": [[false]], "type": "boolean"}, "consent_string": {"samples": [["20e9041e6c424ecb885d2289793ff6..."]], "type": "string"}, "consent_updated_at": {"samples": [["2025-08-02T19:39:54Z"]], "type": "string"}, "gdpr": {"samples": [[false]], "type": "boolean"}}, "type": "object"}, "created_at": {"samples": ["2025-08-24T06:09:45Z"], "type": "string"}, "expires_at": {"samples": [["2025-11-20T04:36:19Z"]], "type": "string"}, "first_seen_at": {"samples": ["2025-08-24T06:09:45Z"], "type": "string"}, "identity_type": {"samples": ["cookie"], "type": "string"}, "identity_value": {"samples": ["1192976633160"], "type": "string"}, "is_primary": {"samples": [false], "type": "boolean"}, "last_seen_at": {"samples": ["2025-09-05T03:18:46Z"], "type": "string"}, "person_id": {"samples": ["person_2daed520-58e7-41ff-ae3f..."], "type": "string"}, "source": {"samples": [["crm"]], "type": "string"}, "updated_at": {"samples": [["2025-10-06T06:09:45Z"]], "type": "string"}}, "type": "object"}, "`training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`tracking_links`": {"Flavor": "", "properties": {"ad_group_id": {"samples": [["ad_group_71ddff1f-24e5-4e31-a3..."]], "type": "string"}, "advertiser_id": {"samples": ["advertiser_0c752fef-5332-40b3-..."], "type": "string"}, "campaign_id": {"samples": [["campaign_72c1fb44-646b-47c0-91..."]], "type": "string"}, "channel_id": {"samples": [["channel_9f04a2ca-45cd-4c8e-884..."]], "type": "string"}, "created_at": {"samples": ["2025-02-20T18:16:37Z"], "type": "string"}, "creative_id": {"samples": [["creative_02dd0b0a-99e2-4c2a-80..."]], "type": "string"}, "destination_url": {"samples": ["https://meyer.info/movie-stage"], "type": "string"}, "params": {"items": {"properties": {"key": {"samples": ["ref"], "type": "string"}, "value": {"samples": ["cause"], "type": "string"}}, "type": "object"}, "samples": [[{"key": "ref", "value": "above"}]], "type": "array"}, "short_url": {"samples": [["https://ex.am/3ab8a25e"]], "type": "string"}, "updated_at": {"samples": [["2025-04-16T18:16:37Z"]], "type": "string"}, "utm": {"properties": {"campaign": {"samples": [["another-wear-team"]], "type": "string"}, "content": {"samples": [["cta_b"]], "type": "string"}, "medium": {"samples": [["affiliate"]], "type": "string"}, "source": {"samples": [["bing"]], "type": "string"}, "term": {"samples": [["appear"]], "type": "string"}}, "type": "object"}}, "type": "object"}, "`training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`creatives`": {"Flavor": "", "properties": {"ad_group_id": {"samples": [["ad_group_72481571-e8e4-4c91-84..."]], "type": "string"}, "advertiser_id": {"samples": ["advertiser_0299d530-c60d-43c5-..."], "type": "string"}, "approval": {"properties": {"reject_reason": {"samples": [["disallowed_content"]], "type": "string"}, "review_status": {"samples": ["approved"], "type": "string"}, "reviewed_at": {"samples": [["2025-03-14T14:15:16Z"]], "type": "string"}}, "type": "object"}, "assets": {"items": {"Flavor": "`asset_type` = \"image\"", "properties": {"asset_type": {"samples": ["image"], "type": "string"}, "height": {"samples": [[90]], "type": "number"}, "text": {"samples": [], "type": "string"}, "url": {"samples": [["https://dummyimage.com/300x250"]], "type": "string"}, "width": {"samples": [[300]], "type": "number"}}, "type": "object"}, "samples": [[{"asset_type": "headline", "height": null, "text": "Various community soldier tonight more region difference.", "url": null, "width": null}]], "type": "array"}, "campaign_id": {"samples": [["campaign_0eb3a753-bdf1-493c-87..."]], "type": "string"}, "created_at": {"samples": ["2025-03-10T14:15:16Z"], "type": "string"}, "format": {"samples": ["carousel"], "type": "string"}, "landing_page": {"properties": {"final_url": {"samples": ["https://edwards-lindsey.com/hi..."], "type": "string"}, "tracking_template": {"samples": [["https://edwards-lindsey.com/hi..."]], "type": "string"}, "url": {"samples": [["https://edwards-lindsey.com/hi..."]], "type": "string"}}, "type": "object"}, "name": {"samples": ["Administration Video Creative"], "type": "string"}, "status": {"samples": ["active"], "type": "string"}, "updated_at": {"samples": [["2025-06-19T14:15:16Z"]], "type": "string"}}, "type": "object"}, "`training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`ad_groups`": {"Flavor": "", "properties": {"advertiser_id": {"samples": ["advertiser_7e25326b-8a75-4b67-..."], "type": "string"}, "bid": {"properties": {"amount": {"samples": [[4.89]], "type": "number"}, "currency": {"samples": [["AUD"]], "type": "string"}, "strategy": {"samples": [["cpc"]], "type": "string"}}, "type": "object"}, "campaign_id": {"samples": ["campaign_197ef716-143d-4c02-87..."], "type": "string"}, "created_at": {"samples": ["2024-12-23T07:37:40Z"], "type": "string"}, "name": {"samples": ["Activity Garden Ad Group"], "type": "string"}, "status": {"samples": ["active"], "type": "string"}, "targeting": {"properties": {"audience_segment_ids": {"items": {"type": "string"}, "samples": [["seg_001"]], "type": "array"}, "devices": {"items": {"type": "string"}, "samples": [["desktop"]], "type": "array"}, "geo": {"items": {"type": "string"}, "samples": [["CA"]], "type": "array"}}, "type": "object"}, "updated_at": {"samples": [["2025-02-01T07:37:40Z"]], "type": "string"}}, "type": "object"}, "`training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`campaigns`": {"Flavor": "", "properties": {"advertiser_id": {"samples": ["advertiser_0648e9cf-bbfe-4208-..."], "type": "string"}, "budget": {"properties": {"amount": {"samples": [[4149.4]], "type": "number"}, "currency": {"samples": [["AUD"]], "type": "string"}, "period": {"samples": [["daily"]], "type": "string"}}, "type": "object"}, "channel_id": {"samples": ["channel_2f163789-e8a5-4402-9c0..."], "type": "string"}, "created_at": {"samples": ["2024-12-29T14:01:06Z"], "type": "string"}, "end_date": {"samples": [["2025-07-19"]], "type": "string"}, "name": {"samples": ["Monitored optimizing extranet"], "type": "string"}, "objective": {"samples": [["awareness"]], "type": "string"}, "start_date": {"samples": [["2025-06-25"]], "type": "string"}, "status": {"samples": ["active"], "type": "string"}, "tags": {"items": {"type": "string"}, "samples": [["cost"]], "type": "array"}, "updated_at": {"samples": [["2025-02-16T13:17:50Z"]], "type": "string"}, "utm_defaults": {"properties": {"campaign": {"samples": [["always_on"]], "type": "string"}, "content": {"samples": [["carousel"]], "type": "string"}, "medium": {"samples": [["affiliate"]], "type": "string"}, "source": {"samples": [["affiliate"]], "type": "string"}, "term": {"samples": [["finish"]], "type": "string"}}, "type": "object"}}, "type": "object"}, "`training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`channels`": {"Flavor": "", "properties": {"advertiser_id": {"samples": ["advertiser_65d17b98-5a16-4d87-..."], "type": "string"}, "created_at": {"samples": ["2024-01-24T07:11:14Z"], "type": "string"}, "name": {"samples": ["Display - dv360"], "type": "string"}, "provider": {"samples": [["dv360"]], "type": "string"}, "settings": {"properties": {"api_account_id": {"samples": [["acct_242141"]], "type": "string"}, "cost_import_mode": {"samples": [["api"]], "type": "string"}, "integration_id": {"samples": [["int_35f39b80ca"]], "type": "string"}}, "type": "object"}, "status": {"samples": ["active"], "type": "string"}, "type": {"samples": ["affiliate"], "type": "string"}, "updated_at": {"samples": [["2024-11-13T09:05:59Z"]], "type": "string"}}, "type": "object"}, "`training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`users`": {"Flavor": "", "properties": {"advertiser_id": {"samples": ["advertiser_00000000-0000-0000-..."], "type": "string"}, "created_at": {"samples": ["2024-08-31T07:30:41Z"], "type": "string"}, "email": {"samples": ["bkelly@example.com"], "type": "string"}, "full_name": {"samples": ["Catherine Martin"], "type": "string"}, "last_login_at": {"samples": [["2025-10-26T11:13:23Z"]], "type": "string"}, "preferences": {"properties": {"default_attribution_model_id": {"samples": [["attr_model_00000000-0000-0000-..."]], "type": "string"}, "locale": {"samples": [["de-DE"]], "type": "string"}, "timezone": {"samples": [["America/Los_Angeles"]], "type": "string"}}, "type": "object"}, "role": {"samples": ["admin"], "type": "string"}, "status": {"samples": ["active"], "type": "string"}, "updated_at": {"samples": [["2024-11-29T07:30:41Z"]], "type": "string"}}, "type": "object"}, "`training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`advertisers`": {"Flavor": "", "properties": {"billing": {"properties": {"billing_address": {"properties": {"city": {"samples": [["Bethton"]], "type": "string"}, "country": {"samples": [["AU"]], "type": "string"}, "line1": {"samples": [["0596 Ortiz Streets Suite 612"]], "type": "string"}, "line2": {"samples": [["Apt. 254"]], "type": "string"}, "postal_code": {"samples": [["05987"]], "type": "string"}, "region": {"samples": [["KS"]], "type": "string"}}, "type": "object"}, "billing_contact_email": {"samples": [["deborahdiaz@mendez.com"]], "type": "string"}, "tax_id": {"samples": [["EU-346669959"]], "type": "string"}}, "type": "object"}, "created_at": {"samples": ["2024-04-09T08:53:26Z"], "type": "string"}, "default_currency": {"samples": ["AUD"], "type": "string"}, "industry": {"samples": [["Education"]], "type": "string"}, "legal_name": {"samples": [["Bartlett-Murphy LLC"]], "type": "string"}, "name": {"samples": ["Adams-Huff"], "type": "string"}, "status": {"samples": ["active"], "type": "string"}, "timezone": {"samples": ["America/Los_Angeles"], "type": "string"}, "updated_at": {"samples": [["2024-07-14T23:25:32Z"]], "type": "string"}}, "type": "object"}}

Natural Language Query:
For each advertiser, identify the top 5 destination URLs (from tracking links) that drove the most attributed revenue in December 2025, and show which campaigns and creatives were used to send traffic there. Add the corresponding spend from costs for those campaigns/channels in the same period, and include cases where the destination URL has attributed revenue but no direct spend records to investigate data gaps.

SQL++ Query:
<|im_end|>
<|im_start|>assistant
<think>
1. Understand the question
- Core intent: ranking plus aggregation.
- We need, for each advertiser, the top 5 destination URLs that generated the highest attributed revenue during December 2025.
- Entities involved: advertisers, attribution results, credit rows inside attribution results, touchpoints, tracking links, campaigns, creatives, and costs.
- Temporal references:
  - attributed revenue should be based on credit occurrences in December 2025
  - spend should be based on cost records dated in December 2025
- Quantifiers:
  - “top 5” implies per-advertiser ranking
  - “for each advertiser” implies partitioned results by advertiser
- Implicit requirements:
  - destination URL comes from tracking links, not directly from attribution results
  - campaigns and creatives used to send traffic there should be collected, likely as arrays because multiple campaign/creative IDs can contribute to the same URL
  - include URLs that have revenue even if there is no spend, so spend lookup must not eliminate such rows

2. Identify relevant collections
- Primary source for attributed revenue: `training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`attribution_results`
- Secondary collections:
  - `training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`touchpoints` to connect credited touchpoint IDs to tracking links
  - `training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`tracking_links` to get `destination_url`
  - `training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`costs` to calculate spend for the same period
- Metadata access is required because the credited touchpoint ID and tracking link ID are document IDs, so joins should use `META(alias).id`

3. Map fields to collections
- From attribution results:
  - `advertiser_id`
  - `credits` array
- From each credit object in `credits`:
  - `touchpoint_id`
  - `campaign_id`
  - `creative_id`
  - `channel_id`
  - `credited_revenue`
  - `occurred_at`
- From touchpoints:
  - document ID via `META(tp).id`
  - `tracking_link_id`
- From tracking links:
  - document ID via `META(tl).id`
  - `destination_url`
- From costs:
  - `advertiser_id`
  - `campaign_id`
  - `channel_id`
  - `date`
  - `spend`
  - `fees.agency_fees`
  - `fees.platform_fees`

4. Analyze arrays and nesting
- `attribution_results.credits` is an array of objects, so it must be expanded with `UNNEST`
- No other array needs to be unnested for the main logic
- Campaign and creative outputs should be accumulated as arrays, so `ARRAY_AGG(DISTINCT ...)` is appropriate
- The spend lookup can use campaign membership against the aggregated campaign array

5. Decide join / nest strategy
- Start from attribution results and `UNNEST` credits because revenue lives at the credit-row level
- Join credit rows to touchpoints using credited `touchpoint_id` against `META(tp).id`
- Join touchpoints to tracking links using `tp.tracking_link_id` against `META(tl).id`
- For spend, an outer-style preservation is needed conceptually because URLs with revenue but no spend must remain. A good way is to compute spend separately and use a scalar correlated subquery in projection, which naturally returns null when no matching spend exists instead of filtering the row out
- CTEs are useful because the logic has clear stages:
  - December credit extraction
  - revenue aggregation by URL
  - ranking per advertiser
  - spend aggregation by campaign/channel

6. Determine filters and aggregations
- December 2025 credit filter:
  - filter credit `occurred_at` from `2025-12-01T00:00:00Z` inclusive to `2026-01-01T00:00:00Z` exclusive
  - using the credit timestamp is correct because the question asks what drove attributed revenue in December
- Revenue aggregation:
  - group by advertiser and destination URL
  - sum `credited_revenue`, defaulting missing/null revenue to 0 with `IFMISSINGORNULL`
  - collect distinct campaign IDs and creative IDs contributing to that URL
- Ranking:
  - use `ROW_NUMBER()` partitioned by advertiser and ordered by attributed revenue descending
  - filter to top 5 per advertiser
- Spend aggregation:
  - group costs by advertiser, campaign, and channel for December 2025 dates
  - sum spend plus agency and platform fees, defaulting missing fees to 0
- Spend matching back to ranked URLs:
  - use correlated subquery over aggregated spend data
  - match advertiser and campaigns associated with the URL
  - sum the spend rows for those related campaigns
- Important nuance:
  - although spend was aggregated by campaign and channel, the final spend lookup can sum over matching campaigns for the advertiser
  - this preserves URLs without spend because absence of matches yields null rather than dropping the revenue row

7. Decide output shape
- Return only fields requested or implied:
  - advertiser_id
  - destination_url
  - attributed_revenue
  - campaign_ids
  - creative_ids
  - spend_with_fees_for_related_campaigns
- Use meaningful aliases for computed values
- Sort final results by advertiser and attributed revenue descending so each advertiser’s top URLs appear in ranked order
- No DISTINCT needed in the outer select because grouping/ranking already define one row per advertiser+URL result

8. Validate SQL++ constraints
- After aliasing, every field reference should use the correct alias (`ar`, `cr`, `tp`, `tl`, etc.)
- `META(tp).id` and `META(tl).id` must be used because aliases are defined and joins are against document IDs
- Grouping rules:
  - revenue-by-URL stage groups by advertiser and destination URL, and all other selected fields there are aggregates
  - spend aggregation groups by advertiser, campaign, channel, and selected non-aggregates match that grouping
- `UNNEST` should reference the parent alias correctly: the credits array belongs to `ar`
- String literals for dates and timestamps should use double quotes
- `date` in costs is a field name that may need identifier escaping when referenced
- Window function ranking should be computed after revenue aggregation, not before

9. Overall construction plan
- First derive December credit-level records from attribution results by unnesting credits and keeping advertiser, touchpoint, campaign, creative, channel, and revenue
- Next join those rows to touchpoints and tracking links to obtain destination URLs, then aggregate revenue and collect campaign/creative IDs per advertiser+URL
- Then rank URLs within each advertiser by attributed revenue and keep the top 5
- Separately aggregate December costs into spend-with-fees by advertiser+campaign+channel
- Finally project the ranked URL rows and attach spend via a correlated sum over the spend CTE, matching advertiser and related campaigns so rows with missing spend are still included
</think>

```sql++
WITH dec_credits AS ( SELECT ar.advertiser_id, cr.touchpoint_id, cr.campaign_id, cr.creative_id, cr.channel_id, IFMISSINGORNULL(cr.credited_revenue,0) AS credited_revenue FROM `training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`attribution_results` AS ar UNNEST ar.credits AS cr WHERE cr.occurred_at >= "2025-12-01T00:00:00Z" AND cr.occurred_at < "2026-01-01T00:00:00Z" ), rev_by_url AS ( SELECT dc.advertiser_id, tl.destination_url, SUM(dc.credited_revenue) AS attributed_revenue, ARRAY_AGG(DISTINCT dc.campaign_id) AS campaign_ids, ARRAY_AGG(DISTINCT dc.creative_id) AS creative_ids FROM dec_credits AS dc INNER JOIN `training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`touchpoints` AS tp ON META(tp).id = dc.touchpoint_id INNER JOIN `training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`tracking_links` AS tl ON META(tl).id = tp.tracking_link_id WHERE tl.destination_url IS NOT NULL GROUP BY dc.advertiser_id, tl.destination_url ), ranked AS ( SELECT r.*, ROW_NUMBER() OVER (PARTITION BY r.advertiser_id ORDER BY r.attributed_revenue DESC) AS rn FROM rev_by_url AS r ), spend_by_campaign_channel AS ( SELECT c.advertiser_id, c.campaign_id, c.channel_id, SUM(c.spend + IFMISSINGORNULL(c.fees.agency_fees,0) + IFMISSINGORNULL(c.fees.platform_fees,0)) AS spend_with_fees FROM `training_bucket`.`Advertising_Attribution_and_ROI_Analytics_scope`.`costs` AS c WHERE c.`date` BETWEEN "2025-12-01" AND "2025-12-31" GROUP BY c.advertiser_id, c.campaign_id, c.channel_id ) SELECT rk.advertiser_id, rk.destination_url, rk.attributed_revenue, rk.campaign_ids, rk.creative_ids, (SELECT RAW SUM(s.spend_with_fees) FROM spend_by_campaign_channel AS s WHERE s.advertiser_id = rk.advertiser_id AND s.campaign_id IN rk.campaign_ids)[0] AS spend_with_fees_for_related_campaigns FROM ranked AS rk WHERE rk.rn <= 5 ORDER BY rk.advertiser_id, rk.attributed_revenue DESC

<|im_end|>


Downloads last month
87
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for jastorj/snowflake_arctic_text2sql_r1_7b-nl2sqlpp-16bit-v5.6-cw-17K

Base model

Qwen/Qwen2.5-7B
Adapter
(7)
this model