Spaces:
Sleeping
Sleeping
| { | |
| "task_id": "hard_05", | |
| "version": "1.0.0", | |
| "created_at": "2026-03-11", | |
| "metadata": { | |
| "domain": "credit_card_optimization", | |
| "difficulty": "hard", | |
| "task_number": 5, | |
| "complexity_hint": { | |
| "max_tokens": 8000, | |
| "expected_output": "complex strategy with contingency plans, stochastic elements, detailed EV" | |
| }, | |
| "requires_human_review": true | |
| }, | |
| "prompt": { | |
| "system": "", | |
| "user": "You are a financial optimization agent tasked with designing the optimal 36-month credit card strategy for a household.\n\nYour goal is to maximize the household\u2019s net expected financial value over the next 3 years.\n\nNet value includes:\n\n* signup bonuses \n* ongoing rewards \n* travel redemption value \n* statement credits that will realistically be used\n\nNet value must subtract:\n\n* annual fees \n* opportunity costs of locking out future bonuses \n* interest if spend requirements exceed realistic spending capacity\n\n**Household Profile:**\n\nPerson A\n\n* Age 38 \n* Credit score: 780 \n* Income: $450k \n* Currently holds 7 cards\n\nPerson B\n\n* Age 36 \n* Credit score: 720 \n* Income: $120k \n* Currently holds 4 cards\n\n**Combined monthly spend:**\n\n| category | monthly |\n| ----- | ----- |\n| airfare / hotels | $4,200 |\n| dining | $1,600 |\n| groceries | $1,200 |\n| rideshare | $600 |\n| business ads / SaaS | $2,500 |\n| utilities | $900 |\n| other | $2,500 |\n\nTotal monthly spend: $13,500\n\n**Current Cards:**\n\nPerson A\n\n* Amex Platinum (3 years old) \n* Amex Gold (2 years old) \n* Chase Sapphire Preferred (45 months old) \n* Capital One Venture X (1 year old) \n* Chase Ink Preferred (7mo old) \n* Citi Double Cash (4 mo old) \n* Marriott Bonvoy Boundless (18mo old)\n\nPerson B\n\n* Chase Freedom Unlimited (10 years old) \n* Amex Blue Business Plus (8 years old) \n* Capital One Venture (6 years old) \n* Citi Premier (2.5 years old)\n\n**Constraints**\n\n* Household may have no more than 10 cards total at any time. \n* Household annual fee budget must remain below $1,200 per year. \n* Household wants to minimize complexity: \n* No more than 3 active reward ecosystems. \n* Approval probability must remain above 55% for every new card application. \n* No more than 3 card applications within any rolling 90-day window. \n* All spend used to meet signup bonuses must remain feasible given actual household spending. \n* Household prefers transferable point ecosystems, but cashback is allowed if EV is at least 10% higher.\n\n**Travel Goals**\n\nEvery year for the next 3 years:\n\n* 2 business class flights NYC \u2192 Tokyo (ANA flight) \n* 2 economy flights NYC \u2192 Paris (KLM flight) \n* several domestic flights (agnostic toward American/Delta/United)\n\nFlights should be booked using points whenever EV is higher than paying cash.\n\nKeep in mind the dataset contains:\n\n* airline alliances \n* transfer partners \n* redemption price distributions \n* transfer ratios\n\n**Offer Uncertainty**\n\nHistorical data shows some cards periodically have elevated signup bonuses.\n\nExample distributions:\n\n| Card | Elevated bonus probability |\n| ----- | ----- |\n| Amex Platinum | 35% chance of 150k annually |\n| Chase Sapphire Reserve | 20% chance of 80k |\n| Venture X | 15% chance of 100k |\n\nYou may choose to delay card applications if expected value improves.\n\n**Credit Score Dynamics**\n\nOpening a card temporarily:\n\n* reduces credit score by 5 points immediately, but recovers in 3 months, and in 6 months it increases score by 5 points because of decreased utilization from more access to credit\n\nClosing a card may:\n\n* reduce available credit \n* affect approval probability\n\n**Manufactured Spend Opportunity**\n\nPerson A has access to up to $8k/mo of manufactured spend, however some issuers flag high MS \n(over $3k/mo). Surpassing that risks issuer shutdown.\n\n**Task**\n\nProduce the optimal 36-month strategy including:\n\n* which cards each person should open and when \n* which cards should be downgraded or cancelled \n* how spending should be allocated across cards \n* which reward ecosystems should be prioritized \n* which travel redemptions should be used \n* whether any card applications should be delayed to wait for elevated bonuses\n\nYour strategy must include:\n\n* a month-by-month timeline and spending plan by card \n* expected reward value for each decision \n* final expected net value after 36 months \n* a backup suggestion for each application in case it gets rejected", | |
| "knowledge_base_ref": "knowledge_base.md", | |
| "kb_filter": [ | |
| "Chase Ink Business Unlimited", | |
| "Chase Ink Business Preferred", | |
| "Chase Ink Business Premier", | |
| "Capital One Venture X", | |
| "Capital One Venture X Business", | |
| "Amex Business Gold", | |
| "Amex Business Platinum", | |
| "Capital One Business Spark 2X Cash", | |
| "Citi Strata", | |
| "Citi Strata Premier", | |
| "BofA Premium Rewards", | |
| "BofA Premium Rewards Elite", | |
| "Chase World of Hyatt", | |
| "Chase United Explorer", | |
| "Chase United Gateway", | |
| "Atmos Rewards Summit", | |
| "Amex Delta SkyMiles Gold", | |
| "Amex Delta SkyMiles Platinum", | |
| "Amex Delta SkyMiles Reserve", | |
| "Amex Green", | |
| "American Express Platinum", | |
| "American Express Gold", | |
| "Chase Sapphire Preferred", | |
| "Chase Freedom Unlimited", | |
| "Amex Blue Business Plus", | |
| "Capital One Venture", | |
| "Citi Double Cash", | |
| "Chase Marriott Bonvoy Boundless", | |
| "Chase Sapphire Reserve", | |
| "Chase Business Sapphire Reserve", | |
| "Capital One Business Spark 2X Miles", | |
| "Amex Blue Business Cash" | |
| ], | |
| "system_prompt_ref": "system_prompt_template.md" | |
| }, | |
| "scoring": { | |
| "dimensions": { | |
| "constraint_compliance": { | |
| "weight": 0.3, | |
| "type": "automated", | |
| "description": "Hard rule checks: velocity limits, eligibility, user constraints", | |
| "checks": { | |
| "velocity_rules": null, | |
| "eligibility_rules": null, | |
| "user_constraints": null, | |
| "expected_cards": [ | |
| "Chase Ink Business Unlimited", | |
| "Chase Ink Business Preferred", | |
| "Chase Ink Business Premier", | |
| "Capital One Venture X", | |
| "Capital One Venture X Business", | |
| "Amex Business Gold", | |
| "Amex Business Platinum", | |
| "Capital One Business Spark 2X Cash", | |
| "Citi Strata", | |
| "Citi Strata Premier", | |
| "BofA Premium Rewards", | |
| "BofA Premium Rewards Elite", | |
| "Chase World of Hyatt", | |
| "Chase United Explorer", | |
| "Chase United Gateway", | |
| "Atmos Rewards Summit", | |
| "Amex Delta SkyMiles Gold", | |
| "Amex Delta SkyMiles Platinum", | |
| "Amex Delta SkyMiles Reserve", | |
| "Amex Green", | |
| "American Express Platinum", | |
| "American Express Gold" | |
| ], | |
| "expected_housing_option": null, | |
| "key_constraints_flags": [ | |
| "dual_person_household", | |
| "10_card_limit", | |
| "1200_annual_fee_budget", | |
| "3_ecosystems_max", | |
| "90_day_velocity", | |
| "manufactured_spend", | |
| "elevated_offer_probability", | |
| "travel_redemption_goals", | |
| "36_month_horizon" | |
| ] | |
| }, | |
| "hard_constraint": false | |
| }, | |
| "ev_accuracy": { | |
| "weight": 0.4, | |
| "type": "automated", | |
| "description": "EV calculation accuracy vs. reference solution", | |
| "reference": { | |
| "reference_ev_usd": 40693.0, | |
| "ev_tolerance_pct": 0.05 | |
| } | |
| }, | |
| "reasoning_quality": { | |
| "weight": 0.2, | |
| "type": "human", | |
| "description": "Quality of tradeoff articulation and strategic reasoning (0-3 scale)", | |
| "rubric": { | |
| "0": "No reasoning or incorrect reasoning", | |
| "1": "Surface-level reasoning, misses key tradeoffs", | |
| "2": "Correct tradeoffs identified with clear justification", | |
| "3": "Expert-level nuance including edge cases and constraint interactions" | |
| }, | |
| "score": null | |
| }, | |
| "constraint_prioritization": { | |
| "weight": 0.1, | |
| "type": "human", | |
| "description": "Correct handling of ambiguity and conflicting constraints", | |
| "score": null | |
| } | |
| }, | |
| "passing_threshold": 0.6, | |
| "hard_constraint_failure_zeroes_dimension": true | |
| }, | |
| "reference_solution": { | |
| "_status": "EXPERT_REVIEWED", | |
| "recommended_cards": [ | |
| "Chase Ink Business Unlimited", | |
| "Chase Ink Business Preferred", | |
| "Chase Ink Business Premier", | |
| "Capital One Venture X", | |
| "Capital One Venture X Business", | |
| "Amex Business Gold", | |
| "Amex Business Platinum", | |
| "Capital One Business Spark 2X Cash", | |
| "Citi Strata", | |
| "Citi Strata Premier", | |
| "BofA Premium Rewards", | |
| "BofA Premium Rewards Elite", | |
| "Chase World of Hyatt", | |
| "Chase United Explorer", | |
| "Chase United Gateway", | |
| "Atmos Rewards Summit", | |
| "Amex Delta SkyMiles Gold", | |
| "Amex Delta SkyMiles Platinum", | |
| "Amex Delta SkyMiles Reserve", | |
| "Amex Green", | |
| "American Express Platinum", | |
| "American Express Gold" | |
| ], | |
| "total_ev_usd": 40693.0, | |
| "ev_breakdown": { | |
| "signup_bonuses_usd": 25003.0, | |
| "ongoing_rewards_usd": 20794.0, | |
| "credits_usd": 0.0, | |
| "annual_fees_usd": -5104.0, | |
| "other_usd": 0.0 | |
| }, | |
| "housing_option": null, | |
| "key_constraints_flags": [ | |
| "dual_person_household", | |
| "10_card_limit", | |
| "1200_annual_fee_budget", | |
| "3_ecosystems_max", | |
| "90_day_velocity", | |
| "manufactured_spend", | |
| "elevated_offer_probability", | |
| "travel_redemption_goals", | |
| "36_month_horizon" | |
| ], | |
| "expert_notes": "36-month dual-person strategy. Partner A (business eligible, MS access): $17,331 EV. Partner B: $23,362 EV. Combined: $40,693. Each partner opens 1 card per 90 days (12 cards each over 36mo). Partner A: Chase biz cards first, then Cap1/Amex biz, Citi/BofA personal. Partner B: Chase personal first (Hyatt/United), then airline/bank cards, then Amex personal. Cancel all cards at 1yr to avoid 2nd annual fee. Non-bonus spend on Venture X. Partner A MS of $8k/mo = $288k extra over 36mo. Keep MS <$3k/mo per issuer to avoid shutdown flags. Elevated bonus probabilities factored into Partner B's Amex Plat (104.5k expected) and Venture X (78.75k expected) EVs." | |
| } | |
| } | |