Case Study Analysis: City-Level Precision for Global AI Optimization — From Hypothesis to Measured ROI

Short summary: A global advertiser moved from country-level AI experimentation to city-level targeting and validation. The result: a measurable uplift in incremental conversions, clearer ROI attribution, and a rethought experimentation design that validated — and then iteratively improved — the AI optimization theory. This case study walks through background, challenge, approach, implementation, results, lessons, and step-by-step application guidance.

1. Background and context

Company: Global consumer subscription brand (finance/streaming hybrid). Scope: 45 markets across EMEA, APAC, and Americas. Core marketing channels: paid search, programmatic display, and platform-owned discovery feeds. Data foundation: server-side event tracking, CRM revenue matching, and a daily ingestion into a single customer data warehouse. Prior experimentation: platform A/B testing at the country level, model-based budget allocation from an ML vendor. Objective: increase incremental subscriptions while preserving CAC targets and expanding profitable reach.

Key assumptions the team started with:

    Country-level granularity is sufficient for the AI optimization engine. Lift measured by platform-reported conversions approximates true incremental revenue. Model recommendations are transportable across cities inside a country.

2. The challenge faced

The vendor's AI optimization engine suggested reallocation of media spend across country buckets based on predicted ROI. Early rollouts showed uneven results: some countries improved CPA, others degraded. The https://faii.ai/content-action-engine/ team suspected that aggregating to country-level masked important local variation. Two practical problems emerged:

Attribution confusion: platform conversions rose in some cases but CRM-attributed revenue did not match, suggesting non-incremental conversion or poor dedupe across channels. Optimization theory risk: the AI's objective assumed homogeneous response inside countries, which led to over-optimization on a few high-volume cities while starving lower-volume but high-ROI micro-markets.

Two hypotheses were formed:

    Hypothesis A: City-level heterogeneity (behavior, supply, pricing, competition) significantly affects marginal conversion rates; country-level optimization misallocates budget. Hypothesis B: Platform-reported uplift is inflated by attribution leakage; proper incremental measurement via holdouts will show different ROI.

3. Approach taken

Frameworks used:

    ROI framework: incremental revenue per dollar spent (net incremental revenue / media spend) and payback period on CAC changes. Attribution models: parallel use of multi-touch attribution for funnel insights and an experimental incremental approach — randomized geo holdouts at the city level plus synthetic control modeling for markets where randomization was impractical. Validation protocol: "test, validate, generalize" — run localized experiments, validate with external outcome data (CRM revenue and retention), then retrain allocation policies incorporating city-level priors.

Advanced techniques selected:

    Hierarchical Bayesian uplift modeling to share statistical strength across cities while allowing city-specific effects. Uplift (heterogeneous treatment effect) models to predict the incremental impact of impressions/ads per city cohort. Synthetic Control for cities where pure randomization could not be implemented because of operational constraints.

4. Implementation process

Step-by-step execution:

Data partitioning and tagging: every impression and click was tagged with city-level geo + experiment cell. Server-side deduplication to match CRM subscriptions back to city-level exposures. Baseline model: trained a hierarchical Bayesian model on prior 180-day data to estimate city-level baseline conversion rate, seasonality, and price sensitivity. This established priors rather than cold starts. Experiment design: 120 cities chosen across regions (mix of large metros, secondary cities, and smaller towns). In each country, cities were randomly assigned to treatment or control bands for a 6-week test. Treatment = AI-optimized reallocation (higher bid, different creatives, or increased frequency); Control = hold historical budget allocation. Instrumenting incremental measurement: used a blended attribution of direct randomized uplift (city holdouts) plus synthetic control for non-randomized regions. KPI: incremental paid subscriptions attributable to treatment during the test window + 30-day retention revenue. Parallel analytic checks: cross-checked with platform reported conversions, raw CRM matches, and last-click dedupe. Built diagnostic dashboards to surface discrepancies immediately. Model retraining: uplift model updated weekly with fresh experiment data; hierarchical priors shrunk city estimates toward regional means when sample size small. Policy rollout: once uplift per city normalized (posterior probability of positive uplift > 90% and ROI > target), policy moved from test to production and allocation engine given updated bids and budgets at city granularity.

Practical implementation notes:

    Server-side tagging eliminated client-side duplicate conversions and ad-blocker noise — reduced false positives by ~12% (diagnostic check). To avoid cannibalization across channels, test cells controlled budgets across paid search and display simultaneously in each city. For low-sample cities, the Bayesian hierarchy prevented overfitting and extreme bid swings.

5. Results and metrics

High-level outcomes after 12 weeks:

    Incremental subscriptions attributable to the AI city-level policy: +17.6% vs country-level baseline allocation. Average CPA (platform-reported) dropped 9% in treated cities, but CRM-validated CPA (incremental) was actually 5% lower — indicating some platform-reported lifts were non-incremental. Net incremental revenue per dollar: $2.20 incremental revenue per $1 spent in treated cities vs $1.55 per $1 in country-optimized controls. Top-line revenue increase (quarter-on-quarter) attributable to the program: +4.9% across all markets where city-level policy was rolled out.

Example city-level results (selected cities):

City Incremental Subs (%) CPA Change (CRM-validated) Revenue per $1 London +12.4% -7% $2.05 Lisbon +28.3% -18% $3.10 São Paulo +6.8% -1% $1.45 Jakarta +33.9% -22% $3.90

Attribution model comparison (summary):

image

Model Incremental Subs Bias Risk Platform last-click +26% (overest) High (double-counting, cross-device) Multi-touch heuristic +18% Moderate Randomized geo holdout + CRM match +17.6% Low (experimental) Synthetic control for constrained markets +16–19% Moderate-low (model dependency)

Key observation: city-level uplift estimates were consistent across experimental and synthetic-control methods within confidence intervals, whereas platform-reported figures diverged frequently.

image

6. Lessons learned

Lesson 1 — Global coverage needs city-level precision.

    Analogy: Country-level targeting is like watering a garden with a sprinkler for the whole yard. You’ll overwater the lawn and underwater the flower beds. City-level targeting is turning the hose to each bed as needed. Data shows heterogeneous elasticities: some cities had 2–3x the revenue return for the same marginal spend compared to other cities in the same country.

Lesson 2 — Validate AI optimization theory with randomized experiments.

    AI models generate counter-intuitive allocations. Treat their output as hypotheses, not final truth. Randomized geo holdouts are the simplest clean test for incremental effect; where randomization isn’t possible, use synthetic control with rigorous pre-treatment fit checks.

Lesson 3 — Use hierarchical and uplift models to balance learning speed and stability.

    Hierarchical Bayesian models prevented noisy city-level swings while capturing local idiosyncrasies. Uplift models provided directional granularity for creative and bid changes, not just budget allocation.

Lesson 4 — Attribution: measure incrementality, don't trust platform-reported conversion counts alone.

    Platform metrics are useful for operational debugging but are biased for ROI decisions. Combine experimental measurement with multi-touch attribution and MMM for channel-level reconciliation.

Lesson 5 — Operational readiness matters: tagging, server-side dedupe, and experiment mapping are prerequisites.

7. How to apply these lessons

Practical checklist and execution blueprint for marketing teams ready to move from country to city-level AI experimentation:

Data readiness
    Ensure server-side event tracking and CRM join keys exist at the user or household level. Implement consistent city-level geo enrichment (lat/lon cluster > city polygon mapping).
Experiment design
    Randomize at the city level where possible. Use blocks per country to control balance (metro vs non-metro). Set test windows long enough to capture funnel lag (6–8 weeks + 30-day revenue lookback).
Modeling approach
    Start with hierarchical Bayesian models to estimate priors for cities with low sample size. Train uplift models to estimate incremental impact and segment responses by user cohorts and time of week.
Attribution & ROI calculation
    Primary KPI = incremental revenue per dollar (experimental uplift). Secondary KPIs = CRM-validated CPA, retention-adjusted LTV uplift. Use a hybrid attribution: experimental measurement as the gold standard; use multi-touch and MMM for operational allocation and channel strategy.
Operationalizing the policy
    Translate uplift predictions into bid multipliers and creative rotations at the city level. Cap adjustments to prevent budget shock (e.g., +/- 25% per week ramp). Automate safe-rollouts: move cities to production when posterior uplift >90% and ROI threshold met.
Monitoring and guardrails
    Dashboard daily: city-level spend, incremental subs, CRM-validated CPA, and bid change history. Include anomaly detection for abrupt shifts. Re-run experiments seasonally or after major product changes; product-market fit at the city level can drift quickly (competitor promos, local events).

Concrete example (practical knobs):

    If uplift model predicts +30% incremental subs in City X with predicted ROI $3 per $1, set bid multiplier to +20% initially, cap spend increase at +15% per week, and schedule a mid-week creative swap to test message variants. If city has low sample size but high predicted uplift, shrink the multiplier toward the regional mean by 40% (hierarchical shrinkage) until posterior tightens.

Attribution decision tree (quick guide)

    Can you randomize at the city level? Yes → Randomized geo holdout + CRM match is your ground truth. No → Can you find a comparable unaffected city cluster? Yes → Synthetic control with pre-treatment fit checks. No → Use multi-touch plus MMM and treat decisions as higher-risk; move conservatively and validate fast when possible.

Closing — a balanced metaphor

Think of optimization like navigating a global archipelago in fog. Country-level maps are nautical charts; useful but coarse. City-level precision is the sonar that reveals reefs, currents, and trade winds under each island. AI can be the autopilot, but you need sonar (experiments and incremental measurement) to validate the autopilot isn’t steering you into shallow water.

Final pragmatic note: the biggest leverage wasn't the algorithm itself but the experimental discipline around it — city-level randomness, hierarchical modeling, and CRM-grounded incrementality. If you want to scale AI-driven allocation globally, build the city-level instrumentation first, then let the models exploit the signal.

image

For a one-page checklist you can operationalize tomorrow:

    Implement server-side tagging + city polygons. Design randomized city holdouts in each country for at least 6 weeks. Train hierarchical uplift models and use shrinkage for low-sample cities. Validate with CRM-match incrementality and reconcile with platform metrics. Rollout with staged caps and monitoring dashboards.

Want the sample SQL for city-level aggregation, or the Bayesian model priors we used for shrinkage? Tell me which one and I’ll share the code snippets and model templates next.