Scale-Aware Recommendation Engine (Walmart)

PySparkGoogle Cloud PlatformTensorFlowBERTSQLRanking AlgorithmsMLOps

Business Impact

+10% CTR, +25% Recall@K

Scale

Billions of transactions analyzed

Scale-Aware Recommendation Engine (Walmart)

The Challenge

Walmart needed to optimize cross-selling across a massive catalog with billions of historical transactions. Legacy systems were rule-based, unscalable, and couldn't capture semantic product relationships.

The Architecture

Designed a two-tower recommendation architecture using BERT embeddings for product and user representations. Data Lake (BigQuery) → Feature Engineering (PySpark on Dataproc) → BERT Embedding Layer → Ranking Algorithm (XGBoost + Business Rules) → Serving Infrastructure (Vertex AI Endpoints). Implemented offline batch processing for candidate generation and real-time scoring for personalized ranking.

System Architecture Diagram

graph LR
    A[Data Lake<br/>BigQuery] --> B[Feature Engineering<br/>PySpark/Dataproc]
    B --> C[BERT Embedding<br/>Layer]
    C --> D[Ranking Algorithm<br/>XGBoost + Rules]
    D --> E[Serving Infrastructure<br/>Vertex AI Endpoints]
    E --> F[Walmart.com<br/>Personalization]

    G[A/B Testing<br/>Framework] -.->|Metrics| E
    H[Retraining<br/>Pipeline] -.->|Daily| C

    style A fill:#0066ff,stroke:#0052cc,stroke-width:2px,color:#fff
    style B fill:#4C9AFF,stroke:#0066ff,stroke-width:2px,color:#fff
    style C fill:#0066ff,stroke:#0052cc,stroke-width:2px,color:#fff
    style D fill:#4C9AFF,stroke:#0066ff,stroke-width:2px,color:#fff
    style E fill:#0066ff,stroke:#0052cc,stroke-width:2px,color:#fff
    style F fill:#4C9AFF,stroke:#0066ff,stroke-width:2px,color:#fff
    style G fill:#666,stroke:#444,stroke-width:1px,color:#fff
    style H fill:#666,stroke:#444,stroke-width:1px,color:#fff

The Impact

Achieved 10% CTR lift and 25% improvement in Recall@K metrics. System processes billions of product interactions daily, serving personalized recommendations to millions of customers. Deployed on GCP with automated retraining pipelines and A/B testing framework.

Interested in similar solutions?

Let's discuss how we can build scalable ML systems for your business challenges.

View More Case Studies