Designing a Revenue-Aligned Product Matching System at Scale

Tech Lead, Senior Data Scientist | Fetch Rewards

Revenue-First Thinking · Diagnostic Analysis · Hybrid ML Architecture

6M+ daily active users · 11M+ receipts/day · hundreds of millions of line items

The Business Problem

With 6M+ daily active users and 11M+ receipts processed every day, Fetch's system extracted hundreds of millions of line items from receipts. Each line item needed to be matched to a known product in an internal taxonomy to power offer eligibility, brand attribution, partner billing, and rewards calculation. This matching pipeline was a core revenue driver, directly connecting user purchases to 600+ brand partners.

At the time, a significant portion of these items were not being automatically matched, requiring heavy manual review and limiting the value of receipt data at scale. Improving automated product assignment was a strategic priority.

But early discussions framed the problem as: "We need higher match accuracy."

Before building ML models, gaining clarity and understanding the key problem is important to establish the goal. I asked:

• What does "higher accuracy" mean in terms of business outcomes?
• Which product categories and retailers matter most?
• What is the revenue impact of unmatched items?
• What are the constraints of the current product catalog?

This reframed the initiative from a modeling problem to a product systems problem.

Diagnostic Deep Dive: Understanding Failure

Instead of immediately iterating on ML models, I conducted a structured failure analysis.

Sampling Approach

Stratified sample of 50,000 recent unmatched line items, balanced across merchants, frequency bands, OCR quality, and categories.

Key Questions

• Are failures due to algorithm limitations or catalog gaps?
• Are unmatched items concentrated in specific categories?
• Do they represent high-revenue products?
• What is the realistic performance ceiling?

The Pareto Insight (80/20 Rule)

The distribution followed a classic 80/20 pattern. A small percentage of SKUs drove the vast majority of revenue-relevant volume. Meanwhile, a large volume of unmatched items belonged to:

• Produce and commodity grocery
• Pharmacy prescriptions
• Gas station receipts
• Restaurant toppings and substitutions

These categories were not partner-priority segments, carried lower revenue impact, and distorted raw match rate metrics.

Among the revenue-relevant unmatched items, failures were driven by:

• OCR errors: scanned paper receipts introduced misspellings, merged words, and missing characters, degrading text-based matching
• Entity resolution: abbreviated product names, inconsistent brand naming, partial UPC/SKU codes, and varying unit formats across retailers made direct matching unreliable

This led to a pivotal shift: optimize for revenue-weighted match accuracy, not raw match rate.

Measuring the System Ceiling

Before committing to aggressive targets, I designed a human benchmark study. Provided unmatched items to trained annotators, restricted them to the existing internal catalog, and measured maximum achievable match rate.

This established a realistic upper bound under current catalog conditions.

Insight: performance is constrained not just by modeling, but by catalog completeness. This prevented unrealistic commitments and informed roadmap prioritization.

Strategic Reframing

The solution became a two-track strategy:

Track 1: Raise the Ceiling

• Identify high-frequency, revenue-relevant catalog gaps
• Partner with catalog and data integrity teams
• Prioritize enrichment for high-impact brands
• Map low-impact items to generic taxonomy buckets, avoiding unnecessary expansion of low-value segments

Track 2: Close the Gap to Ceiling

Redesigned the matching system as a hybrid retrieval architecture:

• Entity normalization layer (OCR correction, brand mapping, unit standardization)
• Multi-stage candidate generation: deterministic UPC matching, fuzzy matching, TF-IDF similarity, Sentence-BERT embeddings, approximate nearest neighbor search
• Transformer-based reranking
• Confidence calibration to balance precision and recall

Precision vs Coverage - A Deliberate Tradeoff

The system needed to maximize coverage while maintaining strict precision. False positives were not just an accuracy issue, they directly caused incorrect brand attribution and billing errors, creating revenue risk for both Fetch and its partners. I implemented:

• Recall@10 optimization at retrieval stage
• Calibrated confidence thresholds
• Precision-coverage tradeoff curves
• Shadow deployment before automation

This ensured safe incremental rollout.

From Problem Reframing to Technical Execution

The analysis above redefined the problem from "improve match accuracy" to "maximize revenue-weighted coverage under strict precision constraints." This reframing shaped every downstream technical decision: the retrieval architecture, the ranking strategy, the confidence calibration, and the evaluation framework.

The technical implementation involved a hybrid retrieval and ranking system combining deterministic matching, fuzzy search, semantic embeddings (Sentence-BERT, FAISS), transformer-based reranking, and calibrated confidence thresholds. For the full ML architecture and evaluation details, see the Technical Deep-Dive.

Results & Impact

Coverage: 45% improvement in automated product assignment
Attribution: significantly improved partner attribution reliability
Operations: ~30% reduction in manual review workload
Scale: scalable architecture supporting millions of line items from 11M+ daily receipts, covering over $400M GMV, rewarding $500K+ every day

Read the ML Technical Deep-Dive →

← Back to Projects