Designing a Revenue-Aligned Product Matching System at Scale
Tech Lead, Senior Data Scientist | Fetch Rewards
Revenue-First Thinking · Diagnostic Analysis · Hybrid ML Architecture
6M+ daily active users · 11M+ receipts/day · hundreds of millions of line items
The Business Problem
With 6M+ daily active users and 11M+ receipts processed every day, Fetch's system extracted hundreds of millions of line items from receipts. Each line item needed to be matched to a known product in an internal taxonomy to power offer eligibility, brand attribution, partner billing, and rewards calculation. This matching pipeline was a core revenue driver, directly connecting user purchases to 600+ brand partners.
At the time, a significant portion of these items were not being automatically matched, requiring heavy manual review and limiting the value of receipt data at scale. Improving automated product assignment was a strategic priority.
But early discussions framed the problem as: "We need higher match accuracy."
Before building ML models, gaining clarity and understanding the key problem is important to establish the goal. I asked:
- • What does "higher accuracy" mean in terms of business outcomes?
- • Which product categories and retailers matter most?
- • What is the revenue impact of unmatched items?
- • What are the constraints of the current product catalog?
This reframed the initiative from a modeling problem to a product systems problem.
Diagnostic Deep Dive: Understanding Failure
Instead of immediately iterating on ML models, I conducted a structured failure analysis.
Sampling Approach
Stratified sample of 50,000 recent unmatched line items, balanced across merchants, frequency bands, OCR quality, and categories.
Key Questions
- • Are failures due to algorithm limitations or catalog gaps?
- • Are unmatched items concentrated in specific categories?
- • Do they represent high-revenue products?
- • What is the realistic performance ceiling?
The Pareto Insight (80/20 Rule)
The distribution followed a classic 80/20 pattern. A small percentage of SKUs drove the vast majority of revenue-relevant volume. Meanwhile, a large volume of unmatched items belonged to:
- • Produce and commodity grocery
- • Pharmacy prescriptions
- • Gas station receipts
- • Restaurant toppings and substitutions
These categories were not partner-priority segments, carried lower revenue impact, and distorted raw match rate metrics.
Among the revenue-relevant unmatched items, failures were driven by:
- • OCR errors: scanned paper receipts introduced misspellings, merged words, and missing characters, degrading text-based matching
- • Entity resolution: abbreviated product names, inconsistent brand naming, partial UPC/SKU codes, and varying unit formats across retailers made direct matching unreliable
This led to a pivotal shift: optimize for revenue-weighted match accuracy, not raw match rate.
Measuring the System Ceiling
Before committing to aggressive targets, I designed a human benchmark study. Provided unmatched items to trained annotators, restricted them to the existing internal catalog, and measured maximum achievable match rate.
This established a realistic upper bound under current catalog conditions.
Insight: performance is constrained not just by modeling, but by catalog completeness. This prevented unrealistic commitments and informed roadmap prioritization.
Strategic Reframing
The solution became a two-track strategy:
Track 1: Raise the Ceiling
- • Identify high-frequency, revenue-relevant catalog gaps
- • Partner with catalog and data integrity teams
- • Prioritize enrichment for high-impact brands
- • Map low-impact items to generic taxonomy buckets, avoiding unnecessary expansion of low-value segments
Track 2: Close the Gap to Ceiling
Redesigned the matching system as a hybrid retrieval architecture:
- • Entity normalization layer (OCR correction, brand mapping, unit standardization)
- • Multi-stage candidate generation: deterministic UPC matching, fuzzy matching, TF-IDF similarity, Sentence-BERT embeddings, approximate nearest neighbor search
- • Transformer-based reranking
- • Confidence calibration to balance precision and recall
Precision vs Coverage - A Deliberate Tradeoff
The system needed to maximize coverage while maintaining strict precision. False positives were not just an accuracy issue, they directly caused incorrect brand attribution and billing errors, creating revenue risk for both Fetch and its partners. I implemented:
- • Recall@10 optimization at retrieval stage
- • Calibrated confidence thresholds
- • Precision-coverage tradeoff curves
- • Shadow deployment before automation
This ensured safe incremental rollout.
From Problem Reframing to Technical Execution
The analysis above redefined the problem from "improve match accuracy" to "maximize revenue-weighted coverage under strict precision constraints." This reframing shaped every downstream technical decision: the retrieval architecture, the ranking strategy, the confidence calibration, and the evaluation framework.
The technical implementation involved a hybrid retrieval and ranking system combining deterministic matching, fuzzy search, semantic embeddings (Sentence-BERT, FAISS), transformer-based reranking, and calibrated confidence thresholds. For the full ML architecture and evaluation details, see the Technical Deep-Dive.
Results & Impact
- Coverage: 45% improvement in automated product assignment
- Attribution: significantly improved partner attribution reliability
- Operations: ~30% reduction in manual review workload
- Scale: scalable architecture supporting millions of line items from 11M+ daily receipts, covering over $400M GMV, rewarding $500K+ every day