The promise of creative testing in catalog ads on Meta is attractive. Companies like Marpipe, Socioh, and Hunch frequently promote their ability to perform in-depth creative testing, luring marketers with promises of detailed insights and multivariate optimization. Unfortunately, much of what’s being marketed doesn’t quite match reality.
Myth: You can easily test multiple creatives for a single product simultaneously in catalog ads
Reality: Meta doesn’t offer creative-level metrics for catalog ads. You can’t directly compare multiple creative variations of the same product within one feed simultaneously. Attempts to differentiate variations—such as using different overlays or discounts on product variants—run into practical limitations, including the inability to use distinct UTM parameters or gather granular data.
Myth: Sequential testing offers reliable data on creative performance
Reality: Sequential testing (e.g., showing a % discount creative one week, then switching to a $ discount the next week) is vulnerable to external factors and algorithmic variance. Changes in market conditions, competitor activity, or even minor algorithmic fluctuations can dramatically influence results, muddying any insights you might collect.


The age old marketing question: to highlight a discount is it better to show percent off or dollars off?
Myth: Multiple feeds solve the problem by enabling clean A/B testing
Reality: While you can indeed use multiple catalog feeds to run A/B tests, this introduces complexity and algorithmic variance issues. Even seemingly straightforward A/A tests can yield dramatically different results in the short term due to Meta’s algorithmic influence. Achieving statistical significance and reliable insights can require a boat load of spend and extended testing periods, exponentially increasing your time and financial investment.
Our Recommended Approach: Standalone Creative Testing
Instead of relying on flawed or overly complex catalog testing approaches, a simpler, more effective strategy is to test creatives separately from your catalog feed. By designing your creatives using a template intended for your catalog but exporting them as individual images for standalone testing in Waterbucket, you gain precise control and clear metrics. This allows you to:
Clearly attribute performance data directly to each creative.
Avoid algorithmic complexity and external factors more effectively.
Quickly identify winning creatives based on actionable metrics.
Once you’ve determined the highest-performing creative through standalone testing, applying that creative confidently into your catalog feed becomes straightforward and data-driven.

Why Do Misleading Claims Persist?
Perhaps the reason these testing claims are everywhere is because enhanced creative catalog ads genuinely perform exceptionally well. When marketers see a 10% lift in performance regular enhanced creative and often a 40% boost using BNPL ads from Waterbucket compared to standard catalog ads, it’s easy to assume this effectiveness is directly related to testing. In reality, it’s the enhanced creative itself—optimized and impactful—that’s driving these impressive results, not necessarily the overly complicated or misleading testing methodologies.