This framework covers how to screen CPG innovation pipelines from 10-15 concepts to 3-4 winners using consumer evidence. Most CPG teams enter their annual innovation cycle with more concepts than capacity, and the unspoken default is committee-led selection: brand directors advocate for their favorites, R&D pushes the technically interesting ideas, and the concepts that survive are the ones with internal champions rather than consumer pull. This framework replaces that politics with a sequenced consumer-evidence pass that anyone in the organization can defend. For the full innovation research methodology, see Product Innovation Research Template for CPG. For the complete concept testing guide, see Concept Testing for CPG.
The four stages build on each other. Stage 1 triages the entire pipeline at low cost so that no concept advances on a hunch. Stage 2 spends real evaluation budget only on the survivors. Stage 3 gives marginal concepts a second chance with targeted refinement. Stage 4 lifts out of individual-concept scoring to portfolio composition, which is where the actual development-slot decision happens. The economics matter here: running the full sequence costs less than a single traditional agency concept test, while producing evidence on 10-15 ideas instead of one. Teams that adopt this framework typically discover that two or three of their internally favorite concepts kill in Stage 1, freeing the development pipeline for ideas that internal champions had overlooked.
The Four-Stage Screening Process
Stage 1: Quick Screen (30-50 interviews per concept)
Objective: Rapid go/kill assessment on the full pipeline.
Method: AI-moderated interviews with 30-50 verified category purchasers per concept (monadic design).
Questions (compressed, 15-minute interview):
- “Tell me your initial reaction to this concept.”
- “Does this solve a real problem for you? What problem?”
- “What is the biggest concern or hesitation you have?”
- “How is this different from what is already available?”
- “If this were on the shelf, would you stop and pick it up?”
Timeline: 24 hours per batch of 3-5 concepts. Full pipeline in 1-2 weeks.
Cost: $600-$1,000 per concept. $6,000-$15,000 for a 10-15 concept pipeline.
Go/No-Go Criteria:
| Metric | Advance | Refine | Kill |
|---|---|---|---|
| Spontaneous appeal (% positive) | >60% | 40-60% | <40% |
| Problem-solution fit (% real problem) | >50% | 30-50% | <30% |
| Barrier severity (% dealbreaker) | <30% | 30-50% | >50% |
| Differentiation (% articulate difference) | >40% | 20-40% | <20% |
Expected outcome: 10-15 concepts triage to 5-7 that advance, 3-5 that go to refine, and 3-5 that are killed.
The 15-minute interview length is deliberate. At Stage 1, the team is making a binary call — does this concept deserve full evaluation, or not — and that decision rarely benefits from more interview time. Longer interviews at this stage often produce false positives: respondents talk themselves into liking concepts they would never buy because the conversation invites elaboration. The five compressed questions force the respondent to react fast, the way they would react in a shelf moment. Stage 1 is also where the AI moderator’s consistency pays the biggest dividend. Across 30-50 interviews per concept on 10-15 concepts, that is 450-750 conversations evaluated against identical probing logic — a level of consistency no human moderator panel could match.
Stage 2: Deep Evaluation (100 interviews per surviving concept)
Objective: Full concept evaluation of the 5-7 survivors, using the complete concept testing discussion guide.
Method: 100 verified category purchasers per concept, 30-minute AI-moderated interviews.
Timeline: 24 hours per concept.
Cost: $2,000 per concept. $10,000-$14,000 for 5-7 concepts.
Assessment dimensions:
- Motivation hierarchy (laddering from attribute to value)
- Price-value perception
- Competitive displacement potential
- Barrier addressability
- Repurchase likelihood indicators
Expected outcome: 5-7 concepts triage to 3-4 with strong consumer evidence for advancement.
The motivation hierarchy is where Stage 2 earns its budget. Stage 1 tells you whether a concept appeals on first reaction; Stage 2 tells you whether that appeal is rooted in something durable. The five-to-seven-level laddering — from attribute, to functional benefit, to emotional benefit, to identity, to underlying value — separates concepts that win on novelty (short-lived) from concepts that win on a value connection (durable). When 100 respondents ladder to the same underlying value across a concept, you have a defensible advancement case. When the ladders fragment across unrelated values, the concept is more polarizing than the Stage 1 scores suggested, and the team should weigh portfolio fit carefully before slot allocation.
Stage 3: Refinement Testing (50-100 interviews per refined concept)
Objective: Test modified versions of concepts that showed potential but had addressable barriers.
Method: 50-100 interviews testing the refined concept versus the original.
Timeline: 24 hours.
Cost: $1,000-$2,000 per concept.
Key question: Did the refinement address the barrier without weakening the core appeal?
Stage 3 is the most under-used part of the framework. Most teams treat the refine bucket as a junk drawer — concepts that did not quite clear the Stage 2 bar but felt too promising to kill — and then never actually re-screen them. The discipline of running the refined-versus-original side-by-side test, even at half the sample size, separates concepts where a small wording or claim adjustment unlocks the appeal from concepts where the underlying idea was the problem all along. Common refinement targets include simplifying the value proposition, shifting the occasion claim, addressing the most-cited barrier directly in the concept copy, or repositioning against a different competitive frame. Each of those changes is testable in 24 hours at $1,000-$2,000.
Stage 4: Portfolio Decision
Objective: Select the 3-4 concepts for full development investment.
Inputs: Consumer evidence from Stages 1-3, plus business feasibility data (margin, supply chain, distribution, cannibalization risk).
Stage 4 is where the team explicitly stops asking “which concept won?” and starts asking “which portfolio of 3-4 concepts wins?” Those are different questions. A concept that ranks third on absolute consumer evidence may belong in the development slate because it opens a segment the top two ignore, or because its margin profile balances a high-risk top-ranked concept. Conversely, two top-ranked concepts that target identical occasions and demographics may cannibalize each other on launch, and one should be deferred. The consumer evidence from Stages 1-3 narrows the candidate set; Stage 4 is where business judgment finishes the job.
Concept Scoring Matrix
For each concept that reaches Stage 2, score on these dimensions:
| Dimension | Weight | Score (1-5) | Weighted Score |
|---|---|---|---|
| Consumer appeal strength | 25% | ||
| Problem-solution fit | 20% | ||
| Motivation depth (value connection) | 15% | ||
| Competitive differentiation | 15% | ||
| Barrier addressability | 10% | ||
| Price-value acceptance | 10% | ||
| Repurchase indicators | 5% | ||
| Total | 100% | /5.00 |
Score interpretation:
- 4.0+: Strong advance. High confidence in consumer demand.
- 3.0-3.9: Conditional advance. Strong in some areas but has gaps to address.
- 2.0-2.9: Requires significant refinement. Re-screen after modification.
- <2.0: Kill. Consumer evidence does not support advancement.
The weighting itself is a strategic statement. A brand premiumizing its portfolio should weight competitive differentiation and price-value acceptance more heavily than a value-tier extension; a brand defending market share against a new entrant should weight problem-solution fit and consumer appeal strength. The mistake to avoid is keeping default weights across every cycle: the matrix should reflect the specific strategic question this round of innovation is meant to answer. Document the weighting rationale alongside the scores so that future cycles can compare results against intent rather than against shifting goalposts.
Portfolio-Level Prioritization
After individual scoring, assess the portfolio:
-
Coverage: Do the 3-4 winners address different consumer segments or occasions? A portfolio of concepts that all target the same segment creates cannibalization risk.
-
Risk balance: Does the portfolio include both incremental (low risk, moderate upside) and breakthrough (higher risk, high upside) concepts?
-
Cross-concept patterns: What themes emerged across concepts? If consumers consistently value a specific attribute across multiple concepts, that is a category-level insight that should inform all future innovation.
The Intelligence Hub surfaces these cross-concept patterns automatically when all screening data is stored in the same system.
Portfolio composition is also where margin and operational reality enter the decision explicitly. A concept that scored 4.2 on consumer evidence but requires a new supply chain and carries a 30% margin profile may rank below a 3.6-scoring concept with existing supply and a 55% margin. Stage 4 forces the team to make that tradeoff transparently, with consumer evidence and business feasibility on the same page rather than in separate slide decks. The output is a development slate the CFO can defend to the board and the CMO can defend to the brand teams whose concepts did not advance.
Total Pipeline Screening Cost
| Stage | Per Concept | Concepts | Total |
|---|---|---|---|
| Stage 1: Quick screen | $600-$1,000 | 10-15 | $6,000-$15,000 |
| Stage 2: Deep evaluation | $2,000 | 5-7 | $10,000-$14,000 |
| Stage 3: Refinement testing | $1,000-$2,000 | 2-3 | $2,000-$6,000 |
| Total | $18,000-$35,000 |
Compare to traditional agency screening of the same pipeline: $250,000-$750,000 over 6-12 months. The headline number understates the operational impact: the AI-moderated framework also unlocks a parallel-fielding model in which all 10-15 concepts go to panel simultaneously, so the team has a comparable view of every concept on the same day rather than carrying early concepts in memory for three months while later concepts field. That single change — parallel rather than sequential evaluation — is often more valuable than the cost savings, because it eliminates the recency bias that quietly shapes most committee decisions.
How Does This Compare to Traditional Agency Screening?
The cost gap is the headline number, but the operational differences are what change the innovation cycle. Traditional agency screening is sequenced because each concept is a discrete engagement: recruit, schedule, moderate, transcribe, analyze, report. Running 15 concepts in parallel through a single agency is logistically impossible without doubling fees, so teams batch concepts and screen 3-4 at a time over 8-12 weeks. That sequencing forces early concepts to compete against later concepts in memory rather than on evidence — and forces development decisions before the full pipeline has been evaluated. Running all 15 concepts simultaneously on User Intuition’s 4M+ panel, in 24 hours per batch, produces a side-by-side evaluation no agency can offer.
| Dimension | AI-moderated screening | Traditional agency screening |
|---|---|---|
| Cost (15-concept pipeline) | $18,000-$35,000 | $250,000-$750,000 |
| Timeline (full pipeline) | 1-2 weeks | 6-12 months |
| Parallel concepts | All 15 simultaneous | 3-4 at a time, sequenced |
| Interviews per concept (Stage 1) | 30-50 | 8-12 |
| Moderator consistency | Identical AI logic across every interview | Variable across human moderators and sessions |
| Knowledge persistence | Searchable Intelligence Hub | Static report on a shared drive |
| Iteration speed | Re-screen refined concept in 24h | Re-engage agency, 4-8 week cycle |
| Per-concept cost (Stage 1) | $600-$1,000 | $25,000-$50,000 |
For the complete concept testing guide, see the pillar reference. Related guides in this batch — concept screening before full testing, concept test sample size, AI-moderated interviews vs. focus groups for CPG — cover the screening, sizing, and methodology questions this framework assumes are already settled.
What Goes Wrong When Teams Skip Stage 1?
The single most common failure mode in CPG innovation is collapsing Stages 1-3 into a single agency engagement that evaluates 4-5 concepts in moderate depth. The economic argument seems sound — fewer engagements, less coordination — but the strategic cost is large. When a team commits to deep evaluation on five concepts pre-selected by committee, they have already made the most important decision (which five) on the weakest evidence (internal preference). The Stage 1 quick screen exists specifically to reverse that order: let consumer evidence select the five, then commit deep evaluation budget to them.
Skipping Stage 1 also masks an asymmetry that matters at the portfolio level. In any 10-15 concept pipeline, two to three concepts will produce evidence so weak that they should never have reached deep evaluation. Without a quick screen, those concepts still consume full-evaluation budget — typically 30-40% of total spend — and crowd out the marginal concepts in Stages 2 and 3 that could have benefited from refinement. Teams that report disappointing innovation hit rates often have a Stage 1 problem, not a launch problem.
A CPG innovation pipeline is a portfolio decision dressed up as a sequence of concept decisions, and the framework that wins is the one that respects that. Stage 1 exists because committee selection is faster than evidence selection, and every concept that survives committee selection without quick-screen evidence is a bet placed against the market on the basis of internal politics. Stage 2 exists because appeal alone is not durability, and the laddering depth that separates novelty wins from value wins is where pre-launch confidence is earned. Stage 3 exists because most concepts are not killed by their core idea — they are killed by a specific barrier that targeted refinement could address. Stage 4 exists because portfolio composition is the actual development decision, and ranking individual concepts on absolute appeal often produces a slate that cannibalizes itself on launch. Run the four stages in sequence and the politics fade behind the evidence.
Running the four stages on User Intuition
The framework only works if every concept in the pipeline can be fielded at once and re-fielded in days — sequence the screening and the recency bias the framework exists to defeat creeps right back in. User Intuition makes the parallel model the default: all 10-15 concepts go to panel simultaneously as separate monadic studies, each drawing verified category purchasers, with batches returning in 24 hours so the team sees every concept side-by-side on the same day rather than carrying early ideas in memory for a quarter. For product innovation screening specifically, the capability that changes the decision is iteration speed at Stage 3: a refined concept can be tested against its original within two days, so the refine bucket stops being a junk drawer and becomes a real second-chance gate. Because every screening interview is moderated by the same AI logic — identical probing across 450-750 Stage 1 conversations — the go/kill thresholds compare cleanly, and all of it persists in one hub where cross-concept patterns surface as category-level insight for the next cycle. To see how a full pipeline screen is structured before you commit a development slate, book a demo and review a worked four-stage example.
What Should the Output of Stage 4 Look Like?
The Stage 4 deliverable is not a ranked list. It is a development slate — typically 3-4 concepts — with a written rationale that ties each slot to specific consumer evidence, portfolio role, and business case. Each entry should answer four questions: what consumer problem does this concept solve, which segment is it for, how does it complement the other concepts in the slate rather than overlap them, and what is the launch risk profile (incremental, moderate, breakthrough). Teams that produce that deliverable have built a defensible plan; teams that produce a ranked list with no portfolio logic have built a list of favorites with consumer-evidence cover.
The framework also creates an audit trail. Six months after launch, when results come in, the team can look back at Stage 1 and Stage 2 evidence for each concept and ask which signals predicted launch performance and which did not. That feedback loop is how the framework improves over time — the Intelligence Hub surfaces the patterns automatically when all screening data is stored in the same system, and the team’s next pipeline is screened against the lessons of the last one. Over three or four innovation cycles, the framework starts producing not just better individual decisions but a sharper internal model of what wins in the category.
For the full CPG innovation research framework, see Product Innovation Research Template for CPG. For agency-specific discussion-guide patterns, see agency concept testing discussion guide template. To screen your innovation pipeline with verified purchasers, launch a study or book a demo.