← Reference Deep-Dives Reference Deep-Dive · 11 min read

How to Design an AI Interview Discussion Guide

By Kevin, Founder & CEO

The discussion guide is the single highest-leverage input in an AI interview study. A good guide unlocks 5-7 levels of probing depth. A bad guide constrains the AI to survey-level responses regardless of the platform’s capability. The difference between an excellent guide and a mediocre one is rarely the topics themselves — it is whether the structure invites depth or forecloses on it.

On modern AI-moderated interview platforms like User Intuition, the moderator has the capacity to ladder, probe, and follow unexpected threads at a depth that exceeds most human moderators. But that capacity only activates if the guide gives it room. This reference covers the design principles, structural framework, and templates for discussion guides that get the most out of AI-moderated interviews, drawing on patterns we have seen across thousands of studies.

Core Design Principles


1. Start Behavioral, Not Attitudinal

Wrong: “How do you feel about our onboarding process?” Right: “Walk me through your first week using the product. What happened?”

Behavioral questions ground the conversation in specific experiences. The AI moderator can then probe the emotional and attitudinal layers — but it needs a concrete foundation to ladder from. Attitudinal openings invite generic responses (“it was fine,” “pretty good”) that the AI then has to break open with secondary probes. Behavioral openings deliver concrete material in the first answer, which the AI uses immediately for laddering depth.

The behavioral framing also has a second-order effect: it changes the participant’s mental mode. Asked an attitudinal question, participants slip into evaluation mode and produce summary judgments. Asked a behavioral question, they shift into recall mode and reconstruct events — and recall mode is where the data actually lives.

2. Limit Topics to Enable Depth

A 30-minute AI interview supports 3-5 topics with genuine laddering depth. Each topic needs 5-10 minutes for the AI to reach levels 5-7. Attempting to cover 10 topics produces 3-minute exchanges that never get past level 2 — which is the depth a well-designed survey can reach in 5 minutes for a fraction of the cost.

The math is simple. A high-fidelity probe sequence is: opening question (level 1), clarifying probe (level 2), specific-instance probe (level 3), motivational probe (level 4), counterfactual probe (level 5), reframe probe (level 6), implication probe (level 7). Each step takes roughly a minute of dialogue. Seven levels in seven minutes. A topic that gets less than five minutes of conversational space cannot reach the levels where insight differentiates from observation.

3. Design Opening Questions as Launching Pads

The AI will generate its own follow-up probes. Your opening questions should create conditions for exploration:

  • Open-ended — no yes/no answers possible
  • Specific — grounded in a moment, decision, or experience
  • Non-leading — genuinely curious, not confirmatory
  • Cognitively single — one idea per question, not three stacked together
  • Time-anchored — when possible, “last week” or “the most recent time” rather than “in general”

The best opening questions feel almost too simple to a researcher trained on dense survey instruments. “Walk me through what happened” is a complete and excellent opening for most topics. The AI handles the structure that more elaborate openings try to bake in.

4. Include Permission-Giving Language

Participants give deeper, more honest responses when they feel safe. Include prompts like:

  • “There are no wrong answers — I’m genuinely interested in your experience”
  • “Feel free to share anything, including criticism”
  • “If something didn’t work, that’s the most useful thing you can tell me”

This language is particularly important in studies where social-desirability bias is a known risk — pricing perception, churn diagnosis, competitive evaluation. Participants in these studies often soften their actual views to be polite. Explicit permission to be critical, repeated at the start of relevant topics, measurably increases the volume of disconfirming evidence the study surfaces. For more on how moderator framing shapes response quality, see our moderator bias guide.

How Does an AI-Moderated Guide Differ From a Human-Moderated Guide?


The two guides serve different operational realities. A human moderator can improvise, read body language in real time, and shift the entire guide on the fly based on what they hear in the first five minutes. An AI moderator is precise, consistent, and tireless — but it works from the framing you give it. The guide is the primary lever you have to direct AI behavior across thousands of interviews.

The table below summarizes the practical differences. It is the comparison element of this guide.

Design DimensionHuman-Moderated GuideAI-Moderated Guide
Topic count5-8 typical, moderator adjusts live3-5 hard ceiling for depth
Opening questionBrief; moderator clarifies on the flyPrecise; the AI follows the framing you give
Probe scriptingOptional; moderator improvisesProhibited; AI handles probes dynamically
Must-hit questions5-10 typical2-3 maximum
Permission languageConveyed through toneMust be written into the guide explicitly
Hypothetical framingSometimes useful; moderator can redirectAvoid; produces speculative data
Conditional branches”If X, ask Y” commonUnnecessary; AI branches on its own
Cross-interview consistencyVariable across moderatorsIdentical across interviews
Length45-60 min standard25-35 min ideal
Cost driverModerator timeSample size

The biggest design shift teams make when moving from human to AI moderation is removing the elaborate conditional-branch scripting that compensated for moderator inconsistency. The AI doesn’t need a flowchart of “if the participant mentions X, ask Y.” It probes the right thread without prompting, and over-scripting actively prevents it from doing so. This is also why voice, video, and chat modalities all work from the same guide structure — the modality affects participant experience, but the guide architecture is shared.

Template: Churn Discussion Guide


Study objective: Understand the decision architecture behind customer churn — not just that they left, but the sequence of events, emotions, and alternatives that drove the decision.

Opening framing: “I’m interested in your experience and the decisions you made. There are no wrong answers, and the most useful thing you can tell me is what actually happened — including the parts that didn’t go well.”

Topic 1: The Decision Journey (10 min)

  • “Walk me through the journey from when you first started considering leaving to when you made the decision.”

Topic 2: Attempted Resolution (8 min)

  • “Before you decided to leave, what did you try to make it work? What happened with those attempts?”

Topic 3: The Alternative (7 min)

  • “What were you hoping would be different about the alternative you chose?”

Topic 4: Retrospective (5 min)

  • “Looking back, was there a point where you felt the relationship was still salvageable?”

The structural logic: a churn guide should open with the timeline of the decision rather than asking the participant to evaluate the product directly. Direct product evaluation triggers defensiveness (“you’re asking me to justify leaving”) and surfaces post-hoc rationalization rather than the actual decision sequence. Timeline openings produce narrative responses that the AI can probe for the inflection points where the decision actually moved.

Template: Win-Loss Discussion Guide


Topic 1: Trigger and Context (10 min)

  • “Take me back to the beginning — what triggered the search for a solution?”

Topic 2: Evaluation Process (8 min)

  • “Walk me through how you narrowed from many options to your final two or three.”

Topic 3: Decision Factors (7 min)

  • “What was the single most important factor in your final decision, and why did it matter so much?”

Topic 4: Retrospective (5 min)

  • “Looking back, how does the reality compare to what you expected?”

The structural logic: win-loss guides need explicit questions about the alternatives evaluated and the criteria used. These don’t emerge naturally unless prompted, because participants reconstruct the decision around their final choice rather than around the option space they were actually considering. Without a direct prompt for the evaluation process, the data skews toward justification of the winner rather than understanding of the deliberation.

Template: Concept Testing Discussion Guide


Topic 1: Immediate Reaction (8 min)

  • “When you first saw this, what was your gut reaction — before you started thinking analytically?”

Topic 2: Relevance and Fit (8 min)

  • “Who in your organization would this be most useful for, and why them specifically?”

Topic 3: Barriers and Concerns (8 min)

  • “What would need to be true about your situation for you to actually purchase this?”

Topic 4: Competitive Frame (6 min)

  • “What does this remind you of from past experience — positively or negatively?”

The structural logic: concept testing requires showing or describing the concept before any evaluation. Then separate believability probes (does this seem real and credible?) from desirability probes (would this be valuable if it were real?). The two predict different things and respond to different interventions. A concept can be desirable but not believable, or believable but not desirable, and the implications for product or message development are completely different.

What Are the Most Common Mistakes in AI Discussion Guide Design?


Overscripting probes. Don’t write “If participant mentions X, ask Y.” The AI handles this dynamically and often pursues better threads than a pre-scripted guide anticipates. Pre-scripted probes also create rigidity — the AI feels obligated to ask the scripted question even when the participant has already addressed it implicitly, which makes the interview feel transactional rather than exploratory.

Too many must-hit questions. Mandatory questions reduce the AI’s ability to follow unexpected insights. Limit must-hits to 2-3 per study. Each additional must-hit consumes roughly two minutes of interview time that could otherwise have been spent on emergent threads.

Hypothetical framing. “Would you consider…” produces hypothetical answers. “When was the last time you…” produces experiential data. Hypothetical framing is a habit researchers absorb from survey design, where it works because surveys aren’t trying to surface depth. In qualitative interviews it short-circuits the entire mechanism.

Stacked questions. “Tell me about your onboarding experience, especially the first week, and what you wish had been different.” Three questions in one. The participant picks one to answer and the others get lost. Each question in the guide should contain exactly one ask.

Leading framing. “Most customers find our pricing fair — what’s your experience?” The first clause anchors the response. Even participants who would have said otherwise will soften their answer. Leading questions narrow the response space and prevent the AI from surfacing perspectives that contradict team assumptions.

Vague topic boundaries. Topics need clear scope so the AI knows when to transition. “Talk to me about your overall experience” is not a topic — it is the entire interview. Specific, scoped topics (“the onboarding process,” “the moment you decided to renew”) give the AI clean transition points.

How Does User Intuition Handle Discussion Guide Execution?


User Intuition’s AI moderator uses the discussion guide as a launching structure rather than a fixed script. It follows the respondent’s thread when they surface something significant, probes for specificity when answers are vague, and returns to uncovered guide topics as the conversation warrants. The output is a structured transcript with thematic tags mapped to the guide’s topic areas, which makes cross-interview synthesis faster because findings are already organized by research objective rather than requiring manual coding from unstructured transcripts.

The discussion guide is the contract between the researcher and the AI moderator. Everything that happens in the interview flows downstream from how the guide is written. A guide that gives the AI room to probe will produce levels 5-7 of conversational depth across hundreds of interviews simultaneously, in any of 50+ languages, with the same calibration in every session. A guide that overscripts will produce shallow data at scale — which is worse than shallow data at small scale, because the shallow data is now backed by enough sample size to feel authoritative. Discussion guide design is the place where qualitative research craft most directly meets AI capability. Get the guide right and the platform does its best work. Get it wrong and the platform amplifies the wrong instructions consistently across every interview. The investment in guide design pays back across the entire study and across every future study that uses the same structural template.

For complete methodology behind AI interview depth, see the pillar guide and laddering methodology deep-dive. For how AI interviews compare with surveys at the methodological level, see our AI interview vs. survey guide. For sample size planning based on the guide structure, see our qualitative research sample size guide.

Iterating the Guide After Pilot Interviews


A discussion guide is not a static document. The strongest research practice we have observed is treating the guide as a hypothesis about how to surface insight and then revising it after a pilot wave of 5-10 interviews. The pilot review should answer three questions: Did the AI consistently reach levels 5-7 on each topic? Did any topic over-run or under-run its time budget? Did emergent threads surface that should become explicit topics in the next wave?

Pilot revisions typically fall into a handful of patterns. Topics that consistently under-run get either compressed into fewer slots or expanded with a deeper opening question. Topics that over-run usually have an opening question that’s too broad — narrowing the framing or splitting the topic into two cleaner sub-topics tightens the dialogue. Threads that emerged across multiple pilot interviews and weren’t covered by the original guide become candidate topics for the main wave, often replacing a topic that’s producing thin data. This iterative approach is one of the operational advantages AI moderation makes practical — pilot waves cost $100-$200 rather than $7,500-$15,000, so iterating is genuinely affordable.

The pilot review is also when the must-hit list gets finalized. After watching the AI handle ten pilot interviews, it becomes clear which specific questions need to be guaranteed coverage and which questions the AI surfaces naturally without being prompted. Most must-hits that look essential at the planning stage turn out to be redundant in practice. Cutting the must-hit list from 5-6 down to 2-3 after the pilot is the most common single revision.

Guide Structure for Cross-Segment Studies


When a study spans multiple segments — enterprise vs. SMB, US vs. UK, current customer vs. churned — the guide should be identical across segments, with one exception: the opening framing can be lightly adjusted for context (e.g., “as someone who recently switched away from our product” for churned-customer interviews). Beyond that, the topics, questions, time allocations, and probe scaffolding should be identical.

The reason is comparability. Cross-segment analysis depends on the segments having been asked the same questions in the same order with the same framing. If the enterprise guide has six topics and the SMB guide has four, cross-segment patterns become unreadable — the differences in the findings could be real or could be artifacts of the different guides. Identical guides plus segment-defined recruitment is the cleanest path to comparability.

The one operational decision worth making explicitly is whether segments share a single study or run as parallel studies. Sharing one study means the AI’s calibration is identical and the synthesis runs natively. Parallel studies allow segment-specific opening framing but require post-hoc reconciliation across studies. For most cross-segment work, a single study with identical guide and segment-tagged recruitment is the right default.

Note from the User Intuition Team

Your research informs million-dollar decisions — we built User Intuition so you never have to choose between rigor and affordability. We price at $20/interview not because the research is worth less, but because we want to enable you to run studies continuously, not once a year. Ongoing research compounds into a competitive moat that episodic studies can never build.

Don't take our word for it — see an actual study output before you spend a dollar. No other platform in this industry lets you evaluate the work before you buy it. Already convinced? Sign up and try today with 3 free interviews.

Frequently Asked Questions

AI discussion guides require more precise opening questions because the AI cannot ask spontaneous clarifying questions the way a skilled human moderator might. The opening question needs to surface enough behavioral specificity on its own to give the AI a clear thread to follow. Guides should limit topics to 3-5 maximum - more than that spreads the conversation too thin for the AI to probe deeply on any single area - and each topic should open with a behavioral prompt (what happened) rather than an attitudinal one (how do you feel about).
The most common mistake is designing for breadth rather than depth - writing 10-15 questions covering every aspect of the customer experience rather than 3-5 topic anchors with room for the AI to follow the respondent's thread. A second common mistake is writing leading questions that suggest the desired answer, which narrows the response space and prevents the AI from surfacing perspectives that contradict the team's assumptions. A third mistake is using attitudinal openings ('how satisfied are you with X?') that invite ratings rather than behavioral openings that invite stories.
Churn guides should open with the timeline of the customer's decision to leave rather than asking them to evaluate the product directly, which reduces defensiveness and surfaces the actual sequence of events. Win-loss guides need explicit questions about alternatives evaluated and evaluation criteria, which don't emerge naturally unless specifically prompted. Concept testing guides require showing or describing the concept before any evaluation questions, then separating believability probes from desirability probes because they predict different things and respond to different interventions.
User Intuition's AI moderator uses the discussion guide as a launching structure rather than a fixed script - it follows the respondent's thread when they surface something significant, probes for specificity when answers are vague, and returns to uncovered guide topics as the conversation warrants. Output is a structured transcript with thematic tags mapped to the guide's topic areas, which makes cross-interview synthesis faster because findings are already organized by research objective rather than requiring manual coding from unstructured transcripts.
Get Started

Put This Research Into Action

Run your first 3 AI-moderated customer interviews free — no credit card, no sales call.

Self-serve

3 interviews free. No credit card required.

See it First

Explore a real study output — no sales call needed.

You only pay for quality interviews.

Every interview is automatically scored against your brief. Misses aren't charged.

No contract · No retainers · Results in 72 hours