Reference calls are the most widely accepted form of customer evidence in private equity due diligence. They are also, by any rigorous standard, indefensible.
The mechanics are familiar to every deal team. The target company provides a list of 3-5 customers. An associate or operating partner schedules 15-20 minute calls. The customers — selected by the very management team seeking a premium valuation — deliver polished, enthusiastic endorsements. The deal team checks the box. Customer diligence: complete.
This is not diligence. It is theater with a script written by the seller.
The reference call model persists not because it produces reliable evidence, but because it is easy, fast, cheap, and familiar. It fits neatly into existing workflows. It generates quotable soundbites for investment committee memos. And it almost never surfaces the kind of disconfirming evidence that might complicate a thesis or slow a process.
This post dissects why reference calls fail as diligence instruments, quantifies the cost of that failure, and lays out the alternative — one that delivers genuinely independent customer evidence without adding weeks to your timeline.
For the full framework on customer research across the PE deal lifecycle, see the complete guide to customer research for private equity.
The Selection Bias Problem: 0.25% of the Truth
Start with the arithmetic.
A typical mid-market SaaS company has 1,500 to 3,000 customers. The target provides 5 references. That is a 0.17% to 0.33% sample — call it 0.25% to keep the math clean.
No other form of diligence operates on 0.25% of the available data. Financial diligence examines 100% of the revenue. Legal diligence reviews 100% of the material contracts. Technical diligence assesses the entire codebase. But customer diligence — the exercise that is supposed to validate whether real human beings will continue paying for the product — relies on a fraction of a percent of the customer base.
That alone should give pause. But the problem is worse than sample size.
The 0.25% is not randomly selected. It is curated by the party with the strongest possible incentive to present favorable results. The target company’s management team — the same people negotiating for a higher multiple — hand-picks which customers the deal team will speak with.
Think about what this means in practice. No CEO provides a reference from:
- The enterprise account that did not renew last quarter
- The customer who just signed a pilot with a competitor
- The mid-market segment where NPS has dropped 15 points in the last year
- The account where the champion left and the new decision-maker is running an RFP
- The customer who renewed only because switching costs made leaving impractical
Management selects their happiest, most articulate, most loyal advocates. Often these are customers with personal relationships to the CEO or sales leadership. Sometimes they are customers who have received concessions — preferential pricing, dedicated support, custom features — that the broader base does not enjoy and that will not scale post-acquisition.
The result is a sample that is both absurdly small and systematically biased in a single direction. It is the equivalent of evaluating a restaurant by asking the chef’s mother what she thinks of the food.
The Data: 30-40% Satisfaction Inflation
Selection bias is not an abstract concern. It is measurable.
When you compare satisfaction metrics from reference calls against independently-recruited interviews for the same company, the gap is consistent and large. Reference call satisfaction scores run 30-40% higher than those from independent customer research.
This is not a subtle discrepancy. A 30-40% inflation in satisfaction is the difference between a company that customers love and a company that customers tolerate. It is the difference between a thesis that holds and a thesis that collapses 18 months into the hold period.
Here is what the gap looks like across common diligence metrics:
Net Promoter Score. Reference calls for a given company might suggest an NPS of 45 — solidly in “excellent” territory. Independent interviews with 50+ customers from the same base typically reveal an NPS closer to 25. Still positive, but a fundamentally different story about customer loyalty and advocacy.
Renewal intent. Reference customers almost universally report high renewal likelihood — 90%+ say they plan to renew. Independent interviews consistently show renewal intent 15-25 percentage points lower, with meaningful segments expressing conditional renewal (“depends on pricing,” “evaluating alternatives,” “need to see improvement in X”).
Competitive vulnerability. In reference calls, customers rarely mention competitors in a serious context. In independent interviews, 30-50% of customers report active awareness of or evaluation of alternatives. The reference call model structurally excludes the customers most likely to switch.
Willingness to pay. Reference customers — often the most loyal and longest-tenured — tend to express higher price tolerance. Independent interviews reveal pricing sensitivity across newer, smaller, or less-engaged segments that reference calls never surface.
The pattern is consistent regardless of industry, company size, or product category. Reference calls produce a systematically rosier picture because they are designed to. That is their function. The question is whether you want your diligence to confirm the seller’s narrative or test it.
For real-world examples of what happens when customer evidence is wrong, see commercial due diligence failures that cost PE firms millions.
Why Smart Deal Teams Still Rely on Reference Calls?
If reference calls are so obviously flawed, why do sophisticated investors continue to use them?
The answer is not ignorance. Most deal professionals understand, at least intuitively, that 5 curated conversations are not a representative sample. They continue using reference calls for structural and behavioral reasons that are worth examining honestly.
They are easy
Reference calls require no recruitment infrastructure, no panel access, no research methodology. The target provides a list. An associate makes calls. The information flows through existing workflows without requiring new vendors, new processes, or new approvals.
They are fast
Scheduling 5 calls takes days, not weeks. In a competitive process where speed is currency, the appeal of a customer diligence workstream that can close in a week is powerful. The perceived alternative — a multi-week consulting engagement — feels incompatible with deal timelines.
They are cheap
Five phone calls cost nothing beyond internal team time. Commissioning independent customer research from a traditional consulting firm might cost $50,000-$150,000 and take 4-8 weeks. For a deal team under fee pressure, the cost-benefit calculation has historically favored reference calls.
They are familiar
Reference calls have been the standard for decades. Every associate knows how to run them. Every investment committee expects to see them in the deck. Proposing an alternative means challenging institutional convention, which carries career risk that maintaining the status quo does not.
They produce comfortable answers
This is the most honest reason, and the least discussed. Reference calls almost always confirm the thesis. They generate quotable praise that reads well in IC memos. They rarely produce the kind of disconfirming evidence that forces difficult conversations about valuation, deal structure, or walk-away decisions.
For a deal team that has already spent months on a process, invested significant resources, and developed conviction around a thesis, reference calls function as confirmation bias with an institutional veneer.
None of these reasons are illegitimate on their own terms. They reflect rational responses to real constraints — time pressure, budget limitations, process momentum. The problem is that they prioritize process convenience over diligence rigor at exactly the moment when rigor matters most.
The Math: What Satisfaction Inflation Costs Your Bid
The gap between reference call data and independent customer evidence is not an academic concern. It translates directly into financial assumptions that drive your bid price.
Walk through a simplified but illustrative example.
The setup: You are evaluating a $200M revenue SaaS company. Your model assumes 90% gross revenue retention (GRR) based partly on customer sentiment from reference calls. The deal is priced at 12x EBITDA.
What reference calls told you: NPS of 45. Renewal intent above 90%. Minimal competitive vulnerability. Strong willingness to pay. Your retention model projects stable, compounding revenue.
What independent research reveals: NPS of 25. Renewal intent closer to 75% with meaningful at-risk segments. 35% of customers actively aware of competitors. Pricing pressure in the mid-market segment that represents 40% of revenue.
The retention impact: If actual GRR is 82% instead of 90%, the difference compounds aggressively. Over a 5-year hold:
- At 90% GRR: $200M in Year 1 revenue retains $131M by Year 5 (before new bookings)
- At 82% GRR: $200M in Year 1 revenue retains $97M by Year 5
That is a $34M gap in retained revenue by Year 5 — from an 8-point difference in gross retention that reference calls masked.
The valuation impact: If exit multiples hold constant at 12x and your retained revenue base is $34M lower than projected, the exit value erosion is measured in hundreds of millions. Factor in the compounding effect on expansion revenue (happy customers expand; at-risk customers do not), and the total value destruction from inaccurate customer evidence grows further.
The bid price implication: If you bid 12x on a thesis that assumes 90% GRR, and the real number is 82%, you overpaid. The premium you gave the seller was not for quality of business. It was for quality of curation.
This is not a worst-case scenario. It is the median outcome when deal teams rely on 5 curated conversations instead of 50+ independent interviews. The 30-40% satisfaction inflation documented across hundreds of deals translates into retention assumptions that are reliably too optimistic by 5-10 percentage points — enough to meaningfully alter the economics of a leveraged transaction.
The Alternative: Independent Recruitment at Scale
The solution to reference call bias is not better reference calls. It is a fundamentally different approach to sourcing customer evidence.
Independent customer diligence operates on three principles that reference calls violate:
1. Independent recruitment
Customers are sourced from a 4M+ research panel — not from a list provided by the target. Multi-layer screening verifies actual product usage, employment at the target’s customer companies, and decision-making authority. The target company has no involvement in participant selection, no knowledge of which customers participate, and no opportunity to coach responses.
This is the single most important difference. When management cannot curate the sample, the selection bias that inflates reference call scores disappears. You hear from the full spectrum of the customer experience — the enthusiasts, the satisfied-but-passive, the frustrated, and the actively-evaluating-alternatives.
2. Statistical significance
Fifty or more interviews provide enough volume to segment by meaningful dimensions: customer size (enterprise vs. mid-market vs. SMB), tenure (new vs. long-standing), industry vertical, use case, and satisfaction level. You can identify whether a retention problem is concentrated in a specific segment or distributed across the base. You can distinguish between a company with one unhappy cohort and a company with a systemic issue.
With 5 reference calls, segmentation is impossible. One enterprise customer and one SMB customer does not constitute a segment analysis. It constitutes two anecdotes.
3. Structured methodology
Each interview runs 30+ minutes with 5-7 levels of probing depth. AI moderation follows the customer’s responses in real time, pursuing the threads that matter rather than running through a checklist. When a customer says “we renewed, but…” the conversation goes deeper into the “but.” When a customer mentions evaluating a competitor, the interview explores why, what triggered the evaluation, and how far along it has progressed.
Reference calls, by contrast, tend toward surface-level Q&A. The format — 15 minutes, phone call, associate-led — is not designed for depth. And the social dynamics of speaking about a company that introduced you to the caller discourage candor.
For the specific questions that drive these interviews, see 50 customer due diligence questions for PE.
Speed: 72 Hours, Not 6 Weeks
The historical objection to independent customer research was timeline. Traditional consulting firms take 4-8 weeks to recruit 15-20 customers, conduct interviews, and deliver a report. In a competitive auction, that timeline is a non-starter.
AI-moderated interviews eliminate this constraint.
Recruitment: Independent recruitment from a 4M+ panel takes 24-48 hours. Screening is automated, with multi-layer verification running in parallel across hundreds of potential participants. There is no cold outreach, no LinkedIn InMail chains, no waiting for callbacks.
Interviews: AI-moderated conversations run asynchronously — participants complete them on their own schedule, at their own pace. There is no calendar coordination, no timezone juggling, no rescheduling. Fifty interviews can run simultaneously rather than sequentially.
Synthesis: AI-driven analysis produces structured outputs — thematic clustering, sentiment scoring, segment-level breakdowns, direct comparison against investment thesis assumptions — within hours of interview completion. There is no two-week analyst sprint to produce a PowerPoint.
Total timeline: 48-72 hours from engagement to synthesized findings. That is faster than most deal teams can schedule and complete 5 reference calls.
The speed objection is not just outdated. It is inverted. Independent customer diligence is now faster than the reference call model it replaces.
How Do You Transition: A Practical Framework?
Moving from reference calls to independent customer diligence does not require a wholesale process overhaul. It requires three adjustments.
Step 1: Reframe the ask
Stop requesting “customer references” from the target. Instead, request a complete customer list (or a representative subset) and permission to recruit independently. Frame it as standard diligence practice — because it should be.
Some deal teams worry that requesting independent access will signal distrust or create friction with the seller. In practice, the opposite is true. A target that resists independent customer research is telling you something important about what that research might find.
Step 2: Run both in parallel (initially)
For your first 2-3 deals, run reference calls alongside independent interviews. This serves two purposes. First, it lets you calibrate the satisfaction gap for your specific deal context — is it 30%? 40%? Higher? Second, it builds an internal evidence base for retiring reference calls entirely.
The comparison is consistently striking. Deal teams who see the gap between curated and independent customer evidence once rarely go back to relying on reference calls alone.
Step 3: Integrate findings into the financial model
Independent customer evidence should feed directly into your retention assumptions, pricing power projections, and competitive vulnerability assessment. Build a standard template that maps customer research outputs to specific line items in the financial model. When NPS is 25 instead of 45, what does that mean for your Year 3 GRR assumption? When 35% of customers are evaluating competitors, what does that imply for the churn rate you are modeling?
The goal is not to add another qualitative section to the IC memo. It is to make customer evidence quantitatively rigorous enough to adjust the bid price.
What Does Good Customer Diligence Actually Look Like?
To make the contrast concrete, here is what a commercial due diligence customer research workstream looks like when it is done properly:
Day 1-2: Independent recruitment from a 4M+ panel. Multi-layer screening identifies 60-80 verified customers across key segments (enterprise, mid-market, SMB; tenured and new; various industry verticals). No involvement from the target company.
Day 2-3: AI-moderated interviews. Each conversation runs 30+ minutes with 5-7 levels of probing depth. Participants complete interviews asynchronously. 50+ conversations run in parallel.
Day 3-4: Automated synthesis. Thematic analysis clusters findings around investment thesis dimensions: retention risk, competitive positioning, pricing power, product satisfaction, expansion potential. Segment-level breakdowns reveal whether patterns are uniform or concentrated.
Day 4: Delivery. Structured findings mapped to thesis assumptions, with specific implications for the financial model. Not a 60-page deck. A focused evidence package designed to inform bid decisions.
Cost: $20 per interview. For 50 interviews, that is $1,000 — a rounding error on a deal where the bid price is measured in hundreds of millions.
Compare that to 5 reference calls that cost nothing and tell you nothing you could not have predicted from the management presentation.
The Uncomfortable Question
Every deal team that relies on reference calls should ask themselves a simple question: If you were selling your own company, would you let the buyer choose which customers to interview?
Of course not. You would select your best advocates, brief them on what to expect, and present a curated narrative that supports the highest possible valuation. That is rational behavior. It is also exactly what every target company does when they provide a reference list.
The reference call model does not just fail to surface risk. It structurally prevents risk from surfacing. The customers most likely to reveal churn risk, competitive vulnerability, pricing pressure, or product dissatisfaction are precisely the customers that management will never include on a reference list.
For a discipline that prides itself on rigorous analysis, the persistence of reference calls as a primary source of customer evidence is a remarkable blind spot. The tools to do better exist. The cost is negligible. The timeline is shorter. The only thing standing between your deal team and genuinely independent customer evidence is the willingness to stop accepting curated narratives as diligence.
Five customers hand-picked by the seller is not a sample. It is a performance. And in 2026, there is no reason to keep applauding.