← All posts

Call Center Quality Assurance: Building a QA Program That Drives Revenue

Jordan Rogers·

Most QA programs measure the wrong things for the wrong reasons

Quality assurance in call centers has an identity problem. In most operations, QA exists to check compliance boxes: Did the agent use the greeting script? Did they read the disclosure? Did they offer the survey? These are not unimportant. But when QA scoring revolves around procedural compliance, you end up with agents who follow every script step perfectly while the customer hangs up frustrated and never comes back.

The financial stakes are real. Bain & Company research has shown that a 5% improvement in customer retention can increase profitability by 25% to 95%. The call center is where retention is won or lost on a daily basis. Every interaction either strengthens the customer relationship or weakens it, and QA is the mechanism that tells you which is happening and why.

Yet most QA programs evaluate 2% to 5% of interactions through manual sampling. That means 95% to 98% of your customer conversations are invisible to quality management. You are making coaching decisions, identifying training gaps, and assessing agent performance based on a statistically insignificant slice of actual work.

This guide covers how to build a QA program that connects to revenue outcomes: the scoring framework, the calibration process, the coaching model, the technology decisions, and the metrics that tell you whether your program is actually working. If you are running or building call center operations, QA is the diagnostic engine that makes every other improvement possible.


What call center QA actually is (and is not)

Call center quality assurance is the systematic evaluation of customer interactions against defined standards, with the purpose of improving agent performance, customer experience, and business outcomes.

What it is not: a compliance audit. QA programs that focus exclusively on script adherence and procedural checkboxes miss the point. The purpose of QA is to answer two questions: Are our customers getting the experience that retains them and grows their value? And where specifically do our agents need development to deliver that experience consistently?

The three functions of an effective QA program

Measurement. QA provides the data you need to understand interaction quality at scale. Without it, you are relying on anecdotal feedback, escalation patterns, and lagging indicators like churn to tell you how your team is performing.

Diagnosis. QA scores do not just tell you that quality is high or low. A well-designed rubric tells you exactly where quality breaks down: opening, needs assessment, resolution approach, communication clarity, or closing. That diagnostic specificity is what makes coaching actionable.

Improvement. The entire point of QA is to drive behavior change through targeted coaching. If your QA program produces scores but does not change how agents handle calls, it is an expensive reporting exercise.

QA staffing benchmarks

The industry standard ratio is one QA analyst per 20 to 25 agents. At that ratio, a QA analyst can evaluate approximately 8 to 12 interactions per day with sufficient depth for meaningful scoring and feedback documentation.

For operations under 25 agents, QA is often handled by team leads or senior agents who dedicate a portion of their time to evaluations. This works at small scale but creates a conflict of interest when supervisors are evaluating their own team members. As you scale past 50 agents, a dedicated QA function becomes essential.


Building a QA scoring rubric that connects to revenue

The rubric is the foundation of your QA program. A bad rubric produces bad data, which produces bad coaching, which produces no improvement. Getting this right matters more than the technology you use.

Structure: categories, criteria, and weighting

An effective QA rubric has four to six categories, each containing three to five specific, observable criteria. Every criterion should be scorable as a clear yes/no or on a defined scale with behavioral anchors.

Here is a framework that connects to revenue outcomes:

Opening and rapport (10% to 15% of total score). Did the agent establish rapport efficiently? Did they set expectations for the call? Did they demonstrate awareness of the customer's history and context? This category matters because first impressions determine whether the customer engages or checks out.

Needs assessment and discovery (20% to 25%). Did the agent ask diagnostic questions before jumping to a solution? Did they uncover the root issue rather than treating the surface symptom? Did they identify related needs or opportunities? This is where upsell and cross-sell potential surfaces, and where first contact resolution is won or lost.

Resolution and accuracy (25% to 30%). Did the agent provide accurate information? Did they resolve the issue completely? Did they take the right action in the system? This is the highest-weighted category because it directly determines whether the customer calls back (costing you money) or leaves satisfied (protecting revenue).

Communication quality (15% to 20%). Was the agent clear and concise? Did they avoid jargon? Did they confirm the customer understood the resolution? Did they manage pace and tone appropriately? Communication quality is the difference between a technically correct interaction and one that actually leaves the customer feeling confident.

Closing and next steps (10% to 15%). Did the agent summarize what was done? Did they set expectations for any follow-up? Did they offer additional assistance? A strong close reinforces resolution and reduces repeat contacts.

Compliance and required elements (5% to 10%). Did the agent meet legal and regulatory requirements? This category exists for necessary disclosures and required steps, but it should never dominate the rubric. If compliance items represent more than 15% of your total score, your rubric is measuring process adherence, not quality.

Weighting for revenue impact

The weighting above is deliberate. Resolution accuracy and needs assessment together represent roughly half the total score because those categories drive the metrics that connect directly to revenue: first contact resolution, customer satisfaction, repeat contact rate, and expansion opportunity identification.

SQM Group research has shown that every 1% improvement in FCR correlates with a 1% improvement in customer satisfaction. And our analysis of call center KPIs shows that CSAT is the leading indicator for retention, which is where call center performance compounds into P&L impact.

Auto-fail criteria

Certain behaviors should result in an automatic failure regardless of the overall score. These are non-negotiable:

  • Providing materially incorrect information that could cause financial or legal harm
  • Failing to complete a legally required disclosure
  • Unprofessional behavior (rudeness, inappropriate language)
  • Unauthorized disclosure of customer information

Auto-fail criteria should be rare and reserved for genuinely consequential behaviors. If your auto-fail list has more than five items, you are probably using it as a crutch for criteria that should be weighted normally.


Calibration: the process that makes QA data trustworthy

A QA rubric is only as good as the consistency with which evaluators apply it. Calibration is the process of aligning QA analysts, supervisors, and leadership on how to score interactions using the rubric.

How calibration works

In a calibration session, three to five evaluators independently score the same interaction, then compare and discuss their scores. Where scores differ, the group discusses the criteria, identifies the source of disagreement, and aligns on the correct interpretation.

Calibration cadence

Run calibration sessions weekly during the first month of a new rubric launch, then biweekly as alignment stabilizes. Monthly calibrations are the minimum for ongoing maintenance. If inter-rater reliability (the correlation between different evaluators scoring the same interaction) drops below 85%, increase calibration frequency until it recovers.

Common calibration pitfalls

Scoring based on outcome rather than behavior. If the customer seemed happy at the end of the call, evaluators tend to score the interaction higher even if the agent's technique was poor. Score what the agent did, not how the call turned out.

Halo effect. Strong agents get the benefit of the doubt on ambiguous criteria. Struggling agents get penalized. Calibration is where you catch and correct this bias.

Rubric drift. Over time, evaluators develop personal interpretations that diverge from the original standard. Without regular calibration, the rubric means something slightly different to each person applying it. Three months of unchecked drift can make your QA data unreliable.


From QA scores to coaching that changes behavior

QA data without coaching is just measurement. The coaching process is where quality improvement actually happens.

The coaching framework

Frequency. One coaching session per agent per week is the target for development-focused coaching. For agents meeting quality targets, biweekly is sufficient. For agents on performance improvement plans, daily touchpoints may be necessary.

Structure. Each coaching session should focus on one to two specific behaviors identified from QA evaluations. Trying to address five problems in a single session produces zero improvement on any of them.

Evidence-based. Every coaching conversation should reference specific interactions. "Your needs assessment could be stronger" is not coaching. "On the call with the customer on Tuesday at 2:15 PM, you jumped to a solution before asking about their account configuration, which led to a 12-minute call that could have been resolved in 6 minutes" is coaching.

Connecting QA coaching to workforce management

QA insights should inform workforce management decisions. If QA data reveals that a segment of agents consistently struggles with a specific call type, that insight should feed into scheduling (ensure those agents are not disproportionately routed those call types until they are developed) and training (build targeted skill development for the gap).

Similarly, if QA analysis shows quality degrades during specific intervals (late shifts, high-volume periods), that is a WFM signal about staffing adequacy and agent fatigue, not just a quality problem.


Traditional sampling vs. AI-powered QA

The most significant shift in call center QA over the past two years is the move from manual sampling to AI-powered evaluation of 100% of interactions.

The sampling problem

Manual QA teams typically evaluate 5 to 10 interactions per agent per month. For a 50-agent center handling 200 calls per agent per month, that is a 2.5% to 5% sample rate. You are making performance judgments based on a handful of calls that may or may not be representative.

Statistical sampling works when the sample is truly random. In practice, QA analysts tend to select recent calls, calls of a certain length, or calls where a customer complaint triggered the review. This introduces selection bias that makes the data less reliable than the sample size alone would suggest.

What AI-powered QA changes

AI-powered QA tools evaluate every interaction against your rubric. The shift from 3% coverage to 100% coverage changes what QA can accomplish:

Pattern identification at scale. AI can identify that 30% of your team struggles with a specific objection, or that calls transferred from a particular IVR path have 2x the handle time. Manual sampling would take months to surface these patterns.

Consistency. AI applies the same scoring criteria identically to every interaction. There is no evaluator bias, no halo effect, no Friday afternoon fatigue affecting scores.

Real-time flagging. AI can flag interactions that need immediate attention (compliance risk, customer escalation, exceptional performance) in near real-time rather than days or weeks after the interaction.

Coaching at scale. When every call is scored, coaching conversations can reference patterns across dozens of interactions rather than cherry-picked examples. "In 47 of your last 200 calls, you skipped the needs assessment step" is a fundamentally different coaching input than "on this one call I reviewed, you skipped the needs assessment."

AI does not replace human QA

AI-powered QA handles the scoring and pattern identification. Human QA analysts shift their focus to calibration, coaching, rubric design, and the nuanced evaluation of complex interactions where context matters. The best implementations use AI for coverage and consistency, and humans for judgment and development.


QA metrics that prove your program works

A QA program needs its own performance metrics to demonstrate value.

QA score trends. Track average QA scores by team, tenure cohort, and call type over time. The trend matters more than the absolute number. If scores are not improving quarter over quarter, your coaching process is not working.

Correlation to business outcomes. Track the relationship between QA scores and the metrics that matter: CSAT, FCR, repeat contact rate, and retention. If high QA scores do not correlate with better business outcomes, your rubric is measuring the wrong things. This analysis is the feedback loop that keeps your QA program connected to revenue. For the full framework on which business metrics to track, see our call center KPIs guide.

Calibration consistency. Measure inter-rater reliability monthly. Target 85% or higher agreement between evaluators. Below that threshold, your QA data is not reliable enough to base coaching and performance decisions on.

Coaching completion rate. Track whether coaching sessions are actually happening at the planned frequency. QA programs often break down not because the rubric is wrong, but because managers deprioritize coaching when operational demands spike.

Time from evaluation to coaching. The faster an agent receives feedback after the evaluated interaction, the more impactful the coaching. Best practice is within 48 hours. If the average gap exceeds a week, the feedback loses context and impact.


Common QA program mistakes

Over-engineering the rubric. A rubric with 40 criteria and 10 categories produces evaluator fatigue and inconsistent scoring. Keep it to 15 to 20 criteria across four to six categories. You can always add specificity later. You cannot easily simplify a rubric that evaluators have already learned to game.

Scoring without coaching. QA scores that go into a spreadsheet and never become coaching conversations are a waste of analyst time. If you do not have the coaching capacity to act on QA data, reduce your evaluation volume and invest the saved time in coaching.

Treating QA as punitive. When agents perceive QA as a gotcha mechanism, they disengage. The best programs position QA as a development tool, not a disciplinary one. Share scoring criteria openly, let agents listen to their own calls, and celebrate improvement.

Ignoring the connection to customer outcomes. If your QA program cannot demonstrate a correlation between higher scores and better customer outcomes (satisfaction, retention, resolution rates), it is not measuring the right things. Revisit your rubric.


Where to go from here

If you are building or rebuilding a QA program, start with the rubric. Get that right, calibrate it thoroughly, and build the coaching cadence before investing in technology. The best QA software in the world cannot compensate for a rubric that measures compliance instead of quality or a coaching process that does not exist.

For the broader strategic context on how QA fits into call center performance, start with our comprehensive guide to call center operations. For the metrics framework that QA should connect to, see our call center KPIs guide. And for the AI-powered QA tools that are changing what is possible, see our AI in call centers playbook.

Quality assurance is not a checkbox. It is the diagnostic engine that tells you where your operation is strong, where it is breaking, and exactly what to fix. Build it to measure what matters to your customers and your revenue, and it becomes the highest-leverage investment in your call center.


RevenueTools is building purpose-built operations software for revenue teams. See what we are working on.

Purpose-built tools for RevOps teams

Cross-channel routing and territory planning, built by operators.

Learn more