Most health scores are glorified gut feelings with a color label
Here's a test. Go into your CS platform right now and pull up the distribution of customer health scores. If more than 80% of your accounts are green, your health score isn't measuring health. It's measuring CSM optimism.
This is the most common failure mode in customer health scoring. The model exists, it has inputs, it produces a number or a color, and it doesn't predict anything. When you backtest it against actual churn outcomes, the correlation is weak or nonexistent. Red accounts renew. Green accounts churn. The score becomes background noise that the CS team ignores, and leadership loses confidence in the entire framework.
The problem isn't that health scoring is fundamentally flawed. It's that most implementations skip the hard work of building a model grounded in data, calibrated against real outcomes, and connected to operational workflows that turn the score into action.
This post walks through how to build a health scoring model that actually works. Not the vendor demo version with twenty pre-built signals and a pretty dashboard. The operator version, built from your data, calibrated against your churn history, and designed to tell your CS team not just that an account is at risk but what to do about it.
What makes a good health score
Before building anything, define what "good" looks like. A health score that doesn't meet these four criteria is not worth the effort of maintaining.
Predictive
The score must correlate with actual renewal and churn outcomes. This is the only criterion that ultimately matters. If accounts scored green churn at the same rate as accounts scored yellow, the score has no predictive value and should be rebuilt.
Predictive doesn't mean perfect. No model will predict every churn event, especially those driven by external factors like acquisitions, budget cuts, or executive turnover. But a well-built model should show a clear, statistically significant difference in churn rates across score tiers. Green accounts should churn at 2-5%. Yellow at 8-15%. Red at 20-40%. If the differences aren't that stark, the model needs calibration.
Actionable
A score that tells you an account is "at risk" without indicating why is only marginally useful. An actionable health score decomposes into its component signals so the CSM can see that the account is yellow because product adoption dropped 40% last month and the executive sponsor hasn't logged in for 60 days. That decomposition turns the score from a label into a diagnostic.
Objective
The score should be driven by data, not CSM judgment. This is where most health scoring implementations break down. CSMs override the score based on their "feel" for the account, which reintroduces the subjectivity the model was designed to eliminate. There's a place for CSM input (more on this below), but the base score must be calculated from measurable signals.
Dynamic
Health scores must update automatically as signals change. A score that refreshes monthly is too slow. By the time you see the decline, the account has been disengaged for weeks. The best implementations update daily or in near real-time, pulling from product usage data, support systems, and CRM activity feeds.
The input categories
A robust health scoring model draws from four categories of signals. Each category captures a different dimension of customer health, and over-weighting any single category creates blind spots.
Product usage signals
Product usage is typically the strongest predictor of retention. Customers who use the product regularly and broadly are unlikely to churn. Customers whose usage is declining or concentrated in a single feature are at elevated risk.
Key signals to consider:
- Login frequency and trends. Not just the absolute number of logins, but the direction. An account that went from 50 logins per week to 15 logins per week is more concerning than an account that consistently logs in 20 times per week. Trend matters more than snapshot.
- Feature adoption breadth and depth. Breadth measures how many features the customer uses. Depth measures how intensively they use core features. A customer using three of twelve features superficially is less healthy than a customer using five features deeply. Define "core features" for your product and track adoption against that list.
- DAU/MAU ratio. Daily active users divided by monthly active users. This measures engagement intensity. A DAU/MAU of 50% means half of all monthly users are active on any given day, which indicates strong habitual usage. Below 20% suggests the product is used infrequently or by a small subset of the user base.
- Time in product. Total session duration per user per week. Useful as a secondary signal, but be careful: more time in product isn't always better. For some products, high session times might indicate usability problems rather than deep engagement. Context matters.
Engagement signals
Engagement signals capture the relationship between the customer and your team. They're harder to automate than usage data but provide critical context that product signals alone can't.
Key signals to consider:
- Executive sponsor engagement. Is the economic buyer or executive sponsor still engaged with your team? If the person who championed the purchase has gone silent, that's one of the strongest churn predictors available. Track last contact date, meeting attendance, and communication frequency.
- Support ticket volume and sentiment. A spike in support tickets can signal product friction. But the absence of tickets isn't necessarily positive; it might mean the customer has stopped trying to make the product work. Sentiment analysis on ticket content (frustrated vs. routine inquiries) adds a valuable layer.
- Meeting frequency with CSM. Customers who consistently attend QBRs, check-ins, and training sessions are engaged. Customers who cancel or no-show repeatedly are disengaging. Track attendance rates, not just scheduled meetings.
- Response time to communications. How quickly does the customer respond to CSM emails and meeting requests? A customer who used to respond within hours and now takes a week is showing a clear behavioral change.
Financial signals
Financial signals capture the economic dimension of the relationship. They're often overlooked in health scoring models that focus exclusively on usage and engagement.
Key signals to consider:
- Contract value trend. Is the customer's spend growing, flat, or shrinking over successive contract periods? A customer who renewed at a lower value last cycle is showing contraction behavior that may escalate to churn.
- Payment history. Late payments and payment disputes are correlated with churn. They may indicate budget pressure, internal dissatisfaction, or organizational deprioritization of the product.
- Expansion conversations in progress. An account actively evaluating an upsell or additional product is unlikely to churn in the near term. The presence of active expansion conversations is a positive health indicator.
- Renewal date proximity. Risk increases as the renewal date approaches, especially for accounts without early renewal commitment. Weight this signal higher in the 90 days before renewal.
Outcome signals
Outcome signals measure whether the customer is achieving the goals they bought the product to accomplish. This is the hardest category to measure systematically but arguably the most important.
Key signals to consider:
- Achievement of stated goals. During onboarding or the sales process, the customer defined success criteria. Are they hitting those milestones? This requires tracking goals in your CS platform and updating progress regularly.
- ROI documentation. Has the customer quantified and documented the return on their investment? Customers who can articulate concrete ROI are significantly less likely to churn. They have internal justification for renewal.
- Reference and advocacy willingness. Customers willing to serve as references, write case studies, or speak at events are deeply engaged. This signal is binary (willing or not) but highly indicative.
Building the model: step by step
Step 1: Identify your churn drivers
Before selecting inputs for the model, analyze your historical churn data. Pull every account that churned or significantly contracted in the last 12-24 months and look for patterns.
Questions to answer:
- What did usage look like in the 90 days before churn? Was there a consistent decline pattern?
- Were there engagement warning signs (missed QBRs, unanswered emails, executive sponsor departure)?
- What was the average health score at the time of churn? If most churned accounts were scored green, your current model is broken.
- Were there financial signals (late payments, contraction at last renewal)?
- Did the customer achieve their stated goals?
This analysis produces a ranked list of churn drivers specific to your business. Product usage decline might be the top predictor for one company while executive sponsor disengagement is the top predictor for another. Don't assume; let your data tell you.
Step 2: Select 8-12 input signals
Resist the temptation to include every possible signal. Models with 50+ inputs are impossible to interpret, difficult to maintain, and no more predictive than simpler models. The sweet spot is 8-12 signals that collectively cover all four input categories.
A typical starting configuration:
- Login frequency trend (30-day vs. 90-day average)
- Feature adoption breadth (percentage of core features used)
- DAU/MAU ratio
- Executive sponsor last contact date
- Support ticket trend (volume and sentiment)
- CSM meeting attendance rate
- Contract value trend
- Renewal date proximity
- Goal achievement status
- NPS/CSAT score (most recent)
Fewer inputs, well-chosen and well-weighted, will outperform a comprehensive model that nobody understands.
Step 3: Define scoring weights
Not all inputs are equally predictive. Weight them based on your churn driver analysis from Step 1. A common starting distribution:
- Product usage signals: 40-50%. Usage is typically the strongest predictor across B2B SaaS companies. If customers aren't using the product, nothing else matters much.
- Engagement signals: 25-30%. Relationship health is the second strongest predictor. It captures context that usage data alone misses.
- Financial signals: 15-20%. Financial indicators add a valuable economic dimension, especially close to renewal.
- Outcome signals: 10-15%. Outcome signals are powerful but harder to measure consistently, so they typically receive lower initial weight until your tracking matures.
These are starting points. Your churn analysis may produce a different distribution. The point is that the weights should be evidence-based, not arbitrary.
Step 4: Set thresholds
Define what the score means in operational terms. Two common approaches:
Color-based (green/yellow/red). Simple, intuitive, and easy to communicate. Green means "healthy, continue current engagement." Yellow means "at risk, proactive intervention needed." Red means "critical, immediate action required." Set the boundaries based on where you see natural breaks in your historical data.
Numeric scale (0-100). More granular, better for trend analysis, but requires more interpretation. A common mapping: 80-100 is green, 50-79 is yellow, below 50 is red.
Whichever approach you choose, the thresholds should produce a distribution that matches your known reality. If your annual gross retention is 88%, roughly 12% of your ARR should be in red/yellow zones at any given time. If your model shows 5% at risk when you're losing 12% to churn, the thresholds are too lenient.
Step 5: Backtest against historical data
This is the step that separates predictive models from wishful thinking. Apply your scoring model retroactively to accounts from the past 12-24 months and compare the scores to actual outcomes.
What you're looking for:
- Separation. Is there a clear, measurable difference in churn rates between green, yellow, and red accounts? If green accounts churned at 5%, yellow at 12%, and red at 30%, your model has good separation.
- Coverage. Did the model flag accounts that ultimately churned? If 60% of churned accounts were scored green at the time of churn, the model is missing critical signals.
- False positive rate. What percentage of red accounts actually renewed? Some false positives are expected (intervention may have saved the account), but if 80% of red accounts renew, your model is too sensitive.
Adjust weights and thresholds based on the backtest results. This is an iterative process; expect to go through 3-5 rounds of adjustment before the model performs well.
Step 6: Calibrate quarterly
A health scoring model is not a set-it-and-forget-it artifact. Customer behavior changes, product updates shift usage patterns, and the signals that predicted churn last year may not predict churn next year.
Build a quarterly calibration process:
- Compare health scores at the time of renewal to actual renewal outcomes
- Identify any churned accounts that the model missed (false negatives) and any flagged accounts that renewed successfully (false positives)
- Adjust weights, thresholds, or input signals based on the analysis
- Document changes and communicate them to the CS team
From score to action
A health score that exists only on a dashboard is a vanity metric. The score becomes valuable when it triggers specific, documented workflows.
Red accounts: immediate intervention playbook
When an account drops into red, the following should happen automatically within 24-48 hours:
- Alert the assigned CSM and their manager
- Create a task for the CSM to conduct a diagnostic review (usage data, engagement history, open support tickets)
- Schedule an executive-level check-in if the executive sponsor hasn't been contacted in 30+ days
- Flag the account in the renewal forecast as at-risk
- If the account is high-value (top 20% by ARR), escalate to CS leadership for a save plan
The playbook should define specific actions based on the reason the account is red. An account that's red due to usage decline needs a different intervention than one that's red due to executive sponsor departure.
Yellow accounts: proactive outreach cadence
Yellow accounts need proactive engagement before they slip into red:
- Increase CSM touchpoint frequency (from quarterly to monthly, or monthly to bi-weekly)
- Conduct a success review focused on goal achievement and ROI documentation
- Identify and address the specific signals driving the yellow score
- Introduce additional stakeholders on the customer side to reduce single-threaded risk
- Share relevant best practices, training resources, or product features that address the gap
Green accounts: expansion opportunity identification
Green accounts represent your expansion pipeline. The health score should trigger growth-oriented workflows:
- Identify upsell and cross-sell opportunities based on usage patterns (features they'd benefit from but haven't adopted)
- Schedule strategic business reviews focused on future goals and growth plans
- Engage the account in advocacy programs (references, case studies, advisory boards)
- Monitor for signals that green is trending toward yellow so you can intervene early
The critical principle: the score is only useful if it triggers a workflow. If your CS team sees the scores but doesn't have documented playbooks for each tier, you've built measurement without action, and measurement without action changes nothing.
Common health scoring mistakes
Making it too complex
A model with 50+ inputs, custom weighting for each customer segment, and a machine learning layer that nobody on the team can explain is not better than a 10-input model that the CS team understands and trusts. Complexity reduces transparency, makes calibration difficult, and erodes team confidence in the score.
Start simple. Validate that the simple model works. Add complexity only when you have evidence that additional signals improve predictive accuracy.
Letting CSMs override the score without documentation
CSM overrides are a common request. "I know this account is healthy; the score is wrong." Sometimes the CSM is right. But undocumented overrides defeat the purpose of an objective scoring model.
If you allow overrides, require documentation: which signal does the CSM disagree with, what evidence supports the override, and when should the override expire? Track override accuracy over time. If CSMs who override the score are right 80% of the time, the model needs better inputs. If they're right 40% of the time, the overrides are introducing noise.
Not calibrating against actual outcomes
This is the most consequential mistake. Teams build the model, deploy it, and never validate whether it works. A year later, someone asks, "Does our health score predict churn?" and nobody can answer.
Build the calibration process before you launch the model. Schedule the first calibration review for 90 days after launch, and run it quarterly thereafter. If you skip calibration, the model will drift out of alignment with reality, the CS team will lose trust in the scores, and you'll end up right back where you started.
Weighting all inputs equally
If every input carries the same weight, your model assumes that a decline in login frequency is exactly as concerning as a decline in NPS score. That's almost never true. Equal weighting is a shortcut that produces mediocre accuracy.
Use your churn analysis to determine weights. If product usage decline is present in 80% of churned accounts but NPS decline is present in only 30%, usage should carry significantly more weight.
Ignoring the time dimension
Many models treat signals as point-in-time snapshots. Current login count. Current NPS. Current ticket volume. But trends are more predictive than snapshots. An account with 100 logins this month is healthy if last month was also 100. The same account is concerning if last month was 300. Build trend-based inputs (30-day vs. 90-day averages, month-over-month change rates) into the model wherever possible.
For the complete CS operations framework that ties health scoring to metrics, playbooks, and technology, see the customer success operations guide. For the metrics that health scores should ultimately connect to, see the CS Ops metrics and NRR guide.
Build a health score that earns trust
The goal isn't a perfect model. It's a model that's meaningfully better than gut feel, that the CS team trusts enough to act on, and that improves over time through systematic calibration.
Start with your churn data. Identify what actually predicted churn in your business. Select 8-12 signals that cover usage, engagement, financial health, and outcomes. Weight them based on evidence. Set thresholds that produce a realistic risk distribution. Backtest against history. Calibrate quarterly. And most importantly, connect every score tier to a specific workflow so the score drives action, not just awareness.
The companies that get health scoring right don't just reduce churn. They shift their entire CS motion from reactive to proactive, catching risks months before renewal and converting healthy accounts into expansion opportunities. That operational shift is what separates CS teams that protect revenue from CS teams that grow it.
At RevenueTools, we're building the data and workflow infrastructure that makes health scoring operationally useful. Not just a score on a dashboard, but a signal connected to the playbooks, alerts, and processes that turn insight into action. If you're building or rebuilding your health scoring model, we'd like to help.