Main image of article 3 Mistakes to Avoid With Software Engineer Performance Metrics

In April 2020, McKinsey surveyed 500 leaders about the ways “technology transformations” were impacting their companies. The results were shocking.

The survey asked about ten technology-led changes designed to increase revenue or reduce costs, from enhancing IT infrastructure to scaling data and analytics. Respondents said that changes to a company’s tech talent created the most value. Yet they also said talent-strategy changes were the least likely to occur over the next two years. How could that be?

One factor could be that transforming tech talent depends on accurately defining “success” through effective performance measurement. Indeed, measuring performance consistently, accurately, and fairly is a tough task for most organizations.

For over 100 years, industrial-organizational psychologists have studied the notoriously difficult challenge of measuring job performance and have even invented a name for it: “the criterion problem.”

The criterion problem is the difficulty of finding performance measures that are uncontaminated by job-irrelevant factors yet capture a job’s full performance spectrum. For example, the number of code commits is a “contaminated” performance metric because outside forces impact it. It’s also a deficient performance metric because it misses the quality of those commits, an essential facet of engineer performance.

So, while there are no simple solutions to a 100-year-old problem, especially with complex, constantly evolving jobs like software engineering, there are common measurement mistakes engineering leaders can avoid.

Here are three mistakes to avoid and what to do about them, according to the latest science:

Using One Global, Subjective Rating

Job performance, according to modern scientific thinking, is a combination of productivity (i.e., task performance), teamwork (i.e., contextual performance), and bad behaviors such as absenteeism, time theft, toxicity, or negligence (i.e., counterproductive work behaviors or CWBs).

Because of this inherent complexity, many leaders collect global, subjective ratings when measuring performance: for instance, a net promoter score (NPS)-type rating where supervisors rate the likelihood they would rehire a new employee after the first 90 days.

Generic, easy-to-gather metrics can sometimes be better than no metrics at all. Still, the best-case scenario for using a global rating is that it provides a vague impression of a worker’s combined task, contextual, and CWB performance behaviors, making insights or interventions difficult, if not impossible. The worst-case scenario is that global, subjective ratings measure something completely unrelated to an employee’s effectiveness, such as their similarity to the rater.

Instead of using one global rating, combine supervisor ratings with team level outcomes (see below), HR metrics (engagement survey responses and tenure), or even peer ratings (e.g., 15Five’s performance management and Ray Dalio’s Dot Collector) measured at the individual level. These combined ratings offer a more robust index of individual task, contextual, and CWB performance behaviors.

Measuring KPIs or Outcomes at the Wrong Level

For engineering organizations that want more objective performance metrics, leaders often turn to a handful of engineering KPIs such as page load times, endpoint latencies, the number of critical bugs fixed, uptime, and deployment lead time. At face value, these KPIs make sense as they seem like hyper-job-relevant, quantifiable measures of an engineer’s contribution.

But like sales numbers, individual KPIs are often contaminated by forces outside the employee’s control, such as team interdependencies, vendor problems, or security threats. Engineering KPIs may also be deficient in measuring an individual engineer’s contextual contributions, such as strong collaboration, communication, and documentation.

Instead of fixating on KPIs to determine an individual’s outputs or behaviors, consider measuring outcomes at a team level. Outcomes are the consequences of collective individual behaviors that benefit the organization, such as increased user net-promoter scores for a product or feature.

Assuming you have identified measurable outcomes that represent competitive advantages, measuring team-level outcomes can sometimes reward and incentivize behaviors better than individual-level measurements. If you do choose team-level outcomes, ensure rewards are aligned to the team’s performance and are the ‘real’ monitored goals.

Evaluating Engineers’ Performance Once a Year

Research also tells us that job performance isn’t static but highly variable, making it necessary to measure performance across time. But the status quo in engineering management is to periodically review KPIs, along with an annual performance review.

Rather than looking at moment-in-time KPIs or ratings, monitor trends and change rates in individual and team metrics. This requires more regular measurements and sophisticated data gathering, but you often get the most actionable signals by observing dynamic effects across time.

For organizations with fewer resources, ask supervisors to rate trends and changes in their direct reports’ performance regularly, rather than averaging performance over the past six months.

To be clear, there are no silver bullets or golden standards for engineer performance metrics. The best advice is to localize metrics to your local job and business context.

Finding ways to more accurately and fairly evaluate engineering performance in your organization can have a big impact on employee engagement, talent retention, and overall company performance. Implementing regular evaluations and check-ins with your engineers and Identifying the right metrics for evaluation for your organization can help you to make the most of the talent you have.

Neil Morelli is the Chief I-O Psychologist at Codility.