Redefining Success: Expanding the Metrics of AI Progress

By Raffi Krikorian

4 min read

Today's AI progress is tracked through well-understood metrics: Capital raised by AI start-ups or users for a particular AI product. These data points provide reassurance, since we know them well from other digital success stories over the decades and thus can track them with precision. 

But if the old adage is true and "what gets measured gets managed," we should consider a distinct set of metrics. Defining AI's success beyond technological breakthroughs and commercial gains can help our society evaluate and steer AI's broader impact. If we track AI's societal impact, ethical considerations, and long-term effects, we can manage those outcomes as rigorously as user growth or valuations.

Metrics for societal impact could include the number of lives improved or saved through AI-powered healthcare interventions, the reduction in carbon emissions achieved by AI-driven energy optimization systems, or the decrease in poverty rates facilitated by AI-supported financial inclusion initiatives. Negative effects can also be observed, including the number of jobs lost due to automation or the number of deepfake videos published and circulated. Evaluating success or failure should involve quantifying tangible outcomes and the degree to which AI technologies contribute to societal progress or regression.

Change the incentives, and you change the behaviors.

These metrics should be given cultural and economic weight, because doing so can alter the incentive system for these technologies. If governments and markets care about societal impact in the same way that, today, investors care about sustainability scores, AI technology will begin to mature and adapt. Today’s incentives — user growth, money raised, time on site — drive certain behaviors. Change the incentives, and you change the behaviors. 

For instance, an AI-powered healthcare innovation that improves patient outcomes, reduces healthcare costs, and increases accessibility should be prized as much as the generative AI application that reaches unheard-of levels of user growth. Investors looking to merge high returns with socially-conscious investing should ask for numbers from AI start-ups that speak to their social impact, not just their return on invested capital.

We should also evaluate AI's ethical imprint, or study how it performs against the metrics outlined in the White House’s draft AI bill of rights, which Technically Optimistic guest Suresh Venkatasubramanian co-authored. Those measurements can then be used to shape AI systems development and deployment. Success metrics can assess how well AI technologies adhere to ethical principles and respect fundamental human rights. Transparency and explainability, for instance, are quantifiable metrics, and they are rigorously studied by international think tanks and advocacy groups. We can use those same methodologies to measure AI transparency in a given country or company, including thoughtful evaluations of algorithmic decision-making or auditing and accountability systems. 

Metrics for fairness and bias mitigation can also improve AI's ethical impact. It isn't difficult to come up with an index that tracks bias in AI algorithms, measured through demographic parity or equalized odds. Another easily-quantifiable ethical heuristic: Privacy protection and data security. Even the simplest metrics—like the number of data breaches, or the level of self-reported user trust—can operate like contemporary "Net Promoter Scores" for a given piece of AI consumer technology.

Success metrics can assess how well AI technologies adhere to ethical principles and respect fundamental human rights.

These metrics needn't be short-term. In fact, for the AI revolution to unfold in a way that benefits us all, we must also measure its impact over the long haul. We can measure and evaluate AI's long-term effects on employment, socioeconomic equity, and the environment, including metrics like the number of new job opportunities created through AI technologies or the number of workers upskilled in industries affected by automation. An AI-driven manufacturing process that leads to job losses and exacerbates socioeconomic inequalities should not be deemed a complete success, despite its near-term boost to efficiency.

We can also study "softer" metrics tied to education and information. It isn't a stretch to imagine "an AI Census"—a wide-ranging public survey that quantifies public trust in AI systems and tracks the adoption of AI technologies in critical domains such as healthcare, transportation, or education. AI literacy can also be measured, both through public surveys as well as testing of school-age children.

Finally, and perhaps most importantly, we can measure processes, not just outcomes. If we seek to usher in AI's responsible deployment, we can track data about how technologies are deployed. We can, for example, evaluate diverse employee or board representation at AI companies. We can also evaluate the impact of a given technology launch on affected communities or the number and variety of advocacy groups involved in a company's AI decision-making. Third-party organizations could create AI scorecards, tracking metrics like stakeholder engagement and the diversity of data sources. Congress could even establish a national index—pulling together a range of data sources into a single cohesive index.

Even the most barebones monitoring and evaluation can powerfully influence AI's outcomes. Simple metrics like these can help us assess the unintended consequences and ethical issues that arise from AI systems post-deployment, as well as the prompt resolution of such issues. Companies would then vie to increase their "Responsible AI score"—in the same way they compete today for better NPS ratings or faster user growth.

In every area of life, how we measure our successes drives the successes themselves. Metrics that encompass AI’s societal impact, ethical considerations, long-term implications, and responsible deployment can help shape everything from public attitudes to regulatory frameworks. As we redefine these metrics, we can nudge AI technologies in the direction of better outcomes for a broader number of people over a longer time horizon.

Related Stories