Data and Technology

How Decision-Makers Can Catch Gen AI's Bad Advice

AI can sound confident even when it’s wrong. But simple alerts could help businesses discern when to trust an algorithm's recommendations and when to question them, research by Kris Johnson Ferreira shows.

A person’s hands type on a laptop keyboard. On the screen, a silhouetted figure with overlaid with abstract network lines and dots, has their hands half raised in an "I don't know" gesture. The image uses teal and muted red tones.

An ice cream shop that asks an AI chatbot to predict winter sales based on summer results may soon find itself with a freezer full of unsold treats.

That’s not because algorithms always get forecasts wrong. It’s because artificial intelligence often provides confident-sounding answers even when it lacks the information it needs to give reliable responses.

The idea is to help managers learn to use the AI tool when it’s likely to give a good recommendation—maybe even better than their own.

For that reason, businesses should recognize that AI may miss the mark and question its recommendations—or walk away from them—when its results don’t add up, says Harvard Business School’s Kris Ferreira.

“AI can’t do everything, right? We want people to be able to choose when to adhere to an algorithm and when not to,” she says.

Alerts help users evaluate AI’s advice

As businesses increasingly rely on AI to help with decisions like predicting customer demand, Ferreira cautions businesses not to trust AI too much and provides guidance for how companies can design human-AI collaboration efforts so employees are better able to spot and correct bad advice from algorithms.

She finds that a simple AI design improvement makes a big difference: Adding short, well-timed alerts to chatbot responses helped users understand when they could trust AI predictions and when they needed to adjust or discard them. Specifically, letting people know when AI was responding with authority in familiar territory—and warning them when it wasn’t—cut user errors by nearly half.

Ferreira is hopeful that building in alerts will better equip both managers and their employees with the healthy skepticism they need when evaluating AI’s recommendations, so people can ultimately make better decisions.

“The idea,” Ferreira says, “is to help managers learn to use the AI tool when it’s likely to give a good recommendation—maybe even better than their own. Or, in the alternative, to trust their gut, to use their intuition and knowledge when the AI tool likely gets it wrong.”

Ferreira, the Edgerley Family Associate Professor of Business Administration, and HBS doctoral candidate Matthew DosSantos DiSorbo cowrote the paper, “Warnings and Endorsements: Improving Human–AI Collaboration in the Presence of Outliers,” published in the November-December issue of Manufacturing & Service Operations Management, with Jordan Tong, professor at the University of Wisconsin-Madison, and Maya Balakrishnan, assistant professor at the University of Texas at Dallas.

Unusual football play sparks an idea

The seed for the study idea came to DiSorbo from a passion unrelated to business: NFL football analytics. During a game in the 2024 season, a wide receiver threw a touchdown pass—to the quarterback.

The play was so out of the ordinary, DiSorbo recalls, that he noticed it completely confused an analytics algorithm. After all, AI systems are trained on historical data, and in this case, the algorithm did not expect an unusual play like that.

“I realized that outliers that don’t exist in the training set can really wreak havoc on an algorithm in a way that wouldn’t happen with a human,” DiSorbo says. “That was my inspiration, and hopefully a sign that this idea is relevant in many contexts, not just demand forecasting.”

People miscalibrate their trust in AI

When AI first emerged, algorithm aversion caused many users to approach its output with skepticism. Since then, in many organizations, the pendulum has swung the other way, toward unbridled enthusiasm. But in some cases, people now accept AI’s recommendations too willingly and fail to second-guess algorithms even when something doesn’t sound quite right, the researchers say.

The salient aspect is making a prediction about something uncertain—the future.

The team designed a series of online experiments to explore how users perform when asked to predict demand for a product using only an AI-generated forecast and a few basic product details. Participants saw one of three types of AI demand predictions:

  • Representative or familiar training data (called “inliers”)

  • Non-representative or unusual data (called “outliers”)

  • Both familiar and unusual data (inliers and outliers)

In the example of the ice-cream shop, an algorithm may learn that sales increase on warm days (inlier information)—but may still struggle to comprehend how the shop would perform in winter (outlier information).

“A really good training set on representative data will lead to AI predictions that are high quality,” DiSorbo says. “If that training set becomes outdated, corrupted, or flawed, then the AI predictions will break down.”

Participants did the worst when they had to deal with both familiar and unusual data at the same time, since they overcorrected for familiar patterns and under-corrected for unusual ones. Their wildly inaccurate predictions were:

  • 143% worse than people focused only on unusual data.

  • 176% worse than those who only looked at familiar data.

These users fell victim to what the authors call “naive adjusting behavior.” Without clear signals about when the model was operating outside its comfort zone, they put too much weight on the algorithm’s predictions, failing to discount its advice when they should have.

Alerts reduce human errors

The researchers then introduced “endorsements”—cues to let users know the data was familiar and would likely create accurate answers—as well as “warnings”—signals that the algorithm might not be familiar with the information, so its answers might not be reliable.

The alerts were highly effective. They reduced errors by:

  • 28%
    using endorsements alone
  • 35%
    using endorsements alone
  • 49%
    using combination of endorsements and warnings

How AI alerts can help businesses

While the study focused on demand forecasting, Ferreira and DiSorbo say that simple design improvements, such as adding warnings and endorsements, could make algorithms more accurate for companies across industries. For instance:

  • Hospitals that use AI to analyze data to predict readmission rates might set warnings for unusual patient conditions.

  • Banks that algorithmically forecast loan defaults could use endorsements to affirm that credit scores are within an expected range.

  • UX designers at AI platforms can apply simple endorsements or warnings to direct user behavior.

Other potential business applications include forecasting supplier delivery timelines, customer churn, employee turnover, manufacturing quality, and conversion rates on discount offers.

“It could be used across any industry and any function,” Ferreira says. “The salient aspect is making a prediction about something uncertain—the future.”

Image created with assets from AdobeStock.

Have feedback for us?

Warnings and Endorsements: Improving Human-AI Collaboration in the Presence of Outliers

DosSantos DiSorbo, Matthew, Kris Ferreira, Maya Balakrishnan, and Jordan Tong. "Warnings and Endorsements: Improving Human-AI Collaboration in the Presence of Outliers." Manufacturing & Service Operations Management 27, no. 6 (November–December 2025): 1814–1831.

Latest from HBS faculty experts

Expertly curated insights, precisely tailored to address the challenges you are tackling today.

Strategy and Innovation

Social Responsibility

Data and Technology