Generative AI has been perceived as so revolutionary that some global experts have deemed it the “steam engine of the fourth revolution.” Despite such grand analogies, we’re just beginning to learn how gen AI affects workforce skills and productivity.
A notable early study showed that gen AI made call-center employees more productive, with novice workers enjoying the greatest gains, but it remained unclear how gen AI might affect more complex intellectual work.
A new study from Harvard Business School suggests that gen AI lifts productivity even for highly skilled consultants by helping with tasks like product ideation and report drafting. But more surprisingly, gen AI often lowers productivity too, specifically if it’s asked to use mixed data to strategize.
The research team—including faculty from HBS, UPenn, and MIT—labels these uneven results “the jagged technological frontier.” AI has annexed some tasks, while others remain beyond its grasp. The challenge for businesses is to “navigate” this fluid frontier.
The research team asked 758 consultants at the prestigious firm Boston Consulting Group to complete tests in one of three conditions: without AI, with OpenAI’s GPT-4, or with GPT-4 plus advice on prompt engineering. The research team did two experiments. The first gave consultants tasks inside the frontier—meaning tasks that AI tends to do well—and the second gave tasks outside it.
In the first experiment, consultants had to bring a new product (a new shoe) to market, starting with idea generation and ending with a report outline. The experiment tested creativity, analytical thinking, writing proficiency, and persuasiveness.
Consultants tended to do better with help from AI, and lesser-skilled consultants improved the most—a result consistent with other research. Notably too, consultants did best when they had access to both AI and advice on prompt engineering.
Among human evaluators, the quality of consultants’ output rated 38% higher when consultants worked with GPT-4 alone and 42.5% higher when they were also given prompt advice to increase their familiarity with AI. Consultants were also able to complete more tasks within a 30-minute window. On average, consultants with GPT-4 completed 12.2% more tasks compared to the control group.
On other tasks, AI harmed performance. In the second experiment—designed around tasks outside the frontier—consultants who had access to AI performed worse than humans working alone.
In this experiment, consultants were given a business case and tasked with giving “actionable strategic recommendations to a hypothetical company”—a test which BCG uses during interviews. Consultants had to base their recommendations on quantitative data in a spreadsheet and qualitative data from interviews with “company insiders.”
Consultants’ answers were evaluated for “correctness.” Without AI, consultants answered correctly 84.5% of the time. Among consultants who used AI alone, that number dropped to 60%. Consultants who also had prompt advice performed slightly better, at 70%.
Why AI harmed performance in this second experiment is ambiguous, but AI seems to struggle with synthesizing two data sources that added subtlety to a task. The research team explains that the quantitative data “was designed to seem to be comprehensive,” but “a careful review of the interview notes revealed crucial details.”
The report also raises new questions about why some employees use AI more productively than others. The researchers found that employees succeed most when they act like “Centaurs” or “Cyborgs.”
Centaurs delegate some tasks to AI and others to human intelligence in discrete chunks. Just as the Centaur’s body is segmented—human intelligence above, equine power below—so is this method for using AI. In contrast, Cyborgs turn to AI whenever it helps, weaving AI into an entire workflow. Why both strategies are effective is not clear and awaits further research.
While the study suggests AI is quickly becoming an indispensable tool in even complex intellectual work, this welcome result—from a productivity perspective—comes with risk.
“An immediate danger … is that people will stop delegating work inside the frontier to junior workers, creating long-term training deficits,” warns the report.
AI may pose risks for creativity too. While AI tends to augment creativity for individuals, it appears to reduce creativity across groups. In the two experiments, AI helped individual consultants generate more and better product ideas, but their ideas tended to converge. Ideas were more diverse among the group of consultants who didn’t use AI.
Research on AI is evolving rapidly, but among this crowded field, this study provides noteworthy guidance for mapping an AI strategy that cuts through the hype and focuses on results.
MIT recently held a Generative AI symposium on campus as part of their GenAI Week. You can watch the highlights here.

