GenAI is Learning on the Job

Studies at MIT and elsewhere seek hard data to support the market hype. Early results show progress, but a long way to go.

MIT IDE
MIT Initiative on the Digital Economy

--

Image: Thomson-Reuters

By Peter Krass

In the wake of industry claims and media reports about AI performing a wide range of human tasks — in some cases, better than humans — rigorous research to prove these claims remains a work in progress.

Early results show that generative artificial intelligence based on large language models (LLMs) can boost productivity for narrowly defined tasks. For example, several studies by MIT researchers indicate that AI can definitely help people attain specific goals such as dispelling belief in conspiracy theories, automating tasks with computer vision and making fairer decisions.

GenAI is also being used to help with writing tasks. At a recent seminar hosted by the MIT Initiative for a Digital Economy (IDE), Mina Lee, a postdoc at Microsoft Research, presented research on using ChatGPT and a custom text editor called CoAuthor to help writers generate text.

Lee and her colleagues found that when AI and humans worked iteratively together on writing projects, spelling and grammar mistakes dropped 14% while “vocabulary diversity” rose 4%.

Broad Tasks, Big Impact

Now researchers are asking about broader work areas such as investment advice and helping entrepreneurs. That’s where the results are more nuanced. High-quality and industry-specific training data for AI applications is crucial.

A team led by Andrew Lo, a professor of finance at MIT Sloan, studied whether AI could replace humans offering financial advice. Their initial findings show that GenAI can provide good financial advice, but only if a supplemental module containing finance-specific knowledge is added.

The outcome of work like this is critical to organizations — including Pirelli, Slack and Mayo Clinic — that are experimenting with AI tools that yield performance gains. In turn, corporate interest has sent the sales and stock prices of AI suppliers, notably Nvidia and Supermicro, soaring to record levels. Goldman Sachs Economic Research predicts that global investments in AI will approach $200 billion by as soon as next year.

Testing the Theories

But in the field, the devil really is in the details. One team of researchers recently explored whether AI could help the business efforts of hundreds of Kenyan entrepreneurs.

The research team was led by David Holtz, an assistant professor of management, entrepreneurship and innovation at the University of California’s Haas School of Business. He explained that he and four colleagues chose Kenya because getting business advice is especially challenging in developing countries. With that the case, the benefits of AI-generated advice could be especially large.

Holtz and his colleagues drilled down to learn whether GenAI advice could help business owners, even when the entrepreneurs were distracted or focused on the wrong tasks. The researchers also wanted to know how AI-driven performance gains — if any — would compare with those generated by other types of advice.

Creating a GenAI Mentor

To start, the researchers recruited some 640 Kenyan small-business owners, then randomly assigned each person to either a treatment or control group. Those in the treatment group were given a GenAI mentor, while those in the control group were given a PDF containing the text of a standard business guide.

The GenAI mentor was based on ChatGPT, an open-source LLM. It interacted with the entrepreneurs via WhatsApp, a messaging app used widely in Kenya.

To improve the AI’s performance, the research team also conducted what’s known as “prompt engineering.” This included instructing the AI to provide three to five pieces of actionable advice for each question asked by a human. The AI was also fine-tuned to tailor its answers to the kind of business in question, and to allow the human users to drill down on an answer for more details.

For example, the owner of a dairy products business asked the following question and received the answer below it from the AI mentor:

Human question: I sell milk, both wholesale and retail, but now I want to add something to the business to maximize my profit range. I don’t know what will blend well with milk.

AI answer: Tea and coffee. Cereal and porridge. Bread and pastries. Sweeteners. Other dairy products.

And here’s a question posed by the owner of a fast-food restaurant operating near a busy bus terminal:

Human question: How can I overcome the competitive business environment and be ahead of the rest of the pack?

AI answer: Differentiate your menu. Improve customer service. Offer a loyalty program. Increase your community involvement. Emphasize cleanliness and hygiene.

While the AI advice was plausible, the study’s results were sharply at odds with prior research. Previous work showed that AI helps lower-performing workers increase their work quality. But of the Kenyan experiment found just the opposite. Overall, the intervention had no statistically significant impact on the entrepreneurs’ business performance.

At the same time, among the high-performing Kenyan entrepreneurs, those using the AI mentor raised their business performance by an average of 15%. By contrast, the low-performing entrepreneurs using the AI mentor actually lowered their business performance by an average of 8%.

Prompts Matter: Advising the Advisor

One possible explanation could come from the quality, or lack thereof, of the entrepreneurs’ prompts for the AI. Were their questions formulated well? Did they provide all the necessary details? Did the low performers ask different types of questions? These and other related questions will no doubt be investigated by research in the future.

Holtz and his colleagues also found differences in the challenge levels. Higher-performing entrepreneurs tended to ask the AI mentor about less challenging business tasks. By contrast,

lower-performing managers tended to ask about business tasks that were more challenging and less specific.

In the following chart, the horizontal X axis illustrates the entrepreneurs’ business performance level (farther to the right is higher), and the vertical Y axis illustrates the challenge level of their questions (higher is more challenging):

Yet another possible issue, Holtz explained, is time management. Perhaps the low-performing entrepreneurs received good advice from the AI mentor, but were unable to implement it. After all, there are costs to implementing advice, both in time and capital.

Given these uncertainties, what does the experiment teach us about the state of AI advisors? “AI advice does actually matter,” Holtz said.

Entrepreneurs could be helped by using the right GenAI tool in the right way, he said. But that perfect formula for AI-human interaction is still a missing link between AI advancements and business success.

Peter Krass is a contributing writer and editor to the MIT IDE.

--

--

MIT IDE
MIT Initiative on the Digital Economy

Addressing one of the most critical issues of our time: the impact of digital technology on businesses, the economy, and society.