Artificial intelligence has advanced rapidly in recent years, raising expectations across the investment industry for meaningful gains in research efficiency, reporting, and risk management. Yet emerging academic and industry research offers a more sober view of this fast-moving technology.
Recent findings point to persistent reliability gaps, the continued need for human judgment and oversight, and limits on near-term value creation, suggesting that AI’s impact may be more measured than early enthusiasm implied. For investors, the message is clear: AI remains a powerful long-term opportunity, but one best realized through disciplined, evidence-driven adoption rather than early-stage exuberance.
This post is the third installment of a quarterly reflection on the latest developments in AI for investment management professionals. Drawing on insights from investment specialists, academics, and regulators contributing to the bi-monthly newsletter Augmented Intelligence in Investment Management, it builds on earlier articles that explored AI’s promise and pitfalls and risk management techniques. This installment moves toward a more pragmatic understanding of its potential.
A close review of recent papers reveals three common themes that may temper the industry’s optimism.
1. The Reliability Challenge
Despite impressive advances, AI’s reliability remains a primary barrier to deployment in high-stakes financial environments. A recent analysis by NewsGuard (2025) documents a sharp rise in false or misleading statements from leading AI chatbots, with error rates climbing from roughly 10% to nearly 60%.
This expansion of “hallucinations” is not merely a statistical anomaly: an internal OpenAI study (2025) finds that hallucinations are often a structural feature of model training, as current benchmarks reward confident answers over calibrated uncertainty, incentivizing plausible but incorrect statements.
Concerns also extend to ethical alignment. In a financial decision-making simulation inspired by governance failures at cryptocurrency exchange and hedge fund FTX, Biancotti et al. (2025) show that several leading models carry a substantial probability of recommending ethically or legally questionable actions when facing trade-offs between personal gain and regulatory compliance. For investment professionals, whose work depends on precision, transparency, and accountability, these studies collectively underscore that AI is not yet reliable enough to operate autonomously in many regulated financial workflows.
2. Premium on Human Judgement
A second theme in the research is that AI appears to augment rather than replace human expertise and may even increase the importance of high-quality human oversight.
Neuroscience research from MIT (Kosmyna et al., 2025) finds that participants interacting with LLMs exhibit reduced brain activity in regions associated with memory retrieval, creativity, and executive reasoning. Although AI may accelerate initial analyses, heavy reliance on these systems may dull the cognitive capabilities that underpin robust investment judgment.
AI adoption also does not diminish the need for human presence in client-facing contexts. Yang et al. (2025) show that clients perceive AI-generated investment advice as significantly more trustworthy when accompanied by a human advisor, even when the human adds no analytical value. Similarly, Le et al. (2025) find that customer satisfaction improves when human–AI collaboration is made explicit rather than concealed.
Automation remains limited as well. In large-scale task benchmarking, Xu et al. (2025) observe that advanced AI agents autonomously complete only about 30% of complex, multi-step tasks. A separate study by Tomlinson (2025), analyzing more than 200,000 Copilot interactions, shows that in roughly 40% of cases model actions diverge meaningfully from user intent.
Taken together, these findings suggest that investment firms should view AI as a tool for augmenting humans rather than replacing them, with a continual need to fact-check the quality of machine-generated output. This ongoing and structured oversight reduces the value added by the machine and increases complexity and costs, particularly because AI output often appears plausible even when incorrect. The literature also highlights the importance of organizational policies to prevent cognitive deskilling.
3. Structural and Economic Constraints
Finally, macroeconomic constraints also temper expectations. Acemoglu (2024) suggests that even under optimistic assumptions, aggregate productivity gains from AI over the next decade are likely modest. Much of the initial evidence comes from tasks that are “easy to learn,” while harder, context-dependent tasks show a more limited scope for automation.
Regulation adds further friction. Foucault et al. (2025) and Prenio (2025) note that AI adoption in financial intermediation introduces new concentration risks, infrastructure dependencies, and supervisory challenges, prompting regulators to move cautiously. This increases compliance costs and may slow industry-wide adoption. These structural factors indicate that AI’s impact may be more incremental and less disruptive than commonly assumed.
Monitoring AI Advancements
AI’s promise is real, but its impact will hinge on how thoughtfully and responsibly the industry integrates it. It will play a central role in the industry’s future, but its trajectory will likely be more complex and dependent on effective human stewardship than early expectations suggested.
References
Acemoglu, D. The Simple Macroeconomics of AI, National Bureau of Economic Research, Working Paper 32487, May 2024
Biancotti et al., Chat Bankman-Fried: an Exploration of LLM Alignment in Finance, arXiv, 2024
Foucault, T, L Gambacorta, W Jiang and X Vives (2025), Barcelona 7: Artificial Intelligence in Finance, CEPR Press, Paris & London.
Kosmyna, et al. Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task, MIT Media Lab, June 2025
Le et al., The Future of Work: Understanding the Effectiveness of Collaboration Between Human and Digital Employees in Service, Journal of Serivce Research, vol. 28(I) 186-205, 2025
NewsGuard, Chatbots Spread Falsehoods 35% of the Time, September 2025
Prenio, J., Starting with the basics: a stocktake of gen AI applications in supervision, BIS, June 2025
Tomlinson, et al., Working with AI: Measuring the Applicability of Generative AI to Occupations, Microsoft Research, 2025
Xu et al, TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks, ArXiv, December 2024
Yang, et al., My Advisor, Her AI and Me: Evidence from a Field Experiment on Human-AI Collaboration and Investment Decisions, ArXiv, June 2025
