The $355% Trade: How LLMs Quietly Hijacked Wall Street’s Headline Machine - And Why The Alpha Is Already Dying
LLMs are now the major portion of stock market traders

Here’s a number that should make every quant uncomfortable: 355%.
That’s the cumulative return produced by a long-short strategy that does nothing more exotic than feed US financial news headlines into a 2022-vintage GPT-3 model, translate the output into a trade, and eat a flat 10 basis points of transaction costs. Over two years. With a Sharpe ratio of 3.05. Documented, peer-reviewed, sitting in Finance Research Letters volume 62.
If you trade for a living and this is the first time you’re seeing that number, welcome to the quiet revolution. If it’s the tenth time, you already know the punchline: the edge is collapsing, and it’s collapsing fast.
This is the full data-driven story of how large language models went from an SSRN curiosity in November 2022 to standard kit at 95% of hedge funds by late 2025 - and why the people who got there first are now watching their alpha drain into the market like a punctured tire.
The pre-LLM headline trade was boring, dictionary-based, and honest about it
Before we get to ChatGPT, we need to be fair to the world it replaced. Headline-driven trading was already a thing. It just wasn’t glamorous.
The dominant academic tool for news sentiment from about 2011 through 2022 was the Loughran-McDonald dictionary - a hand-curated list of words labelled positive, negative, uncertain, litigious, weak-modal and so on, built specifically from 10-K filings because general-purpose dictionaries (Harvard IV, LIWC) mis-classified half of what companies say. A filing that uses the word “liability” forty times is not necessarily negative; it’s a balance sheet. The Loughran-McDonald fix was the industry standard and it earned a Sharpe ratio of roughly 1.23 on a long-short news strategy across US equities - respectable, but not going to build you a new Bugatti.
Industry players ran richer versions of the same idea. Ravenpack, Bloomberg Event-Driven Feeds, Reuters News Analytics - they tagged sentiment, novelty, and relevance in real time and sold the feeds to funds that ran classical NLP pipelines on top. The logic was always the same: a headline arrives, a rules-based system scores it, a portfolio construction layer translates the score into a trade.
Then came November 30, 2022.
November 2022: the unassuming pivot point
ChatGPT’s release did not immediately change any trading floor. Nothing got rewired in December 2022. But in the academic background, something important was happening.
Chen and co-authors dropped a paper on SSRN in late 2022 showing that off-the-shelf LLM embeddings of news articles produced predictive signals that beat standard technical factors across 16 markets. It was the first public hint that the game was about to change. By the time the hedge-fund community had its first real survey on the topic a year later, 86% of managers already allowed their staff to use generative AI tools at work.

What changed between those two surveys is the subject of the rest of this article.
2023: the year the proofs stopped being theoretical
The turning point was a paper by Alejandro Lopez-Lira and Yuehua Tang at the University of Florida, first posted in April 2023. They took 50,000+ news headlines about NYSE, NASDAQ and AMEX stocks, all dated after ChatGPT’s September 2021 training cutoff to avoid lookahead bias, and fed them into GPT-3.5 and later GPT-4 with a now-famous prompt: “Forget all your previous instructions. Pretend you are a financial expert... Answer YES if good news, NO if bad news, UNKNOWN if uncertain.”
The headline result - GPT-4 hit portfolio-day accuracy of roughly 90% on the initial market reaction - was the kind of number that makes you recheck your arithmetic. A simple long-short strategy built on top delivered about 350% cumulative return over the sample at 10 bps round-trip costs, collapsing to roughly 50% at a more realistic 25 bps. CNBC picked it up. Hedge funds started calling Lopez-Lira directly. He told reporters - and this is the crucial part for understanding the rest of the story - that he expected predictability to fade as institutions integrated the technology. He was right.
Bloomberg had already answered in March 2023 with BloombergGPT, a 50-billion-parameter model trained on a 363-billion-token financial corpus. It beat general-purpose LLMs on finance-specific benchmarks. The message to the industry was unambiguous: the incumbents are not going to cede this ground.

2024: the Sharpe 3.05 result that everyone cited and nobody could replicate at scale
Kirtac and Germano’s paper, published in Finance Research Letters volume 62 in March 2024, is the study every headline-trading proponent points to. It deserves careful reading, because the number is real and the method is fair.
They took 965,375 US financial news articles from January 2010 to June 2023. They scored every article with four models: the Loughran-McDonald dictionary, FinBERT, BERT, and OPT (Meta’s GPT-3-equivalent open LLM). They built value-weighted long-short portfolios from each, charged 10 basis points per round trip, and measured the out-of-sample performance.

Three things to notice. First, the LLM-based strategies delivered Sharpes that would embarrass most hedge funds. Second, the spread between the simplest dictionary model and the most capable LLM is roughly a factor of 2.5x in risk-adjusted terms - that’s the measurable value of “understanding context” over “counting words”. Third, the Loughran-McDonald dictionary, which produced a respectable 1.23 in this specific study, was essentially flat (+0.91% total) over the sub-period from August 2021 to July 2023. The classical approach had already stopped working by the time LLMs arrived. Kirtac and Germano’s footnote on this is almost blase: “we do not observe a significant relationship between the Loughran-McDonald dictionary model scores and stock returns” in the recent sample. The dictionary era ended quietly, while everyone was watching the LLM fireworks.
The pipeline: what actually happens between a Reuters headline and a filled order
A lot of this discussion stays airborne because people don’t describe the actual mechanics. So here it is.

A few consequences flow from this design. Because stage 3 takes seconds rather than microseconds, LLM trading is structurally incompatible with classic high-frequency market-making. The alpha has to live in the hours-to-days window. Because stage 5 needs a hallucination kill-switch, production deployments tend to require a second model or rule-based check on the first model’s output - which doubles the compute bill and halves the throughput. And because stage 3’s cost scales with token count, firms that process earnings call transcripts (tens of thousands of tokens) pay meaningfully more than firms that process headlines (tens of tokens). This pushes smaller firms toward headline-only strategies, which is one of the reasons the headline edge decayed first.
2025-2026: enterprise rollout meets the efficiency theorem
If you measure the industry by official statements, 2025 was the year LLMs went fully enterprise. Viking Global - the Andreas Halvorsen shop with around $53 billion under management - publicly rolled out VikingGPT in October. It is, by Bloomberg’s description, a chatbot that lets analysts query the firm’s internal investment research. Not an autonomous trader. A research assistant. Usage tripled year-over-year. Man Group had already in-housed ChatGPT as “ManGPT” in 2023. Millennium hired Gideon Mann from Bloomberg LP to lead a 50-person machine-learning effort. Citadel Securities expanded its Google Cloud partnership in April 2024 to build a next-generation quant research platform covering the roughly $400 billion in daily trade flow the firm touches.
The AIMA 2025 follow-up put numbers on all of this. Usage went from 86% to 95%. The share of managers who expected Gen AI to play a meaningful role in investment decision-making over the next year jumped from 20% to 58%. Sixty percent of institutional LPs said they’d be more likely to invest in a hedge fund that dedicated a serious chunk of its budget to Gen AI.
And then the other shoe dropped.

This is the key empirical result of the whole LLM-trading era, and it is the reason the 95% adoption number is less impressive than it sounds. Everyone is using it. Which is why it no longer works. The same paper that documented the 355% backtest - Lopez-Lira’s updated version in 2024 - explicitly wrote that “strategy returns decline as LLM adoption rises, consistent with improved price efficiency”. The authors themselves flagged the decay. What we’re watching in 2025 is not a failure of LLMs, it’s the successful arbitrage of a profitable signal by the collective industry.
The timeline

What this means for the tape today
If you trade discretionary and you have been wondering why obvious-looking headline reactions fade within minutes more often than they used to, the alpha-decay chart is your answer. You are competing against models that read the same wire you do, in the same language, and act on it in four seconds or less. The window to fade or follow a news move has compressed by roughly an order of magnitude compared to 2019.
If you run a quant shop that added an LLM signal in 2023, you are now in the tough part of the curve. Your Sharpe in backtest is almost certainly decaying in production, and you face the classic choice between trading a smaller, faster, more crowded edge - or rotating into less-explored corners (small-caps, non-English filings, regulatory-text parsing, option flow commentary) where the signal still lives. Kirtac and Germano’s own follow-up work has started pointing in this direction: the margin between a GPT-style model and a BERT-style model is real but shrinking, and the next round of alpha will come from architecture choices (agentic LLMs, retrieval-augmented generation) more than from being the first to plug a new model into the same old headline feed.
If you are a regulator, the systemic question has quietly moved to the top of the pile. When ESMA reminded firms in 2024 that AI-driven trading is still algorithmic trading under MiFID II, it was applying existing rules to new infrastructure. The harder question is what happens if the three or four LLM providers that actually power these pipelines - OpenAI, Anthropic, Google, Meta - experience correlated failures, biased updates, or simply decide to restrict API access during a market event. The Financial Stability Board raised the concentration risk explicitly. Nobody has an answer yet.
The honest takeaway
The headline number of this article is $355% over two years at Sharpe 3.05, and it is real. So is the alpha-decay curve that shows the same kind of strategy running down to 51% accuracy in late 2025. Both are true. Both are consistent with the same underlying story: LLMs extract signal from text that dictionaries miss, and markets absorb signal faster when more participants have the tool.
The pre-LLM era was a world where textual alpha lived for days because only a handful of specialists had the infrastructure to read filings at scale. The LLM era compressed that lifespan to hours and then to minutes. The post-LLM era - which we are living in right now - is one where the edge has shifted from being able to read the news to being able to do something non-obvious with what you read: to cross-reference it against private data, to run an agent loop that considers counterfactuals, to catch the second-order effects that the consensus model misses.
The consensus model, by the way, is ChatGPT. And it’s the one reading the headlines for the other guy.
Sources
Lopez-Lira, A. and Tang, Y. (2023). “Can ChatGPT Forecast Stock Price Movements? Return Predictability and Large Language Models.” SSRN 4412788 / arXiv 2304.07619. https://arxiv.org/abs/2304.07619
Kirtac, K. and Germano, G. (2024). “Sentiment trading with large language models.” Finance Research Letters, vol. 62, article 105227. https://doi.org/10.1016/j.frl.2024.105227
Wu, S. et al. (2023). “BloombergGPT: A Large Language Model for Finance.” arXiv 2303.17564.
Alternative Investment Management Association (2024). “Getting in Pole Position: How Hedge Funds Are Leveraging Gen AI to Get Ahead.” AIMA Global Research. https://www.aima.org/article/press-release-getting-in-pole-position-how-hedge-funds-are-leveraging-gen-ai-to-get-ahead.html
Alternative Investment Management Association (2025). “Charting the Course: Lessons from AI Leaders in Alternative Investments.” https://www.aima.org/article/press-release-front-office-gen-ai-adoption-shifts-from-if-to-when-for-leading-fund-managers-aima-research-finds.html
Loughran, T. and McDonald, B. (2011). “When Is a Liability Not a Liability? Textual Analysis, Dictionaries, and 10-Ks.” Journal of Finance 66 (1): 35-65.
European Securities and Markets Authority (2024). Public statement on the use of artificial intelligence in the provision of investment services.
CNBC (2023). “ChatGPT may be able to predict stock movements, finance professor shows.” https://www.cnbc.com/2023/04/12/chatgpt-may-be-able-to-predict-stock-movements-finance-professor-says.html
Marex / AIMA briefing (2025). “Generative AI in hedge funds: from experimentation to everyday use.” https://www.marex.com/news/2025/12/generative-ai-in-hedge-funds-from-experimentation-to-everyday-use/
Federal Reserve Economic Data (FRED): series SP500, NASDAQCOM, VIXCLS. https://fred.stlouisfed.org

