What 2026 Looks Like to Me
I stopped using ChatGPT a few months ago. Not for any dramatic reason, but for something far simpler (at least on the surface). The voice changed after GPT-5 shipped and I just couldn’t “vibe” with it anymore. Something about our conversations changed and after a few weeks of trying to adjust I just… stopped. I switched back to using Claude for most things, kept Gemini around for research, moved on.
What surprised me was how quickly it happened. I’d been all in on ChatGPT for months. Built custom GPTs, created workflows using it, and even developed what felt like a working relationship with it. But when GPT-5 changed, so did all of my interactions with it. I realized the “relationship” I subconsciously thought I was building had been one-sided all along, because there was nobody there on the other side. I knew this intellectually but that doesn’t change how it felt.
I have never really evaluated AI based on reasoning capability. Nor have I ever gone looking for AI scoresheets to see which model was winning that week. I responded to how it felt to work with the AI, and that feeling ultimately boils down to trust. The kind of trust that requires “someone” to be accountable when things change.
That’s why I think most leaderboard debates miss the point entirely.
The LLM Leaderboard Doesn’t Matter
Gemini, Claude, and ChatGPT have been jockeying hard for top position on the leaderboard over the last month. They all continue to get dramatically better and faster, but the questions are getting louder: do they actually drive value? Is the ROI there? I’m not qualified to answer that, but people who are say that a reckoning is coming soon.
Stanford’s Human-Centered AI Institute is calling 2026 the year of AI evaluation. The question shifts from “Can AI do this?” to “How well, at what cost, and for whom?” James Landay, HAI’s co-director, predicts we’ll hear more companies say that AI hasn’t shown productivity increases except in certain target areas. If they’re right, we’ll hear about a lot of failed AI projects. Who knows, maybe the speculation around AI might be ebbing, or as Gartner made famous, we might be heading for the “trough of disillusionment”.
The Ceiling for LLMs is Real
When GPT-5 finally shipped in August, the response was so underwhelming that people online started calling it “Gary Marcus Day.” Marcus has been arguing for years that LLMs have fundamental limits, and the GPT-5 launch fit that thesis. Three years of development, billions of dollars, and the result was what Marcus called “moderate quantitative improvement” that still fails in the same qualitative ways as its predecessors.
I’m not an AI researcher, but I’ve been watching this closely because it matters for how I think about these AI tools that I’m heavily using. What keeps catching my eye is that the wall street investment thesis assumes these models keep getting smarter at judgment. If that assumption proves wrong (if we’re hitting real architectural limits, not just temporary setbacks) then a lot of organizational bets are about to look suspect.
The Hard Stuff Was Never the Software
Nate Jones has been talking about software “decoupling into substrate” (the back office systems of record, data models, compliance infrastructure that actually hold value) and pixels (the interfaces that can increasingly be generated on demand by AI). His argument is that if your moat is your interface, you’re exposed. Agents will start routing around your UI. SaaS companies will face this pressure starting now, changing how they measure engagement and putting pressure on traditional pricing models.
For those of us in service-based industries, this shows up differently. Our moat was never the interface. The hard stuff in telecom, distribution, logistics, operational businesses has always been in the service itself: in field operations, regulatory complexity, domain expertise, the actual service delivery that the digital product supports. AI will improve the software layer, but that was never where the value lived anyway.
The services companies that show real ROI next year will be the ones who understood what AI actually does well (pattern recognition, drafting, synthesis, coordination) versus what it doesn’t do at all (judgment, verification, accountability) and designed their human/AI workflows accordingly. That organizational work has a long ramp, because changing human process and mindset takes years, not months. The companies seeing real value from AI deployments are doing that unglamorous internal work right now.
I wrote about this in “The Partnership Matrix”, noting the winners are the ones who design partnerships where human judgment and AI capability reinforce each other. That still holds true, but the 2026 version adds a sharp edge. When things go wrong (and they will) who is going to answer for it? The organizations that skipped the partnership design work will face the hard questions about failed investments with no one to point to but themselves.
2026 Brings Clarity to AI Value
Azeem Azhar frames 2026 as a reckoning year. The crazy level of AI investment means companies can’t keep saying “we’re investing in AI” forever. The bill comes due. Either you show ROI or you explain why you haven’t. The “we’re building for the future” answer stops working when the future arrives and the productivity gains aren’t there.
For leaders and teams building products in operational businesses, that’s actually helpful. The questions move toward whether you’re helping your organization capture real value from these tools, while keeping humans in the loop for the judgment and accountability that those tools fundamentally cannot do.
The work of figuring out how to collaborate with systems that can make decisions all day long but can never answer for those decisions, that remains ours. It’s been the thread running through everything I’ve written about AI this year, and it’s the thread I expect to keep pulling on in 2026.