I’d been running a weekly synthesizer workflow for a few months. Every Sunday, it reads through all my meeting transcripts from the past week and spits out a summary of insights, patterns, and things I should pay attention to.
Good idea. But I’d just been running it on Claude because… that’s what I’d been using. I’d never actually tested whether it was the right choice.
Then one week, I got curious.
The Test
I ran the exact same workflow three times. Same prompt, same input data — a full week of meeting transcripts. The only variable was the model:
- Claude Sonnet 4.5 (cost: 214 credits)
- ChatGPT 5.2 (cost: 217 credits)
- Gemini 3.0 (cost: 274 credits)
Then I read all three outputs against what I actually knew about my week.
What Happened
ChatGPT came in last. Not slightly behind — genuinely last. It skipped things that had clearly happened. Missed patterns that were right there in the text. I don’t know if it was the context window, the summarization behavior, or something else, but the output was noticeably thinner than the other two.
Claude was the runner-up. Solid, complete, nothing obviously missing. But nothing surprising either. It gave me back a clean version of what I already knew.
Gemini 3.0 won. And it wasn’t really close.
Gemini surfaced stuff I would have missed on my own. Connections between meetings from different weeks. A theme that had been building for about six weeks that I hadn’t consciously noticed. Specific phrasing patterns from client calls that kept repeating.
That’s what I was building this workflow to catch. And only one model actually delivered it.
Why This Matters More Than You Think
Here’s the thing. I’ve been teaching AI productivity for a few years now, and one of the patterns I see constantly is what I call model loyalty. People start with one AI tool, get comfortable, and just… stay there.
It’s like finding a restaurant you like and never eating anywhere else.
The truth is, each model has different strengths. I think of it this way:
- ChatGPT is the daily driver. The reliable workhorse. Great for general tasks, quick answers, and anything where you need something done fast.
- Claude is the speedboat. Fast, precise, excellent at technical reasoning, writing, and nuanced analysis.
- Gemini is built on top of Google’s data. It handles long documents and large amounts of text better than the others, and it surfaces patterns across big datasets in a way that genuinely surprises me.
This is what I call being multi-tool native — you don’t pick one AI and stay loyal to it. You match the tool to the job.
The best AI users I know all do this. They route work to wherever it’ll get done best. Technical reasoning goes to Claude. Image analysis goes to Gemini. Quick daily-driver tasks go to ChatGPT. Recurring workflows that need integrations go to Lindy.
The Wrong Assumption
When I started testing, I assumed ChatGPT would do fine. It’s the most popular, the most talked about, the one most people default to. I figured it would at least be competitive.
I was wrong.
And this is the bigger lesson: most people’s assumptions about which AI is best are based on reputation, not testing. They go with the one they heard about first, or the one that feels most familiar, or the one their tech-savvy friend recommended two years ago.
That’s a terrible way to build a workflow.
The models are improving and changing constantly. The right call last year might not be the right call today. What works for one task category might fail on another.
The only way to know is to test.
How to Run Your Own Test
You don’t need to do anything complicated here. Pick one task you use AI for every week. Something recurring. Something where you have a sense of what good output looks like.
Then run it through two different models side by side this week.
That’s it. Just compare the outputs. Don’t add any other variables.
You’ll get a useful data point. And you’ll probably discover something that surprises you — the way my Gemini result surprised me.
From there, you can start routing intentionally. When you have a long document to synthesize, you know which model to reach for. When you need precise technical writing, you know where to go.
That’s how you actually get better results from AI… not by finding the one best model, but by knowing when to use each one.
One More Thing
I still run that weekly synthesizer on Gemini. Every Sunday, it reads the week’s transcripts, and every Sunday I read something I didn’t already know.
I’m spending 274 credits instead of 214. The extra 60 credits is maybe $0.10.
Best $0.10 I spend all week.
Want to learn how to build workflows like this? The Productivity Academy has workshops that cover AI agent design, tool selection, and getting real results from automation. Check it out.
