I’m always on the lookout for the next big thing in AI, and I think I just found it. I was watching a video from an AI professional who broke down Anthropic’s latest release, and honestly, my mind is blown.
Even though ChatGPT gets way more traffic (we’re talking 3.1 billion visits a month vs. Claude’s 70 million!), this channel has been saying for a while that Claude’s models are top-tier. Now, they’ve dropped something new that could change the game.
✨ The New Sheriff in Town: Claude 3.5 Sonnet
The big news is the release of Claude 3.5 Sonnet, an upgraded version of what was already one of the best models out there. The expert in the video points out that on paper, this new model is crushing competitors like GPT-4o and Gemini 1.5 Pro in benchmarks.
Of course, benchmarks are one thing, but real-world tests are another. The YouTuber put its reasoning to the test with some classic riddles, and the results were impressive.
📌 Reasoning Test Results:
- Strawberry Riddle: Correctly identified the three ‘r’s.
- Two Fathers, Two Sons: Nailed the classic grandfather/father/son logic.
- 25 Horses Puzzle: This is a tough one! The model correctly determined the minimum number of races is seven, showing its step-by-step thought process.
It wasn’t perfect, though. Just like other models, it failed to correctly count the number of words in its own response. Still, the reasoning power is seriously strong.
💻 A Mixed Bag on Coding
Next, the creator tested its coding skills. The results here were a bit hit-or-miss.
- The Fail: The YouTuber asked it to code a game of Checkers. After three different prompts and a few fixes, the game logic was still broken. The creator noted he usually gets a better result from GPT-4o on the first try.
- The Win: He then asked for a game of Tetris as a standalone app, and it worked perfectly right out of the gate!
So, while it’s clearly a capable coder, it might need a little hand-holding on more complex logic.
🤖 The Real Game-Changer: “Computer Use”
This is the part that got me really excited. The mind behind this video shared a demo of a new beta feature called “Computer Use.” This isn’t just a chatbot; it’s an AI agent that can actually use your computer.
The demo showed Claude navigating between a spreadsheet and a CRM, pulling information, and filling out a web form completely on its own. It sees the screen, moves the cursor, and types just like a person would. This is the kind of automation we’ve been promised!
This feature is still in beta for developers, but I think this is a massive leap forward for AI agents.
This new Claude 3.5 Sonnet is definitely a beast. For the full breakdown and to see the model fail and succeed in real-time, you have to watch the original video from the creator!