AI Agents Need To Come With An Emergency Button

Some of you may remember a newsletter I sent out a month ago, saying how a startup in China has released a new AI agent that is getting raving reviews.

Following that, the startup Monica gave me early access to Manus AI with plenty of free credits to play around with the AI agent.

Now that I have first-hand experience, I can confirm, it really ups the game on the AI front.

Its ability to define the checklist needed to take an objective to completion and perform one task after another to get there is breathtaking.

So, the one big task I chose to perform with Manus AI was to high-level design a game where the player would live through Indian history by trying to build and defend cities.

Manus had clear instructions — we would brainstorm to come up with the perfect gameplay, but then only code a minimally viable game.

Instead, it tried to achieve the “objective” anyhow within a single execution — and we ended up with this landing page for the game it deployed on its own, rather than having an MVP of a gameplay.

If you think, I could just ask it to refine and go back to what we were trying to do, uh oh, bad luck, we are all out of credits.

So, I ended up blowing all of those fancy free credits Monica gave me, just so we could all learn this valuable lesson. We need a red button on AI.

AI agents are all the rage. And I am very, very optimistic about them as somebody who is building multiple AI agents of my own.

But, we need to acknowledge the negative side.

That AI agents, even top-of-the-line ones, are not yet ready to perform complex tasks autonomously.

This needs to be on the label.

Because users can’t be expected to have blown up 100s (or even 1000s) of dollars before realizing this.

This isn’t just a case with Manus AI, though. This is a similar problem I face from time to time with Cursor — my go-to AI agent for coding.

Sometimes, it introduces unnecessary components or even new libraries, entirely different from what you are currently using, into your code.

Take your eyes off what the agent is doing for a second, and you may just end up with a disaster of epic proportions.

Now, to be fair, Cursor does have a “Reject All” button. But this only works if you review the code immediately.

Many times, you may notice the issue after doing some more work on the code, and then a git rollback remains the only option.

This isn’t to dissuade you from trying out AI agents (they really are the future), but just something I needed to say out loud, especially since I have contributed to the hype around them.

AI agents are currently already a viable option to end-to-end code and deploy things like landing pages, classic games, and in-depth research on any given topic.

I just want that button within reach on my desk that, when hit, just stops whatever task the AI is executing at a time, and just rolls the whole thing back.

This post is an excerpt from the 12th edition of the Artificially Boosted newsletter.

Source link