The agents are coming, and they represent a fundamental shift in the role artificial intelligence plays in businesses, governments, and our lives.
The biggest news in agentic AI happened this month when we learned that OpenAI’s agent, Operator, is expected to launch in January.
OpenAI Operator will function as a personal assistant that can take multi-step actions on its own. We can expect Operator to be put to work writing code, booking travel, and managing daily schedules. It will do all this by using the applications already installed on your PC and by using cloud services.
It joins Anthropic, which recently unveiled a feature for its AI models called “Computer Use.” This allows Claude 3.5 Sonnet to perform complex tasks on computers autonomously. The AI can now move the mouse, click on specific areas, and type commands to complete intricate tasks without constant human intervention.
We don’t know exactly how these tools will work or even whether they’ll work. Both are in what you might call “eta” — aimed mainly at developers and early adopters.
But what they represent is the coming age of agentic AI.
A great way to understand agents is to compare them with something we’ve all used before: AI chatbots like ChatGPT.
Existing, popular LLM-based chatbots are designed around the assumption that the user wants, expects, and will receive text output—words and numbers. No matter what the user types into the prompt, the tool is ready to respond with letters from the alphabet and numbers from the numeric system. The chatbot tries to make the output useful, of course. But no matter what, it’s designed for text in, text out.
Agentic AI is different. An agent doesn’t dive straight away into the training data to find words to string together. Instead, it stops to understand the user’s objective and comes up with the component parts to achieve that goal for the user. It plans. And then it executes that plan, usually by reaching out and using other software and cloud services.
1. Reasoning: At the core of an AI agent is an LLM responsible for planning and reasoning. The LLM breaks down complex problems, creates plans to solve them, and gives reasons for each step of the process.
2. Acting: AI agents have the ability to interact with external programs. These software tools can include web searches, database queries, calculators, code execution, or other AI models. The LLM determines when and how to use these tools to solve problems.
3. Memory Access: Agents can access a “memory” of what has happened before, which includes both the internal logs of the agent’s thought process and the history of conversations with users. This allows for more personalized and context-aware interactions.
“Reasoning” and “acting” (often implemented using the ReACT — Reasoning and Acting) framework) are key differences between AI chatbots and AI agents. But what’s really different is the “acting” part.
Do you see the paradigm shift?
Since the dawn of computing, the users who used software were human beings. With agents, for the first time ever, the software is also a user who uses software.
Because agents can access software tools, they’re more useful, modular, and adaptable. Instead of training an LLM from scratch, or cobbling together some automation process, you can instead provide the tools the agent needs and just let the LLM figure out how to achieve the task at hand.
They’re also designed to handle complex problem-solving and work more autonomously.
When futurists and technology prognosticators talk about the likely impact of AI over the next decade, they’re mostly talking about agents.
Agents will also give rise to new jobs, roles, and specialties related to managing, training, and monitoring agentic systems. They will add another specialty to the cybersecurity field, which will need agents to defend against cyber attackers who are also using agents.
In fact, AI smart glasses and AI agents were made for each other. Using streaming video from the glasses’ camera as part of the multimodal input (other inputs being sound, spoken interaction, and more), AI agents will constantly work for the user through simple spoken requests.
One trivial and perfectly predictable example: You see a sign advertising a concert, looking directly at it (enabling the camera in your glasses to capture that information), and tell your agent you’d like to attend. The agent will book the tickets, add it to your calendar, invite your spouse, hire a babysitter and arrange a self-driving car to pick you up and drop you off.
The key takeaway here is that while agentic AI sounds like futuristic sci-fi, it’s happening in a big way starting next year.
SOURCE: Elgan, Mike. ''AI agents are unlike any technology ever'' 22/11/2024. Computerworld.com. (https://www.computerworld.com/article/3608973/ai-agents-are-unlike-any-technology-ever.html#:~:text=AI%20agents%20will%20take%20over,and%20complicated%20tasks%20to%20agents.).