From agentic systems to zero-shot prompting, generative AI can feel like a new language. Here are the terms CIOs need to know.
As abruptly as generative AI burst on the scene, so too is the new language that’s come with it. A complete list of AI-related vocabulary would be thousands of entries long, but for the sake of urgent relevance, these are the terms heard most among CIOs, analysts, consultants, and other business executives.
Agentic systems
An agent is an AI model or software program capable of autonomous decisions or actions. When multiple agents work together in pursuit of a single goal, they can plan, delegate, research, and execute tasks until the goal is reached. And when some or all of these agents are powered by gen AI, the results can significantly surpass what can be accomplished with a simple prompt-and-response approach. Gen AI-powered agentic systems are relatively new, however, and it can be difficult for an enterprise to build their own, and it’s even more difficult to ensure safety and security of these systems.
“Agents and agentic AI is obviously an area of enormous investment for VCs and startups,” says Gartner analyst Arun Chandrasekaran. “And we’ll perhaps see more agent frameworks evolve and mature in 2025.”
Alignment
AI alignment refers to a set of values that models are trained to uphold, such as safety or courtesy. But not all companies share the same values, and not all AI vendors make it clear exactly which values they’re building into their platforms.
“It’s an issue, and it’s not easy to solve,” says JJ Lopez Murphy, head of data science and AI at Globant. “There’s only so much you can do with a prompt if the model has been heavily trained to go against your interests.”
Black box
A model whose internal mechanisms are not clearly understandable and the inner processes are concealed, making it difficult to tell how the model comes up with its answers. This is a significant problem for enterprises today, especially with commercial models.
Context window
The number of tokens a model can process in a given prompt. A token is, on average, three-quarters of a word. Large context windows allow models to analyze long pieces of text or code, or provide more detailed answers. They also allow enterprises to provide more examples or guidelines in the prompt, embed contextual information, or ask follow-up questions.
At press time, the maximum context window for OpenAI’s ChatGPT is 128,000 tokens, which translates to about 96,000 words or nearly 400 pages of text. Anthropic released an enterprise plan for its Claude model in early September with a 500,000 token window, and Google announced a 2 million token limit for its Gemini 1.5 Pro model in June, which translates to about 1.5 million words or 6,000 pages of text.
Distillation
The process of reducing the size of one model into a smaller model that’s as accurate as possible for a particular use case.
“Using models that have been distilled or pruned during training can provide a similar level of performance, with fewer computational resources required during inference,” says Ryan Gross, senior director of data and applications at Caylent, a cloud consultancy. That means they use less memory and can answer questions faster and cheaper.
Embeddings
Ways to represent text, images, or other data so similar objects can be located near each other. This is typically done using vectors in multi-dimensional space, where each dimension reflects a particular property about the data. They’re typically stored in a vector database and used in conjunction with retrieval augmented generation (RAG) to improve the accuracy and timeliness of AI responses.
Fine-tuning
The process of further training a pre-trained model on a specific dataset to adapt it for particular tasks. Companies typically start with either a commercial or open-source model and then fine-tune it on their own data to improve accuracy, avoiding the need to create their own foundation model from scratch. “Training is most expensive,” says Andy Thurai, VP and principal analyst at Constellation Research. “Fine tuning is second most expensive.”
Foundation models
Large gen AI models typically trained on vast data sets. Most common examples include LLMs like ChatGPT and image models like Dall-E 2. Individual enterprises typically don’t train their own foundation models. Instead, they use a commercially available or an open-source one, and then customize or fine-tune it for their own needs. Foundation models can also be used as is, without additional fine-tuning, with RAG and prompt engineering.
Grounding
Since gen AI models don’t actually remember their training data — just the patterns they learned from that training data — the accuracy of responses can dramatically vary. This can be a significant problem for enterprise use cases, as AI models can give responses that appear correct but be entirely wrong. Grounding helps reduce this problem by providing the AI with the data it needs. For example, a user asking an AI about how to use a particular product might paste the context of the product manual into the prompt.
Human in the loop
For many use cases, gen AI isn’t accurate, comprehensive, or safe enough to use without human oversight. A human in the loop approach involves a person reviewing the AI outputs before they’re used. “I’m a big advocate of making sure the human reviews everything the large language model produces — code, content, pictures — no matter what,” says Iragavarapu.
Prompt
The input given to a gen AI model, or the question sent from a user to a chatbot. In addition to a question, prompts can also include background information that would be helpful in answering the question, safety guidelines about how the question should be answered, and examples of answers to use as models.
Responsible AI
Development and deployment of AI systems with consideration of ethics, bias, privacy, security, compliance, and social impacts. Responsible AI can help increase trust on the part of customers, employees, and other users and stakeholders, as well as help companies avoid public embarrassment and stay ahead of regulations.
PwC’s responsible AI lead Ilana Golbin Blumenfeld recommends that enterprises start by defining their responsible AI principles that will guide the development and deployment of AI systems. They could include fairness, transparency, privacy, accountability, and inclusivity. She also recommends that companies maintain human oversight and accountability. “Design AI systems to augment human decision-making, rather than replace it entirely,” she says.
Small language model
The best-known gen AI models, like OpenAI’s ChatGPT or Anthropic’s Claude, are LLMs, with tens or hundreds of billions of parameters. By comparison, small language models typically have 7 or 8 billion and can offer significant benefits for particular use cases. “Smaller models generally cost less to run but may offer reduced accuracy or capability,” says Caylent’s Gross. But choosing the right model size for the specific task can optimize costs without compromising performance too much, he adds.
Zero-shot prompting
A gen AI use case in which the user doesn’t provide an example of how they want the LLM to respond, and is the simplest way of using a gen AI chatbot. “With zero-shot, anyone can get in front of a gen AI tool and do something of value to the business,” says Sheldon Monteiro, chief product officer at Publicis Sapient. “Like a developer going in and saying, ‘Help me write code.'”
Other common zero-shot prompt examples include general knowledge questions or requests to summarize a piece of text. By comparison, few-shot prompting requires the user to provide examples to guide the AI. For example, a user looking for a sales letter might provide instances of previous sales letters so the AI can do a better job matching the company’s style and format.
Retrieval augmented generation (RAG)
Retrieval augmented generation (RAG) is a way to improve accuracy, security, and timeliness by adding context to a prompt. For example, an application that uses gen AI to write marketing letters can pull relevant customer information from a database, allowing the AI to have access to the most recent data. In addition, it allows a company to avoid training or fine-tuning the AI model on the actual customer data, which could be a security or privacy violation.
But RAG has downsides. First, there’s the added complexity of collecting the relevant information and moving it into vector databases. Then there’s the security overhead to ensure the information is only accessed by authorized users or processes. And there’s the added cost of the inference itself, since the pricing is typically based on the number of tokens.