Reference

AI glossary — 36 terms explained

Plain-English definitions of the AI terms you'll meet across tools and models — from tokens and context windows to RAG, agents and diffusion.

Bq By Benchquill Editorial Team ·Updated June 2026 ·How we rate

AI agent

An AI system that can plan and take multi-step actions toward a goal — using tools, calling APIs and making decisions, not just answering a single prompt.

AI Overviews

Google's AI-generated answers at the top of search results. They cite sources and change how clicks flow to websites.

API

An Application Programming Interface — how developers integrate an AI model into their own apps and pay per usage (tokens).

Chain-of-thought

Prompting (or a model's ability) to reason step by step before answering, which improves performance on complex problems.

Context window

The maximum amount of text (in tokens) a model can consider at once, including your prompt and its response. Larger windows handle longer documents.

Diffusion model

The AI architecture behind most image generators. It starts from noise and iteratively refines it into an image guided by your prompt.

Embedding

A numerical vector representation of text (or other data) that captures meaning, enabling semantic search and similarity comparisons.

Fine-tuning

Further training a base model on your own data so it specializes in a task, tone or domain.

Freemium

A pricing model offering a free tier with limits, plus paid plans that unlock more usage, features or quality.

Function / tool calling

A model's ability to call external tools or APIs in a structured way, letting it fetch data or perform actions reliably.

GEO (Generative Engine Optimization)

Optimizing content to be cited by AI answer engines like ChatGPT, Perplexity and Gemini — the AI-era complement to SEO.

Hallucination

When an AI model produces confident but false or made-up information. Always verify important facts from AI output.

Inference

The process of running a trained model to generate output. Inference cost and speed (latency) are key practical factors when choosing a model.

Large Language Model (LLM)

An AI model trained on massive text data to understand and generate human-like language. LLMs like GPT, Claude and Gemini power most modern AI tools.

Latency

The time a model takes to respond. Low latency matters for real-time chat and interactive apps.

MCP (Model Context Protocol)

An open standard that lets AI assistants connect to external tools, data sources and apps through a common interface.

Mixture of Experts (MoE)

An architecture that activates only part of the model per request, giving large-model quality at lower compute cost.

Multimodal

An AI model that handles more than one type of input or output — e.g. text plus images, audio or video.

Open weights

Models whose trained parameters are publicly released, letting you run or self-host them. Examples include Llama and DeepSeek.

Parameters

The internal values a model learns during training. More parameters can mean more capability, but also higher cost and slower inference.

Prompt

The instruction or input you give an AI model. Clear, specific prompts produce better, more reliable output.

Prompt caching

Reusing previously processed prompt tokens to cut cost and latency on repeated context — many APIs discount cached input heavily.

Prompt engineering

The practice of crafting and refining prompts to get the best results from an AI model, including examples, formatting and step-by-step instructions.

Quantization

Compressing a model to use less precision and memory, making it cheaper and faster to run with minimal quality loss.

RAG (Retrieval-Augmented Generation)

A technique that feeds an LLM relevant documents at query time so answers are grounded in your data instead of only the model's training.

Reasoning model

A model tuned to 'think' through problems with extended internal steps, trading speed and cost for higher accuracy on hard tasks.

Temperature

A setting that controls randomness in AI output. Low temperature is focused and deterministic; high temperature is more creative and varied.

Text-to-image

AI that generates images from a written description. Tools like Midjourney and Adobe Firefly are popular examples.

Text-to-video

AI that generates video clips from text prompts or images. A fast-moving category led by tools like Runway, Kling and Sora.

Token

The unit of text an AI model processes — roughly 0.75 words in English. API pricing is charged per million tokens of input and output.

Transcription

Automatically converting speech (from meetings, calls or video) into text. Often paired with AI summaries and action items.

Transformer

The neural network architecture behind modern LLMs, using an 'attention' mechanism to weigh the importance of different parts of the input.

TTS (Text-to-Speech)

AI that converts written text into natural-sounding spoken audio, used for voiceovers, narration and accessibility.

Vector database

A database optimized for storing and searching embeddings, commonly used to power RAG and semantic search in AI apps.

Voice cloning

Creating a synthetic copy of a specific voice from a sample, then generating new speech in that voice. Use it ethically and with consent.

Zero-shot / few-shot

Zero-shot is asking a model to do a task with no examples; few-shot includes a handful of examples in the prompt to improve accuracy.

Put the terms into practice

Browse AI tools