Introduction
This is the definitive AI glossary - a curated cheat sheet designed to propel your journey into the world of Artificial Intelligence. Whether you're an AI engineer seeking to deepen your knowledge base or an AI-Curious leader figuring out the latest buzzwords you see online, this is your go-to resource.
Each entry provides a clear, detailed definition and a practical example to help you understand how these concepts are applied in real-world scenarios. By familiarizing yourself with these terms, you'll gain a comprehensive understanding of AI, enabling you to engage in informed discussions, follow industry developments, and apply your new-found AI knowledge effectively in your professional or academic life.
Dive in, explore, and begin your transformation from AI-Curious to AI-Comfortable, mastering the language of AI.

-
AI (Artificial Intelligence):
- Definition: The field of computer science focused on creating systems capable of performing tasks that typically require human intelligence. This includes learning, reasoning, problem-solving, perception, understanding natural language, and interaction.
- Example: AI encompasses technologies like machine learning, which allows systems to learn from data; natural language processing, used in chatbots and virtual assistants to understand and respond to text or voice input; computer vision, seen in facial recognition systems used for security purposes; and robotics, like autonomous vehicles that navigate and drive themselves.
-
ANN (Artificial Neural Network):
- Definition: A computational model inspired by the way neural networks in the human brain process information. ANNs consist of layers of interconnected nodes (neurons) that work together to recognize patterns and make decisions from complex data. Each connection has a weight that adjusts as learning proceeds.
- Example: ANN is used in image recognition systems to identify objects in photos, such as distinguishing cats from dogs, and in speech recognition systems to transcribe spoken words into text, enabling technologies like virtual assistants to understand spoken commands.
-
BERT (Bidirectional Encoder Representations from Transformers):
- Definition: A transformer-based model designed to understand the context of words in a sentence by looking at the words that come before and after it. This bidirectional approach allows BERT to capture the full meaning of a word based on its surrounding text, improving comprehension and accuracy.
- Example: BERT is used in tasks like question answering, where the model needs to understand the context of the question to provide a relevant answer. It is also used in sentiment analysis, where it determines whether a piece of text expresses positive, negative, or neutral sentiments by understanding the context in which words are used.
- Types of BERT:
- ALBERT (A Lite BERT):
- Definition: A variant of BERT designed to be lighter and faster while maintaining accuracy. It reduces the number of parameters to enhance efficiency without sacrificing performance.
- Example: ALBERT achieves high accuracy for tasks like language translation, where speed and resource usage are critical.
- DistilBERT (Distilled BERT):
- Definition: A smaller, faster, cheaper version of BERT that retains 97% of BERT's language understanding. It is optimized to run efficiently on less powerful hardware.
- Example: DistilBERT is ideal for applications like real-time text analysis in mobile apps or web services.
-
BLEU (Bilingual Evaluation Understudy):
- Definition: A metric for evaluating the quality of machine-translated text by comparing it to human translations. Higher BLEU scores indicate better translation quality, showing how closely the machine translation matches human-generated text.
- Example: BLEU scores are used to assess the performance of translation systems like Google Translate. For instance, a high BLEU score in a translated document indicates that the AI's translation is close to what a human translator would produce, ensuring the output is accurate and fluent.
-
Boosting:
- Definition: An ensemble technique in machine learning that combines the outputs of several weak learners to improve performance. It focuses on correcting errors made by previous models by giving more weight to misclassified data points.
- Example: In fraud detection systems, boosting can be used to improve the detection of fraudulent transactions by iteratively focusing on transactions that were previously misclassified.
-
CBOW (Continuous Bag of Words):
- Definition: A method used in Word2Vec to predict a word based on its context within a sentence. It works by averaging the surrounding words to improve prediction accuracy, allowing the model to learn relationships between words.
- Example: CBOW is used to create word embeddings, which help in understanding the meaning of words based on their surrounding words in applications like language translation. For example, in a sentence like "The cat sat on the ___," CBOW can predict the missing word "mat" by considering the context provided by the surrounding words.
-
CNN (Convolutional Neural Network):
- Definition: A class of deep neural networks commonly used to analyze visual imagery. CNNs use convolutional layers to automatically and adaptively learn spatial hierarchies of features, making them particularly effective for image and video processing.
- Example: CNNs are used in facial recognition systems to identify individuals by analyzing the features of their faces, such as in security systems. They can also be used in medical imaging to detect abnormalities in X-rays or MRIs, aiding in the diagnosis of diseases.
-
DDP (Distributed Data Parallel):
- Definition: A method for parallelizing data processing across multiple nodes to speed up training. DDP is used, for example, to train large AI models by distributing computation across multiple GPUs or machines efficiently, allowing for faster and more scalable model training.
- Example: Particularly useful in large-scale data analysis tasks, such as training complex models on massive datasets in fields like natural language processing and computer vision.
-
DL (Deep Learning):
- Definition: A subset of machine learning that uses neural networks with many layers to learn from data. Deep learning models excel in handling large and complex datasets, enabling them to perform tasks like image and speech recognition with high accuracy.
- Example: Deep learning is used in autonomous driving technology to recognize traffic signs, pedestrians, and other vehicles, helping cars make safe driving decisions. It is also used in healthcare to analyze medical images and detect diseases like cancer with high accuracy.
-
ELMo (Embeddings from Language Models):
- Definition: A type of word representation that captures contextual information about words in a text, considering the words before and after. This dynamic approach allows ELMo to provide richer, more accurate word embeddings.
- Example: ELMo embeddings improve the performance of NLP tasks like sentiment analysis, where understanding the context of words is crucial for determining the sentiment expressed. For instance, ELMo can differentiate between the sentiment in sentences like "I love this" and "I don't love this" by understanding the context.
-
Embedding:
- Definition: A learned representation for text where words with similar meanings have similar representations in a numerical form. This helps in understanding word relationships and semantic similarities.
- Example: Word embeddings like Word2Vec are used to map words into vectors of real numbers, allowing computers to understand and process human language more effectively in tasks like search engines, where the model needs to understand the relevance of words to improve search results.
-
Ensemble Learning:
- Definition: The process of combining multiple models to produce a better overall model i.e. it is used to improve the performance of a classifier by combining the predictions of multiple models. This technique enhances accuracy and robustness by leveraging the strengths of different models.
- Example: In recommendation systems, ensemble learning can combine the predictions of different algorithms to suggest products or content more accurately to users.
-
Epoch:
- Definition: One complete pass through the entire training dataset during the learning process. Multiple epochs help the model learn better by repeatedly adjusting its parameters.
- Example: Training a neural network on image data might involve running through the dataset 50 times (50 epochs) to achieve high accuracy.
-
Fine-Tuning:
- Definition: The process of making small adjustments to a pre-trained model on a specific dataset to adapt it to a specific task, significantly improving performance.
- Example: Fine-tuning is used to adapt a general language model to perform well on legal documents by training it further on legal text, making the model more accurate in understanding and generating legal language.
-
FLAN (Fine-tuned Language Net):
- Definition: A model that has been fine-tuned for specific language tasks. FLAN enhances the model's ability to perform targeted language functions more accurately and efficiently.
- Example: FLAN models are used in conversational AI systems to provide more accurate and contextually relevant responses. For example, a FLAN model can improve customer service interactions by understanding and responding to customer queries more effectively.
-
FSDP (Fully Sharded Data Parallel):
- Definition: A parallelization technique that shards the model parameters across multiple devices to improve training efficiency. It optimizes the use of computational resources by distributing the workload.
- Example: FSDP is used to train very large models by distributing the model's parameters and computation across multiple GPUs or machines. This approach is often seen in large-scale AI research, such as training state-of-the-art natural language models.
-
Hyperparameter:
- Definition: A parameter set before the learning process begins that controls the learning process itself.
- Example: Hyperparameters like the learning rate, batch size, and the number of epochs determine how quickly and accurately a model learns. Adjusting these settings is like tuning the controls of a machine to get optimal performance.
-
Inference:
- Definition: The process of making predictions using a trained machine learning model. Inference applies the learned patterns from training to new, unseen data.
- Example: Inference is used in real-time applications like speech recognition and image classification to provide immediate predictions. For instance, virtual assistants use inference to understand and respond to voice commands.
-
Kernel:
- Definition: A function used in SVMs and other algorithms to transform data into higher dimensions for better separation of classes. Kernels help in handling non-linear data by mapping it into a more easily separable space.
- Example: A kernel can help distinguish between different types of handwritten digits by transforming the data into a space where they are more clearly separated.
-
LLM (Large Language Model):
- Definition: A type of artificial intelligence model designed to understand and generate human language, trained on vast amounts of text data. LLMs can perform various language tasks, such as translation, summarization, and text generation.
- Example: GPT-4 is an example of a large language model that can generate coherent and contextually relevant text, answer questions, and even engage in conversations by predicting the next word in a sequence based on patterns it has learned during training.
-
Learning Rate:
- Definition: A hyperparameter that controls how much to change the model in response to the estimated error during training. It influences the speed and stability of the learning process. A small learning rate might make the training process slow but steady, reducing the risk of overshooting the optimal solution. Conversely, a large learning rate might speed up learning but can lead to instability and convergence to suboptimal solutions.
- Example: Here's an analogy - It is like the size of steps you take to climb a hill: a high learning rate means big steps that might overshoot the peak, while a low learning rate means small steps that take longer but are more precise.
-
LoRA (Low-Rank Adaptation):
- Definition: A technique to reduce the computational cost of adapting large models by approximating the weight updates using low-rank matrices. LoRA is a type of Parameter-Efficient Fine-Tuning (PEFT).
- Example: Using LoRA to efficiently fine-tune a large language model for a new application, making it easier to deploy AI in various industries, such as customizing a model for industrial automation.
-
ML (Machine Learning):
- Definition: A subset of AI focused on developing algorithms that allow computers to learn from data and make predictions or decisions. ML models improve their performance as they are exposed to more data.
- Example: Machine learning is used in recommendation systems to suggest products or content based on users' past behaviors, enhancing user experience and engagement. For instance, streaming services use ML to recommend shows based on viewing history.
-
NLP (Natural Language Processing):
- Definition: A field of AI that focuses on the interaction between computers and humans through natural language. NLP enables machines to understand, interpret, and generate human language.
- Example: NLP is used in applications like chatbots, which can understand and respond to user queries; sentiment analysis, which gauges emotions in text; and language translation, which converts text from one language to another.
-
Normalization:
- Definition: The process of scaling individual samples to have a unit norm to ensure consistency and improve model performance. Normalization helps in stabilizing and speeding up the training process.
- Example: Normalization is used in data preprocessing to ensure that different features contribute equally to the model's predictions, similar to leveling the playing field in a game. For example, in a dataset with varying scales, normalization ensures that no single feature dominates the learning process.
-
Overfitting:
- Definition: When a model learns the training data too well, including its noise and outliers, and performs poorly on new, unseen data. Overfitting indicates that the model is too complex and not generalized.
- Example: Overfitting occurs when a model captures noise in the training data as if it were a true pattern, leading to poor generalization. For instance, a model trained to recognize cats may perform perfectly on the training set but fail to recognize new images of cats.
-
PEFT (Parameter-Efficient Fine Tuning):
- Definition: A set of methods designed to fine-tune AI models by updating only a small subset of the model's parameters. PEFT aims to make the fine-tuning process more efficient and less resource-intensive.
- Example: Applying PEFT to fine-tune a model for a specific task by adjusting only a few layers or parameters instead of the entire model. An analogy - It is like customizing a suit: instead of making a new suit from scratch, you make small adjustments to an existing suit to fit perfectly, saving time and resources.
-
Pretraining:
- Definition: The initial phase of training an AI model on a large and diverse dataset to learn general patterns and features. Pretraining helps the model develop a broad understanding before fine-tuning it for specific tasks.
- Example: Training a language model on a wide range of internet text to develop a broad understanding of language before specializing it for tasks like legal document analysis. Pretraining enables the model to be versatile and adaptable to various domains.
-
Prompt Engineering:
- Definition: The process of designing and refining input prompts to guide AI models, especially language models, to generate desired outputs. Effective prompts can improve the relevance and accuracy of the model's responses.
- Example: Crafting specific questions or instructions to get the most relevant and accurate responses from an AI chatbot. For instance, providing detailed and clear prompts helps the chatbot understand user queries better and provide more precise answers.
-
Q-Learning:
- Definition: A model-free reinforcement learning algorithm that seeks to find the best action to take given the current state. Q-Learning uses a value function to estimate the long-term reward of actions.
- Example: Q-Learning is used in game AI to learn optimal strategies for playing games. For example, it can help an AI learn the best moves in a game of chess by exploring different actions and updating its strategy based on the outcomes.
-
QLoRA (Quantized Low Rank Adaptation):
- Definition: Combines quantization with low-rank adaptation for efficient model tuning. QLoRA reduces the model size while maintaining performance, making it suitable for deployment on resource-constrained devices.
- Example: This technique is particularly useful for deploying AI models on mobile devices or edge computing platforms. It is like compressing a large, detailed map into a compact, foldable version: you retain all the essential information while making it easier to store and use, without sacrificing much accuracy.
-
RL (Reinforcement Learning):
- Definition: A type of machine learning where an agent learns by interacting with its environment. The agent takes actions to maximize cumulative rewards over time, adjusting its strategy based on feedback.
- Example: RL is used in game-playing AI, where the agent learns strategies and improves its performance by playing the game and receiving feedback. For instance, RL can be used to train an AI to play video games like Go or Dota 2 at a superhuman level.
-
RAG (Retrieval-Augmented Generation):
- Definition: A method that combines the capabilities of retrieval-based systems and generative models. RAG retrieves relevant documents or pieces of information from a large corpus and uses this retrieved data to generate more accurate and contextually relevant responses.
- Example: When asked a question, a RAG model first searches a database for relevant information and then generates a detailed answer based on the retrieved information, improving the accuracy and relevance of the response compared to using a generative model alone. This approach is often used in applications like FAQs on company policies etc.
-
RLHF (Reinforcement Learning from Human Feedback):
- Definition: A technique that uses feedback from humans to improve the learning process of an RL agent. Human feedback helps guide the agent towards better performance.
- Example: RLHF is used to train AI systems that interact with humans, such as personal assistants and recommendation systems. For example, human feedback can help an AI learn to provide more helpful and accurate recommendations based on user preferences.
-
ROUGE:
- Definition: A set of metrics for evaluating automatic summarization and machine translation. ROUGE measures the overlap of n-grams between the generated summary and reference summary to assess quality. For instance, a high ROUGE score indicates that the generated summary captures the main points of the reference text accurately.
- Example: An analogy - It is like grading a student's summary of a book by checking how many key points from the book are correctly mentioned in the summary, ensuring it captures the main ideas.
-
Skip-gram:
- Definition: A method used in Word2Vec to predict the context words from a given target word. Skip-gram learns word representations by maximizing the likelihood of surrounding words.
- Example: Skip-gram is used to learn word embeddings for NLP tasks by predicting surrounding words in a sentence. For example, in the sentence "The cat sat on the ___," Skip-gram can predict the missing word "mat" by considering the context provided by the surrounding words.
-
Stable Diffusion:
- Definition: A type of generative model designed for creating detailed images from text prompts by iteratively refining the image. It belongs to a class of models known as diffusion models, which generate images by progressively denoising a randomly sampled noise vector.
- Example: Given a text prompt like "a fantasy landscape with castles and dragons," Stable Diffusion can produce a high-quality, detailed image that matches this description by starting from random noise and iteratively improving the image's coherence and alignment with the prompt. This model has been used in various applications, including digital art, game design, and content creation.
-
Transformer:
- Definition: A deep learning model architecture that relies on self-attention mechanisms to process sequential data. Transformers are the foundation of many state-of-the-art NLP models.
- Example: Transformers are used in models like BERT and GPT to achieve high performance in language understanding and generation tasks. For instance, GPT-3, a transformer model, can generate human-like text, answer questions, and perform a variety of language tasks with high accuracy.
-
ULMFiT (Universal Language Model Fine-Tuning):
- Definition: A transfer learning method for NLP that involves fine-tuning a pre-trained language model on a specific task. ULMFiT enables quick adaptation to new datasets with minimal training.
- Example: ULMFiT is used to quickly adapt a general language model to a specific dataset or task, improving performance on text classification tasks. For example, fine-tuning a language model on a medical text dataset can improve its accuracy in understanding and generating medical content.
-
Word2Vec:
- Definition: A group of related models used to produce word embeddings by predicting the context words from a given target word or vice versa. Word2Vec captures semantic relationships between words.
- Example: Word2Vec is used in natural language processing to learn high-quality word vectors that capture semantic meanings. For instance, Word2Vec can understand that "king" and "queen" are related words and represent them with similar vectors, aiding in tasks like word similarity and analogy detection.