What Is an AI Model, Really?

Your math-free intro to AI - for business and product professionals.

Welcome to the very first edition of In & Out AI! I'm genuinely thrilled to have you here as we embark on this journey to demystify the world of artificial intelligence. We'll break down complex AI concepts using intuition instead of equations, giving you the technical insights usually reserved for degree holders.

This week, we're cracking open the (not-so-metaphorical) black box of AI to reveal its surprisingly simple core: a whole lot of numbers, all tangled up in a very specific way to perform calculations. Kind of like a spreadsheet, but with A LOT MORE numbers. Stick with us, and you'll learn the two fundamental ingredients that power everything from ChatGPT's witty banter to those eerily realistic deepfakes flooding your newsfeed.

Want to learn about AI but frustrated by materials that are either too generic or buried in math? Subscribe now for our always-free newsletter, designed specifically to help business and product professionals like you to truly understand how AI works.

AI Model = Connected Numbers

You hear about "AI models" all the time - AI this, AI that - but what really is an "AI model"? Is it like an app on your phone? A website you visit? When you chat with something like ChatGPT, where do those answers actually come from? Is the AI in your Wi-Fi router, secretly keeping notes on that extra-large pepperoni pizza you definitely didn't order from Papa John's at 1 AM?

It feels like the term "AI model" is everywhere, but solid explanations are surprisingly rare. That lack of clarity, stirred together with intense media hype, can easily breed confusion and even anxiety.

However, peeling back the layers of complexity, today's AI models fundamentally constitutes just two simple things: a huge collection of numbers, and the mathematical calculations connecting them together.

And we're talking ordinary numbers here – nothing more complex than the $2.90 you paid for your subway ride today. Meanwhile, simple operations like addition and multiplication make up the bulk of the operations performed by the models.

So, why can't these models just run on your home computer? It boils down to sheer scale. We're talking billions (often hundreds of billions) of those simple numbers and connections. That's why these vast sets of "connected numbers" usually reside in specialized, super-powerful computers housed in data centers. When you use an app like ChatGPT, your device is simply connecting to these data centers over the internet.

So, an AI model isn't some mystical entity. At its heart, it's a (very, very large) set of numbers linked by math operations, running on a powerful computer somewhere else. Hopefully, this picture of AI models feels a bit less intimidating. Now, let's start unpacking those "numbers" and "connections" a little further.

Numbers - the Core Ingredient

Now that we've landed on the big idea: AI models are basically just a massive pile of numbers linked by mathematical connections, let's zoom in on those numbers first. What makes them special compared to other numbers? How do they relate to things we understand like text and images? And how on earth does someone figure out the right numbers and connections to use?

Billions of Tuning Knobs

A good intuition to start with is to think of the numbers inside an AI model as its internal "settings" or billions of tiny tuning knobs. As an input goes through the model, it gets adjusted slightly at each tuning knob to eventually arrive at the output. In reality, each input is a group of numbers, and they are "adjusted" via calculations with the numbers stored in the AI model.

In AI jargon, these crucial numbers are often called parameters or weights. As mentioned in the previous section, they aren't inherently complex – each one is just a value, like 1.34 or -0.52. It's the sheer quantity of these parameters, all working together, that allows AI like ChatGPT or image generators to complete seemingly magical tasks.

Representing the World as Numbers

Now you might be wondering: "If AI only understands numbers, how can I ask it a question in English, or send it a picture?" That's why we need "translators" for both the input and the output. They are capable of turning texts, images, and audios into numbers and vice versa. It is a crucial step for all AI models as it translates “human language” (both textual and visual) into “model language”. We'll explore how these "translators" work in detail in future posts. For now, knowing their existence and functionality is enough.

“Translators” convert texts and images into a bunch of numbers.

Finding the Right Numbers

So, the model is packed with billions of these number parameters. But who dials them in? They aren't just random guesses; nor are they manually determined by humans. This is where the "learning" happens, through the training process.

There are many different methods to train an AI model. The most common way is by showing it massive amounts of example input-output pairs. The AI model adjusts its parameters through a sophisticated form of trial-and-error to ensure that when given an input from the examples, its output is as close to the corresponding example output as possible. This trial-and-error process is formally called optimization. And the algorithm used during the optimization process is called an optimizer.

Think of the training process as tuning an old analog radio:

  1. The invisible radio waves carrying different stations are like the inputs you give an AI. And the sound coming out of the speaker is the AI's output.

  2. When you first switch the radio on, you likely just get static noise (an untrained AI model giving random or nonsensical outputs).

  3. You slowly turn the tuning knob (the optimization process gradually adjusts the model's internal parameters / numbers).

  4. Every moment there are new radio waves coming into the radio and new sounds coming out. That's analogous to the large amounts of training data being given to the AI model.

  5. You listen carefully – is the signal getting clearer? Are you hearing music instead of static? (The system checks if the AI's output is getting closer to the correct examples seen in the training data).

  6. You keep tweaking that knob, guided by whether the sound quality improves, until – voilà! – your favorite station comes through loud and clear. The goal is to get the knob to exactly the right position so it reliably plays your station whenever it picks up those specific waves.

It's the same goal for training AI models using optimization: To get the model's internal "knobs" (parameters) tuned to just the right values so that every time it receives a particular input, it consistently produces the desirable output.

Training a model is like tuning an analog radio. The optimizer decides which way to turn the knob based on how the sound has been changing.

Now, here's a critical point about the analogy and training data: Notice that your radio doesn't store the radio waves it receives, nor does it keep a recording of every song it plays. It simply processes the incoming waves based on how its knob is currently set. It's the same idea with AI! Contrary to a common misconception, AI models don't "store" all the raw training data inside themselves. Instead, they learn from the data, adjusting their parameters (tuning their knobs) based on the patterns and information within that data. The data shapes the final parameter values, but the individual examples aren't usually memorized inside the finished model.

The "How" Is a Black Box (For Now)

How to figure out which way and how much to nudge each of those billions of knobs during optimization? That's the difficult part where complex mathematical formulas and techniques are used. It's the engine inside the "black box" of how AI learns.

We'll definitely peek under the hood at how this works in future posts. But for today, the key takeaway is this: The specific, crucial numbers (parameters) inside an AI model aren't programmed by humans one-by-one. They are learned through this iterative tuning process called training, driven by optimization. These learned numbers effectively store the patterns and knowledge the AI gained from the data.

Connecting the Numbers

Model Architecture

So our AI model is now packed with billions of carefully tuned parameters, but just having a giant list of numbers sitting there, even perfectly tuned ones, won't amount to anything. Those numbers need a structure to hold them together. They need to be organized and connected in a specific, meaningful way for the magic to happen.

This is where the model's architecture comes into play. It dictates how all those billions of parameters interact with each other and with the input data. Unlike the parameter values, which are learned during training, the architecture is designed by human AI researchers beforehand.

A model’s architecture organizes and connects the parameters.

A Flow of Simple Math

More concretely, the model's architecture can be thought of as a structured flow of simple math operations. Picture this flow not always as a single stream, but often involving pathways that split, perform different calculations in parallel, and then merge back together. It dictates exactly how input gets transformed into the final output. You don't need to worry about the specific formulas, just grasp the idea of this structured mathematical journey from input to output.

Neural Networks

So what kind of architecture gets the most attention today? The dominant architectural style underpinning most modern AI, including large language and image generation models, is the neural network. "Neural network" is a broad term representing a family of architectures sharing core principles, rather than a single specific design.

The term originates from a LOOSE analogy to biological neural networks (Please don't picture a tiny digital brain whirring away like yours - real brains are infinitely more complex and work very differently). Think of "neural network" more as a helpful naming convention for this specific style of layered, interconnected mathematical structure, rather than a direct biological replica. The key takeaway is the concept of interconnected units processing information in sequence.

As I just mentioned, a key characteristic of neural networks is a layered structure. The outputs from one layer become the inputs for the next. Within each layer, numerous streams of calculations are often performed in parallel. Each stream combines the inputs it received with its own internal parameters. This allows the network to progressively refine the input information towards the output step-by-step.

Because neural networks almost always include a stack of many layers, it's also referred to as deep learning. The "deep" simply refers to the significant number of layers—the depth—in the network's architecture.

Neural networks have a layered structure.

Architecture Matters

Given all models during the recent AI breakthrough are within the neural network family, the specific design of the architecture is critical. Details like how the layers are stacked and the precise pattern of connections within each layer heavily influence the model's capabilities. Different architectural variations are better suited for different jobs. Designing clever and efficient architectures is arguably the most active area of AI research right now.

Real Life Implications for “Connected Numbers”

Now that you know numbers and connections are the core of a modern AI model, you can already use this to make more sense of AI's role in the real world.

The Cost

As mentioned earlier, state-of-the-art AI models often include over a hundred billion parameters, and new models are getting even bigger. AI labs have a constant lust for bigger models because empirical findings have shown that larger models tend to be more capable.

To train frontier models like GPT-4o and Grok 3, it is estimated to require over 10,000,000,000,000,000,000,000,000 (with 25 0s, this is not a typo) calculations. Though each one of them is simple enough to be done by a fifth grader, the sheer amount of calculations required necessitates huge financial costs - in semiconductor design & manufacturing, energy generation & transmission, construction, etc. That's why companies have invested hundreds of billions of dollars in AI last year and will surely invest more in the upcoming years. Models have reached a scale that is truly unfathomable even a few years ago, which is why AI development has incurred jaw-dropping costs.

Drastically increasing CapEx in 2024 due to AI. Source: Generative Value

The Black Box

While we know how parameters are tuned via optimization, no one understands precisely why the set of numbers causes the AI to behave exactly as it does, or what specific real-world concept each parameter might represent. For example, a model can help you curate a playlist for the upcoming party. But it's impossible to pinpoint which parameters are responsible for picking the songs in the playlist. It's not like there's a parameter representing Taylor Swift, another for Future, and another for Kendrick Lamar. Each output of an AI model requires all parameters to work together, in ways that are incomprehensible to humans. That's why AI models are often referred to as black boxes.

This directly connects to the vital and often heated public discussions about AI safety, ethics, and trustworthiness. If we can't fully trace the internal "thoughts" of AI models, how can we reliably guarantee an AI system is truthful? How can we prevent it from generating harmful misinformation or making critical errors in high-stakes situations like medical diagnosis or autonomous driving? This inherent lack of full transparency is a major reason why ensuring AI "alignment" with human values is such a huge area of ongoing research.

There are many proven methods to enhance AI alignment and safety, and researchers are always coming up with new ways to solve this puzzle. We'll definitely dig deeper into this topic in the future.

In a Nutshell: What Is AI

Modern AI, at its heart, is just a colossal collection of numbers (parameters) connected by math operations (architecture). The real mind-bending complexity arises not from some futuristic magic, but from performing this kind of basic math at an absolutely epic scale. Billions of these parameters, intricately linked, create the AI you see in the headlines.

Next Up: How AI Learns

Earlier in this post, we’ve mentioned a model’s parameters are tuned by an optimizer during training. Next, we’ll pull back the curtain on this crucial learning phase.

Glossary

  • Data center: Specialized buildings housing powerful computers and infrastructure that run large AI models. Your interactions with AI are typically processed in these facilities.

  • Parameter: Internal numbers within an AI model, adjusted during its learning phase to store knowledge gained from data.

  • Weight: Numerical value, often used interchangeably with parameter, defining the strength or importance of connections between processing units in an AI model.

  • Optimization: The mathematical process of systematically adjusting a model's parameters during training to improve its output accuracy.

  • Optimizer: A specific algorithm or method used to perform optimization, determining how a model's parameters change to enhance performance based on training data.

  • Training: The entire process where an AI model learns to perform a task by processing many examples from a dataset, continuously adjusting its parameters via an optimizer.

    Architecture: The fundamental design or blueprint of an AI model, specifying how its internal components are structured and interconnected.

  • Deep learning: A subfield of machine learning that uses artificial neural networks with multiple layers (deep structures) to learn complex patterns from large amounts of data.

  • Neural networks: AI models, inspired by the human brain's structure, composed of interconnected processing units or "neurons" organized in layers that transform input data to produce an output.

  • Layer: A distinct stage or group of processing units within a neural network that performs specific computations on the data as it passes through the model.

  • Black box: A term describing AI models whose internal decision-making processes are complex and not easily understood or interpretable by humans, even if their outputs are accurate.

Reply

or to participate.