From Full Stack to AI: Core Fundamentals of Generative AI

From a full stack background, AI can feel confusing at first. There are new words, new math, and new ways of thinking. But at its core, generative AI is still software. It takes input, processes it step by step, and gives output. In this chapter, I am writing what I wish I had when I started. A simple, practical explanation of how Large Language Models work under the hood.

1. Understanding Large Language Models (LLMs)

What is an LLM?

LLM stands for Large Language Model. It is a program trained to understand and generate human language. You give it text as input, and it predicts what text should come next.

Think of it like autocomplete on steroids. Instead of predicting the next word in a sentence, it predicts the next token based on everything it has learned.

How does an LLM work at a high level?

Text is converted into tokens
Tokens are converted into numbers
A neural network processes those numbers
The model predicts the next token
Tokens are converted back into text

This loop runs again and again until the final response is formed.

2. Deep Dive into the GPT Architecture

What does GPT mean?

GPT stands for:

Generative: It creates new text
Pre-trained: It is trained on large text data before you use it
Transformer: It is built using the transformer architecture

Popular GPT-style models

These models all follow the same core idea:

GPT by OpenAI
Gemini by Google
Claude by Anthropic
Mistral by Mistral AI

They differ in size, training data, and optimizations, but the base architecture is similar.

Why transformers matter

Transformers allow models to understand context. They look at all words in a sentence at the same time instead of one by one. This is why modern AI feels much smarter than older models.

3. How LLMs Work Under the Hood

Why do LLMs need GPUs?

LLMs perform a massive number of matrix calculations. CPUs are not designed for this level of parallel math. GPUs are.

When training or running large models:

Millions of parameters are updated
Large matrices are multiplied
Everything must happen fast

That is why GPUs are essential for LLMs.

4. Fundamentals of Tokenization in NLP

What is tokenization?

Tokenization is the process of breaking text into smaller units called tokens. Tokens are not always words. Sometimes they are parts of words.

Example:
Text: "Hey There!"
Tokens might be:

"Hey"
"There"
"!"

Encode and decode

Encoding converts text into numbers
Decoding converts numbers back into text

This is how models understand language.

5. Implementing a Custom Tokenizer in Python

Python example using tiktoken

import tiktoken

enc = tiktoken.encoding_for_model("gpt-4o")
text = "Hey There! My name is Payal Kumari"

tokens = enc.encode(text)
print("Tokens:", tokens)

# Tokens: [25216, 3274, 0, 3673, 1308, 382, 11961, 280, 81689, 1683]

decoded = enc.decode(tokens)
print("Decoded", decoded)

Output

Tokens: [25216, 3274, 0, 3673, 1308, 382, 11961, 280, 81689, 1683]
Decoded Hey There! My name is Payal Kumari

Explanation

Each number represents a token. The model never sees text directly. It only works with numbers. Decoding converts those numbers back into readable text.

6. The Transformer Breakthrough: Google’s Attention Paper

Transformers were introduced in the paper "Attention Is All You Need" by Google.

Core idea

Instead of processing words one by one, the model pays attention to all words at the same time.

This allows it to understand:

Long sentences
Relationships between words
Context across paragraphs

7. Deep Diving into Vector Embeddings

What are embeddings?

Embeddings are numeric representations of words or sentences. Similar meanings result in similar vectors.

Example:

"cat" is closer to "dog"
"Paris" is closer to "France"

This allows models to understand meaning, not just text.

Real-world use

Search engines
Recommendation systems
Semantic search

8. Role of Positional Encodings in Transformers

Why position matters

Transformers do not understand word order by default. Positional encoding adds information about where a word appears in a sentence.

Example:

"Dog bites man"
"Man bites dog"

Same words, different meaning. Positional encoding solves this.

9. Understanding Multi-Head Attention for Rich Context

What is multi-head attention?

Instead of one attention mechanism, transformers use multiple heads.

Each head focuses on a different relationship:

Grammar
Meaning
Long-distance connections

This helps the model understand language more deeply.

Closing Thoughts

Moving from full stack development to AI is not about memorizing everything at once. It is about understanding the flow. Tokens to numbers. Numbers through transformers. Predictions back to text. Build step by step, and the concepts will start clicking.

From Full Stack to AI: Core Fundamentals of Generative AI

1. Understanding Large Language Models (LLMs)

What is an LLM?

How does an LLM work at a high level?

2. Deep Dive into the GPT Architecture

What does GPT mean?

Popular GPT-style models

Why transformers matter

3. How LLMs Work Under the Hood

Why do LLMs need GPUs?

4. Fundamentals of Tokenization in NLP

What is tokenization?

Encode and decode

5. Implementing a Custom Tokenizer in Python

Python example using tiktoken

Output

Explanation

6. The Transformer Breakthrough: Google’s Attention Paper

Core idea

7. Deep Diving into Vector Embeddings

What are embeddings?

Real-world use

8. Role of Positional Encodings in Transformers

Why position matters

9. Understanding Multi-Head Attention for Rich Context

What is multi-head attention?

Closing Thoughts

Documenting my Full Stack → AI journey, step by step.

By Payal Kumari

Comments

From Full Stack to AI: Learning in Public

From Full Stack to AI: API Setup and Integration

More from this blog

From Full Stack to AI: Model Context Protocol MCP

From Full Stack to AI: Conversational Agentic AI with Voice Agents and Chained Patterns

From Full Stack to AI: Graph Memory and Knowledge Graphs in AI Agents

From Full Stack to AI: Checkpointing Workflow in LangGraph with MongoDB

From Full Stack to AI: Building Agentic Workflow with LangGraph

Command Palette

1. Understanding Large Language Models (LLMs)

What is an LLM?

How does an LLM work at a high level?

2. Deep Dive into the GPT Architecture

What does GPT mean?

Popular GPT-style models

Why transformers matter

3. How LLMs Work Under the Hood

Why do LLMs need GPUs?

4. Fundamentals of Tokenization in NLP

What is tokenization?

Encode and decode

5. Implementing a Custom Tokenizer in Python

Python example using tiktoken

Output

Explanation

6. The Transformer Breakthrough: Google’s Attention Paper

Core idea

7. Deep Diving into Vector Embeddings

What are embeddings?

Real-world use

8. Role of Positional Encodings in Transformers

Why position matters

9. Understanding Multi-Head Attention for Rich Context

What is multi-head attention?

Closing Thoughts

Documenting my Full Stack → AI journey, step by step.

By Payal Kumari

Comments

From Full Stack to AI: Learning in Public

From Full Stack to AI: API Setup and Integration

More from this blog