Rushikesh

When you need a flexible schema for custom fields, user preferences, or dynamic attributes, PostgreSQL gives you two powerful types: json and jsonb.

At first glance, they seem similar both store JSON data. But under the hood, they're fundamentally different in how they store, process, and query data. Understanding these differences is crucial for building performant applications.

JSON: Text Storage with Validation

The json type is the simpler one. It stores your input exactly as text pretty much like a string that PostgreSQL knows is valid JSON.

How JSON Storage Works

CREATE TABLE events (
  id SERIAL PRIMARY KEY,
  data JSON
);

INSERT INTO events (data) VALUES 
  ('{"user": "alice", "action": "login", "timestamp": 1699564800}');

When you insert this data, PostgreSQL:

Validates that it's valid JSON syntax
Stores it as plain text exactly as you provided it
Preserves formatting including whitespace and key order

-- This is stored exactly as written
{"user": "alice", "action": "login", "timestamp": 1699564800}

{
  "user"  :  "alice",
  "action": "login"
}

The Performance Cost

Every time you query or extract something from JSON, PostgreSQL has to:

SELECT data->>'user' FROM events WHERE id = 1;

What happens internally:

Read the text from disk
Parse the entire JSON string
Navigate to the requested key
Extract the value
Return the result

This parsing happens every single time you access the data. For a single query, it's negligible. But for thousands of queries or complex filters, it adds up quickly.

JSONB: Binary Format for Performance

Then came jsonb the binary version of JSON. Instead of keeping raw text, PostgreSQL parses your JSON once, converts it into a structured binary format, and stores it that way.

How JSONB Storage Works

CREATE TABLE events (
  id SERIAL PRIMARY KEY,
  data JSONB
);

INSERT INTO events (data) VALUES 
  ('{"user": "alice", "action": "login", "timestamp": 1699564800}');

When you insert JSONB data, PostgreSQL:

Parses the JSON text
Normalizes the structure (removes whitespace, sorts keys)
Converts to binary format
Stores the pre-processed binary tree

TOAST: Handling Large JSON Documents

Now, there's a catch when storing large JSON or JSONB documents. PostgreSQL stores each row in fixed-size pages, typically 8KB. What happens when your JSON document doesn't fit?

The Problem

-- Large JSON document (50KB)
INSERT INTO events (data) VALUES ('{
  "user": "alice",
  "history": [/* 10,000 events */],
  "metadata": {/* large nested object */}
}');

This document exceeds the 8KB page size. PostgreSQL can't store it inline.

TOAST: The Oversized Attribute Storage Technique

TOAST is a system PostgreSQL uses to store large column values that exceed a page's capacity.

How TOAST works:

Small document (<2KB):
┌─────────────────┐
│  Main Table     │
│  [id] [data]    │  ← Data stored inline
└─────────────────┘

Large document (>2KB):
┌─────────────────┐         ┌──────────────────┐
│  Main Table     │         │  TOAST Table     │
│  [id] [pointer] │────────→│  [compressed     │
└─────────────────┘         │   chunks]        │
                            └──────────────────┘

GIN Indexes: Making JSONB Searchable at Scale

Once your JSONB data is stored efficiently, the next big question is: how do you search inside it efficiently?

The Problem Without Indexes

-- Find all users named "alice"
SELECT * FROM events WHERE data->>'user' = 'alice';

-- Without index: Sequential scan
-- Scans every row, parses JSONB, checks condition

Enter GIN Indexes

GIN (Generalized Inverted Index) is what makes JSONB incredibly fast for queries.

CREATE INDEX idx_events_data ON events USING GIN(data);

How GIN Indexes Work

Instead of indexing entire documents, GIN breaks each JSONB value into its smallest searchable components keys, values, and even nested paths.

Example document:

{
  "user": "alice",
  "age": 25,
  "tags": ["vip", "premium"]
}

GIN index entries:

Key-Value Pairs:
"user" → "alice"  → [row_id_1, row_id_5, row_id_12]
"age" → 25        → [row_id_1, row_id_8]
"tags" → "vip"    → [row_id_1, row_id_3]
"tags" → "premium"→ [row_id_1, row_id_7]

It's similar to how Elasticsearch indexes documents every searchable element gets its own index entry pointing to rows that contain it.

When to Use GIN Indexes

Use GIN when:

Read heavy workload (analytics, reporting)
Complex JSONB queries with filters
Containment checks are common
Query performance matters more than write speed

Skip GIN when:

Write heavy workload (high frequency inserts)
Simple key extraction only
Storage space is constrained
You rarely query inside JSONB

Conclusion

The choice between JSON and JSONB comes down to intent and usage patterns.

JSON is ideal when you only store and retrieve data fast inserts, minimal processing, exact format preservation, and no need for indexing.

JSONB is built for when you query inside your JSON filters, searches, indexes, and performance at scale.

The key insights:

JSONB's binary format eliminates repeated parsing overhead
TOAST handles large documents transparently with compression
GIN indexes make JSONB searchable like a document database
Write overhead is small compared to read performance gains
Choose JSONB by default unless you have specific reasons not to

If your JSON is just passive data that you store and retrieve whole, go with JSON. But if you want PostgreSQL to understand and search that data like a structured document, JSONB is the way to go.

PostgreSQL's JSONB turns your relational database into a hybrid system structured where you need it, flexible where you want it, and performant throughout.

References

PostgreSQL JSON Types Documentation

JSONB Stackoverflow

JSON vs JSONB in PostgreSQL