Database Indexing 101: How to Make Queries 10x Faster (With Real Examples and When Not to Index) -

Sales Development Representative and excited about connecting people

When your application starts feeling sluggish, the culprit is often the database. As data grows and queries get more complex, performance bottlenecks show up quickly—especially under peak load. The most reliable, low-risk way to accelerate SQL queries is database indexing. Think of indexes like a book’s index: instead of scanning every page, you jump straight to the right spot.

In this practical guide, you’ll learn what database indexing is, how it improves query performance, when to use it (and when not to), and how to implement indexes the right way—complete with real SQL examples, explain plans, and optimization tips.

Along the way, you’ll also see how indexing plays with modern analytics patterns such as columnar storage and sharding, so you know exactly which lever to pull for each performance challenge.

What Is a Database Index?

A database index is a data structure that lets the query optimizer locate rows quickly without scanning the entire table. Most relational databases default to B-tree indexes because they support fast equality and range lookups, sorting, and joins. Others include:

Hash indexes: Superfast equality lookups (=), not ideal for ranges (>, <) or ORDER BY.
GIN/GiST (PostgreSQL): Great for arrays, full-text search, spatial and JSONB queries.
BRIN (PostgreSQL): Lightweight index for massive, naturally ordered tables (e.g., time-series).
Bitmap (Oracle): Efficient for low-cardinality columns in analytics workloads.

Key terms to know:

Selectivity: How well an index filters (high selectivity = fewer rows match).
Cardinality: Number of distinct values in a column.
Covering index: An index that contains all the columns needed by a query—enabling index-only scans.

Types of Indexes You’ll Use Most

Primary (often clustered): Uniquely identifies rows. In engines like InnoDB (MySQL), the primary key defines the physical order of rows.
Secondary (nonclustered): Additional indexes to speed up lookups, joins, sorting, and grouping.
Unique: Enforces uniqueness; also helps the optimizer prune result sets.
Composite (multi-column): Order matters; match your most selective and most-used predicate first.
Partial/Filtered: Index a subset of rows (e.g., only active orders).
Functional/Expression: Index the result of an expression (e.g., lower(email)).
Full-text/Trigram: For text search and partial matches when LIKE '%term%' breaks sargability.

How Indexing Improves Query Performance

Indexes speed up:

WHERE filters: Avoid full table scans.
JOINs: Rapid lookups on join keys.
ORDER BY: If the query order matches index order, sorting can be skipped.
GROUP BY/DISTINCT: Index can help pre-aggregate or avoid sorts.
Index-only scans: When all referenced columns are in the index.

They do introduce write overhead: each INSERT/UPDATE/DELETE must maintain all indexes. The art is building just enough of the right indexes, not “indexing everything.”

When You Should Add an Index

Create indexes when you see:

1) High-volume reads on specific columns

Example: Frequent searches by customer_id, email, or order_date.

2) Frequent filtering in WHERE clauses

Example: WHERE status = 'OPEN' AND order_date >= CURRENT_DATE - INTERVAL '7 days'

3) Frequent ORDER BY on the same columns

Combine an index with the sort order used most.

4) JOINs across large tables

Index both sides of a join on the key columns.

5) Unique constraints (business rules)

Enforce data integrity and speed lookups.

6) Foreign keys

Index FK columns to speed updates/deletes and referential checks.

7) Large tables (millions+ rows)

Avoid table scans as much as possible.

8) Common aggregations (GROUP BY)

Especially when grouped by a limited set of columns.

When Not to Index

Very small tables (in-memory scans are cheap).
Low-selectivity columns (e.g., boolean is_active), unless used with other selective predicates or as part of a composite index.
Extremely write-heavy tables with low read ratios.
Columns frequently updated with large values (e.g., long text) or volatile fields.
Duplicate or overlapping indexes (maintain cost without benefits).

How Indexes Work Under the Hood (The Short Version)

B-tree indexes store keys in sorted order with pointers to table rows (or to row versions).
The optimizer uses table and index statistics to estimate row counts and choose plans.
If your query is “sargable” (search ARGument ABLE)—i.e., uses the column in an index-friendly way—your DB can jump straight to matching ranges.
If you wrap indexed columns in non-sargable functions (e.g., WHERE LOWER(email) = 'x'), you may force scans—unless you create a functional index on that expression.

Vendor-Specific Nuances to Remember

PostgreSQL: Rich index types (B-tree, GIN, GiST, BRIN, SP-GiST), partial/functional indexes, INCLUDE columns for covering indexes, powerful JSONB indexing.
MySQL (InnoDB): Primary key is clustered; choose it carefully. Prefix indexes for large varchars; composite indexes are essential for complex predicates.
SQL Server: Clustered vs. nonclustered indexes; INCLUDE columns; filtered indexes; Query Store to monitor performance.
Columnar databases (e.g., ClickHouse) rely on data skipping and columnar compression more than traditional secondary indexes. For analytics at scale, columnar storage plus vectorized execution can be a better lever than traditional row-store indexing. Learn more in Optimizing query performance with columnar storage and vectorized execution and see how this plays out in practice with ClickHouse: real-time analytics columnar database.

Practical Indexing Examples (SQL You Can Paste)

Let’s use a simple e-commerce schema.

Tables:

customers(id PK, email UNIQUE, created_at)
orders(id PK, customer_id FK, status, order_date, total_amount)
order_items(id PK, order_id FK, product_id, quantity, unit_price)

1) Speed up customer lookups by email

PostgreSQL or MySQL:

`sql

CREATE UNIQUE INDEX idx_customers_email ON customers (email);

Tip: If you often search case-insensitively:

`sql

-- PostgreSQL

CREATE INDEX idx_customers_lower_email ON customers ((lower(email)));

2) Accelerate recent orders dashboard

Common query:

`sql

SELECT *

FROM orders

WHERE status = 'PAID'

AND order_date >= current_date - INTERVAL '7 days'

ORDER BY order_date DESC

LIMIT 50;

Index:

`sql

-- Order columns match WHERE then ORDER BY

CREATE INDEX idx_orders_status_date ON orders (status, order_date DESC);

This helps filtering and avoids a sort when possible.

3) Make JOINs faster on foreign keys

`sql

CREATE INDEX idx_orders_customer_id ON orders (customer_id);

CREATE INDEX idx_items_order_id ON order_items (order_id);

These dramatically reduce join costs between orders, customers, and items.

4) Cover a reporting query (index-only scan)

Query:

`sql

SELECT customer_id, count(*) AS cnt, sum(total_amount) AS revenue

FROM orders

WHERE order_date >= '2024-01-01'

GROUP BY customer_id;

Index (SQL Server style with INCLUDE; PostgreSQL also supports INCLUDE):

`sql

-- SQL Server

CREATE INDEX ix_orders_customer_date ON orders (order_date, customer_id)

INCLUDE (total_amount);

This can enable an index-only scan if the engine can satisfy the query without touching the table heap.

5) Partial (filtered) index for hot data

If 95% of queries touch only PAID orders:

`sql

-- PostgreSQL

CREATE INDEX idx_orders_paid_recent

ON orders (order_date DESC)

WHERE status = 'PAID';

Smaller index, faster maintenance, better cache locality.

6) JSONB indexing for metadata

`sql

-- PostgreSQL

CREATE INDEX idx_orders_meta ON orders USING GIN (metadata jsonb_path_ops);

-- Now queries like:

-- WHERE metadata @> '{"channel":"mobile"}'

-- can use the index.

7) LIKE queries that still use an index

LIKE 'foo%' can use a standard B-tree index.
LIKE '%foo%' generally cannot (consider trigram or full-text):

`sql

-- PostgreSQL trigram example

CREATE EXTENSION IF NOT EXISTS pg_trgm;

CREATE INDEX idx_customers_email_trgm ON customers USING GIN (email gin_trgm_ops);

What Happens If You Don’t Index?

Let’s say you run this on a 20M-row orders table:

`sql

SELECT *

FROM orders

WHERE customer_id = 12345

ORDER BY order_date DESC

LIMIT 20;

Without an index on (customer_id, order_date), the database will scan a huge portion of the table, then sort, then limit. With the right composite index, you jump straight to the rows for that customer in the correct order—often turning seconds into milliseconds.

Measuring the Difference (EXPLAIN, Logs, and Traces)

Use EXPLAIN/EXPLAIN ANALYZE (PostgreSQL) or EXPLAIN (MySQL/SQL Server) to see plans.
Track slow query logs (MySQL), pg_stat_statements (PostgreSQL), or Query Store (SQL Server).
Instrument your services with tracing (e.g., OpenTelemetry) to correlate database spans with app latency and catch N+1 query patterns.

A Step-by-Step Indexing Playbook

1) Capture reality

Identify top slow queries by P95/P99 latency and total time consumed.
Aggregate by normalized SQL (ignore value literals) to find the heavy hitters.

2) Make queries sargable

Move functions to the right-hand side if needed.
Avoid calculations on indexed columns in WHERE: use generated columns or functional indexes instead.

3) Propose targeted indexes

Start with equality predicates and join keys.
For composite indexes, order columns by selectivity and how they’re used in WHERE and ORDER BY.

4) Test on a staging dataset

Compare EXPLAIN plans and timings before/after.
Check write overhead after adding indexes.

5) Monitor and iterate

Watch CPU, buffer cache hits, lock contention, and bloat.
Drop unused or redundant indexes (many engines expose usage stats).

6) Plan for scale

For very large datasets and analytical queries, indexing is just one lever. You may need columnar storage, materialized views, or even sharding. If you’re exploring real-time analytics, see how columnar engines accelerate scans in ClickHouse: real-time analytics columnar database. When sheer growth outpaces a single node, consider database sharding.

Advanced Tips That Pay Off

Composite index order matters:
WHERE last_name = ? AND created_at >= ? ORDER BY created_at DESC
Prefer (last_name, created_at DESC) to cover both filter and sort.

INCLUDE columns (covering):
Keep key columns minimal; “include” non-key columns to satisfy SELECT without extra lookups.

Descending indexes for time-series:
ORDER BY created_at DESC often benefits from a DESC index.

Partitioning:
Partition large tables by date or tenant. Use local indexes per partition; consider global indexes if the engine supports them and your access patterns need it.

Fillfactor and fragmentation:
In write-heavy tables, tune fillfactor (PostgreSQL) and defragment/rebuild periodically to avoid page splits and bloat.

Statistics:
Keep stats updated (ANALYZE). Bad stats lead to bad plans—even with good indexes.

Monitoring and Continuous Optimization

Baseline latency (P50/P95/P99) per endpoint and per query.
Set alerts for query regressions (e.g., 2x latency increase).
Track index usage and size growth; measure the write penalty after each new index.
Use AI-assisted observability or anomaly detection to flag N+1 patterns, missing-index symptoms (e.g., sudden scan spikes), and cardinality estimation issues.

Putting It All Together: A Realistic Example

Before:

Dashboard endpoint latency P95: 2.4s
Top query: filter by status + date, sort by date, join to customers for email
Plan: Seq Scan on orders, Hash Join to customers, Sort, Limit

After adding indexes:

`sql

CREATE INDEX idx_orders_status_date ON orders (status, order_date DESC);

CREATE INDEX idx_orders_customer_id ON orders (customer_id);

CREATE INDEX idx_customers_email ON customers (email);

Result:

Dashboard endpoint P95: 180ms
Plan: Index Scan on idx_orders_status_date, Nested Loop to customers via PK, no Sort needed
Write overhead: +6–10% on orders inserts (acceptable for our read-heavy workload)

Conclusion

Smart indexing is one of the highest-ROI performance improvements you can make. Start by targeting your most expensive queries, make them sargable, add the smallest possible set of well-ordered composite indexes, and measure the impact. When your workload shifts to large-scale analytics, complement indexing with columnar storage and vectorized execution—start with this primer on optimizing query performance with columnar storage and vectorized execution.

Start now: profile your slowest queries, run EXPLAIN, add targeted indexes, and monitor P95/P99 over a week. You’ll likely see double-digit to 10x speedups with minimal risk.

FAQ

Do indexes slow down writes?

Yes. Every insert/update/delete must maintain each index. This is why you should avoid redundant or rarely used indexes and prefer partial indexes where applicable.

Should I always index foreign keys?

Almost always, yes. Indexing FKs speeds up joins and avoids full scans during deletes/updates in the parent table.

How many indexes are “too many”?

There’s no magic number. Track read gains vs. write penalties and index size. Drop unused or overlapping indexes regularly.

What’s the best order for composite indexes?

Generally: most selective and most frequently filtered column first, then columns used for range or sorting. Align the index’s sort order with your ORDER BY when possible.

Do indexes help with LIKE?

LIKE 'foo%' can use a normal index.
LIKE '%foo%' cannot—consider trigram, full-text, or search-specific indexes.

When should I consider columnar databases or sharding?

Columnar is great for analytics-heavy, scan-heavy workloads with large datasets.
Sharding helps when a single node can’t hold or serve your data at acceptable latency. Learn more about the trade-offs in database sharding explained.

Do I need to rebuild indexes?

Sometimes. On engines prone to fragmentation or bloat, periodic maintenance (REINDEX, VACUUM FULL, or engine-specific rebuilds) can restore performance. Always measure before and after.

Data Analytics

Database Indexing 101: How to Make Queries 10x Faster (With Real Examples and When Not to Index)

What Is a Database Index?

Types of Indexes You’ll Use Most

How Indexing Improves Query Performance

When You Should Add an Index

When Not to Index

How Indexes Work Under the Hood (The Short Version)

Vendor-Specific Nuances to Remember

Practical Indexing Examples (SQL You Can Paste)

1) Speed up customer lookups by email

2) Accelerate recent orders dashboard

3) Make JOINs faster on foreign keys

4) Cover a reporting query (index-only scan)

5) Partial (filtered) index for hot data

6) JSONB indexing for metadata

7) LIKE queries that still use an index

What Happens If You Don’t Index?

Measuring the Difference (EXPLAIN, Logs, and Traces)

A Step-by-Step Indexing Playbook

Advanced Tips That Pay Off

Monitoring and Continuous Optimization

Putting It All Together: A Realistic Example

Conclusion

FAQ

Do indexes slow down writes?

Should I always index foreign keys?

How many indexes are “too many”?

What’s the best order for composite indexes?

Do indexes help with LIKE?

When should I consider columnar databases or sharding?

Do I need to rebuild indexes?

Don't miss any of our content

Sign up for our BIX News

Our Social Media

Most Popular

5 key considerations for implementing Gen AI in Your business

dbt Semantic Layer: How Metrics Work in Practice (and Why It Changes Analytics)

Best Observability Tools for LLM-Based Applications: A Practical Guide to Traces, Costs, Quality, and Safety

Implementing dbt in an Existing Data Warehouse: A Practical, Low-Risk Playbook

The Best BI Tools for Non‑Technical Users (and How to Choose the Right One)

The Hidden Costs of “Cheap” Data Solutions: Why Low Price Often Means High Risk

Is Your Company Ready to Use Generative AI? A Practical Readiness Guide for Leaders

Start your tech project risk-free