Databases are the backbone of almost every modern application, from small websites to massive enterprise systems. However, as data grows, so does the challenge of retrieving it efficiently. That’s where database indexing comes into play.
An index is a data structure that improves the speed of data retrieval operations at the cost of additional storage and maintenance overhead. Think of it like an index in a book—it helps you quickly find what you’re looking for without scanning every page.
But not all indexes are the same. Different databases offer different types of indexes, each optimized for specific query patterns.
1. Primary Index (Clustered Index)
A primary index, often referred to as a clustered index, is the main index of a table. It determines the physical order of rows in storage and ensures that each row has a unique identifier.
How It Works
- When a clustered index is created on a column (usually the primary key), the database sorts and stores rows physically based on that column’s values.
- There can only be one clustered index per table because data can only be physically stored in one order.
Example
CREATE TABLE employees (
employee_id INT PRIMARY KEY,
name VARCHAR(100),
department VARCHAR(50)
);
Here, employee_id
is the primary index, meaning data is stored in order of employee IDs.
When to Use
- When your queries often use range-based searches (e.g., retrieving records in a specific order).
- When data retrieval benefits from sorted storage (e.g., retrieving the latest transactions).
Drawbacks
- Slower insertions and updates: Since data is stored physically in order, inserting a new record in the middle of the sequence can be costly.
2. Secondary Index (Non-Clustered Index)
A secondary index, or non-clustered index, provides a way to retrieve data quickly without altering the physical storage order.
How It Works
- Unlike clustered indexes, non-clustered indexes store pointers to the actual data rather than sorting it physically.
- A table can have multiple non-clustered indexes.
Example
CREATE INDEX idx_employee_name ON employees(name);
This index helps queries that filter by name
to run faster.
When to Use
- When queries often search by columns other than the primary key.
- When a high number of reads and low number of writes are expected.
Drawbacks
- Takes extra space since it stores pointers and duplicate values.
- Overhead in updates and inserts as index structures need to be maintained.
3. Unique Index
A unique index ensures that all values in a column (or a set of columns) remain distinct. It is automatically created when a unique constraint is applied to a column.
How It Works
- It works similarly to a primary index but does not enforce physical storage order.
- It rejects duplicate values.
Example
CREATE UNIQUE INDEX idx_unique_email ON users(email);
This ensures that no two users can have the same email.
When to Use
- When columns should not have duplicate values, such as email, phone number, or username.
Drawbacks
- Overhead on inserts and updates: Every new value needs to be checked for uniqueness before insertion.
4. Bitmap Index
A bitmap index is optimized for columns with low cardinality, meaning columns that have only a few unique values (e.g., gender
, status
, yes/no
flags).
How It Works
- It uses bitmaps (arrays of 0s and 1s) to store values compactly.
- Queries can be executed with bitwise operations, making them very efficient.
Example
If a status
column has three values (Active
, Inactive
, Pending
), a bitmap index might look like this:
ID | Active | Inactive | Pending |
---|---|---|---|
1 | 1 | 0 | 0 |
2 | 0 | 1 | 0 |
3 | 0 | 0 | 1 |
When to Use
- When indexing columns with few distinct values.
- When performing complex queries with multiple filters (e.g., WHERE gender = ‘M’ AND status = ‘Active’).
Drawbacks
- Not good for high-cardinality columns (e.g., customer names).
- Overhead in frequent updates since bitmaps must be recalculated.
5. B-Tree Index
The B-Tree index is the most commonly used index structure in databases like MySQL and PostgreSQL.
How It Works
- It organizes data into a balanced tree structure, where each node contains a sorted list of keys and pointers to child nodes.
- It provides logarithmic search time (O(log n)), making it efficient for range queries and exact matches.
Example
CREATE INDEX idx_order_date ON orders(order_date);
This makes queries like “Find all orders from last month” much faster.
When to Use
- When performing range queries (e.g., BETWEEN, ORDER BY, GROUP BY).
- When indexing high-cardinality columns (e.g., timestamps, IDs).
Drawbacks
- Consumes more storage than simpler indexes.
- More overhead in inserts and deletes due to tree balancing operations.
6. Hash Index
A hash index is an index structure that maps keys to fixed-size hash values, making exact match queries extremely fast.
How It Works
- Uses a hash function to convert values into unique hash keys.
- Instead of storing ordered data, it stores a key-value mapping.
Example
CREATE INDEX idx_hash_email ON users(email) USING HASH;
This is useful when searching for a specific email.
When to Use
- When performing exact match lookups (e.g.,
WHERE email = 'john@example.com'
). - When you don’t need range queries (hash indexes are bad for
BETWEEN
orLIKE
).
Drawbacks
- Not suitable for range queries or sorting operations.
- Risk of hash collisions, which can degrade performance.
7. Full-Text Index
A full-text index is designed for searching text-based content efficiently.
How It Works
- It tokenizes and stores words in an inverted index format.
- Allows fast search operations on large text fields.
Example
CREATE FULLTEXT INDEX idx_article_content ON articles(content);
This enables efficient text searches like:
SELECT * FROM articles WHERE MATCH(content) AGAINST('database indexing');
When to Use
- When implementing search functionality in applications.
- When working with large text fields like blog posts or descriptions.
Drawbacks
- Consumes more storage due to text processing.
- Not ideal for small datasets where simple LIKE queries are enough.
Conclusion
Indexing is one of the most powerful ways to improve database performance, but choosing the right type of index is crucial.
Index Type | Best For |
---|---|
Clustered | Sorting & range queries |
Non-clustered | Searching by other columns |
Unique | Enforcing uniqueness |
Bitmap | Low-cardinality columns |
B-Tree | General-purpose queries |
Hash | Exact match queries |
Full-Text | Text searches |
You may also like:
1) 5 Common Mistakes in Backend Optimization
2) 7 Tips for Boosting Your API Performance
3) How to Identify Bottlenecks in Your Backend
4) 8 Tools for Developing Scalable Backend Solutions
5) 5 Key Components of a Scalable Backend System
6) 6 Common Mistakes in Backend Architecture Design
7) 7 Essential Tips for Scalable Backend Architecture
8) Token-Based Authentication: Choosing Between JWT and Paseto for Modern Applications
9) API Rate Limiting and Abuse Prevention Strategies in Node.js for High-Traffic APIs
10) Can You Answer This Senior-Level JavaScript Promise Interview Question?
11) 5 Reasons JWT May Not Be the Best Choice
12) 7 Productivity Hacks I Stole From a Principal Software Engineer
13) 7 Common Mistakes in package.json Configuration
Read more blogs from Here
Share your experiences in the comments, and let’s discuss how to tackle them!
Follow me on Linkedin