Scaling PostgreSQL for Multi-Tenant SaaS: From Millions to Billions of Rows

Every transaction, audit log, and activity compounds across tenants. The question was never if we’d hit scale problems. It was only ever when, and whether we’d be prepared. Here’s how we evolved our PostgreSQL architecture to handle it.


THE PROBLEM

When Growth Starts Hurting

In the early stages of Ankpal ERP, a single database works just fine. But as usage increases, certain pain points surface fast:

  • Tables balloon from millions to hundreds of millions of rows
  • Indexes become large and inefficient
  • Queries that once ran instantly begin degrading
  • Maintenance tasks like VACUUM become expensive operations
  • Backups and migrations take significantly longer

A query as simple as this starts to show its limits:

sql

SELECT *

FROM activity

WHERE tenant_id = $1;

At this point, the problem isn’t just storage. It’s performance at scale.


CORE INSIGHT

Access Patterns Drive Design

One thing became very clear in building our accounting SaaS: most queries are tenant-scoped. Whether it’s fetching transactions, logs, or activity, almost every query filters by tenant_id.

Instead of designing a generic database structure, we aligned everything with how data is actually accessed. This single insight shaped our entire architecture.


SECURITY

A Note on Row-Level Security (RLS)

Tenant isolation isn’t just an architectural concern. It’s a security one too.

PostgreSQL offers native Row-Level Security (RLS), which enforces access policies at the database level. With RLS enabled on a table, PostgreSQL automatically filters rows based on the current session context, ensuring a tenant can never accidentally access another tenant’s data, even if application-layer logic has a bug.

Our current approach: At Ankpal, we currently enforce tenant isolation at the application layer through code. It works well and gives us full control over query routing and filtering. That said, we’re evaluating native PostgreSQL RLS as an additional enforcement layer, since defense in depth is always the right instinct when dealing with multi-tenant financial data.

If you’re building a similar system, both approaches are valid. Code-level isolation is easier to reason about during early development. Database-level RLS adds a strong safety net as your team and codebase grow.


OUR EVOLUTION

Three Phases of Scaling

Phase 1: Single Database with Sharding in Mind

We started with a single primary database, which is where most SaaS products begin. From day one though, we built the codebase to be shard-ready, knowing we’d eventually need to distribute data across multiple databases. The primary DB does double duty here: it holds all tenant data in the early stages, and it also maintains the tenant directory, a mapping that tells the application which shard each tenant lives on as we scale out.

We ultimately chose a range-based sharding approach, grouping tenants into fixed ID ranges. For example, one shard would handle tenants 1–1000, the next 1001–2000, and so on. This made shard routing predictable and simple, while keeping operational overhead low. So the primary DB always knows exactly where to send each request

Designing the system to be shard-ready from the beginning gave us the flexibility to scale incrementally, without requiring a major rewrite when growth demanded it.

Phase 2: Per-Tenant Schemas

Within each shard, high-value or high-volume tenants get their own dedicated schema (tenant_10, tenant_11, etc.). Full isolation, easier per-tenant migrations, no cross-tenant noise.

Phase 3: Shared Tenant Schema Smaller or newer tenants share a single schema (shared_tenants). All their data lives together in one schema, differentiated only by tenant_id. This keeps resource usage efficient for tenants that don’t yet need dedicated isolation.

Phase 4: Table Partitioning

Across both dedicated and shared schemas, the high-growth tables (activity logs, audit trails, transaction records) are hash-partitioned by tenant_id. PostgreSQL prunes irrelevant partitions automatically, keeping indexes lean and queries fast.

Partitioning solves this by splitting large tables into smaller physical pieces while still letting the application query them as a single logical table. We chose hash partitioning by tenant_id:

When a query includes WHERE tenant_id = 17, PostgreSQL identifies the exact partition, skips all others, and scans only the relevant slice. The result: fewer disk reads, smaller per-partition indexes, and faster execution.


THE RESULT

Our Final Architecture

We arrived at a layered model:

Application → Primary DB → Shards → Per-Tenant Schema or Shared Schema → Partitioned Tables

Each layer solves a different dimension of the problem. Sharding handles horizontal capacity. Schemas enforce tenant isolation. Partitioning keeps individual tables fast and maintainable. It’s not a single solution but a combination of strategies working together.


KEY TAKEAWAYS

  • Shard when a single database can no longer handle the load, not before. Premature sharding adds operational complexity without benefit.
  • Partition when specific tables become too large. Identify the few tables driving the most growth and target them directly.
  • Layer your isolation strategy. Code-level enforcement is a great start. Native PostgreSQL RLS adds a second line of defense for multi-tenant financial data.
  • Measure everything: query plans, index sizes, and maintenance overhead all tell you when it’s time to evolve.

Closing Thought

Scaling a database isn’t about jumping to the most complex solution early. It’s about evolving your architecture as your system grows. What worked at millions of rows didn’t work at billions. But by aligning our design with real-world usage patterns, we built a system that scales both technically and operationally.

And the most important lesson?

Store your data the way you query it.

Leave a Reply

Your email address will not be published. Required fields are marked *