SpectreDev | High-Performance Systems Engineering Alternative: SpectreDev

// PUBLISHED30.05.26

// TIME10 MINS

// TAGS

#DATABASE#NOSQL#POSTGRES#STARTUP

// AUTHOR

Spectre Command

he wrong database choice doesn't hurt immediately. That's what makes it dangerous. For the first six months, almost any database works. Your dataset is small, your queries are simple, your team knows the codebase well enough to work around rough edges. Then you hit 500,000 records. Then a million. Queries that ran in 40ms now run in 4 seconds. You add indexes. It helps for a while. Then it doesn't.

At that point, you're not making a database decision anymore. You're making a migration decision under pressure, with a live product, and a team that has other things to ship.

The RDBMS vs NoSQL question is worth getting right early. Not because the answer is obvious, but because the consequences of getting it wrong are slow, expensive, and guaranteed.

// EXECUTIVE SUMMARY

>A relational database (PostgreSQL) enforces consistency at the data layer. That is a feature, not a constraint.
>If your data requires JOINs or ACID transactions, you need a relational database.
>NoSQL is not one thing. Document stores (MongoDB) are different from Key-Value stores (Redis) and Column-Family stores (Cassandra).
>Most startups choose MongoDB for flexibility, only to migrate back to PostgreSQL three years later when relationships get complex.

What "Relational" Actually Means in Practice

A relational database PostgreSQL, MySQL, SQLite stores data in tables with defined columns and enforces relationships between them. When you say a user has many orders, and each order has many line_items, a relational database makes that structure explicit and enforces it at the data layer.

The result: you can't insert an order that references a user that doesn't exist. You can't delete a user without deciding what happens to their orders first. The database is a second layer of correctness on top of your application code.

That's the feature, not the limitation.

Foreign keys, joins, transactions these aren't legacy concepts. They're the reason that a decade-old PostgreSQL schema at a fintech can still be queried reliably, because the database has been enforcing its own consistency rules the entire time.

The trade-off: schema changes are deliberate. Adding a column to a table with 50 million rows requires a migration. In most cases that's a non-event with modern tooling but it's not zero effort. You can't just start storing a new shape of data without thinking about it.

What "NoSQL" Actually Covers

NoSQL is not a single thing. It's a family of databases that don't use the relational model, and they differ from each other more than they differ from PostgreSQL in some cases.

The NoSQL Family

Document Stores (MongoDB, Firestore) — For variable schemas

Key-Value Stores (Redis, DynamoDB) — For extreme speed & caching

Column-Family Stores (Cassandra) — For massive write-heavy pipelines

Graph Databases (Neo4j) — For complex relationship traversal

The four types you'll encounter:

Document databases (MongoDB, Firestore, CouchDB) store data as JSON-like documents. No fixed schema two documents in the same collection can have completely different fields. Useful when your data genuinely doesn't have a consistent shape, or when you're building something fast and the schema is still changing daily. The trade-off: relationships between documents are your problem. The database won't stop you from storing a reference to a document that doesn't exist.

Key-value stores (Redis, DynamoDB in its simplest form) store a value at a key, nothing more. Extremely fast. No query language. You get the value if you know the key; if you don't know the exact key, retrieval becomes complicated. Redis is the most common cache layer in production systems for exactly this reason simple, fast, purpose-built.

Column-family stores (Cassandra, HBase) are optimised for write-heavy workloads at massive scale. Cassandra, which powers parts of Apple's infrastructure and Tokopedia's event pipelines, is designed to write millions of records per second across distributed nodes. The query model is deliberately constrained you define your access patterns upfront, and the schema is designed around those patterns. Flexible querying is not the goal.

Graph databases (Neo4j, Amazon Neptune) model data as nodes and edges entities and the relationships between them. If your core product is about traversing relationships (fraud detection networks, social graphs, recommendation engines), a graph database does this naturally. For everything else, it's overhead.

Most startups will never touch column-family or graph databases in their first three years. The real decision is between relational and document.

The Actual Decision Framework

The question isn't "which is better." The question is: which fits your data model and your access patterns?

Start with PostgreSQL if:

Your data has clear relationships that you want the database to enforce. Orders belong to users. Transactions belong to accounts. Inventory items belong to products. If you find yourself drawing entity-relationship diagrams and the arrows make sense, you're in relational territory.

Your queries need to join across multiple entities. "Show me all orders in the last 30 days for users who signed up in Jakarta and have a verified email" that's a join. PostgreSQL handles it cleanly.

You need transactions. A payment system where money leaves one account and must arrive in another atomically, or not at all needs ACID transactions. PostgreSQL guarantees them. MongoDB has multi-document transactions now, but they're slower and the mental model is harder.

Your schema is relatively stable. If you know what your data looks like and it's not changing shape every sprint, a relational schema is a feature, not a constraint.

Consider a document database if:

Your data genuinely has variable structure. A CMS where each content type has different fields. A product catalog where different product categories have completely different attributes. Here, forcing everything into rigid columns creates more pain than it solves.

You're in very early exploration. If you don't know what shape your data will take yet and need to iterate quickly, a document database lets you change shape without migrations. This buys time but plan to revisit the decision once the shape stabilises.

You're storing events or logs at scale. Time-series event data where each event is self-contained and you're not querying relationships document or specialised time-series databases are a better fit here than normalised relational tables.

The Part Most Teams Get Wrong

They choose based on ecosystem trends rather than their actual data model.

"Everyone's using MongoDB" was the refrain around 2013–2018. A lot of startups built on MongoDB during that era and then spent 2019–2022 migrating back to PostgreSQL, because it turned out their data was relational all along. The flexibility of a document database solved a problem they didn't actually have, while creating a problem they didn't expect: enforcing data consistency in application code is harder than letting the database do it.

The other common mistake: starting on PostgreSQL and switching to MongoDB when query performance degrades. Nine times out of ten, the performance problem is missing indexes or inefficient queries not the fundamental database choice. Adding a compound index often solves what a three-month migration was supposed to fix.

Before changing databases, always answer: have we indexed the columns we're querying? Have we identified the ten slowest queries and looked at their execution plans? Have we profiled whether the bottleneck is the database at all, or the application logic around it?

The [→ database sharding guide] covers what happens when you've exhausted vertical scaling on your primary database which is a different problem from picking the wrong database type, and it happens much later.

Real-World Example: Traveloka's Data Layer Decision

Traveloka, Indonesia's travel platform, runs a system where different product domains have genuinely different data requirements. Their flight inventory structured, relational, with fixed schemas dictated by airline GDS systems runs on relational databases. Their hotel content thousands of properties each with different attributes, room types, policies, and media uses a mix of document and relational storage depending on the access pattern.

The takeaway isn't "use both." It's that Traveloka made the decision per domain, based on that domain's actual data shape and query patterns. They didn't pick one database for the whole company and force every data model to fit it.

For most startups, you have one domain and one data model in the beginning. Pick the one that fits that model. The time to evaluate a second database is when a new domain with genuinely different requirements appears not when the first one gets slow.

If your data is relational, PostgreSQL will handle more scale than you expect before you need to think about alternatives. Gojek's early order management ran on PostgreSQL. Tokopedia's product catalog ran on MySQL for years before they hit the scale that required more exotic solutions. The boring choice held up longer than their engineers anticipated.

The most expensive database mistake isn't picking the wrong engine. It's making the choice based on what's trendy rather than what matches your actual data and access patterns. Three years later, that decision is either invisible or it's the story your CTO tells at every team retrospective.

FAQ

Q: Can I use PostgreSQL as a document database by storing JSON columns?

A: Yes, and it works well for specific cases. PostgreSQL's jsonb column type stores JSON with indexing support and query operators. If you have one or two fields where the structure is genuinely variable but the rest of the table is relational, jsonb is a clean solution. It's not a replacement for a document database in use cases where the entire record is unstructured but for hybrid cases it avoids the need for a second database.

Q: What about DynamoDB? Our AWS architect is recommending it.

A: DynamoDB is excellent for specific access patterns: high-volume, predictable, key-based lookups at massive scale. It is painful for ad-hoc queries, joins, and evolving access patterns because the data model must be designed around the queries upfront. Most startups don't have the scale that justifies DynamoDB's constraints. If someone recommends it, ask them to define your primary access patterns and show how the key design supports them. If they can't, it's premature.

Q: We started on MongoDB and now we're having performance issues. Should we migrate to PostgreSQL?

A: Not yet. First, profile your queries. Look at which operations are slow and why. MongoDB has a query profiler use it. If the problem is missing indexes, add them. If the problem is that you're doing queries that require joining documents across collections, then you either restructure your schema or you have a genuine mismatch between document and relational models. Migration is a last resort, not a first response.

Q: How does this choice affect how fast we can hire?

A: PostgreSQL and MySQL have a much larger pool of engineers who know them deeply. MongoDB has good coverage too. The gap matters most for specialised optimisation work finding an engineer who can tune a complex PostgreSQL query plan is easier than finding one who can do the same for Cassandra. For early-stage hiring, this factor is real but rarely the deciding one.

Q: Is it possible to run both in the same product?

A: Yes. It's called polyglot persistence and it's common at scale. One service uses PostgreSQL for transactional data; another uses Redis for session storage; a third uses a document store for user-generated content. The operational overhead of running multiple databases is real more things to monitor, more failure modes, more backup strategies. Justify each addition with a specific requirement, not general flexibility.

The database decision compounds. A good choice made early becomes invisible infrastructure that just works. A bad one becomes the story your engineers tell every time someone asks why a feature took three months instead of three weeks.

When in doubt, start relational. PostgreSQL has surprised a lot of engineers with how far it scales before it becomes the constraint. If you're unsure whether your current data model is working for or against you, an architecture review surfaces that faster than any amount of internal debate.

Internal Reference Logs: