Database optimization is crucial for ensuring the performance and scalability of your data-intensive applications. This guide provides a step-by-step approach to tackle common performance challenges, including:
1. **Identify and analyze queries**: Analyze the queries that have the most impact on the performance and identify the queries with the highest latency, throughput, and resource utilization. This step helps you understand what queries are causing performance issues and prioritize optimization efforts.2. **Indexing and partitioning**: Optimize the index structure to align with the queries' execution plans and identify the most performant indexes. Partitioning tables based on read patterns, such as date range or user-defined columns, can improve query performance and reduce index scan time.3. **Partitioning and sharding**: Partition tables and shard your data across multiple databases or regions to distribute workload across multiple instances, alleviate read skew and increase query performance. This approach ensures that the data is available when needed, minimizing contention and improving query performance.4. **Prefer range-sargable filters over functions**: Index the range-based conditions on the indexed columns, as they are more efficient in terms of CPU and memory usage compared to using function calls.5. **Design for read-through/Write-through queries**: Ensure that reads and writes
Why Database Optimization Matters
Full Content: https://favohost.com/blog/database-optimization/
Every application lives or dies by the speed and reliability of its data layer. When the database drifts from optimal to merely “fine,” user experience erodes, infrastructure costs climb, and development slows under the weight of mysterious performance regressions. Database Optimization is the disciplined practice of designing schemas, queries, indexes, and infrastructure so your system remains fast, predictable, and affordable as it grows.
Optimization isn’t a one-time sprint. It’s a lifecycle: measure, understand, change, verify, and repeat. This article lays out a practical end-to-end approach—covering OLTP and analytics, relational and NoSQL—so you can pick the right techniques at the right time.
A Field Guide to Workloads
Before tuning anything, classify the workload. The shape of your reads and writes will determine which Database Optimization techniques apply.
- OLTP (Online Transaction Processing): Short, highly concurrent reads/writes, strict latency and ACID rules. Think carts, orders, logins.
- OLAP (Analytics/BI): Long-running scans, aggregations, complex joins on large volumes. Throughput matters more than per-query latency.
- Mixed/HTAP: Hybrid workloads, often with read replicas or separate warehouses to reduce interference.
- Time-Series/Log Data: Appends and range reads dominated by time filtering; skew toward hot recent partitions.
- Key-Value/Cache-like: Simple get/put with very high QPS and minimal joins.
Understanding this helps you choose table layout, indexing, partitioning, and infrastructure.
Performance Fundamentals: What to Measure
You can’t optimize what you don’t measure. Anchor your Database Optimization process around these signals:
- Latency: p50/p95/p99 response times for key queries (read and write).
- Throughput: Queries/sec, transactions/sec, batch jobs/hour.
- Concurrency: Active sessions, lock waits, queue depth.
- Resource Utilization: CPU, memory, disk IOPS, throughput (MB/s), network, and cache hit ratios.
- Plan Stability: Query plan hash, plan changes over time.
- Errors/Time-outs: Deadlocks, lock timeouts, replication lag.
A good workflow:
- Pick a small set of critical user journeys and the queries behind them.
- Baseline latencies at realistic load.
- Profile the worst offenders (top N by cumulative time).
- Change one thing at a time; re-measure.
The Optimization Lifecycle (Step-by-Step)
- Discover: Capture slow query logs, statement statistics, and top blocking sessions.
- Profile: EXPLAIN/EXPLAIN ANALYZE to see join order, index usage, and row estimates.
- Hypothesize: Choose the cheapest likely win (e.g., an index, query rewrite).
- Change: Apply in non-prod first. For schema/index changes, schedule during low traffic.
- Verify: Compare before/after latency distributions, plans, and resource use.
- Harden: Add tests, monitoring alerts, and documentation.
- Repeat: Prioritize the next bottleneck—optimization shifts hotspots elsewhere.
Schema Design: The Bedrock of Speed
Good schema design prevents many performance problems.
Data Modeling Principles
- Model around access patterns: Design tables for how the app queries them, not an abstract “perfect” model.
- Normalize early, denormalize selectively: Third Normal Form reduces anomalies and saves space; denormalize for critical read paths when joins dominate.
- Choose precise data types: Use the narrowest type that fits. Smaller rows = more rows per page = fewer I/O operations.
-
Default to immutable IDs: Surrogate keys (e.g.,
BIGINT
, UUIDv4 or ULID) keep joins cheap. For time-ordered inserts, consider monotonic IDs (e.g., ULID) to reduce page splits. - Optimize text: Prefer enums or reference tables for low-cardinality strings. Store large blobs out of hot rows; use separate tables or object storage.
Designing for Joins
- Foreign keys: Keep referential integrity but consider batching writes to reduce FK overhead.
-
Composite primary keys: Useful for natural partitioning (e.g.,
(tenant_id, order_id)
). - Covering columns: Preposition frequently used columns in the same table or index to reduce lookups.
Indexing That Works (and When It Doesn’t)
Indexes are your primary lever for Database Optimization—but they’re not free. They accelerate reads and slow writes.
Index Types (Relational)
Index Type | Best For | Notes |
---|---|---|
B-Tree | Equality and range on ordered columns | Default in most engines; supports multi-column |
Hash | Equality lookups | Limited use; engine-specific behavior |
GIN/GiST (Postgres) | Full-text, JSONB, arrays, geospatial | Powerful for semi-structured data |
BRIN (Postgres) | Very large, naturally ordered tables | Tiny indexes; good for time-series ranges |
Bitmap (DW engines) | Low-cardinality columns in analytics | Often in columnar warehouses |
Composite Indexes
- Order columns by selectivity and filter use. A good rule: equality conditions first, then ranges, then sorting/grouping keys.
- Make sure predicates match the index prefix.
(a, b)
won’t help forWHERE b = ?
unless the engine supports index skip-scan (and even then, less efficient). - Covering indexes (include all referenced columns) can turn random I/O into pure index scans.
Partial/Filtered Indexes
- Index only rows that matter, e.g.,
WHERE status = 'ACTIVE'
. - Great for sparse data and hot subsets.
When Indexes Hurt
- Write-heavy tables pay the cost of maintaining each index.
- Over-indexing inflates storage and slows bulk loads.
- Non-selective indexes (e.g., boolean columns) add overhead with little benefit unless paired with filtering or partial indexes.
Query Optimization: Turning Intent into Speed
Readability vs. Performance
Start with clear SQL, then optimize:
- Explicit joins > subselects when possible; optimizers often do fine either way, but explicit joins make intent clear.
- Remove unnecessary columns:
SELECT *
drags extra I/O and network cost. - Eliminate non-sargable predicates: e.g.,
WHERE DATE(created_at) = '2025-09-07'
prevents index use; rewrite as range:
WHERE created_at >= '2025-09-07' AND created_at < '2025-09-08'
.
Typical Transformations
-
Keyset pagination beats
OFFSET/LIMIT
for deep pages:
-- Bad for large offsets
SELECT id, name FROM users ORDER BY id LIMIT 50 OFFSET 100000;
-- Better: keyset pagination
SELECT id, name FROM users
WHERE id > :last_seen_id
ORDER BY id
LIMIT 50;
-
Avoid the N+1 anti-pattern: Fetch related rows in one query or use
IN
/joins with appropriate indexes. - Pre-aggregate: Materialize daily totals if dashboards continually compute them.
- Use appropriate JOINs: Inner join by default; left joins only when necessary; avoid cross joins unless intentional.
Example: From 2.3s to 140ms
Before
SELECT o.id, o.created_at, c.name, SUM(oi.qty * oi.price) AS total
FROM orders o
JOIN order_items oi ON oi.order_id = o.id
LEFT JOIN customers c ON c.id = o.customer_id
WHERE DATE(o.created_at) = CURRENT_DATE
ORDER BY o.created_at DESC
LIMIT 100;
Issues
- Non-sargable date filter.
- No covering index for
(created_at, id)
. - Sorting forces a filesort / temp.
After
-- Index: CREATE INDEX idx_orders_created_id ON orders (created_at DESC, id);
-- Index: CREATE INDEX idx_oi_order ON order_items (order_id);
SELECT o.id, o.created_at, c.name, sums.total
FROM (
SELECT o.id, o.created_at
FROM orders o
WHERE o.created_at >= CURRENT_DATE
AND o.created_at < CURRENT_DATE + INTERVAL '1 day'
ORDER BY o.created_at DESC, o.id DESC
LIMIT 100
) o
JOIN LATERAL (
SELECT SUM(oi.qty * oi.price) AS total
FROM order_items oi
WHERE oi.order_id = o.id
) AS sums ON true
LEFT JOIN customers c ON c.id = o.customer_id;
Result: Sargable range filter, efficient index order, limited working set before aggregation. Latency drop: ~2.3s → ~140ms under the same load.
(Use engine-appropriate syntax: JOIN LATERAL
in Postgres; in MySQL, a derived table with correlated subquery or a join with aggregation; in SQL Server, APPLY.)
Understanding the Optimizer: EXPLAIN and Statistics
Modern optimizers are cost-based. They estimate row counts and pick the cheapest plan. Database Optimization lives or dies by statistics quality.
How to Read Plans (Conceptually)
- Access path: index seek vs. full scan.
- Join order and type: nested loop, hash join, merge join.
- Row estimates: how many rows the optimizer believes flow through each node.
- Sort/aggregate nodes: memory usage, spill risk.
- Parallelism: degree of parallelism, repartitioning costs.
Common Stat Pitfalls
-
Stale stats: After large data changes, run
ANALYZE
(Postgres) or update stats (SQL Server). MySQL maintains InnoDB stats; persistent stats andinnodb_stats_auto_recalc
matter. - Skewed data: Histograms, extended stats, or multi-column stats reduce misestimates.
- Param Sniffing/Plan Freezing: Caching a plan for atypical parameter values can hurt typical requests. Strategies: plan guides/hints, OPTIMIZE FOR hints (SQL Server), query-level literals, or recompile when needed.
Transaction Design, Concurrency, and Locking
Latency isn’t just I/O; it’s also waits.
- Isolation levels: Higher isolation (e.g., Serializable) prevents anomalies but increases contention. Use Read Committed or Repeatable Read unless business rules demand more.
- Lock scope: Touch fewer rows to hold fewer locks. Use narrow indexes that pinpoint exactly the rows you need.
- Short transactions: Keep them brief; do not hold locks while making network calls or waiting on user input.
- Deadlocks: Order operations consistently across code paths; retry transient deadlocks automatically.
Physical Layout: Partitioning and Sharding
Partitioning (Within a Single Database)
- Range partitioning: Ideal for time-series; prune old partitions easily.
- List/Hash partitioning: Balance load across partitions to avoid hot spots.
- Benefits: Partition pruning, faster maintenance (reindex, vacuum, archiving), improved cache locality for hot partitions.
- Pitfalls: Poor key selection causes hot partitions; cross-partition queries can degrade if not pruned.
Sharding (Across Multiple Databases)
- When: Vertical scaling is exhausted; single node can’t hold data or traffic.
- Keys: Hash on user/tenant; avoid sequential keys that create hot shards.
- Cross-shard queries: Keep them rare; use an aggregator service or pre-computed rollups.
- Operational overhead: Schema changes, multi-shard transactions, and reporting all get harder.
Caching: The Unsung Hero
Caching is often the highest-ROI Database Optimization.
- Application cache: Store serialized objects or query results in an in-memory store. Invalidate by key on writes.
- Read-through/Write-through: Balance freshness and complexity. Read-through loads on cache miss; write-through updates cache on write.
- Materialized views: Periodically refresh pre-aggregated results. Great for dashboards.
- Database buffers: Size them correctly; beyond a point, application-level cache provides better returns than over-sizing DB RAM.
Memory and Storage Tuning
Memory
-
PostgreSQL:
shared_buffers
(typically 25–40% of RAM),work_mem
(per operation),effective_cache_size
(estimate of OS cache),maintenance_work_mem
(for index builds),autovacuum_*
(to control bloat). -
MySQL/InnoDB:
innodb_buffer_pool_size
(primary cache),innodb_buffer_pool_instances
(large pools),innodb_flush_log_at_trx_commit
(1 for durability; 2 or 0 for speed with tradeoffs),innodb_redo_log_capacity
(orinnodb_log_file_size
/innodb_log_files_in_group
combinations),innodb_thread_concurrency
(usually leave default). -
SQL Server: Set max server memory to avoid OS pressure;
cost threshold for parallelism
andmax degree of parallelism
(MAXDOP) for parallel query balance.
Storage
- IOPS vs. Throughput: OLTP loves IOPS (small random I/O); OLAP needs sequential throughput. Match your disk class to workload.
-
File systems: Align block sizes; consider
noatime
on Linux for fewer metadata writes. Use modern schedulers; avoid swapping. -
Compression: Row/page compression (SQL Server), TOAST and
pg_compress
options (Postgres), InnoDB page compression. Evaluate CPU tradeoffs. - Temp space: Sorts and hash joins spill; give temp volumes fast storage and headroom.
Maintenance Routines That Prevent Drift
- Vacuum/Autovacuum (Postgres): Reclaims dead tuples and maintains visibility. Tune thresholds for high-churn tables; prevent table bloat.
- Reindexing: Periodically rebuild heavily updated indexes to control fragmentation (engine-dependent).
- Analyze/Statistics: Refresh after bulk loads or large deletes.
- Partition rotation: Detach/drop old partitions; archive cold data.
- Slow query log: Keep it on; rotate frequently; mine it for regressions.
ORM Realities: Convenience Without the Drag
Object-relational mappers speed development but can hide performance traps.
- N+1 queries: Prefer eager loading or explicit joins.
- Chatty writes: Batch inserts/updates where consistency allows.
- Inefficient filters: Ensure ORM emits sargable WHERE clauses.
- Pagination helpers: Choose keyset pagination options if your ORM supports them.
- Migrations: Review generated SQL for indexes and data types.
Security and Performance: Not a Zero-Sum Game
- Row-level security: Use carefully; test plan impacts. Consider denormalized security tables or pre-computed ACL mappings for hot paths.
- Encryption: At rest is cheap with hardware acceleration; in transit is negligible. Application-level encryption can limit indexability—plan for that.
- Least privilege: Helps avoid accidental table scans from ad-hoc queries.
Cloud vs. Bare Metal: What Changes
- Managed databases: Faster to operate, easier HA, but instance classes and storage choices still matter.
- Burst credits & noisy neighbors: Watch performance variability; pin to provisioned IOPS for critical workloads.
- Autoscaling: Great for read replicas; still measure cost vs. query tuning savings.
- Network egress: Cross-AZ/region latency impacts join strategies and microservice designs.
Case Study 1: Checkout Latency Slashed
Context: An e-commerce startup saw p95 checkout latency at 1.8s during sales.
Findings
- Orders table: 600M rows, index on
id
only. - Query filtered by
status = 'OPEN'
andcreated_at >= NOW() - INTERVAL '7 days'
, with ORDER BYcreated_at DESC
.
Actions
- Created composite index
(status, created_at DESC, id)
to align with filter + sort. - Rewrote paging to keyset pagination.
- Added partial index for hot rows:
(status, created_at) WHERE status = 'OPEN'
.
Results
- p95 dropped to 210ms, p99 to 480ms. Instances scaled down one tier, cutting monthly costs ~30%.
Case Study 2: Analytics Without the Pain
Context: Marketing analytics queries ran for 30–45 minutes on the primary OLTP database.
Findings
- Large fact table with heavy joins; nightly ETL produced data skew.
- Queries scanned months of data for daily dashboards.
Actions
- Implemented daily materialized views summarizing metrics per campaign and day.
- Moved BI workloads to a columnar data store; primary DB kept for operational reads/writes.
- Range partitioned fact tables by month; pruned queries to last 7 days for dashboards.
Results
- Dashboard queries now ~3–8 seconds on the warehouse; OLTP unaffected.
Choosing the Right Data Type and Collation
-
Integers vs. bigints: Use
INT
until you near the limit;BIGINT
doubles row size for that column. - Timestamps: Store in UTC with timezone-aware types; avoid string timestamps.
- Booleans and enums: Save space and speed comparisons.
-
Collations: Sorting and LIKE behavior depend on collation; case-insensitive collations can be slower. For Postgres,
citext
is convenient but consider indexes appropriately.
Handling Large Tables and Hot Keys
- Skewed access: Identify hot keys (e.g., a celebrity user). Add read-through cache or replicate their data into a hot partition.
- Write amplification: For append-only logs, use heap/append-optimized tables and BRIN indexes (Postgres) or partition by time.
- Batching writes: Use COPY/BULK INSERT for large loads; disable/reenable non-essential indexes during one-time backfills.
Practical SQL Patterns (With Examples)
1) Turning Anti-Patterns into Wins
Avoid functions on indexed columns
-- Anti-pattern
SELECT * FROM events WHERE DATE(timestamp) = CURRENT_DATE;
-- Optimized
SELECT * FROM events
WHERE timestamp >= CURRENT_DATE
AND timestamp < CURRENT_DATE + INTERVAL '1 day';
Use EXISTS over IN for correlated checks
-- Often slower if the subquery returns many rows
SELECT * FROM users WHERE id IN (SELECT user_id FROM orders WHERE total > 100);
-- Usually better
SELECT u.*
FROM users u
WHERE EXISTS (
SELECT 1 FROM orders o
WHERE o.user_id = u.id AND o.total > 100
);
Prefer UNION ALL to UNION when duplicates aren’t possible
-- UNION enforces DISTINCT
SELECT ... FROM a
UNION ALL
SELECT ... FROM b;
2) Index-Aligned Sorting
-- Create an index that matches filter + order
CREATE INDEX idx_posts_published ON posts (published_at DESC, id);
-- Query
SELECT id, title, published_at
FROM posts
WHERE published_at >= now() - interval '30 days'
ORDER BY published_at DESC, id
LIMIT 50;
3) Keyset Pagination Template
-- First page
SELECT * FROM items ORDER BY id LIMIT 50;
-- Next page
SELECT * FROM items WHERE id > :last LIMIT 50;
4) Diagnosing With EXPLAIN
EXPLAIN ANALYZE
SELECT p.id, p.name
FROM products p
JOIN product_tags pt ON pt.product_id = p.id
WHERE pt.tag = 'sale'
ORDER BY p.id
LIMIT 100;
What to check
- Are you using an index on
product_tags(tag, product_id)
? - Is the join a nested loop with an index seek or a hash join with a large build?
- Any “disk spill” or “sort overflow” indicators? Increase
work_mem
/temp space or adjust query.
Checklists You’ll Actually Use
Pre-Deployment Performance Checklist
- [ ] Slow query log enabled in staging and prod
- [ ] Representative dataset or data generator available
- [ ] Baselines recorded (p50/p95/p99 for key endpoints)
- [ ] New/changed queries explained and reviewed
- [ ] Index impact (read vs. write) evaluated
- [ ] Migration plan with back-out option
- [ ] Alert thresholds defined (latency, errors, lock waits)
Ongoing Operations Checklist
- [ ] Weekly top-N slow query review
- [ ] Autovacuum/maintenance jobs green
- [ ] Stats freshness within acceptable windows
- [ ] Replication lag monitored
- [ ] Storage headroom > 20% on hot volumes
- [ ] Plan stability checks for critical queries
- [ ] Quarterly disaster recovery test (backups, point-in-time restore)
Cost Control Checklist
- [ ] Cache hit ratio and eviction trends reviewed
- [ ] Read replicas utilized for analytics/exports
- [ ] Materialized views for expensive dashboards
- [ ] Right-sized instance/storage tiers
- [ ] Archiving policy for cold data
Table: Symptoms to Root Causes (and Fixes)
Symptom | Likely Causes | First Fixes |
---|---|---|
Sudden latency spike | Plan change, stats stale, hot key | Refresh stats, pin/hint plan, add cache for hot key |
High CPU | Table scans, over-parallelism, JSON parsing | Add/selective indexes, adjust MAXDOP/work_mem, pre-parse |
High I/O wait | Random reads from large tables | Covering index, partitioning, data archiving |
Lock timeouts | Long transactions, inconsistent lock order | Shorten transactions, retry logic, consistent ordering |
Replication lag | Heavy writes, large transactions | Break up batches, tune WAL/redo, add replica resources |
Temp spills | Sort/aggregate exceeds memory | Increase work memory for this query, pre-aggregate |
Fragmentation/bloat | Frequent updates/deletes | Vacuum/reindex, use fillfactor, consider append-only design |
Normalization vs. Denormalization: Making the Call
- Normalize for correctness and size: Eliminates redundant storage and update anomalies; improves cache density.
-
Denormalize for read performance: Duplicate small, stable attributes to avoid hot joins (e.g.,
customer_tier
on orders). - Guardrails: Use triggers or job pipelines to keep denormalized columns synced. Document the authoritative source.
Observability for Databases: Seeing the Whole Picture
- Red, Yellow, Green dashboards: Simple outcome views for latency and errors.
- Query fingerprints: Group similar statements to identify systemic issues.
- Top blockers: Real-time view of sessions waiting on locks or I/O.
- SLOs: Define target p95 latencies for critical endpoints; alert on error budgets rather than single spikes.
- Release markers: Correlate deploys with plan changes and latency shifts.
Read Scaling and High Availability
- Read replicas: Offload analytics and expensive reads. Use read-your-writes strategies for user flows that need fresh data.
- Synchronous vs. asynchronous: Sync replicas protect from data loss but add write latency. Use for financial ledgers; async for general workloads.
- Failover drills: Practice promotions; verify application retries and DNS/connection pooling behavior.
Backups Without Performance Regret
- Physical vs. logical: Physical backups are faster; logical backups more portable.
- Backups windows: Use replicas for heavy backups; throttle I/O to reduce impact.
- Point-in-time recovery: Keep enough WAL/redo logs; test restore regularly.
- Compression and encryption: Balance CPU and storage; prioritize restore speed for RTOs.
Governance: Change Management That Scales
- Migration reviews: Treat schema/index changes like code. Include rollback plans.
- Feature flags: Roll out queries gradually; observe impact.
- Ownership: Each critical table and index should have an owner and an alert channel.
When to Choose NoSQL (and How to Optimize It)
- Key-value stores: Optimize for access patterns; use TTLs; watch hot keys and cluster slot distributions.
- Document stores: Design documents to serve reads in a single fetch; index fields that drive queries; avoid unbounded arrays.
- Column-family stores: Model for wide-row patterns; choose partition keys to spread load.
- Consistency models: Understand eventual consistency; add idempotency to writes.
The Human Side: Culture of Performance
- Make performance a product feature: Put p95 targets in your roadmap.
- Create a guild: Cross-functional group that meets monthly to review top incidents.
- Celebrate removals: Deleting unused indexes and queries is as valuable as adding new ones.
Putting It All Together: A Practical Playbook
- Inventory: List top 20 queries by cumulative time over the last week.
- Triage: Identify three with the worst ROI (slow and frequent).
- Explain: Capture plans at current stats; record cardinalities and chosen indexes.
- Hypothesize: Why is it slow? (Scan, join order, sort, spill, locks.)
- Cheap wins first: Add the smallest index or rewrite that reduces scanned rows by 90%+.
- Benchmark: Use a replay or synthetic load with realistic data distributions.
- Ship cautiously: Deploy with a feature flag or staggered rollout.
- Watch: Compare p50/p95/p99, CPU, I/O, and error rates before/after.
- Document: Keep a “perf diary” so future engineers avoid regression.
- Repeat: Optimization is a loop, not a ladder.
Advanced Topics Worth Your Time
- Extended Statistics: Multi-column correlation (Postgres) to fix cardinality illusions.
- Query Hints (with care): Force join order or index only when the optimizer is consistently wrong; remove hints when the root cause is fixed.
- Adaptive Query Processing: Some engines adjust memory grants and join strategies at runtime—stay current and test.
- Row vs. Columnar: Consider columnar extensions or external warehouses for heavy analytics.
- Application Patterns: Debounce chatty updates, batch writes, and design idempotent operations.
A “Before vs. After” Walkthrough (End-to-End)
Scenario: A SaaS app lists a customer’s 10 most recent tickets with status and last message preview. The endpoint is slow at peak.
Baseline
- p95: 820ms
- Query pattern: filter by
customer_id
, order byupdated_at DESC
, jointicket_messages
for preview text.
Schema Snippet
CREATE TABLE tickets (
id BIGSERIAL PRIMARY KEY,
customer_id BIGINT NOT NULL,
status TEXT NOT NULL,
updated_at TIMESTAMPTZ NOT NULL
);
CREATE TABLE ticket_messages (
id BIGSERIAL PRIMARY KEY,
ticket_id BIGINT NOT NULL REFERENCES tickets(id),
created_at TIMESTAMPTZ NOT NULL,
body TEXT NOT NULL
);
Problems
- Missing composite index on
(customer_id, updated_at DESC)
. - Joining all messages and then filtering to latest in the app.
Fixes
-- Composite index for filter + sort
CREATE INDEX idx_tickets_customer_updated
ON tickets (customer_id, updated_at DESC);
-- Cover messages with an index and fetch only the latest per ticket
CREATE INDEX idx_msg_ticket_created ON ticket_messages (ticket_id, created_at DESC);
Rewritten Query
WITH latest AS (
SELECT DISTINCT ON (m.ticket_id)
m.ticket_id, m.body, m.created_at
FROM ticket_messages m
ORDER BY m.ticket_id, m.created_at DESC
)
SELECT t.id, t.status, t.updated_at, l.body AS last_message
FROM tickets t
JOIN latest l ON l.ticket_id = t.id
WHERE t.customer_id = :cid
ORDER BY t.updated_at DESC
LIMIT 10;
Outcome
- The tickets list now uses index-aligned sorting and fetches exactly one message per ticket.
- p95 improved to 120ms, p99 to 260ms, with CPU reduced ~40% on the primary.
The Benefits: What Great Optimization Buys You
- Happy users: Faster response times improve conversions and retention.
- Lower costs: Right-sized instances, fewer replicas, and less over-provisioning.
- Operational calm: Predictable performance and fewer pages for on-call engineers.
- Scalability headroom: Schema and index strategy supports growth without last-minute rearchitecture.
- Business agility: Teams ship features faster when the database isn’t a bottleneck.
Best Practices You Can Adopt Today
- Prefer range-sargable filters over functions on indexed columns.
- Design indexes to match filters and sort order.
- Measure with production-like data and workloads; synthetic, but realistic.
- Keep stats fresh; watch for plan instability after large data changes.
- Paginate with keysets for infinite scroll and activity feeds.
- Use partial/filtered indexes for hot subsets of data.
- Partition time-series tables and prune aggressively.
- Cache what you can—application cache, materialized views, result caches.
- Tune memory for the workload (buffer pool/shared buffers, work_mem).
- Automate maintenance: vacuum, analyze, reindex, and backups.
- Treat schema changes like code, with reviews and rollbacks.
- Document and share learnings to make performance a team habit.
Final Thoughts
Database Optimization is not a trick bag or a checklist you do once. It’s a mindset anchored in measurement, clarity of intent, and respect for the costs of data movement. Start with the queries that matter most to your users, make the smallest changes that deliver the biggest wins, and enshrine those wins in your tooling and culture. Over time, you’ll build a system that’s not only fast, but resilient—able to grow without surprise as the business thrives.