What Is Load Balancing? A Detailed Guide from Basics to Advanced
Learn what load balancing is, how it works, key algorithms, how to deploy with Nginx/HAProxy, and how to optimize costs. A detailed, practical guide.

Trung Vũ Hoàng
Author
1. What is Load Balancing? Why do businesses need it?
Have you ever seen a website slow down or “go down” during an ad campaign? That’s when load balancing proves its value. Load balancing is a technique for distributing traffic across multiple backend servers, helping systems stay stable, scalable, and performance-optimized.
Instead of sending every request to a single server, a load balancer sits at the reverse proxy layer (the entry point) and routes traffic to the healthiest server. This reduces overload risk, prevents downtime, and improves response speed. Even with 99.9% uptime, a business can still lose about 8.76 hours per year; with a solid load-balancing architecture, you can aim for 99.99% (~52.56 minutes/year).
According to Google, 53% of mobile users will leave if a page takes >3 seconds to load. Amazon once shared that just 100ms of extra delay can reduce revenue by about 1%. Load balancing removes “bottlenecks,” keeping experiences fast and stable—especially when you run Facebook Ads, Google Ads, or fast-growing SEO campaigns.
High availability: if one node fails, traffic shifts to another node.
Better performance: spreads load evenly, reducing sudden latency spikes.
Flexible scaling: add or remove servers as needed.
Protection: hides backends and helps block some application-layer attacks.
Takeaway: If your website is a sales channel, load balancing is “insurance” that helps you avoid losing revenue to downtime.
2. How does a Load Balancer work? L4 vs L7, health checks, and sticky sessions
A load balancer works like a “dispatch station.” Each incoming request/connection is analyzed and then routed to an appropriate backend. There are two main layers:
L4 (Transport): routes based on TCP/UDP, IP, port. Fast and lightweight, with minimal awareness of application content.
L7 (Application): routes by HTTP headers, URL, cookies, JWT, host. Flexible and supports smarter routing rules.
Health checks are the “heartbeat” of load balancing. The LB sends probe requests (HTTP 200, TCP handshake, gRPC health) to determine which backends are healthy. If a node fails or responds slowly, the LB temporarily removes it from the pool.
Sticky session (session persistence) keeps a user “pinned” to a specific server for a period of time. It’s useful for carts and logins if your app still stores state in RAM. However, sticky sessions can cause imbalance when heavy users concentrate on one node.
That’s why the current trend is stateless application servers (no session stored on the app server) with sessions stored in Redis or a database, combined with caching (CDN/Reverse proxy cache). In that setup, the LB can route more flexibly with less risk of skewed load.
SSL/TLS termination: decrypt HTTPS at the LB to reduce backend load.
Connection pooling: reuse connections to reduce TCP overhead.
Rate limiting: limit abnormal requests to reduce application-layer DDoS risk.
Takeaway: Choose L4 for speed and L7 for flexibility. Prioritize stateless design for effective load balancing.
3. Traffic distribution algorithms: how to choose the right one
The algorithm decides “who serves” a request. Each algorithm fits a specific context:
Round Robin: rotates in order. Simple and suitable when nodes are similar.
Weighted Round Robin: assigns higher weights to stronger servers (better CPU/RAM). Common for non-uniform infrastructure.
Least Connections: picks the server with the fewest active connections. Great for long-lived connections (WebSocket, HTTP/2).
Least Response Time: prioritizes the fastest-responding server. Suitable when services have varying latency.
IP Hash / Consistent Hashing: maps users by IP/key. Useful for sticky behavior or microservices that need consistent routing.
Balanced Random: simple and helps avoid skew during synchronized bursts.
Quick selection guide:
Static/uniform websites: Round Robin, Weighted RR.
API/Realtime: Least Connections or Least Response Time.
Need sticky/user-based routing: IP Hash or cookie-based routing.
Don’t forget protective thresholds like max connections, max queue, and read/write timeouts. Correct configuration helps prevent a “waterfall” of queued requests that drives up P95/P99 latency.
Takeaway: Start with Weighted RR, then measure and switch to the best algorithm for your real workload.
4. Types of Load Balancers: hardware, software, cloud
The market has three main categories, each with pros and cons:
Hardware appliance (F5, Citrix ADC): high performance, rich features (WAF, strong SSL offload). High upfront cost and operational complexity.
Software (Nginx, HAProxy, Envoy): flexible, cost-effective, easy to automate (IaC). Requires a skilled operations team.
Cloud-managed (AWS ALB/NLB, GCP Load Balancing, Azure Front Door): high availability, autoscaling, pay-as-you-go. Vendor lock-in and costs rise with traffic.
There’s also API Gateway (Kong, Tyk, Apigee) at L7 for API management (auth, quota, rate-limit, transform) and Service Mesh (Istio/Linkerd) for microservices, based on proxies like Envoy.
Combining with a CDN (Cloudflare, Akamai, Fastly) at the “network edge” can significantly reduce origin load. CDN + L7 LB is a strong pairing for content-heavy sites and high-traffic Digital Marketing campaigns.
Takeaway: SMEs should start with Nginx/HAProxy or the LB of their current cloud provider, then upgrade as needs grow.
5. Deploying Load Balancing with Nginx/HAProxy: process and key considerations
5.1 A 6-step process
1) Assess traffic: current/peak RPS, payload, static vs dynamic ratio.
2) Choose architecture: L4/L7, single LB or HA (active-active), whether to combine with a CDN.
3) Standardize backends: unify app versions, enable a health endpoint (/healthz).
4) Configure the LB: algorithm, timeouts, max conn, SSL/TLS, HTTP/2.
5) Load testing: k6/Locust/JMeter. Measure P50/P95 latency, error rate, throughput.
6) Rolling rollout: canary/blue-green, fast rollback.
5.2 Sample configurations and best practices
Nginx: upstream with least_conn or ip_hash, proxy_read_timeout 30–60s, keepalive 64–128.
HAProxy: balance leastconn, option httpchk, tune.bufsize, maxconn aligned with machine size.
SSL/TLS: use HTTP/2, TLS 1.2+, enable OCSP stapling, be cautious with HSTS.
Logging/Observability: enable access logs, expose Prometheus metrics, build Grafana dashboards.
Zero-downtime: safe config reloads, drain connections before removing a node.
When you design an e-commerce website, standardize health checks, session storage, and the CI/CD pipeline from the start to make future scaling easier.
Takeaway: A clear process + load testing makes the first deployment smooth and safe.
6. Designing sessions, data, and cache for effective load balancing
6.1 Sticky session vs stateless
Sticky session is simple but increases the risk of uneven load. When one node gets many “heavy” users, latency rises. A more sustainable solution is stateless: store sessions/tokens in Redis or issue JWT so any node can handle a request.
Session externalize: Redis with a sensible TTL, keep payloads small.
File upload: use object storage (S3/GCS) instead of local disk.
Stateful features: split into separate services accessed via API.
6.2 Multi-layer caching
Browser cache: Cache-Control, ETag.
CDN cache: can reduce 60–95% of requests to the origin when there’s lots of static content.
Reverse proxy cache: Nginx/HAProxy caching for read-heavy APIs.
Application cache: Redis/Memcached for slow queries.
For databases, deploy read replicas and use a connection pool. With heavy write workloads, consider sharding or queues (Kafka/RabbitMQ) to “flatten” traffic peaks.
Takeaway: Remove state from app servers and leverage caching and replicas to maximize what load balancing can do.
7. Monitoring, autoscaling, and security
7.1 Monitoring & autoscaling
Core metrics: RPS, P95/P99 latency, error rate (5xx/4xx), CPU/RAM, open connections, queue length.
Alerts: threshold-based and anomaly-based.
Autoscaling: based on CPU/RPS/queue. Scale out as load rises, scale in as it falls.
Set up horizontal pod autoscaler (Kubernetes) or an auto scaling group (AWS/GCP/Azure). Don’t forget pod disruption budget and graceful shutdown to avoid abrupt disconnects.
7.2 Security
WAF: block SQLi/XSS, OWASP Top 10.
DDoS protection: rate limiting, CDN, Anycast, scrubbing centers.
Internal mTLS: protect traffic between the LB and backends.
Segmentation: separate public/private networks, minimize open ports, least-privilege principle.
Implement TLS termination at the LB, and re-encrypt to the backend when needed. Encrypt at rest for caches and use secure secret management.
Takeaway: Strong observability and layered security are the non-negotiable duo of load balancing.
8. Cost, ROI, and a case study in Vietnam
8.1 Cost and ROI estimates
Downtime cost: Many reports estimate downtime can cost as much as $5,600 per minute for large enterprises. For SMEs, losing just 30–60 minutes during peak hours can wipe out an entire day’s Ads budget.
Infrastructure cost: 2–3 mid-range backends + 1 LB VM/cloud instance + a basic CDN often costs less than the revenue lost from 1–2 site outages per quarter.
Marketing performance: Speed improvements help boost CVR and ROI for SEO/Ads.
8.2 Case study
A mid-sized e-commerce marketplace in Ho Chi Minh City hit peak traffic of ~8,000 RPS during flash sales. They moved from a single server to: Cloud LB (L7) + 4 backends + Redis sessions + CDN. Results: page load time decreased by ~35%, 5xx errors dropped from 3.2% to 0.4%, and peak-hour revenue increased by 18% in the first two weeks.
For a service business (booking/registration), after adopting HAProxy + static caching via a CDN, the early-campaign drop-off rate noticeably decreased because the site responded faster and stayed stable.
Takeaway: A right-sized load-balancing architecture often “pays for itself” quickly through revenue saved from downtime.
9. Quick comparison: Nginx, HAProxy, and cloud load balancers
The table below helps you pick a suitable starting option:
Criteria | Nginx | HAProxy | Cloud LB |
|---|---|---|---|
Performance | High, flexible L7 | Very high, optimized for L4/L7 | Autoscaling, stable |
Features | Proxy, cache, SSL, http2 | Strong health checks, observability | Global, Anycast, built-in WAF |
Operations | Easy, lots of documentation | Requires tuning experience | Managed, minimal effort |
Cost | Low (OSS) | Low–medium (OSS) | Pay by usage |
Use case | Common L7 web/apps | High throughput, realtime | Global, multi-region |
Takeaway: Start with Nginx/HAProxy if your DevOps team is ready; choose a Cloud LB when you need rapid, multi-region scaling.
10. Summary and recommendations
Load balancing is the foundation of web systems that are stable, fast, and scalable. Start by measuring load, choosing an L7 LB with Weighted RR, moving sessions to Redis, adding a CDN, and monitoring P95/P99. Then optimize over time with autoscaling, WAF, and zero-downtime deployments.
Prioritize stateless architecture + multi-layer caching.
Monitor closely: RPS, latency, error rate.
Load test before major campaigns.
If you need infrastructure architecture consulting aligned with business goals and marketing performance, contact our team at Hoang Trung Digital. We help you design a stable system so ROI from SEO/Ads isn’t wasted due to downtime.
Frequently Asked Questions
Bài viết liên quan

What Is a Message Queue? Benefits, Use Cases, and Implementation Guide
What is a Message Queue? This guide explains everything from how it works and key benefits to real-world examples and how to choose RabbitMQ, Kafka, or SQS for your business.

What Is Load Balancing? A Detailed Guide from Basics to Advanced
Learn what load balancing is, how it works, key algorithms, how to deploy with Nginx/HAProxy, and how to optimize costs. A detailed, practical guide.

What Is an API Gateway? Concepts, Benefits, and How It Works
What is an API Gateway and why does it matter for businesses? This article explains everything end-to-end: how it works, benefits, tools, and a practical rollout process.