- >Goroutines start with a 2KB stack, compared to 1MB for OS threads.
- >Use Worker Pools to cap concurrency and prevent CPU starvation.
- >Zero-Copy networking libraries are essential for high-throughput websocket servers.
The Single-Threaded Event Loop is a brilliant architecture for I/O-bound microservices. It is a catastrophe for stateful, long-lived connections at scale.
When building real-time infrastructure (Signaling Servers, Market Data Feeds), we observed Node.js explicitly failing at the 10k concurrent connection mark due to GC pauses and context switching overhead.
We migrated the core websocket layer to Go. This is the architectural post-mortem.
The Memory Penalty
In V8 (Node.js), every WebSocket connection is an Object. At 100k connections, the heap size explodes, triggering the Garbage Collector to pause the world for hundreds of milliseconds. In a trading environment, a 200ms pause is unacceptable.
Go handles concurrency differently. It uses Goroutines—lightweight threads managed by the Go Runtime, not the OS.
The Worker Pool Pattern
Spawning a Goroutine per request is cheap, but not free. To handle 100k concurrents without exhausting system resources, we implement a strict Worker Pool pattern to cap active processing.
1package main23import "sync"45type Job interface {6 Process()7}89type WorkerPool struct {10 maxWorkers int11 jobQueue chan Job12 wg sync.WaitGroup13}1415func NewWorkerPool(maxWorkers int) *WorkerPool {16 pool := &WorkerPool{17 maxWorkers: maxWorkers,18 jobQueue: make(chan Job),19 }20 // Initialize the fixed number of workers immediately21 pool.start()22 return pool23}2425func (wp *WorkerPool) start() {26 for i := 0; i < wp.maxWorkers; i++ {27 wp.wg.Add(1)28 go func() {29 defer wp.wg.Done()30 for job := range wp.jobQueue {31 job.Process()32 }33 }()34 }35}
Ensure your jobQueue is buffered. If the buffer fills up, the producer will block, causing backpressure that can cascade upstream to your API Gateway. Always implement a select with a default/timeout case for non-blocking pushes.
Zero-Copy Upgrades
Standard Go net/http creates a new goroutine for every request. For WebSockets, we utilize Gobwas/ws or Gnet (event-loop networking for Go) to perform "Zero-Copy" upgrades. This allows us to read the frame header without allocating memory for the payload until we determine routing logic.
Benchmark Results (AWS c5.large)
| Metric | Node.js (ws) | Go (Goroutines) | Diff | | :--- | :--- | :--- | :--- | | Idle Memory (10k Conns) | 600MB | 85MB | -85% | | CPU (Message Broadcast) | 85% | 15% | -70% | | Max Concurrents | ~18k | ~150k | 8x |
Conclusion
Node.js is for orchestration; Go is for calculation and concurrency. By moving the "Live" layer to Go, we reduced our server footprint by 60% while increasing headroom for traffic spikes.