If you ask any mid-senior Go dev what makes goroutines ’lightweight’ and you’ll get the standard reply:
They start with 2 KB of stack instead of 1 MB like OS threads.
They’re not wrong. But they’re not thinking deep enough.
Go’s stack model isn’t just a small preallocated buffer; it’s a live, evolving region of memory that resizes in realtime, grows when needed, and (rarely) shrinks. It’s also bounded. Not infinite. Bounded by hard design.
And none of this is your typical day-to-day developer concern. Until it is.
Let’s start simple. Go’s runtime gives each new goroutine 2 KB of stack. That’s tiny. But Go doesn’t panic when you blow past it - it grows the stack dynamically, by allocating a new region (typically doubling the size) and copying the old stack frames over.
This is a silent, behind-the-scenes act of memory juggling that can happen dozens or hundreds of times per process, with no visibility unless you go looking.
Here’s the kicker: each goroutine has an upper stack limit and it’s not documented in bold in any official place.
The hard upper bound per goroutine? Around 1 GB of stack.
Hit it, and the program panics immediately:
runtime: goroutine stack exceeds 1000000000-byte limit
fatal error: stack overflow
That’s not a soft fail. That’s a crash.
And it’s easier to hit than you think if you’re writing recursive algorithms, parsing deeply nested data, or spawning goroutines in hot paths that grow quickly under concurrency.
Stack Growth Lifecycle
Every time a goroutine’s stack runs out of space, the runtime silently doubles the allocation and copies everything over. This is the full lifecycle from creation to regrowth.
Let’s prove it.
Try this:
package main
func deep(n int) {
var buf [1024]byte // 1 KB per frame
buf[0] = byte(n)
if n > 0 {
deep(n - 1)
}
}
func main() {
deep(4096 * 4096) // Push for 1 GB stack with 16 million calls
}
You’ll crash. Every call uses 1 KB on the stack. 1M recursive calls = 1 GB.
> go run main.go
runtime: goroutine stack exceeds 1000000000-byte limit
runtime: sp=0x140201603a0 stack=[0x14020160000, 0x14040160000]
fatal error: stack overflow
runtime stack:
runtime.throw({0x104f9c923?, 0x100000000?})
... and more fluff ...
The stack doesn’t shrink after this. It doesn’t get reused by default. The runtime gives up.
Here’s something most people don’t know: stack growth triggers memory copy operations. Every time your goroutine blows past its stack limit, the runtime:
- Allocates a new larger stack
- Copies the existing stack to the new one
- Updates stack pointers and metadata
- Continues execution like nothing happened
This is not free. It introduces latency and can increase garbage collection overhead - because stacks contain pointers, and the Go GC must scan every live goroutine stack frame for reachable objects.
The more your stacks grow, the more work your GC has to do.
Even if those stacks are just frames, if they hold pointers, they’re GC roots.
Stack Size Progression
Stacks double on each growth event, from the initial 2 KB up to the ~1 GB ceiling. The GC can shrink stacks back down, but only under specific conditions.
A goroutine with a 2 KB stack is cheap.
A goroutine that grows to 512 KB, holds references to large objects, and lives long enough to survive multiple GC cycles? That’s not cheap anymore. That’s stealth memory overhead.
Let’s look at this example:
package main
import (
"time"
)
func holdMemory(n int) {
var data [128 * 1024]byte // 128 KB
data[0] = 1
time.Sleep(10 * time.Second) // Keep goroutine alive
}
func main() {
for i := 0; i < 1000; i++ {
go holdMemory(i)
}
time.Sleep(30 * time.Second)
}
You just spawned 1000 goroutines, each holding at least 128 KB on stack. That’s 128 MB of live stack memory not counted in your heap, but scanned by GC. And it only gets worse under load.
Now the part nobody talks about: stack shrinking.
Yes, Go does shrink goroutine stacks, but only during garbage collection, and only if:
- The goroutine is idle
- The stack is mostly unused
- The shrink won’t cause immediate regrowth
In other words: don’t count on it. Go is conservative with stack shrinkage. This means a burst of high-memory goroutines can bloat your memory profile long after the work is done, unless the GC kicks in and decides to do housecleaning - which it might not.
Want to observe it?
You can’t. There is no public runtime metric for per-goroutine stack usage. You can’t pprof it directly unless you attach custom logic. You can’t even tell if a goroutine stack has been shrunk unless you look into a trace.
Contiguous Stack Copy
When a stack outgrows its current allocation, Go allocates a new contiguous block at double the size, copies all frames, and adjusts every internal pointer. The old stack is then freed.
Want to go deeper? Try this:
package main
import (
"fmt"
"runtime"
)
func recurse(n int) {
var buf [1024]byte
buf[0] = byte(n)
if n > 0 {
recurse(n - 1)
}
}
func main() {
var m runtime.MemStats
runtime.ReadMemStats(&m)
fmt.Println("Before:", m.StackInuse)
recurse(1000)
runtime.ReadMemStats(&m)
fmt.Println("After:", m.StackInuse)
}
It prints stack usage in bytes before and after a deep recursion.
> go run main.go
Before: 262144
After: 294912
You’ll see how the memory gets allocated, but never explicitly freed.
Let’s hit one more unexplored angle: stack growth can trigger GC pressure even without heap allocations.
If you think your service has no memory leak because you aren’t allocating on the heap, you’re missing the point. A runaway stack holds pointers. Those pointers get scanned. That means GC is invoked more often, or takes longer, even if you aren’t growing the heap.
This is how your 5ms p99 turns into 100ms not from bad code, but from unseen stack behavior.
There’s no tuning knob for stack size.
No config.
No CLI flag to control initial stack size, max size, or shrink behavior.
The only way to manage it is through code discipline:
- Avoid recursive goroutines unless they terminate quickly
- Don’t hold large structs or pointers deep in call graphs
- Be aware of implicit stack use via function calls
- Never assume goroutines are ‘free’. Inspect their memory impact
- Use
GODEBUG=efence=1to crash fast and find limits
You don’t need to memorize internals. But you do need to understand the consequences.
Most devs won’t talk about this. Few don’t know either.
But now you do.
And if you write systems that scale, this will hit you eventually.
Better to learn it now while your stack’s still small.