
Reducing Dynamic Dispatch Overhead in Rust
What dynamic dispatch costs in Rust
When you call a method through a trait object, Rust must resolve the target implementation at runtime using a vtable. That means the compiler cannot fully specialize the call site the way it can with generics.
In practical terms, dynamic dispatch can introduce:
- An indirect function call
- Reduced inlining opportunities
- Less optimization across the call boundary
- Potential branch prediction misses in tight loops
For most application code, this is fine. But if the call happens millions of times per second, the overhead can become measurable.
Static dispatch vs dynamic dispatch
| Approach | Example | Cost profile | Best use case |
|---|---|---|---|
| Static dispatch | fn run<T: Worker>(w: T) | Monomorphized, inlineable, usually fastest | Hot paths, known types |
| Dynamic dispatch | fn run(w: &dyn Worker) | Runtime vtable lookup, indirect call | Plugin systems, heterogeneous collections |
| Enum dispatch | enum WorkerKind { A(A), B(B) } | Compile-time branching, often fast | Small fixed set of implementations |
The key question is not “Is trait object dispatch slow?” but “Is the dynamic flexibility worth the runtime cost in this specific path?”
When trait objects are the right choice
Trait objects are often the cleanest solution when you need one or more of these:
- Heterogeneous collections of behavior
- Runtime selection of implementations
- Plugin or extension architectures
- Reduced code size compared to heavy monomorphization
- API boundaries where concrete types should remain hidden
For example, a logging subsystem may accept Box<dyn LogSink> because the overhead is tiny compared to I/O. A parser running in a tight loop over millions of tokens is a different story.
A simple example
trait Compressor {
fn compress(&self, input: &[u8]) -> Vec<u8>;
}
struct ZstdCompressor;
struct Lz4Compressor;
impl Compressor for ZstdCompressor {
fn compress(&self, input: &[u8]) -> Vec<u8> {
input.to_vec()
}
}
impl Compressor for Lz4Compressor {
fn compress(&self, input: &[u8]) -> Vec<u8> {
input.to_vec()
}
}
fn process(compressor: &dyn Compressor, data: &[u8]) -> Vec<u8> {
compressor.compress(data)
}This is readable and flexible. If process is called occasionally, the overhead is irrelevant. If it is called in a hot loop, you should consider alternatives.
Prefer generics when the caller can choose the type
If the implementation type is known at compile time, use generics. This lets the compiler inline the method call and optimize across boundaries.
trait Compressor {
fn compress(&self, input: &[u8]) -> Vec<u8>;
}
fn process<C: Compressor>(compressor: &C, data: &[u8]) -> Vec<u8> {
compressor.compress(data)
}This version is usually faster because:
- The compiler knows the concrete type at each call site
- The method can be inlined
- Dead code can be removed more aggressively
Rule of thumb
Use generics when:
- The caller chooses the type
- Performance matters
- You do not need to store different implementations in one collection
Use trait objects when:
- The callee must accept multiple unrelated implementations at runtime
- You need type erasure
- You are crossing an API boundary
Replace trait objects with enums when the set is small and fixed
If you only have a few implementations, an enum can be faster than a trait object and still keep the code ergonomic.
trait Compressor {
fn compress(&self, input: &[u8]) -> Vec<u8>;
}
struct ZstdCompressor;
struct Lz4Compressor;
impl Compressor for ZstdCompressor {
fn compress(&self, input: &[u8]) -> Vec<u8> {
input.to_vec()
}
}
impl Compressor for Lz4Compressor {
fn compress(&self, input: &[u8]) -> Vec<u8> {
input.to_vec()
}
}
enum CompressorKind {
Zstd(ZstdCompressor),
Lz4(Lz4Compressor),
}
impl Compressor for CompressorKind {
fn compress(&self, input: &[u8]) -> Vec<u8> {
match self {
CompressorKind::Zstd(c) => c.compress(input),
CompressorKind::Lz4(c) => c.compress(input),
}
}
}This avoids vtable dispatch. The compiler can often optimize the match very well, especially when the enum is used in a hot loop.
When enums are a good fit
- A small, known set of variants
- Performance-sensitive code
- You want exhaustive handling at compile time
- You can tolerate adding a new variant by editing the enum
Reduce repeated dynamic dispatch in hot loops
The biggest performance mistake is not using a trait object once; it is using it repeatedly inside a tight loop when the implementation does not change.
Bad pattern
fn run_all(tasks: &[Box<dyn Compressor>], input: &[u8]) {
for task in tasks {
let _ = task.compress(input);
}
}If tasks is large and compress is small, the dispatch overhead can become noticeable.
Better pattern: group by implementation
If possible, reorganize work so each implementation handles a batch of inputs.
fn run_batch<C: Compressor>(compressor: &C, inputs: &[Vec<u8>]) {
for input in inputs {
let _ = compressor.compress(input);
}
}This improves locality and gives the compiler more room to optimize.
Another option: move dispatch outside the loop
If the implementation is selected once, dispatch once and then call a specialized function.
fn process_with_zstd(input: &[u8]) -> Vec<u8> {
input.to_vec()
}
fn process_with_lz4(input: &[u8]) -> Vec<u8> {
input.to_vec()
}
enum Mode {
Zstd,
Lz4,
}
fn process(mode: Mode, input: &[u8]) -> Vec<u8> {
match mode {
Mode::Zstd => process_with_zstd(input),
Mode::Lz4 => process_with_lz4(input),
}
}This pattern is especially useful when the selected path is reused many times.
Be careful with Box<dyn Trait> in data structures
Storing trait objects in collections is convenient, but it adds both dispatch overhead and pointer indirection. Each element is typically heap-allocated when boxed, which can hurt cache locality.
Common tradeoff
Vec<Box<dyn Trait>>: flexible, but scattered allocations and indirect callsVec<Enum>: compact, cache-friendly, and often fasterVec<T>: fastest when all elements share the same type
If you need polymorphism in a collection, ask whether the collection is performance-critical. If it is, an enum often performs better.
Example comparison
| Structure | Pros | Cons |
|---|---|---|
Vec<Box<dyn Trait>> | Flexible, extensible | Heap allocation per item, indirect calls |
Vec<Enum> | Compact, fast dispatch via match | Fixed set of variants |
Vec<T> | Best locality and optimization | Single concrete type only |
A good compromise is to keep trait objects at the edges of your system and use concrete types internally.
Use trait object boundaries intentionally
A practical performance strategy is to isolate dynamic dispatch at coarse-grained boundaries rather than fine-grained inner loops.
Good boundary placement
- Configuration loading
- Dependency injection
- Top-level orchestration
- Plugin registration
- Request routing
Poor boundary placement
- Per-element processing in a tight loop
- Inner numeric kernels
- Token-by-token parsing
- Per-byte transformations
The more frequently a call is executed, the more important it is to make it statically dispatchable.
Measure before and after
Dynamic dispatch overhead is real, but it is not always the bottleneck. In many cases, memory allocation, cache misses, or algorithmic complexity dominate.
Use benchmarking to validate your assumptions. Compare:
- A trait object version
- A generic version
- An enum-based version
If the difference is small, keep the clearer design. If the hot path is affected, refactor the boundary.
What to look for
- Reduced CPU cycles per operation
- Improved branch prediction
- Better inlining in generated code
- Lower instruction count
- Reduced heap traffic if boxing is removed
A microbenchmark that isolates the dispatch cost can be useful, but also test the real workload. Sometimes the dispatch cost disappears once the function does meaningful work.
Practical refactoring checklist
If you suspect dynamic dispatch is slowing down a Rust workload, work through this checklist:
- Identify the hot path
- Use profiling to find where time is spent.
- Check call frequency
- A trait object called once per request is usually fine.
- A trait object called inside a million-iteration loop is a candidate for optimization.
- Ask whether the type set is fixed
- If yes, prefer an enum.
- Ask whether the caller knows the type
- If yes, prefer generics.
- Move dispatch outward
- Dispatch once, then call specialized code repeatedly.
- Avoid boxing unless necessary
- Boxing adds allocation and indirection.
- Re-benchmark after changes
- Confirm the refactor actually helps.
A decision guide
| Situation | Recommended approach |
|---|---|
| Single known type | Generic function |
| Small fixed set of types | Enum dispatch |
| Runtime plugin architecture | Trait object |
| Performance-critical inner loop | Generic or enum |
| Public API with hidden implementation | Trait object at the boundary, concrete types inside |
This is the core design principle: keep flexibility where it matters, and keep the hot path concrete where it counts.
Conclusion
Dynamic dispatch is a useful tool in Rust, but it is not free. In performance-sensitive code, repeated calls through dyn Trait can limit inlining, add indirect branches, and reduce optimization opportunities. The best optimization is often structural: use generics when the type is known, use enums when the set is small, and reserve trait objects for boundaries where runtime flexibility is worth the cost.
If you treat trait objects as an architectural choice rather than a default, you can preserve Rust’s ergonomics while still getting excellent performance in critical paths.
