Reducing Dynamic Dispatch Overhead in Rust

By Anastasia K.8 min readApril 18, 2026

What dynamic dispatch costs in Rust

When you call a method through a trait object, Rust must resolve the target implementation at runtime using a vtable. That means the compiler cannot fully specialize the call site the way it can with generics.

In practical terms, dynamic dispatch can introduce:

An indirect function call
Reduced inlining opportunities
Less optimization across the call boundary
Potential branch prediction misses in tight loops

For most application code, this is fine. But if the call happens millions of times per second, the overhead can become measurable.

Static dispatch vs dynamic dispatch

Approach	Example	Cost profile	Best use case
Static dispatch	`fn run<T: Worker>(w: T)`	Monomorphized, inlineable, usually fastest	Hot paths, known types
Dynamic dispatch	`fn run(w: &dyn Worker)`	Runtime vtable lookup, indirect call	Plugin systems, heterogeneous collections
Enum dispatch	`enum WorkerKind { A(A), B(B) }`	Compile-time branching, often fast	Small fixed set of implementations

The key question is not “Is trait object dispatch slow?” but “Is the dynamic flexibility worth the runtime cost in this specific path?”

When trait objects are the right choice

Trait objects are often the cleanest solution when you need one or more of these:

Heterogeneous collections of behavior
Runtime selection of implementations
Plugin or extension architectures
Reduced code size compared to heavy monomorphization
API boundaries where concrete types should remain hidden

For example, a logging subsystem may accept Box<dyn LogSink> because the overhead is tiny compared to I/O. A parser running in a tight loop over millions of tokens is a different story.

A simple example

trait Compressor {
    fn compress(&self, input: &[u8]) -> Vec<u8>;
}

struct ZstdCompressor;
struct Lz4Compressor;

impl Compressor for ZstdCompressor {
    fn compress(&self, input: &[u8]) -> Vec<u8> {
        input.to_vec()
    }
}

impl Compressor for Lz4Compressor {
    fn compress(&self, input: &[u8]) -> Vec<u8> {
        input.to_vec()
    }
}

fn process(compressor: &dyn Compressor, data: &[u8]) -> Vec<u8> {
    compressor.compress(data)
}

This is readable and flexible. If process is called occasionally, the overhead is irrelevant. If it is called in a hot loop, you should consider alternatives.

Prefer generics when the caller can choose the type

If the implementation type is known at compile time, use generics. This lets the compiler inline the method call and optimize across boundaries.

trait Compressor {
    fn compress(&self, input: &[u8]) -> Vec<u8>;
}

fn process<C: Compressor>(compressor: &C, data: &[u8]) -> Vec<u8> {
    compressor.compress(data)
}

This version is usually faster because:

The compiler knows the concrete type at each call site
The method can be inlined
Dead code can be removed more aggressively

Rule of thumb

Use generics when:

The caller chooses the type
Performance matters
You do not need to store different implementations in one collection

Use trait objects when:

The callee must accept multiple unrelated implementations at runtime
You need type erasure
You are crossing an API boundary

Replace trait objects with enums when the set is small and fixed

If you only have a few implementations, an enum can be faster than a trait object and still keep the code ergonomic.

trait Compressor {
    fn compress(&self, input: &[u8]) -> Vec<u8>;
}

struct ZstdCompressor;
struct Lz4Compressor;

impl Compressor for ZstdCompressor {
    fn compress(&self, input: &[u8]) -> Vec<u8> {
        input.to_vec()
    }
}

impl Compressor for Lz4Compressor {
    fn compress(&self, input: &[u8]) -> Vec<u8> {
        input.to_vec()
    }
}

enum CompressorKind {
    Zstd(ZstdCompressor),
    Lz4(Lz4Compressor),
}

impl Compressor for CompressorKind {
    fn compress(&self, input: &[u8]) -> Vec<u8> {
        match self {
            CompressorKind::Zstd(c) => c.compress(input),
            CompressorKind::Lz4(c) => c.compress(input),
        }
    }
}

This avoids vtable dispatch. The compiler can often optimize the match very well, especially when the enum is used in a hot loop.

When enums are a good fit

A small, known set of variants
Performance-sensitive code
You want exhaustive handling at compile time
You can tolerate adding a new variant by editing the enum

Reduce repeated dynamic dispatch in hot loops

The biggest performance mistake is not using a trait object once; it is using it repeatedly inside a tight loop when the implementation does not change.

Bad pattern

fn run_all(tasks: &[Box<dyn Compressor>], input: &[u8]) {
    for task in tasks {
        let _ = task.compress(input);
    }
}

If tasks is large and compress is small, the dispatch overhead can become noticeable.

Better pattern: group by implementation

If possible, reorganize work so each implementation handles a batch of inputs.

fn run_batch<C: Compressor>(compressor: &C, inputs: &[Vec<u8>]) {
    for input in inputs {
        let _ = compressor.compress(input);
    }
}

This improves locality and gives the compiler more room to optimize.

Another option: move dispatch outside the loop

If the implementation is selected once, dispatch once and then call a specialized function.

fn process_with_zstd(input: &[u8]) -> Vec<u8> {
    input.to_vec()
}

fn process_with_lz4(input: &[u8]) -> Vec<u8> {
    input.to_vec()
}

enum Mode {
    Zstd,
    Lz4,
}

fn process(mode: Mode, input: &[u8]) -> Vec<u8> {
    match mode {
        Mode::Zstd => process_with_zstd(input),
        Mode::Lz4 => process_with_lz4(input),
    }
}

This pattern is especially useful when the selected path is reused many times.

Be careful with `Box<dyn Trait>` in data structures

Storing trait objects in collections is convenient, but it adds both dispatch overhead and pointer indirection. Each element is typically heap-allocated when boxed, which can hurt cache locality.

Common tradeoff

Vec<Box<dyn Trait>>: flexible, but scattered allocations and indirect calls
Vec<Enum>: compact, cache-friendly, and often faster
Vec<T>: fastest when all elements share the same type

If you need polymorphism in a collection, ask whether the collection is performance-critical. If it is, an enum often performs better.

Example comparison

Structure	Pros	Cons
`Vec<Box<dyn Trait>>`	Flexible, extensible	Heap allocation per item, indirect calls
`Vec<Enum>`	Compact, fast dispatch via `match`	Fixed set of variants
`Vec<T>`	Best locality and optimization	Single concrete type only

A good compromise is to keep trait objects at the edges of your system and use concrete types internally.

Use trait object boundaries intentionally

A practical performance strategy is to isolate dynamic dispatch at coarse-grained boundaries rather than fine-grained inner loops.

Good boundary placement

Configuration loading
Dependency injection
Top-level orchestration
Plugin registration
Request routing

Poor boundary placement

Per-element processing in a tight loop
Inner numeric kernels
Token-by-token parsing
Per-byte transformations

The more frequently a call is executed, the more important it is to make it statically dispatchable.

Measure before and after

Dynamic dispatch overhead is real, but it is not always the bottleneck. In many cases, memory allocation, cache misses, or algorithmic complexity dominate.

Use benchmarking to validate your assumptions. Compare:

A trait object version
A generic version
An enum-based version

If the difference is small, keep the clearer design. If the hot path is affected, refactor the boundary.

What to look for

Reduced CPU cycles per operation
Improved branch prediction
Better inlining in generated code
Lower instruction count
Reduced heap traffic if boxing is removed

A microbenchmark that isolates the dispatch cost can be useful, but also test the real workload. Sometimes the dispatch cost disappears once the function does meaningful work.

Practical refactoring checklist

If you suspect dynamic dispatch is slowing down a Rust workload, work through this checklist:

Identify the hot path

Use profiling to find where time is spent.

Check call frequency

A trait object called once per request is usually fine.
A trait object called inside a million-iteration loop is a candidate for optimization.

Ask whether the type set is fixed

If yes, prefer an enum.

Ask whether the caller knows the type

If yes, prefer generics.

Move dispatch outward

Dispatch once, then call specialized code repeatedly.

Avoid boxing unless necessary

Boxing adds allocation and indirection.

Re-benchmark after changes

Confirm the refactor actually helps.

A decision guide

Situation	Recommended approach
Single known type	Generic function
Small fixed set of types	Enum dispatch
Runtime plugin architecture	Trait object
Performance-critical inner loop	Generic or enum
Public API with hidden implementation	Trait object at the boundary, concrete types inside

This is the core design principle: keep flexibility where it matters, and keep the hot path concrete where it counts.

Conclusion

Dynamic dispatch is a useful tool in Rust, but it is not free. In performance-sensitive code, repeated calls through dyn Trait can limit inlining, add indirect branches, and reduce optimization opportunities. The best optimization is often structural: use generics when the type is known, use enums when the set is small, and reserve trait objects for boundaries where runtime flexibility is worth the cost.

If you treat trait objects as an architectural choice rather than a default, you can preserve Rust’s ergonomics while still getting excellent performance in critical paths.