What dynamic dispatch costs in Rust

When you call a method through a trait object, Rust must resolve the target implementation at runtime using a vtable. That means the compiler cannot fully specialize the call site the way it can with generics.

In practical terms, dynamic dispatch can introduce:

  • An indirect function call
  • Reduced inlining opportunities
  • Less optimization across the call boundary
  • Potential branch prediction misses in tight loops

For most application code, this is fine. But if the call happens millions of times per second, the overhead can become measurable.

Static dispatch vs dynamic dispatch

ApproachExampleCost profileBest use case
Static dispatchfn run<T: Worker>(w: T)Monomorphized, inlineable, usually fastestHot paths, known types
Dynamic dispatchfn run(w: &dyn Worker)Runtime vtable lookup, indirect callPlugin systems, heterogeneous collections
Enum dispatchenum WorkerKind { A(A), B(B) }Compile-time branching, often fastSmall fixed set of implementations

The key question is not “Is trait object dispatch slow?” but “Is the dynamic flexibility worth the runtime cost in this specific path?”


When trait objects are the right choice

Trait objects are often the cleanest solution when you need one or more of these:

  • Heterogeneous collections of behavior
  • Runtime selection of implementations
  • Plugin or extension architectures
  • Reduced code size compared to heavy monomorphization
  • API boundaries where concrete types should remain hidden

For example, a logging subsystem may accept Box<dyn LogSink> because the overhead is tiny compared to I/O. A parser running in a tight loop over millions of tokens is a different story.

A simple example

trait Compressor {
    fn compress(&self, input: &[u8]) -> Vec<u8>;
}

struct ZstdCompressor;
struct Lz4Compressor;

impl Compressor for ZstdCompressor {
    fn compress(&self, input: &[u8]) -> Vec<u8> {
        input.to_vec()
    }
}

impl Compressor for Lz4Compressor {
    fn compress(&self, input: &[u8]) -> Vec<u8> {
        input.to_vec()
    }
}

fn process(compressor: &dyn Compressor, data: &[u8]) -> Vec<u8> {
    compressor.compress(data)
}

This is readable and flexible. If process is called occasionally, the overhead is irrelevant. If it is called in a hot loop, you should consider alternatives.


Prefer generics when the caller can choose the type

If the implementation type is known at compile time, use generics. This lets the compiler inline the method call and optimize across boundaries.

trait Compressor {
    fn compress(&self, input: &[u8]) -> Vec<u8>;
}

fn process<C: Compressor>(compressor: &C, data: &[u8]) -> Vec<u8> {
    compressor.compress(data)
}

This version is usually faster because:

  • The compiler knows the concrete type at each call site
  • The method can be inlined
  • Dead code can be removed more aggressively

Rule of thumb

Use generics when:

  • The caller chooses the type
  • Performance matters
  • You do not need to store different implementations in one collection

Use trait objects when:

  • The callee must accept multiple unrelated implementations at runtime
  • You need type erasure
  • You are crossing an API boundary

Replace trait objects with enums when the set is small and fixed

If you only have a few implementations, an enum can be faster than a trait object and still keep the code ergonomic.

trait Compressor {
    fn compress(&self, input: &[u8]) -> Vec<u8>;
}

struct ZstdCompressor;
struct Lz4Compressor;

impl Compressor for ZstdCompressor {
    fn compress(&self, input: &[u8]) -> Vec<u8> {
        input.to_vec()
    }
}

impl Compressor for Lz4Compressor {
    fn compress(&self, input: &[u8]) -> Vec<u8> {
        input.to_vec()
    }
}

enum CompressorKind {
    Zstd(ZstdCompressor),
    Lz4(Lz4Compressor),
}

impl Compressor for CompressorKind {
    fn compress(&self, input: &[u8]) -> Vec<u8> {
        match self {
            CompressorKind::Zstd(c) => c.compress(input),
            CompressorKind::Lz4(c) => c.compress(input),
        }
    }
}

This avoids vtable dispatch. The compiler can often optimize the match very well, especially when the enum is used in a hot loop.

When enums are a good fit

  • A small, known set of variants
  • Performance-sensitive code
  • You want exhaustive handling at compile time
  • You can tolerate adding a new variant by editing the enum

Reduce repeated dynamic dispatch in hot loops

The biggest performance mistake is not using a trait object once; it is using it repeatedly inside a tight loop when the implementation does not change.

Bad pattern

fn run_all(tasks: &[Box<dyn Compressor>], input: &[u8]) {
    for task in tasks {
        let _ = task.compress(input);
    }
}

If tasks is large and compress is small, the dispatch overhead can become noticeable.

Better pattern: group by implementation

If possible, reorganize work so each implementation handles a batch of inputs.

fn run_batch<C: Compressor>(compressor: &C, inputs: &[Vec<u8>]) {
    for input in inputs {
        let _ = compressor.compress(input);
    }
}

This improves locality and gives the compiler more room to optimize.

Another option: move dispatch outside the loop

If the implementation is selected once, dispatch once and then call a specialized function.

fn process_with_zstd(input: &[u8]) -> Vec<u8> {
    input.to_vec()
}

fn process_with_lz4(input: &[u8]) -> Vec<u8> {
    input.to_vec()
}

enum Mode {
    Zstd,
    Lz4,
}

fn process(mode: Mode, input: &[u8]) -> Vec<u8> {
    match mode {
        Mode::Zstd => process_with_zstd(input),
        Mode::Lz4 => process_with_lz4(input),
    }
}

This pattern is especially useful when the selected path is reused many times.


Be careful with Box<dyn Trait> in data structures

Storing trait objects in collections is convenient, but it adds both dispatch overhead and pointer indirection. Each element is typically heap-allocated when boxed, which can hurt cache locality.

Common tradeoff

  • Vec<Box<dyn Trait>>: flexible, but scattered allocations and indirect calls
  • Vec<Enum>: compact, cache-friendly, and often faster
  • Vec<T>: fastest when all elements share the same type

If you need polymorphism in a collection, ask whether the collection is performance-critical. If it is, an enum often performs better.

Example comparison

StructureProsCons
Vec<Box<dyn Trait>>Flexible, extensibleHeap allocation per item, indirect calls
Vec<Enum>Compact, fast dispatch via matchFixed set of variants
Vec<T>Best locality and optimizationSingle concrete type only

A good compromise is to keep trait objects at the edges of your system and use concrete types internally.


Use trait object boundaries intentionally

A practical performance strategy is to isolate dynamic dispatch at coarse-grained boundaries rather than fine-grained inner loops.

Good boundary placement

  • Configuration loading
  • Dependency injection
  • Top-level orchestration
  • Plugin registration
  • Request routing

Poor boundary placement

  • Per-element processing in a tight loop
  • Inner numeric kernels
  • Token-by-token parsing
  • Per-byte transformations

The more frequently a call is executed, the more important it is to make it statically dispatchable.


Measure before and after

Dynamic dispatch overhead is real, but it is not always the bottleneck. In many cases, memory allocation, cache misses, or algorithmic complexity dominate.

Use benchmarking to validate your assumptions. Compare:

  • A trait object version
  • A generic version
  • An enum-based version

If the difference is small, keep the clearer design. If the hot path is affected, refactor the boundary.

What to look for

  • Reduced CPU cycles per operation
  • Improved branch prediction
  • Better inlining in generated code
  • Lower instruction count
  • Reduced heap traffic if boxing is removed

A microbenchmark that isolates the dispatch cost can be useful, but also test the real workload. Sometimes the dispatch cost disappears once the function does meaningful work.


Practical refactoring checklist

If you suspect dynamic dispatch is slowing down a Rust workload, work through this checklist:

  1. Identify the hot path
  • Use profiling to find where time is spent.
  1. Check call frequency
  • A trait object called once per request is usually fine.
  • A trait object called inside a million-iteration loop is a candidate for optimization.
  1. Ask whether the type set is fixed
  • If yes, prefer an enum.
  1. Ask whether the caller knows the type
  • If yes, prefer generics.
  1. Move dispatch outward
  • Dispatch once, then call specialized code repeatedly.
  1. Avoid boxing unless necessary
  • Boxing adds allocation and indirection.
  1. Re-benchmark after changes
  • Confirm the refactor actually helps.

A decision guide

SituationRecommended approach
Single known typeGeneric function
Small fixed set of typesEnum dispatch
Runtime plugin architectureTrait object
Performance-critical inner loopGeneric or enum
Public API with hidden implementationTrait object at the boundary, concrete types inside

This is the core design principle: keep flexibility where it matters, and keep the hot path concrete where it counts.


Conclusion

Dynamic dispatch is a useful tool in Rust, but it is not free. In performance-sensitive code, repeated calls through dyn Trait can limit inlining, add indirect branches, and reduce optimization opportunities. The best optimization is often structural: use generics when the type is known, use enums when the set is small, and reserve trait objects for boundaries where runtime flexibility is worth the cost.

If you treat trait objects as an architectural choice rather than a default, you can preserve Rust’s ergonomics while still getting excellent performance in critical paths.

Learn more with useful resources