
Preventing Unsafe YAML Deserialization in Rust
Why YAML needs defensive handling
YAML is often used for files that influence application behavior: feature flags, access rules, job definitions, webhook targets, or environment-specific settings. If an attacker can modify such a file, or if your service accepts YAML from an external source, unsafe parsing can lead to:
- unexpected defaults being applied
- type confusion, such as strings where integers are expected
- resource exhaustion from very large or deeply nested documents
- duplicate keys overriding earlier values
- accepting fields you never intended to support
Rust’s type system helps, but only if you deserialize into a narrow schema and validate the result. The key idea is simple: parse first, then validate against a strict model.
Use a strict data model
The most important defense is to deserialize into a dedicated struct rather than a generic map. This gives you compile-time structure and makes it easier to reject invalid input.
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct AppConfig {
host: String,
port: u16,
tls: bool,
allowed_origins: Vec<String>,
}This model already blocks many bad inputs:
portmust fit intou16tlsmust be a booleanallowed_originsmust be a YAML sequence of strings
If the YAML contains an unexpected shape, deserialization fails immediately.
Example: safe parsing with serde_yaml
use serde::Deserialize;
#[derive(Debug, Deserialize)]
struct AppConfig {
host: String,
port: u16,
tls: bool,
allowed_origins: Vec<String>,
}
fn load_config(yaml: &str) -> Result<AppConfig, serde_yaml::Error> {
serde_yaml::from_str::<AppConfig>(yaml)
}
fn main() {
let yaml = r#"
host: "0.0.0.0"
port: 8443
tls: true
allowed_origins:
- "https://example.com"
- "https://admin.example.com"
"#;
let config = load_config(yaml).expect("valid config");
println!("{:?}", config);
}This is the baseline pattern: define a schema, deserialize into it, and fail closed on mismatch.
Reject unknown fields
A common mistake is to allow extra fields silently. That can hide typos in configuration and create a security problem when users think a setting is active but it is ignored.
Use #[serde(deny_unknown_fields)] on configuration structs:
use serde::Deserialize;
#[derive(Debug, Deserialize)]
#[serde(deny_unknown_fields)]
struct AppConfig {
host: String,
port: u16,
tls: bool,
allowed_origins: Vec<String>,
}With this attribute, YAML like the following is rejected:
host: "127.0.0.1"
port: 8080
tls: true
allowed_origins: []
debug: trueThis is especially useful for security-sensitive settings such as:
- authentication modes
- admin endpoints
- network binding addresses
- allowlists and deny rules
Validate semantic rules after deserialization
Type correctness is not enough. A value can be structurally valid and still unsafe.
For example:
port: 0is technically a validu16, but may not be acceptablehost: "0.0.0.0"may be fine for a server, but not for a clientallowed_origins: ["*"]may be too permissive- a path may be syntactically valid but point outside an intended directory
Add a validation step after parsing:
use serde::Deserialize;
use std::net::IpAddr;
#[derive(Debug, Deserialize)]
#[serde(deny_unknown_fields)]
struct AppConfig {
host: String,
port: u16,
tls: bool,
allowed_origins: Vec<String>,
}
impl AppConfig {
fn validate(&self) -> Result<(), String> {
if self.port == 0 {
return Err("port must be greater than zero".into());
}
if self.allowed_origins.is_empty() {
return Err("allowed_origins must not be empty".into());
}
if self.allowed_origins.iter().any(|o| o == "*") {
return Err("wildcard origins are not allowed".into());
}
Ok(())
}
}A useful pattern is to keep deserialization and validation separate:
- Deserialize into a narrow struct.
- Validate business rules.
- Convert into the runtime configuration type.
This separation makes error handling clearer and testing easier.
Handle duplicate keys deliberately
YAML allows mappings, and duplicate keys can be dangerous because different parsers may handle them differently. In security-sensitive code, you should not rely on “last key wins” behavior.
For example, this document is suspicious:
host: "127.0.0.1"
port: 8080
port: 443
tls: true
allowed_origins:
- "https://example.com"Depending on parser behavior, the second port may override the first. If your application treats YAML as a trusted internal format, that may still be a bug. If the file is user-controlled, it can become a configuration injection vector.
The safest approach is to:
- use a parser that surfaces duplicate-key behavior clearly
- test how your chosen parser handles duplicates
- reject ambiguous input in your configuration pipeline if possible
When in doubt, treat duplicate keys as invalid input and fail the load.
Limit document size and nesting depth
Even if a YAML document is structurally valid, it can still be used for denial-of-service-style resource exhaustion. Large sequences, deeply nested mappings, or repeated aliases can consume memory and CPU.
Defensive measures include:
- reading from bounded sources
- enforcing file size limits before parsing
- rejecting documents with excessive nesting
- avoiding untrusted YAML in hot paths
A practical rule is to keep YAML for configuration, not for high-volume data exchange. If you must accept external YAML, place it behind request size limits and parse it in a controlled worker.
Comparison of common defenses
| Risk | Recommended defense | Notes |
|---|---|---|
| Unexpected fields | #[serde(deny_unknown_fields)] | Prevents silent typos and unsupported options |
| Wrong types | Strongly typed structs | Fails fast on malformed input |
| Invalid values | Post-deserialization validation | Checks business rules and policy constraints |
| Duplicate keys | Parser-aware rejection policy | Avoid ambiguous “last key wins” behavior |
| Oversized input | Size and depth limits | Reduces memory and CPU exhaustion risk |
Prefer enums for constrained choices
Whenever a setting has a fixed set of valid values, model it as an enum instead of a string. This avoids ad hoc string comparisons and makes invalid values fail during parsing.
use serde::Deserialize;
#[derive(Debug, Deserialize)]
#[serde(rename_all = "lowercase")]
enum LogLevel {
Error,
Warn,
Info,
Debug,
}
#[derive(Debug, Deserialize)]
#[serde(deny_unknown_fields)]
struct AppConfig {
log_level: LogLevel,
}With this model, log_level: verbose is rejected automatically. That is better than accepting a string and interpreting unknown values later.
Enums are especially useful for:
- authentication providers
- deployment modes
- retry policies
- transport protocols
- feature toggles with a small valid set
Use custom deserializers for sensitive fields
Sometimes a field needs stricter parsing than the default type provides. For example, you may want to accept only specific host formats, validated URLs, or non-empty strings.
A custom deserializer lets you enforce those rules at the boundary.
use serde::de::{Error, Visitor};
use serde::{Deserialize, Deserializer};
use std::fmt;
fn non_empty_string<'de, D>(deserializer: D) -> Result<String, D::Error>
where
D: Deserializer<'de>,
{
let s = String::deserialize(deserializer)?;
if s.trim().is_empty() {
return Err(D::Error::custom("value must not be empty"));
}
Ok(s)
}
#[derive(Debug, Deserialize)]
#[serde(deny_unknown_fields)]
struct AppConfig {
#[serde(deserialize_with = "non_empty_string")]
host: String,
}This pattern is useful when a field must satisfy a narrow policy, such as:
- non-empty identifiers
- validated hostnames
- restricted path formats
- normalized email addresses
Keep custom deserializers small and focused. If the logic becomes complex, move it into a dedicated validation function.
Treat YAML as configuration, not as a general-purpose object format
A frequent security mistake is to use YAML as a flexible interchange format for arbitrary application objects. That encourages broad schemas, dynamic maps, and permissive parsing.
Instead:
- define a dedicated config schema for each use case
- avoid
serde_yaml::Valueunless you truly need dynamic structure - keep untrusted YAML away from privileged operations
- convert parsed data into a safer internal representation immediately
Using serde_yaml::Value can be acceptable for inspection or tooling, but it should not be your default for production configuration loading. The more generic the representation, the more validation burden you carry.
Build a robust loading pipeline
A secure YAML loading pipeline usually looks like this:
- Read input from a bounded source.
- Parse into a strict Rust struct.
- Reject unknown fields.
- Validate semantic constraints.
- Convert into runtime state.
- Log only sanitized error details.
Here is a practical example:
use serde::Deserialize;
use std::fs;
use std::path::Path;
#[derive(Debug, Deserialize)]
#[serde(deny_unknown_fields)]
struct AppConfig {
host: String,
port: u16,
tls: bool,
allowed_origins: Vec<String>,
}
impl AppConfig {
fn validate(&self) -> Result<(), String> {
if self.port == 0 {
return Err("port must be greater than zero".into());
}
if self.allowed_origins.is_empty() {
return Err("allowed_origins cannot be empty".into());
}
Ok(())
}
}
fn load_config(path: impl AsRef<Path>) -> Result<AppConfig, String> {
let content = fs::read_to_string(path).map_err(|e| format!("read error: {e}"))?;
let config: AppConfig = serde_yaml::from_str(&content)
.map_err(|e| format!("parse error: {e}"))?;
config.validate()?;
Ok(config)
}This design keeps the failure modes explicit. Parsing errors and policy errors are separate, which is helpful for both debugging and security reviews.
Practical checklist
Before shipping YAML-based configuration parsing, verify the following:
- The input is parsed into a dedicated struct, not a generic map.
- Unknown fields are rejected.
- Required fields are mandatory, not inferred from defaults unless intended.
- Values are validated after deserialization.
- Duplicate-key behavior is understood and tested.
- Input size is bounded.
- Deeply nested or attacker-controlled YAML is not accepted casually.
- Error messages are useful but do not expose unnecessary internal details.
If you follow these rules, YAML can remain a convenient configuration format without becoming a weak point in your application’s security posture.
