Understanding the Risks

When user input is used to construct file paths or system commands, it opens the door for attackers to manipulate the application's behavior. For example, an attacker might provide a malicious string like ../../etc/passwd to access restricted files. Similarly, unvalidated user input in system calls can lead to command injection attacks.

To mitigate these risks, developers should:

  • Avoid direct string concatenation when constructing file paths.
  • Sanitize and validate all user input.
  • Use secure functions and libraries that prevent common pitfalls.

Secure File Handling with os and pathlib

Python’s os and pathlib modules provide utilities for file system operations, but they must be used carefully. The pathlib module is preferred for its object-oriented approach and built-in safety features.

from pathlib import Path
import os

def safe_read_file(user_input):
    base_dir = Path("/safe/directory")
    user_path = Path(user_input)

    # Ensure the resolved path is within the base directory
    full_path = base_dir / user_path
    if not full_path.resolve().is_relative_to(base_dir):
        raise ValueError("Access denied: Path traversal attempt detected")

    if not full_path.is_file():
        raise FileNotFoundError(f"File {full_path} not found")

    with full_path.open("r") as f:
        return f.read()

In the example above, is_relative_to() ensures that the resolved path is within the intended directory, preventing path traversal attacks. This is more secure than using os.path functions like os.path.abspath() without additional checks.

Input Validation and Normalization

Validating and normalizing user input is critical for preventing injection and other attacks. The re module can be used for pattern matching, but it’s important to use it correctly.

import re

def validate_username(username):
    # Allow only alphanumeric characters and underscores
    if not re.fullmatch(r"^[a-zA-Z0-9_]+$", username):
        raise ValueError("Invalid username: Only alphanumeric characters and underscores are allowed")
    return username.lower()

This function ensures that the username conforms to a safe pattern and is normalized to lowercase, reducing the risk of case-based injection or duplicate entries.

Secure Deserialization and Data Unmarshalling

Deserialization of untrusted data can lead to arbitrary code execution. Avoid using pickle for untrusted data. Instead, use safer serialization formats like JSON.

import json

def safe_deserialize(data):
    try:
        return json.loads(data)
    except json.JSONDecodeError:
        raise ValueError("Invalid JSON data")

Always validate the structure and content of deserialized data to ensure it conforms to expected types and constraints.

Using Input Whitelisting for Command Arguments

When passing user input to system commands, use whitelisting and parameterized execution to prevent command injection.

import subprocess

def run_safe_command(command, args):
    allowed_commands = {"ls", "grep", "cat"}
    if command not in allowed_commands:
        raise ValueError("Command not allowed")

    # Normalize and sanitize arguments
    safe_args = [str(arg).strip() for arg in args]
    subprocess.run([command] + safe_args, check=True)

This example ensures that only pre-approved commands are executed and that arguments are sanitized before use.

Summary of Best Practices

PracticeDescription
Avoid direct path concatenationUse pathlib or os.path.join() to prevent path injection.
Validate all user inputUse regex or schema validation to ensure input conforms to expected patterns.
Use whitelisting for commandsOnly allow predefined commands to be executed.
Prefer JSON over pickleUse JSON for safe data serialization and deserialization.
Normalize inputConvert input to a standard form (e.g., lowercase) to avoid case-based issues.

Learn more with useful resources