Getting Started with Rust: Building a Reliable CSV Summarizer

By Robert F8 min readApril 27, 2026

What we are building

The goal is a command-line tool that:

reads a CSV file from disk
expects a header row with columns such as name, department, and salary
computes summary statistics like row count and average salary
reports malformed rows clearly instead of failing silently

This is a useful starter project because it demonstrates a common pattern in Rust development: parse input, validate it, transform it, and return a clean result.

Example input

name,department,salary
Ava,Engineering,92000
Noah,Sales,78000
Mia,Engineering,101000

Example output

Rows processed: 3
Departments: 2
Average salary: 93000.00
Highest salary: 101000

Create the project

Start with a new binary crate:

cargo new csv_summarizer
cd csv_summarizer

For this tutorial, we will use one external crate to simplify CSV parsing.

Add the dependency to Cargo.toml:

[dependencies]
csv = "1"

The standard library can read files, but the csv crate handles quoting, delimiters, headers, and edge cases correctly. That matters because CSV is deceptively tricky: embedded commas, escaped quotes, and missing fields are common in production data.

Define the data model

Before writing parsing logic, define a structure for each row. This keeps the code readable and makes validation explicit.

Create src/main.rs:

use std::collections::HashSet;
use std::error::Error;
use std::fs::File;

#[derive(Debug)]
struct Employee {
    name: String,
    department: String,
    salary: u32,
}

This struct models the data we expect from each row. Using u32 for salary is a simple choice for a tutorial, but in real applications you may want u64, Decimal, or a domain-specific money type.

Read and validate CSV rows

Now implement a parser that reads the file and converts each record into an Employee.

fn parse_employee_record(record: &csv::StringRecord) -> Result<Employee, String> {
    let name = record.get(0).ok_or("missing name")?.trim().to_string();
    let department = record.get(1).ok_or("missing department")?.trim().to_string();
    let salary_str = record.get(2).ok_or("missing salary")?.trim();

    if name.is_empty() {
        return Err("name cannot be empty".into());
    }

    if department.is_empty() {
        return Err("department cannot be empty".into());
    }

    let salary = salary_str
        .parse::<u32>()
        .map_err(|_| format!("invalid salary: {salary_str}"))?;

    Ok(Employee {
        name,
        department,
        salary,
    })
}

This function returns Result<Employee, String>, which is a practical way to separate valid records from invalid ones. In a real tool, you might return a richer error type, but a string is enough for a first pass.

Why validate manually?

CSV files often contain inconsistent data:

empty cells
text in numeric columns
extra whitespace
partially exported rows

Manual validation gives you control over how strict the tool should be. For example, you may decide to skip bad rows and continue, or stop immediately on the first error.

Summarize the dataset

Next, compute the statistics we want to print.

fn summarize(employees: &[Employee]) {
    let row_count = employees.len();

    let mut departments = HashSet::new();
    let mut total_salary: u64 = 0;
    let mut highest_salary: Option<u32> = None;

    for employee in employees {
        departments.insert(&employee.department);
        total_salary += employee.salary as u64;

        highest_salary = match highest_salary {
            Some(current_max) if current_max > employee.salary => Some(current_max),
            _ => Some(employee.salary),
        };
    }

    let average_salary = if row_count > 0 {
        total_salary as f64 / row_count as f64
    } else {
        0.0
    };

    println!("Rows processed: {row_count}");
    println!("Departments: {}", departments.len());
    println!("Average salary: {average_salary:.2}");

    if let Some(max_salary) = highest_salary {
        println!("Highest salary: {max_salary}");
    }
}

This function demonstrates a few useful Rust techniques:

HashSet tracks unique departments
Option handles the case where there are no rows
u64 is used for the total to reduce overflow risk when summing many salaries

Best practice: choose accumulator types carefully

Even if individual values fit in u32, totals may not. When aggregating data, use a wider type for sums and averages. This is a small habit that prevents subtle bugs later.

Wire everything together

Now implement main to open the file, read records, and handle errors cleanly.

fn main() -> Result<(), Box<dyn Error>> {
    let path = std::env::args()
        .nth(1)
        .ok_or("usage: csv_summarizer <file.csv>")?;

    let file = File::open(&path)?;
    let mut reader = csv::Reader::from_reader(file);

    let mut employees = Vec::new();

    for (index, result) in reader.records().enumerate() {
        let record = result?;
        match parse_employee_record(&record) {
            Ok(employee) => employees.push(employee),
            Err(message) => {
                eprintln!("Skipping row {}: {}", index + 2, message);
            }
        }
    }

    summarize(&employees);
    Ok(())
}

A few details are worth noting:

std::env::args() reads command-line arguments.
File::open returns an error if the file does not exist or cannot be accessed.
reader.records() iterates over data rows, excluding the header.
index + 2 is used because CSV row numbers are typically 1-based, and row 1 is the header.

The ? operator keeps the code concise while still propagating unexpected errors. This is one of Rust’s strengths: the happy path stays readable, and failure handling remains explicit.

Run the tool

Create a sample file named employees.csv:

name,department,salary
Ava,Engineering,92000
Noah,Sales,78000
Mia,Engineering,101000
,Support,50000
Liam,Finance,not-a-number

Run the program:

cargo run -- employees.csv

Expected output:

Skipping row 5: name cannot be empty
Skipping row 6: invalid salary: not-a-number
Rows processed: 3
Departments: 2
Average salary: 93000.00
Highest salary: 101000

This behavior is often preferable in data-processing tools. A single bad row should not necessarily invalidate the entire file, especially when the file comes from a human-edited spreadsheet.

Make the tool more robust

The basic version works, but a few improvements make it more production-friendly.

1. Check the header row

Right now, the code assumes the columns are in the correct order. You can validate the header before reading records:

let headers = reader.headers()?;
let expected = ["name", "department", "salary"];

for (actual, expected_name) in headers.iter().zip(expected.iter()) {
    if actual != *expected_name {
        return Err(format!("unexpected header: expected {expected_name}, found {actual}").into());
    }
}

This is useful when the input format must be stable. If the file may evolve, consider supporting named columns by position-independent lookup.

2. Use named fields instead of indexes

Index-based parsing is simple, but it is fragile if column order changes. The csv crate also supports header-based access, which is safer for long-term maintenance.

3. Separate parsing from reporting

As the tool grows, keep parsing, aggregation, and output formatting in separate functions. This makes testing easier and prevents main from becoming a large block of procedural code.

Common design choices

When building a CSV summarizer, you will usually choose one of three strategies for bad input.

Strategy	Behavior	Best for
Fail fast	Stop on the first invalid row	Strict data pipelines
Skip invalid rows	Continue and report warnings	Human-generated exports
Collect all errors	Process everything, then report a full list	Data quality audits

For a beginner project, skipping invalid rows is a good default because it demonstrates error handling without making the program unusable. For ETL jobs, collecting all errors may be more valuable.

Testing the parser

Rust makes it straightforward to test parsing logic independently from file access. Add a unit test to verify that valid and invalid rows are handled correctly.

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn parses_valid_employee_record() {
        let record = csv::StringRecord::from(vec!["Ava", "Engineering", "92000"]);
        let employee = parse_employee_record(&record).unwrap();

        assert_eq!(employee.name, "Ava");
        assert_eq!(employee.department, "Engineering");
        assert_eq!(employee.salary, 92000);
    }

    #[test]
    fn rejects_invalid_salary() {
        let record = csv::StringRecord::from(vec!["Ava", "Engineering", "abc"]);
        let error = parse_employee_record(&record).unwrap_err();

        assert!(error.contains("invalid salary"));
    }
}

Testing parsing logic separately is a strong habit. It lets you verify edge cases without needing fixture files or integration tests for every scenario.

Practical extensions

Once the basic summarizer works, there are several useful directions to expand it:

compute per-department averages
output JSON instead of plain text
accept a custom delimiter such as ;
support configurable column names
write warnings to a log file
add a --strict flag to stop on the first invalid row

These enhancements are good exercises because they build on the same core skills: input validation, iteration, aggregation, and error handling.

Example: per-department totals

If you want to group salaries by department, use a HashMap<String, Vec<u32>> or HashMap<String, (u64, u32)> to track totals and counts. That pattern is common in Rust data-processing code and scales well as your logic becomes more complex.

Conclusion

A CSV summarizer is a compact but realistic Rust project. It teaches you how to read files, parse structured data, handle errors explicitly, and produce useful output from raw input. More importantly, it shows how Rust encourages clear boundaries between parsing, validation, and reporting.

If you continue refining this tool, focus on making the input contract explicit and the error messages actionable. Those two qualities matter in almost every developer-facing utility.