
Getting Started with Rust: Building a Reliable CSV Summarizer
What we are building
The goal is a command-line tool that:
- reads a CSV file from disk
- expects a header row with columns such as
name,department, andsalary - computes summary statistics like row count and average salary
- reports malformed rows clearly instead of failing silently
This is a useful starter project because it demonstrates a common pattern in Rust development: parse input, validate it, transform it, and return a clean result.
Example input
name,department,salary
Ava,Engineering,92000
Noah,Sales,78000
Mia,Engineering,101000Example output
Rows processed: 3
Departments: 2
Average salary: 93000.00
Highest salary: 101000Create the project
Start with a new binary crate:
cargo new csv_summarizer
cd csv_summarizerFor this tutorial, we will use one external crate to simplify CSV parsing.
Add the dependency to Cargo.toml:
[dependencies]
csv = "1"The standard library can read files, but the csv crate handles quoting, delimiters, headers, and edge cases correctly. That matters because CSV is deceptively tricky: embedded commas, escaped quotes, and missing fields are common in production data.
Define the data model
Before writing parsing logic, define a structure for each row. This keeps the code readable and makes validation explicit.
Create src/main.rs:
use std::collections::HashSet;
use std::error::Error;
use std::fs::File;
#[derive(Debug)]
struct Employee {
name: String,
department: String,
salary: u32,
}This struct models the data we expect from each row. Using u32 for salary is a simple choice for a tutorial, but in real applications you may want u64, Decimal, or a domain-specific money type.
Read and validate CSV rows
Now implement a parser that reads the file and converts each record into an Employee.
fn parse_employee_record(record: &csv::StringRecord) -> Result<Employee, String> {
let name = record.get(0).ok_or("missing name")?.trim().to_string();
let department = record.get(1).ok_or("missing department")?.trim().to_string();
let salary_str = record.get(2).ok_or("missing salary")?.trim();
if name.is_empty() {
return Err("name cannot be empty".into());
}
if department.is_empty() {
return Err("department cannot be empty".into());
}
let salary = salary_str
.parse::<u32>()
.map_err(|_| format!("invalid salary: {salary_str}"))?;
Ok(Employee {
name,
department,
salary,
})
}This function returns Result<Employee, String>, which is a practical way to separate valid records from invalid ones. In a real tool, you might return a richer error type, but a string is enough for a first pass.
Why validate manually?
CSV files often contain inconsistent data:
- empty cells
- text in numeric columns
- extra whitespace
- partially exported rows
Manual validation gives you control over how strict the tool should be. For example, you may decide to skip bad rows and continue, or stop immediately on the first error.
Summarize the dataset
Next, compute the statistics we want to print.
fn summarize(employees: &[Employee]) {
let row_count = employees.len();
let mut departments = HashSet::new();
let mut total_salary: u64 = 0;
let mut highest_salary: Option<u32> = None;
for employee in employees {
departments.insert(&employee.department);
total_salary += employee.salary as u64;
highest_salary = match highest_salary {
Some(current_max) if current_max > employee.salary => Some(current_max),
_ => Some(employee.salary),
};
}
let average_salary = if row_count > 0 {
total_salary as f64 / row_count as f64
} else {
0.0
};
println!("Rows processed: {row_count}");
println!("Departments: {}", departments.len());
println!("Average salary: {average_salary:.2}");
if let Some(max_salary) = highest_salary {
println!("Highest salary: {max_salary}");
}
}This function demonstrates a few useful Rust techniques:
HashSettracks unique departmentsOptionhandles the case where there are no rowsu64is used for the total to reduce overflow risk when summing many salaries
Best practice: choose accumulator types carefully
Even if individual values fit in u32, totals may not. When aggregating data, use a wider type for sums and averages. This is a small habit that prevents subtle bugs later.
Wire everything together
Now implement main to open the file, read records, and handle errors cleanly.
fn main() -> Result<(), Box<dyn Error>> {
let path = std::env::args()
.nth(1)
.ok_or("usage: csv_summarizer <file.csv>")?;
let file = File::open(&path)?;
let mut reader = csv::Reader::from_reader(file);
let mut employees = Vec::new();
for (index, result) in reader.records().enumerate() {
let record = result?;
match parse_employee_record(&record) {
Ok(employee) => employees.push(employee),
Err(message) => {
eprintln!("Skipping row {}: {}", index + 2, message);
}
}
}
summarize(&employees);
Ok(())
}A few details are worth noting:
std::env::args()reads command-line arguments.File::openreturns an error if the file does not exist or cannot be accessed.reader.records()iterates over data rows, excluding the header.index + 2is used because CSV row numbers are typically 1-based, and row 1 is the header.
The ? operator keeps the code concise while still propagating unexpected errors. This is one of Rust’s strengths: the happy path stays readable, and failure handling remains explicit.
Run the tool
Create a sample file named employees.csv:
name,department,salary
Ava,Engineering,92000
Noah,Sales,78000
Mia,Engineering,101000
,Support,50000
Liam,Finance,not-a-numberRun the program:
cargo run -- employees.csvExpected output:
Skipping row 5: name cannot be empty
Skipping row 6: invalid salary: not-a-number
Rows processed: 3
Departments: 2
Average salary: 93000.00
Highest salary: 101000This behavior is often preferable in data-processing tools. A single bad row should not necessarily invalidate the entire file, especially when the file comes from a human-edited spreadsheet.
Make the tool more robust
The basic version works, but a few improvements make it more production-friendly.
1. Check the header row
Right now, the code assumes the columns are in the correct order. You can validate the header before reading records:
let headers = reader.headers()?;
let expected = ["name", "department", "salary"];
for (actual, expected_name) in headers.iter().zip(expected.iter()) {
if actual != *expected_name {
return Err(format!("unexpected header: expected {expected_name}, found {actual}").into());
}
}This is useful when the input format must be stable. If the file may evolve, consider supporting named columns by position-independent lookup.
2. Use named fields instead of indexes
Index-based parsing is simple, but it is fragile if column order changes. The csv crate also supports header-based access, which is safer for long-term maintenance.
3. Separate parsing from reporting
As the tool grows, keep parsing, aggregation, and output formatting in separate functions. This makes testing easier and prevents main from becoming a large block of procedural code.
Common design choices
When building a CSV summarizer, you will usually choose one of three strategies for bad input.
| Strategy | Behavior | Best for |
|---|---|---|
| Fail fast | Stop on the first invalid row | Strict data pipelines |
| Skip invalid rows | Continue and report warnings | Human-generated exports |
| Collect all errors | Process everything, then report a full list | Data quality audits |
For a beginner project, skipping invalid rows is a good default because it demonstrates error handling without making the program unusable. For ETL jobs, collecting all errors may be more valuable.
Testing the parser
Rust makes it straightforward to test parsing logic independently from file access. Add a unit test to verify that valid and invalid rows are handled correctly.
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn parses_valid_employee_record() {
let record = csv::StringRecord::from(vec!["Ava", "Engineering", "92000"]);
let employee = parse_employee_record(&record).unwrap();
assert_eq!(employee.name, "Ava");
assert_eq!(employee.department, "Engineering");
assert_eq!(employee.salary, 92000);
}
#[test]
fn rejects_invalid_salary() {
let record = csv::StringRecord::from(vec!["Ava", "Engineering", "abc"]);
let error = parse_employee_record(&record).unwrap_err();
assert!(error.contains("invalid salary"));
}
}Testing parsing logic separately is a strong habit. It lets you verify edge cases without needing fixture files or integration tests for every scenario.
Practical extensions
Once the basic summarizer works, there are several useful directions to expand it:
- compute per-department averages
- output JSON instead of plain text
- accept a custom delimiter such as
; - support configurable column names
- write warnings to a log file
- add a
--strictflag to stop on the first invalid row
These enhancements are good exercises because they build on the same core skills: input validation, iteration, aggregation, and error handling.
Example: per-department totals
If you want to group salaries by department, use a HashMap<String, Vec<u32>> or HashMap<String, (u64, u32)> to track totals and counts. That pattern is common in Rust data-processing code and scales well as your logic becomes more complex.
Conclusion
A CSV summarizer is a compact but realistic Rust project. It teaches you how to read files, parse structured data, handle errors explicitly, and produce useful output from raw input. More importantly, it shows how Rust encourages clear boundaries between parsing, validation, and reporting.
If you continue refining this tool, focus on making the input contract explicit and the error messages actionable. Those two qualities matter in almost every developer-facing utility.
