Setting Up Your SQL Environment

Before writing any queries, ensure you have access to a relational database system like PostgreSQL, MySQL, or SQLite. For this tutorial, we'll use SQLite due to its lightweight and portable nature. You can download SQLite from sqlite.org or install it via a package manager.

Once installed, open the SQLite command-line tool or use a GUI client like DB Browser for SQLite. Start by creating a sample database and table:

-- Create a sample database and table
CREATE TABLE employees (
    id INTEGER PRIMARY KEY,
    first_name TEXT NOT NULL,
    last_name TEXT NOT NULL,
    department TEXT,
    salary REAL
);

Writing Your First Query

Let’s insert some data and retrieve it:

-- Insert sample data
INSERT INTO employees (first_name, last_name, department, salary)
VALUES
    ('Alice', 'Smith', 'Engineering', 85000),
    ('Bob', 'Johnson', 'Sales', 60000),
    ('Charlie', 'Williams', 'Engineering', 92000),
    ('Diana', 'Brown', 'HR', 55000);

-- Select all records
SELECT * FROM employees;

This query retrieves all rows from the employees table. While this is simple, real-world applications often require filtering, sorting, and aggregating data.

Filtering and Sorting Data

Use the WHERE and ORDER BY clauses to filter and sort results:

-- Get engineers earning more than $80,000
SELECT * FROM employees
WHERE department = 'Engineering'
  AND salary > 80000
ORDER BY salary DESC;

This query returns all engineers with a salary greater than $80,000, sorted from highest to lowest. The ORDER BY clause is optional but often used for readability and usability.

Aggregating Data

SQL provides powerful functions for summarizing data. For example, to calculate the average salary by department:

-- Average salary by department
SELECT department, AVG(salary) AS average_salary
FROM employees
GROUP BY department;

This query groups rows by department and computes the average salary for each. The AS keyword gives a name to the computed column.

Joining Tables

In real-world applications, data is often spread across multiple tables. Consider the following schema:

-- Create a departments table
CREATE TABLE departments (
    id INTEGER PRIMARY KEY,
    name TEXT NOT NULL
);

-- Update employees to reference departments
ALTER TABLE employees
ADD COLUMN department_id INTEGER;

-- Update existing records
UPDATE employees
SET department_id = 1
WHERE department = 'Engineering';

UPDATE employees
SET department_id = 2
WHERE department = 'Sales';

UPDATE employees
SET department_id = 3
WHERE department = 'HR';

Now, we can query the data using a JOIN:

-- Join employees with departments
SELECT e.first_name, e.last_name, d.name AS department
FROM employees e
JOIN departments d ON e.department_id = d.id;

This improves normalization and makes the data easier to maintain.

Using Subqueries and CTEs

Subqueries and Common Table Expressions (CTEs) help break down complex queries. For example, to find employees earning more than the average salary in their department:

-- Using a subquery
SELECT *
FROM employees
WHERE salary > (
    SELECT AVG(salary)
    FROM employees
    WHERE department_id = employees.department_id
);

Alternatively, using a CTE for better readability:

-- Using a CTE
WITH avg_salary_per_dept AS (
    SELECT department_id, AVG(salary) AS avg_salary
    FROM employees
    GROUP BY department_id
)
SELECT e.*
FROM employees e
JOIN avg_salary_per_dept a ON e.department_id = a.department_id
WHERE e.salary > a.avg_salary;

Writing Maintainable SQL

To keep your SQL code maintainable:

  • Use aliases for tables and columns.
  • Format consistently, especially for long queries.
  • Comment when necessary, especially for complex logic.
  • Avoid SELECT * in production code—name the columns you need.

Summary of Best Practices

PracticeReason
Use explicit column names | instead of SELECT *Reduces data transfer and avoids unexpected changes
Format SQL with consistent indentationImproves readability and collaboration
Use CTEs for complex queriesEnhances clarity and modularity
Normalize database designReduces redundancy and improves data integrity

Learn more with useful resources