🧠 How Window Functions Work: The Fundamentals

This tutorial explains how SQL window functions work, including key concepts like PARTITION BY, ORDER BY, and frame clauses. You'll learn how to rank rows, calculate running totals, and apply powerful logic without collapsing your data.


πŸš€ What Are Window Functions, Really?

Window functions compute values across a set of rows that are somehow related to the current row, without collapsing the result set. They help you compare, rank, or accumulate across rows while preserving individual detail.

Unlike aggregate functions (which group and collapse data), window functions add extra insight per row.

πŸ”§ Syntax Snapshot: Core Anatomy

FUNCTION(...) OVER (
  PARTITION BY ...
  ORDER BY ...
  [ROWS or RANGE BETWEEN ...]
)

Each clause plays a specific role:

  • PARTITION BY β†’ divides the dataset into groups
  • ORDER BY β†’ defines how rows are sequenced within those groups
  • ROWS BETWEEN β†’ defines which rows to include in each calculation

πŸ” PARTITION BY Demystified

Think of PARTITION BY as creating mini subgroups within your data. Each group resets the window function logic.

🧾 Sample Data

CustomerID OrderDate Amount
C1 2023-01-01 100
C1 2023-01-10 200
C2 2023-01-01 300
C2 2023-01-15 400

πŸ”„ Comparison: With vs Without PARTITION BY

SQL example: Compare ROW_NUMBER() with and without PARTITION BY to see how row numbering behaves globally vs. within each customer’s group.
SELECT CustomerID, OrderDate, Amount,
       ROW_NUMBER() OVER (ORDER BY OrderDate) AS RowNum_All,
       ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS RowNum_ByCustomer
FROM Orders;
CustomerID OrderDate Amount RowNum_All RowNum_ByCustomer
C1 2023-01-01 100 1 1
C2 2023-01-01 300 2 1
C1 2023-01-10 200 3 2
C2 2023-01-15 400 4 2

πŸ“Œ Notice how PARTITION BY resets the row numbering inside each group.

  • PARTITION BY CustomerID restarts the function for each customer
  • Without it, the window applies across all rows

πŸ“Œ Use PARTITION BY when you want results relative to a subgroup (e.g., a customer’s order rank).


πŸ“ ORDER BY Inside the Window

ORDER BY determines the sequence the function uses inside each partition.

  • Example: ORDER BY OrderDate means "oldest to newest"
  • Omitting it? You may get unpredictable or default behavior

Changing the ORDER changes the meaning. It defines what β€œfirst”, β€œlast”, or β€œnext” really means.


πŸͺŸ Understanding Frame Clauses

A frame clause controls which rows are visible to the window function for each row.

Common Frame Patterns:

  • ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW β†’ running total
  • ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING β†’ forward-looking window
  • ROWS BETWEEN 2 PRECEDING AND CURRENT ROW β†’ moving average (3-row)

🧾 Sample Data

OrderDate Amount
2023-01-01 100
2023-01-05 200
2023-01-10 300
2023-01-15 400

πŸ”Ž Query Example

SQL example: Use SUM() and AVG() with different frame clauses to calculate a cumulative total (past), a future-facing total, and a 3-row moving average based on OrderDate.
SELECT OrderDate, Amount,
       SUM(Amount) OVER (ORDER BY OrderDate
                         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal,
       SUM(Amount) OVER (ORDER BY OrderDate
                         ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS FutureTotal,
       AVG(Amount) OVER (ORDER BY OrderDate
                         ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS MovingAvg3
FROM Orders;

πŸ“Š Result

OrderDate Amount RunningTotal FutureTotal MovingAvg3
2023-01-01 100 100 1000 100.0
2023-01-05 200 300 900 150.0
2023-01-10 300 600 700 200.0
2023-01-15 400 1000 400 300.0

ROWS BETWEEN vs RANGE BETWEEN

  • ROWS = includes a specific number of rows relative to the current row
  • RANGE = includes rows based on values being equal to the current row’s ORDER BY value

Example:

If OrderDate has duplicates:

OrderDate Amount
2023-01-01 100
2023-01-01 200
2023-01-02 300

Using ROWS BETWEEN CURRENT ROW AND CURRENT ROW only includes one row. Using RANGE BETWEEN CURRENT ROW AND CURRENT ROW includes both rows with 2023-01-01.

πŸ” Use RANGE when you want to include all rows with matching ORDER BY values.

πŸ“Œ ROWS is more precise for counting.
πŸ“Œ RANGE is better for grouped value comparisons.

Default behavior varies by function:

  • Ranking functions often don’t need explicit frames
  • LAST_VALUE() usually does


πŸ“¦ Why Use CTEs with Window Functions?

A Common Table Expression (CTE) is often necessary when you want to filter or manipulate the results of a window function.

Why?

You can't directly reference a window function result in the WHERE clause or reuse it in the same SELECT β€” SQL doesn't allow that. But by putting it in a CTE, you can treat the computed column as a normal field.

🧾 Example

Find customers with the top 10% order amounts:

SQL example: Use a CTE and PERCENT_RANK() to rank orders within each region, then filter for the top 10% of rows by amount.
WITH Ranked AS (
  SELECT *,
         PERCENT_RANK() OVER (PARTITION BY Region ORDER BY Amount DESC) AS pr
  FROM Orders
)
SELECT * FROM Ranked WHERE pr <= 0.10;

πŸ“Œ The window function runs inside the CTE, and the outer query can safely filter on it.


⚠️ Common Mistakes and How to Spot Them

  • ❌ Forgetting ORDER BY β†’ RANK() will return 1 for all rows. Without ordering, many functions treat all rows as ties.
  • ❌ Applying filters in WHERE after window logic β†’ may exclude needed rows. You must use CTEs or subqueries to filter after computing window results.
  • ❌ Expecting default behavior with frames β†’ especially LAST_VALUE() surprises. If you don’t specify a frame, you might get unexpected results like the current row instead of the actual last row.
  • ❌ Using aggregates when a window function was needed (collapses rows!) β†’ Aggregates like SUM() without OVER() eliminate row detail; window versions keep all rows.

πŸ’‘ Tips for Learning and Debugging

  • Use ROW_NUMBER() to test sorting logic
  • Always specify ORDER BY, even when optional
  • Try SELECT * alongside your window to verify inputs
  • Start small (2-5 rows) to visually trace the function

Bootstrap JS is working!