🧠 How Window Functions Work: The Fundamentals


πŸš€ What Are Window Functions, Really?

Window functions compute values across a set of rows that are somehow related to the current row, without collapsing the result set. They help you compare, rank, or accumulate across rows while preserving individual detail.

Unlike aggregate functions (which group and collapse data), window functions add extra insight per row.

πŸ”§ Syntax Snapshot: Core Anatomy

FUNCTION(...) OVER (
  PARTITION BY ...
  ORDER BY ...
  [ROWS or RANGE BETWEEN ...]
)

Each clause plays a specific role:

πŸ” PARTITION BY Demystified

Think of PARTITION BY as creating mini subgroups within your data. Each group resets the window function logic.

🧾 Sample Data

CustomerID OrderDate Amount
C1 2023-01-01 100
C1 2023-01-10 200
C2 2023-01-01 300
C2 2023-01-15 400

πŸ”„ Comparison: With vs Without PARTITION BY

SELECT CustomerID, OrderDate, Amount,
       ROW_NUMBER() OVER (ORDER BY OrderDate) AS RowNum_All,
       ROW_NUMBER() OVER (PARTITION BY CustomerID ORDER BY OrderDate) AS RowNum_ByCustomer
FROM Orders;
CustomerID OrderDate Amount RowNum_All RowNum_ByCustomer
C1 2023-01-01 100 1 1
C2 2023-01-01 300 2 1
C1 2023-01-10 200 3 2
C2 2023-01-15 400 4 2

πŸ“Œ Notice how PARTITION BY resets the row numbering inside each group.

πŸ“Œ Use PARTITION BY when you want results relative to a subgroup (e.g., a customer’s order rank).


πŸ“ ORDER BY Inside the Window

ORDER BY determines the sequence the function uses inside each partition.

Changing the ORDER changes the meaning. It defines what β€œfirst”, β€œlast”, or β€œnext” really means.


πŸͺŸ Understanding Frame Clauses

A frame clause controls which rows are visible to the window function for each row.

Common Frame Patterns:

🧾 Sample Data

OrderDate Amount
2023-01-01 100
2023-01-05 200
2023-01-10 300
2023-01-15 400

πŸ”Ž Query Example

SELECT OrderDate, Amount,
       SUM(Amount) OVER (ORDER BY OrderDate
                         ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS RunningTotal,
       SUM(Amount) OVER (ORDER BY OrderDate
                         ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS FutureTotal,
       AVG(Amount) OVER (ORDER BY OrderDate
                         ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) AS MovingAvg3
FROM Orders;

πŸ“Š Result

OrderDate Amount RunningTotal FutureTotal MovingAvg3
2023-01-01 100 100 1000 100.0
2023-01-05 200 300 900 150.0
2023-01-10 300 600 700 200.0
2023-01-15 400 1000 400 300.0

ROWS BETWEEN vs RANGE BETWEEN

Example:

If OrderDate has duplicates:

OrderDate Amount
2023-01-01 100
2023-01-01 200
2023-01-02 300

Using ROWS BETWEEN CURRENT ROW AND CURRENT ROW only includes one row. Using RANGE BETWEEN CURRENT ROW AND CURRENT ROW includes both rows with 2023-01-01.

πŸ” Use RANGE when you want to include all rows with matching ORDER BY values.

πŸ“Œ ROWS is more precise for counting.
πŸ“Œ RANGE is better for grouped value comparisons.

Default behavior varies by function:



πŸ“¦ Why Use CTEs with Window Functions?

A Common Table Expression (CTE) is often necessary when you want to filter or manipulate the results of a window function.

Why?

You can't directly reference a window function result in the WHERE clause or reuse it in the same SELECT β€” SQL doesn't allow that. But by putting it in a CTE, you can treat the computed column as a normal field.

🧾 Example

Find customers with the top 10% order amounts:

WITH Ranked AS (
  SELECT *,
         PERCENT_RANK() OVER (PARTITION BY Region ORDER BY Amount DESC) AS pr
  FROM Orders
)
SELECT * FROM Ranked WHERE pr <= 0.10;

πŸ“Œ The window function runs inside the CTE, and the outer query can safely filter on it.


⚠️ Common Mistakes and How to Spot Them


πŸ’‘ Tips for Learning and Debugging