"Xorthax isn’t just interested in who’s first—he wants to know how far ahead they are in the galactic rankings. Whether it’s finding percentiles, calculating distributions, or creating tiered categories, these functions do the trick."
These functions calculate relative standing within a partition:
PERCENT_RANK()
gives a value from 0 to 1 indicating a row’s percentile rankCUME_DIST()
gives a value from 0 to 1 indicating the cumulative distributionNTILE(n)
splits rows into n
buckets (e.g. quartiles, deciles)Each helps understand "how far along" a row is within its ordered group.
PERCENT_RANK() OVER (PARTITION BY group ORDER BY metric)
CUME_DIST() OVER (PARTITION BY group ORDER BY metric)
NTILE(4) OVER (PARTITION BY group ORDER BY metric)
CustomerID | Region | OrderAmount |
---|---|---|
C1 | North | 100 |
C2 | North | 200 |
C3 | North | 200 |
C4 | North | 400 |
SELECT CustomerID, Region, OrderAmount,
PERCENT_RANK() OVER (PARTITION BY Region ORDER BY OrderAmount) AS PercentRank,
CUME_DIST() OVER (PARTITION BY Region ORDER BY OrderAmount) AS CumeDist,
NTILE(3) OVER (PARTITION BY Region ORDER BY OrderAmount) AS Tier
FROM Orders;
CustomerID | OrderAmount | PercentRank | CumeDist | Tier |
---|---|---|---|---|
C1 | 100 | 0.0 | 0.25 | 1 |
C2 | 200 | 0.333 | 0.50 | 2 |
C3 | 200 | 0.333 | 0.75 | 2 |
C4 | 400 | 1.0 | 1.0 | 3 |
🔍 How SQL Calculates These:
PERCENT_RANK()
is based on the formula: (rank - 1) / (total_rows - 1)
. Rows with the same value share a rank, so C2 and C3 have the same PercentRank.CUME_DIST()
reflects the proportion of rows less than or equal to the current row. For example, C3 is the 3rd of 4 rows (≤ 200), so its CUME_DIST is 0.75.NTILE(3)
divides the rows as evenly as possible into 3 buckets. With 4 rows, SQL assigns 1 row to the first and third groups, and 2 rows to the middle group—starting with the lowest values and filling tiers in order.PERCENT_RANK()
to answer: "What percentile is this row in?"CUME_DIST()
to answer: "What portion of the group is less than or equal to this row?"NTILE(n)
to bucket rows into equal groups for analysis (quartiles, deciles, etc.)❌ Using PERCENT_RANK()
expecting uniform distribution—it depends on row count and ties.
Example: With 4 rows, the rank values might be 1, 2, 2, and 4. Using the formula (rank - 1) / (total_rows - 1)
results in PercentRanks like 0.0, 0.333, 0.333, and 1.0—not a uniform spread. Tied values share the same rank, which compresses the distribution unevenly.
❌ Assuming NTILE(n)
gives exactly equal-sized groups when row counts don’t divide evenly
Example: With 4 rows and NTILE(3)
, SQL will assign the first row to Tier 1, the next two rows to Tier 2, and the last row to Tier 3. The groups aren't perfectly equal—some may have more rows than others depending on the row count and how evenly it divides.
ORDER BY
something meaningful in the window clauseWant to find all customers in the top 10% by revenue?
WITH Ranked AS (
SELECT *,
PERCENT_RANK() OVER (PARTITION BY Region ORDER BY OrderAmount DESC) AS pr
FROM Orders
)
SELECT * FROM Ranked WHERE pr <= 0.10;
🔍 Explanation: This query uses a Common Table Expression (CTE) to calculate PERCENT_RANK()
within each region, ordering by OrderAmount
from highest to lowest. Then, in the outer query, it filters the results to only include rows where the percentile rank is in the top 10% (i.e., pr <= 0.10
).
A CTE is required here because you can't filter directly on a window function like PERCENT_RANK()
in the same SELECT block where it's calculated—doing so would result in an error. The CTE allows you to evaluate the function first, then filter in the outer query.
Use NTILE()
or CUME_DIST()
to build segments for summary:
WITH Tiered AS (
SELECT *, NTILE(4) OVER (ORDER BY OrderAmount) AS Quartile
FROM Orders
)
SELECT Quartile, AVG(OrderAmount) AS AvgSpend
FROM Tiered
GROUP BY Quartile;
🔍 Explanation: This query uses NTILE(4)
to divide all rows into four approximate quartiles based on OrderAmount
. Then it calculates the average for each quartile using GROUP BY
. This makes it easy to compare how spending differs across tiers.
ROW_NUMBER()
for insight into row order(See also: RANK, Aggregate Window Functions, Frame Clauses)