📈 Percentile & Distribution Functions: PERCENT_RANK(), CUME_DIST(), and NTILE()


🚀 Understanding Relative Position in SQL

"Xorthax isn’t just interested in who’s first—he wants to know how far ahead they are in the galactic rankings. Whether it’s finding percentiles, calculating distributions, or creating tiered categories, these functions do the trick."


🤔 What Are PERCENT_RANK(), CUME_DIST(), and NTILE()?

These functions calculate relative standing within a partition:

Each helps understand "how far along" a row is within its ordered group.


🔧 Syntax Snapshot

PERCENT_RANK() OVER (PARTITION BY group ORDER BY metric)
CUME_DIST()   OVER (PARTITION BY group ORDER BY metric)
NTILE(4)      OVER (PARTITION BY group ORDER BY metric)

📋 See It in Action

🧾 Sample Data

CustomerID Region OrderAmount
C1 North 100
C2 North 200
C3 North 200
C4 North 400

🔎 Query Example

SELECT CustomerID, Region, OrderAmount,
       PERCENT_RANK() OVER (PARTITION BY Region ORDER BY OrderAmount) AS PercentRank,
       CUME_DIST()    OVER (PARTITION BY Region ORDER BY OrderAmount) AS CumeDist,
       NTILE(3)       OVER (PARTITION BY Region ORDER BY OrderAmount) AS Tier
FROM Orders;

📊 Result

CustomerID OrderAmount PercentRank CumeDist Tier
C1 100 0.0 0.25 1
C2 200 0.333 0.50 2
C3 200 0.333 0.75 2
C4 400 1.0 1.0 3

🔍 How SQL Calculates These:


🧭 When Should You Use Each?


🧠 Keywords That Hint You're in the Right Place


⚠️ Common Pitfalls


🔄 CTEs + Distribution Filtering

Want to find all customers in the top 10% by revenue?

WITH Ranked AS (
  SELECT *,
         PERCENT_RANK() OVER (PARTITION BY Region ORDER BY OrderAmount DESC) AS pr
  FROM Orders
)
SELECT * FROM Ranked WHERE pr <= 0.10;

🔍 Explanation: This query uses a Common Table Expression (CTE) to calculate PERCENT_RANK() within each region, ordering by OrderAmount from highest to lowest. Then, in the outer query, it filters the results to only include rows where the percentile rank is in the top 10% (i.e., pr <= 0.10).

A CTE is required here because you can't filter directly on a window function like PERCENT_RANK() in the same SELECT block where it's calculated—doing so would result in an error. The CTE allows you to evaluate the function first, then filter in the outer query.


➕ Combine with Aggregates

Use NTILE() or CUME_DIST() to build segments for summary:

WITH Tiered AS (
  SELECT *, NTILE(4) OVER (ORDER BY OrderAmount) AS Quartile
  FROM Orders
)
SELECT Quartile, AVG(OrderAmount) AS AvgSpend
FROM Tiered
GROUP BY Quartile;

🔍 Explanation: This query uses NTILE(4) to divide all rows into four approximate quartiles based on OrderAmount. Then it calculates the average for each quartile using GROUP BY. This makes it easy to compare how spending differs across tiers.

🧰 Debugging Tips



🔗 Related Topics

(See also: RANK, Aggregate Window Functions, Frame Clauses)