📈 SQL NTILE, SQL PERCENT_RANK, and SQL CUME_DIST – Percentile & Bin Distribution Functions

This guide explains SQL NTILE, PERCENT_RANK, and CUME_DIST functions—essential tools for percentile and ranking analysis. Learn how to calculate SQL percentiles, create data bins, and compare cumulative rank using window functions.

🧪 Try NTILE, PERCENT_RANK, and CUME_DIST problems now! 🧪

🚀 Understanding Percentiles and Relative Position in SQL

"Xorthax isn’t just interested in who’s first—he wants to know how far ahead they are in the galactic rankings. Whether it’s finding percentiles, calculating distributions, or creating tiered categories, these functions do the trick."

🤔 What Are SQL PERCENT_RANK(), SQL CUME_DIST(), and SQL NTILE()?

These functions calculate relative standing within a partition:

PERCENT_RANK() gives a value from 0 to 1 indicating a row’s percentile rank
CUME_DIST() gives a value from 0 to 1 indicating the cumulative distribution
NTILE(n) splits rows into n buckets (e.g. quartiles, deciles)

Each helps understand "how far along" a row is within its ordered group.

🔧 Syntax Snapshot: How to Write NTILE, PERCENT_RANK, and CUME_DIST

SQL example: Syntax for PERCENT_RANK(), CUME_DIST(), and NTILE() window functions used to calculate percentile ranks, cumulative distribution, and n buckets within a group.

PERCENT_RANK() OVER (PARTITION BY group ORDER BY metric)
CUME_DIST()   OVER (PARTITION BY group ORDER BY metric)
NTILE(n)      OVER (PARTITION BY group ORDER BY metric)

📋 See SQL Percentile and Bin Functions in Action

🧾 Sample Data

CustomerID	Region	OrderAmount
C1	North	100
C2	North	200
C3	North	200
C4	North	400

🔎 Query Example

SQL example: Calculate the percentile rank, cumulative distribution, and quantile tier for each customer's order within their region using PERCENT_RANK(), CUME_DIST(), and NTILE().

SELECT CustomerID, Region, OrderAmount,
       PERCENT_RANK() OVER (PARTITION BY Region ORDER BY OrderAmount) AS PercentRank,
       CUME_DIST()    OVER (PARTITION BY Region ORDER BY OrderAmount) AS CumeDist,
       NTILE(3)       OVER (PARTITION BY Region ORDER BY OrderAmount) AS Tier
FROM Orders;

📊 Result

CustomerID	OrderAmount	PercentRank	CumeDist	Tier
C1	100	0.0	0.25	1
C2	200	0.333	0.50	2
C3	200	0.333	0.75	2
C4	400	1.0	1.0	3

🔍 How SQL Calculates These:

PERCENT_RANK() is based on the formula: (rank - 1) / (total_rows - 1). Rows with the same value share a rank, so C2 and C3 have the same PercentRank.
CUME_DIST() reflects the proportion of rows less than or equal to the current row. For example, C3 is the 3rd of 4 rows (≤ 200), so its CUME_DIST is 0.75.
NTILE(3) divides the rows as evenly as possible into 3 buckets. With 4 rows, SQL assigns 1 row to the first and third groups, and 2 rows to the middle group—starting with the lowest values and filling tiers in order.

🧭 When Should You Use NTILE, PERCENT_RANK, or CUME_DIST?

Use PERCENT_RANK() to answer: "What percentile is this row in?"
Use CUME_DIST() to answer: "What portion of the group is less than or equal to this row?"
Use NTILE(n) to bucket rows into equal groups for analysis (quartiles, deciles, etc.)

🧠 SQL Keywords That Hint You’re Working with Percentile or Distribution Functions

"Percentile"
"Top 10%"
"Quartile"
"How far into the group"
"Split into tiers"

⚠️ Common Pitfalls When Using NTILE and Percentile Calculations

❌ Using PERCENT_RANK() expecting uniform distribution—it depends on row count and ties.

Example: With 4 rows, the rank values might be 1, 2, 2, and 4. Using the formula (rank - 1) / (total_rows - 1) results in PercentRanks like 0.0, 0.333, 0.333, and 1.0—not a uniform spread. Tied values share the same rank, which compresses the distribution unevenly.
❌ Assuming NTILE(n) gives exactly equal-sized groups when row counts don’t divide evenly

Example: With 4 rows and NTILE(3), SQL will assign the first row to Tier 1, the next two rows to Tier 2, and the last row to Tier 3. The groups aren't perfectly equal—some may have more rows than others depending on the row count and how evenly it divides.
❌ Forgetting to ORDER BY something meaningful in the window clause

🔄 Using CTEs to Filter or Group SQL Percentile and Bin Results

Want to find all customers in the top 10% by revenue?

SQL example: Use a CTE and PERCENT_RANK() to filter for the top 10% of customers by order amount within each region.

WITH Ranked AS (
  SELECT *,
         PERCENT_RANK() OVER (PARTITION BY Region ORDER BY OrderAmount DESC) AS pr
  FROM Orders
)
SELECT * FROM Ranked WHERE pr <= 0.10;

🔍 Explanation: This query uses a Common Table Expression (CTE) to calculate PERCENT_RANK() within each region, ordering by OrderAmount from highest to lowest. Then, in the outer query, it filters the results to only include rows where the percentile rank is in the top 10% (i.e., pr <= 0.10).

A CTE is required here because you can't filter directly on a window function like PERCENT_RANK() in the same SELECT block where it's calculated—doing so would result in an error. The CTE allows you to evaluate the function first, then filter in the outer query.

➕ Combine NTILE, PERCENT_RANK, and CUME_DIST with Aggregates

Use NTILE() or CUME_DIST() to build segments for summary:

SQL example: Use NTILE(4) to divide orders into quartiles, then calculate the average spend per quartile using GROUP BY and AVG().

WITH Tiered AS (
  SELECT *, NTILE(4) OVER (ORDER BY OrderAmount) AS Quartile
  FROM Orders
)
SELECT Quartile, AVG(OrderAmount) AS AvgSpend
FROM Tiered
GROUP BY Quartile;

🔍 Explanation: This query uses NTILE(4) to divide all rows into four approximate quartiles based on OrderAmount. Then it calculates the average for each quartile using GROUP BY. This makes it easy to compare how spending differs across tiers. Like the CTE example above, you cannot aggregate a window function inside the CTE

👉 Remember, you can’t combine the aggregate and window function in the same SELECT because window functions run after aggregation — they need the detailed row context first, so the CTE separates those steps.

🧰 Debugging Tips for SQL Percentile and Distribution Functions

Watch how ties affect percentile placement — Tied values can compress the range of ranks and distort spacing. Be cautious when interpreting distributions with many duplicates.
Add ROW_NUMBER() for insight into row order
Use small datasets first to test edge behavior

🧪 Practice Problems by Concept

🔗 Related Topics

More Resources by Function and Dialect

Want to explore all window functions across every SQL dialect? Visit the full SQL Dialects & Window Functions Documentation.