CUME_DIST in Spark SQL

This page is a quick reference checkpoint for CUME_DIST in Spark SQL: behavior, syntax rules, edge cases, and a minimal example; plus the official vendor documentation.


Function Details

CUME_DIST returns the cumulative distribution, showing the proportion of rows with values less than or equal to the current row.

Returns the cumulative distribution within the window's ORDER BY.

If this behavior feels unintuitive, the tutorial below explains the underlying pattern step-by-step.

`CUME_DIST()` takes no arguments and must be used with an OVER clause.

SELECT category, amount, CUME_DIST() OVER (PARTITION BY category ORDER BY amount) AS cume_dist FROM sales;

What should you do next?

If you came here to confirm syntax, you’re done. If you came here to get better at window functions, choose your next step.

Understand the pattern

CUME_DIST is part of a bigger window-function pattern. If you want the “why”, start here: Percentile Distribution

Prove it with a real query

Reading docs is useful. Writing the query correctly under pressure is the skill.

Cumulative Spending by Species

Support Status

  • Supported: yes
  • Minimum Version: 1.4

Official Documentation

For the authoritative spec, use the vendor docs. This page is the fast “sanity check”.

View Spark SQL Documentation →

Looking for more functions across all SQL dialects? Visit the full SQL Dialects & Window Functions Documentation.