Learn how to solve Gap and Island SQL problems using window functions like ROW_NUMBER, LAG, and LEAD — an advanced SQL pattern for analyzing consecutive dates, detecting streaks, and answering popular interview questions with clear examples and practice problems.
Learning how to solve the Gap and Island SQL problem is essential for anyone preparing for SQL interview questions or working with time-based data. These challenges test advanced SQL queries beyond simple joins and aggregations to analyze consecutive dates, streaks, and gaps between events.
In real-world analytics, this skill helps you tackle complex SQL queries such as:
ROW_NUMBER(), LAG(), and LEAD() to uncover hidden streak analysis patterns.By mastering these SQL gaps and islands techniques, you’ll gain the ability to analyze churn, reliability, and consistency — key insights in data analytics, performance tracking, and operations.
The Gap and Island SQL problem refers to a family of analytical queries that uncover consecutive dates, streaks of activity, or gaps between events — frequent SQL interview questions and real-world analytics challenges. These problems are a favorite in interviews because they test how well you can use SQL window functions to work with sequential data. Let’s look at both types:
A Gap problem asks you to measure the break or inactivity period between consecutive rows, which is classic advanced SQL work that often appears as complex SQL queries or in technical interviews. Using LAG() and LEAD(), you compare the current row’s date or sequence number with the previous one to calculate how much time has passed — a core concept in SQL streak analysis.
An Island problem flips the idea: instead of measuring breaks, you look for continuous streaks using a slightly different Gap and Island SQL pattern (ROW_NUMBER with a date or sequence grain).
We’ll walk through both a Gap and an Island problem, showing how window functions like ROW_NUMBER, LAG, and LEAD power advanced SQL queries step by step.
Gap & Island: Step-by-StepA practical 4-step workflow using ROW_NUMBER, LAG, and LEAD.
::date, DATE_TRUNC(...)) and de-dupe if needed.LAG(date) to get the previous row’s date.ROW_NUMBER() to assign a running index.gap_days = date - prev_date.grp = date - rn * interval '1 day' (or week/month).WHERE rn = 1).MIN/MAX/COUNT), then rank to choose the longest streak.Let's see what that looks like for a gap problem!
"Xorthax suspects some of his loyal customers disappear for long stretches. For each customer, find the longest gap between two consecutive orders."
Create a time-grained series or group by the relevant grain.
WITH ordered AS (
SELECT DISTINCT
customerid,
orderdate::date AS orderdate
FROM orders
),
If we were to select * from ordered we would get results that look like:
| CustomerID | OrderDate |
|---|---|
| 1 | 2245-01-02 |
| 1 | 2245-01-03 |
| 1 | 2245-01-05 |
| 2 | 2245-01-20 |
| 2 | 2245-01-15 |
Use LAG() partitioned by entity and ordered by date.
sequenced AS (
SELECT
customerid,
orderdate,
LAG(orderdate) OVER (
PARTITION BY customerid
ORDER BY orderdate
) AS prevdate
FROM ordered
),
The output from that CTE would look like this:
| CustomerID | OrderDate | PrevDate |
|---|---|---|
| 1 | 2245-01-02 | NULL |
| 1 | 2245-01-03 | 2245-01-02 |
| 1 | 2245-01-05 | 2245-01-03 |
| 2 | 2245-01-15 | NULL |
| 2 | 2245-01-20 | 2245-01-15 |
Compute the gap.
gaps AS (
SELECT
customerid,
(orderdate - prevdate) AS gapdays,
prevdate AS gapstart,
orderdate AS gapend
FROM sequenced
WHERE prev_date IS NOT NULL
),
The output from "gaps" could look like:
| CustomerID | GapDays | GapStart | GapEnd |
|---|---|---|---|
| 1 | 1 | 2245-01-02 | 2245-01-03 |
| 1 | 2 | 2245-01-03 | 2245-01-05 |
| 2 | 5 | 2245-01-15 | 2245-01-20 |
| 3 | 7 | 2245-02-01 | 2245-02-08 |
| 3 | 3 | 2245-02-08 | 2245-02-11 |
Pick the desired gap per entity (in this case longest gap).
ranked AS (
SELECT g.*, ROW_NUMBER() OVER (
PARTITION BY customerid ORDER BY gapdays DESC, gapend DESC
) AS rn
FROM gaps g
)
SELECT customerid, gapdays AS longestgapdays, gapstart, gapend
FROM ranked
WHERE rn = 1;
This is what the final output would look like:
| CustomerID | LongestGapDays | GapStart | GapEnd |
|---|---|---|---|
| 1 | 2 | 2245-01-03 | 2245-01-05 |
| 2 | 5 | 2245-01-15 | 2245-01-20 |
| 3 | 7 | 2245-02-01 | 2245-02-08 |
WITH monthly AS (
SELECT
o.customerid,
DATE_TRUNC('month', o.orderdate)::date AS MonthStart
FROM orders o
GROUP BY o.customerid, DATE_TRUNC('month', o.orderdate)::date
),
Numbered AS (
SELECT
Category,
MonthStart,
ROW_NUMBER() OVER (PARTITION BY Category ORDER BY MonthStart) AS rn
FROM monthly
),
grouped AS (
SELECT
customerid,
MonthStart,
MonthStart - (rn * INTERVAL '1 month') AS grp
FROM numbered
),
streaks AS (
SELECT
customerid,
MIN(MonthStart) AS StartMonth,
MAX(MonthStart) AS EndMonth,
COUNT(*) AS MonthCount
FROM grouped
GROUP BY customerid, grp
),
ranked AS (
SELECT
s.*,
ROW_NUMBER() OVER (
PARTITION BY customerid
ORDER BY monthcount DESC, endmonth DESC
) AS rnk
FROM streaks s
)
SELECT
customerid,
startmonth,
endmonth,
monthcount AS longest_streak_months
FROM ranked
WHERE rnk = 1
ORDER BY customerid;
Now let's walk through an Island problem ...
"Xorthax wants to celebrate his most consistent shoppers. For each customer, find the longest streak of consecutive months in which they placed at least one order."
WITH monthly AS (
SELECT
o.customerid,
DATE_TRUNC('month', o.orderdate)::date AS MonthStart
FROM orders o
GROUP BY o.customerid, DATE_TRUNC('month', o.orderdate)::date
),
Here is what the output of step 1 would look like:
| CustomerID | MonthStart |
|---|---|
| 1 | 2245-01-01 |
| 1 | 2245-02-01 |
| 1 | 2245-03-01 |
| 2 | 2245-01-01 |
| 2 | 2245-03-01 |
Use a window ordered by date, partitioned by entity.
Numbered AS (
SELECT
Category,
MonthStart,
ROW_NUMBER() OVER (PARTITION BY Category ORDER BY MonthStart) AS rn
FROM monthly
),
The output from the second CTE would look like:
| CustomerID | MonthStart | RN |
|---|---|---|
| 1 | 2245-01-01 | 1 |
| 1 | 2245-02-01 | 2 |
| 1 | 2245-03-01 | 3 |
| 2 | 2245-01-01 | 1 |
| 2 | 2245-03-01 | 2 |
grouped AS (
SELECT
customerid,
MonthStart,
MonthStart - (rn * INTERVAL '1 month') AS grp
FROM numbered
),
Here is the sample output from grouped:
| CustomerID | MonthStart | Grp |
|---|---|---|
| 1 | 2245-01-01 | 2245-01-01 |
| 1 | 2245-02-01 | 2245-01-01 |
| 1 | 2245-03-01 | 2245-01-01 |
| 2 | 2245-01-01 | 2245-01-01 |
| 2 | 2245-03-01 | 2245-02-01 |
streaks AS (
SELECT
customerid,
MIN(MonthStart) AS StartMonth,
MAX(MonthStart) AS EndMonth,
COUNT(*) AS MonthCount
FROM grouped
GROUP BY customerid, grp
),
The results from streaks look like this:
| CustomerID | StartMonth | EndMonth | MonthCount |
|---|---|---|---|
| 1 | 2245-01-01 | 2245-03-01 | 3 |
| 2 | 2245-01-01 | 2245-01-01 | 1 |
| 2 | 2245-03-01 | 2245-03-01 | 1 |
| 3 | 2245-02-01 | 2245-04-01 | 3 |
| 3 | 2245-05-01 | 2245-06-01 | 2 |
ranked AS (
SELECT
s.*,
ROW_NUMBER() OVER (
PARTITION BY customerid
ORDER BY monthcount DESC, endmonth DESC
) AS rnk
FROM streaks s
)
SELECT
customerid,
startmonth,
endmonth,
monthcount AS longest_streak_months
FROM ranked
WHERE rnk = 1
ORDER BY customerid;
The final results would look like this:
| CustomerID | StartMonth | EndMonth | LongestStreakMonths |
|---|---|---|---|
| 1 | 2245-01-01 | 2245-03-01 | 3 |
| 2 | 2245-01-01 | 2245-01-01 | 1 |
| 3 | 2245-02-01 | 2245-04-01 | 3 |
| 4 | 2245-01-01 | 2245-02-01 | 2 |
| 5 | 2245-03-01 | 2245-05-01 | 3 |
WITH monthly AS (
SELECT
o.customerid,
DATE_TRUNC('month', o.orderdate)::date AS month_start
FROM orders o
GROUP BY o.customerid, DATE_TRUNC('month', o.orderdate)::date
),
numbered AS (
SELECT
customerid,
month_start,
ROW_NUMBER() OVER (
PARTITION BY customerid
ORDER BY month_start
) AS rn
FROM monthly
),
grouped AS (
SELECT
customerid,
month_start,
-- Stable group key: shift each month back by rn months
month_start - (rn * INTERVAL '1 month') AS grp
FROM numbered
),
streaks AS (
SELECT
customerid,
MIN(month_start) AS start_month,
MAX(month_start) AS end_month,
COUNT(*) AS month_count
FROM grouped
GROUP BY customerid, grp
),
ranked AS (
SELECT
s.*,
ROW_NUMBER() OVER (
PARTITION BY customerid
ORDER BY month_count DESC, end_month DESC
) AS rnk
FROM streaks s
)
SELECT
customerid,
start_month,
end_month,
month_count AS longest_streak_months
FROM ranked
WHERE rnk = 1
ORDER BY customerid;;
Once you understand the Gap and Island SQL pattern, the next step is choosing the right SQL window function to solve complex SQL queries on consecutive dates, streaks, and gaps.
Use this quick guide to decide which function helps you analyze consecutive dates, streaks, and gaps between events.LAG() (or LEAD() for “next”).ROW_NUMBER() + date grain + normalized key.RANK() or DENSE_RANK() to find the top sequence per entity.Each function supports a different angle of SQL streak analysis — finding breaks, identifying runs, or ranking their lengths. If you want the syntax-level breakdowns and typical combinations, expand the section below.
prev_date, then compute date - prev_date.LAG().LAG().rn).date - rn * interval, so consecutive rows fall in the same island.RANK() penalizes ties (1,2,2,4), DENSE_RANK() doesn’t (1,2,2,3).ROW_NUMBER() to break ties deterministically for “Top 1 per entity.”PARTITION BY the entity you care about.ORDER BY a deterministic key (date + tiebreaker).LAG(date) → compute gap_days → ROW_NUMBER() to pick max per entity.ROW_NUMBER() → normalize (date - rn * interval) → group/aggregate → rank longest streak.These pitfalls frequently appear in SQL interview questions and real-world projects, making it essential to understand how to troubleshoot and explain your Gap and Island SQL solutions clearly.
Symptom: “My streaks skip months / gaps look huge.”
Fix: Explicitly set grain:
::date,
DATE_TRUNC('week', ...),
DATE_TRUNC('month', ...).
Symptom: Different answers run-to-run.
Fix: Add a tie-breaker:
ORDER BY orderdate, orderid.
Symptom: Zero-day gaps; broken islands.
Fix: De-dup before windows
(SELECT DISTINCT customerid, orderdate::date)
or keep one row via ROW_NUMBER() and filter
(e.g., QUALIFY rn = 1 or a subquery).
Symptom: Islands undercounted because dates with no rows vanish.
Fix: For rules like “every day/week”, build a calendar/time series and join, then apply the condition.
Symptom: Streak length miscounts by 1.
Fix: Be explicit: for monthly streaks, count rows; for day streaks,
MAX(date) - MIN(date) + 1 only if every day is present.
Symptom: Streaks drop unexpectedly.
Fix: Apply the same filter in every CTE
(WHERE status = 'Completed'), not just the last step.
Symptom: Streaks/gaps bleed across entities.
Fix: Double-check PARTITION BY (customer vs supplier vs category)
matches the question.
Symptom: Consecutive months still split.
Fix: Create rn at the same grain you normalized (e.g., month),
then grp = month_start - rn * interval '1 month'.
Symptom: Gaps differ by 1 day around midnight.
Fix: Normalize to date (or a consistent timezone) before windows.
Symptom: Not the most recent or not truly the max.
Fix: Rank inside entity:
ORDER BY month_count DESC, end_month DESC (or your tie-break),
then filter to the top (e.g., WHERE rnk = 1).
Symptom: Slow queries on big tables.
Fix: Pre-filter date range; index
(customerid, orderdate); avoid unnecessary DISTINCTs; materialize intermediate CTEs if needed.
Here are some problems you can try right away:
See also: LEAD / LAG, ROWNUMBER() & RANK()