This guide explains how to tackle SQL Gap and Island problems. These are questions that focus on finding streaks of consecutive activity or gaps between events. You’ll learn how to apply window functions such as ROW_NUMBER
, LAG
, and LEAD
to identify patterns, measure breaks, and analyze sequences in your data sets.
Learning Gap and Island problems is essential because they appear frequently in SQL interviews and mirror real business challenges. Interviewers love them because they test whether you can go beyond simple aggregations and think about sequences over time. In practice, these problems help you:
By mastering this family of problems, you’ll be ready to analyze churn, reliability, and consistency in datasets — skills that are valuable in analytics, product monitoring, and operations.
A Gap problem asks you to measure the break between two consecutive events. Using window functions, especially LAG()
, you compare the current row’s date (or sequence number) with the previous one to find how much time elapsed.
An Island problem asks you to find stretches of consecutive events that meet a certain condition. Instead of measuring the breaks, you identify uninterrupted streaks (the islands) of activity, often by combining ROW_NUMBER()
with dates or sequence numbers to group consecutive rows together.
We’ll walk through both a Gap problem and an Island problem, showing how window functions like ROW_NUMBER
, LAG
, and LEAD
are used step by step.
Gap & Island: Step-by-StepA practical 4-step workflow using ROW_NUMBER, LAG, and LEAD.
::date
, DATE_TRUNC(...)
) and de-dupe if needed.LAG(date)
to get the previous row’s date.ROW_NUMBER()
to assign a running index.gap_days = date - prev_date
.grp = date - rn * interval '1 day'
(or week/month).WHERE rn = 1
).MIN
/MAX
/COUNT
), then rank to choose the longest streak.Let's see what that looks like for a gap problem!
"Xorthax suspects some of his loyal customers disappear for long stretches. For each customer, find the longest gap between two consecutive orders."
Create a time-grained series or group by the relevant grain.
WITH ordered AS (
SELECT DISTINCT
customerid,
orderdate::date AS orderdate
FROM orders
),
If we were to select * from ordered
we would get results that look like:
CustomerID | OrderDate |
---|---|
1 | 2245-01-02 |
1 | 2245-01-03 |
1 | 2245-01-05 |
2 | 2245-01-20 |
2 | 2245-01-15 |
Use LAG()
partitioned by entity and ordered by date.
sequenced AS (
SELECT
customerid,
orderdate,
LAG(orderdate) OVER (
PARTITION BY customerid
ORDER BY orderdate
) AS prevdate
FROM ordered
),
The output from that CTE would look like this:
CustomerID | OrderDate | PrevDate |
---|---|---|
1 | 2245-01-02 | NULL |
1 | 2245-01-03 | 2245-01-02 |
1 | 2245-01-05 | 2245-01-03 |
2 | 2245-01-15 | NULL |
2 | 2245-01-20 | 2245-01-15 |
Compute the gap.
gaps AS (
SELECT
customerid,
(orderdate - prevdate) AS gapdays,
prevdate AS gapstart,
orderdate AS gapend
FROM sequenced
WHERE prev_date IS NOT NULL
),
The output from "gaps" could look like:
CustomerID | GapDays | GapStart | GapEnd |
---|---|---|---|
1 | 1 | 2245-01-02 | 2245-01-03 |
1 | 2 | 2245-01-03 | 2245-01-05 |
2 | 5 | 2245-01-15 | 2245-01-20 |
3 | 7 | 2245-02-01 | 2245-02-08 |
3 | 3 | 2245-02-08 | 2245-02-11 |
Pick the desired gap per entity (in this case longest gap).
ranked AS (
SELECT g.*, ROW_NUMBER() OVER (
PARTITION BY customerid ORDER BY gapdays DESC, gapend DESC
) AS rn
FROM gaps g
)
SELECT customerid, gapdays AS longestgapdays, gapstart, gapend
FROM ranked
WHERE rn = 1;
This is what the final output would look like:
CustomerID | LongestGapDays | GapStart | GapEnd |
---|---|---|---|
1 | 2 | 2245-01-03 | 2245-01-05 |
2 | 5 | 2245-01-15 | 2245-01-20 |
3 | 7 | 2245-02-01 | 2245-02-08 |
WITH monthly AS (
SELECT
o.customerid,
DATE_TRUNC('month', o.orderdate)::date AS MonthStart
FROM orders o
GROUP BY o.customerid, DATE_TRUNC('month', o.orderdate)::date
),
Numbered AS (
SELECT
Category,
MonthStart,
ROW_NUMBER() OVER (PARTITION BY Category ORDER BY MonthStart) AS rn
FROM monthly
),
grouped AS (
SELECT
customerid,
MonthStart,
MonthStart - (rn * INTERVAL '1 month') AS grp
FROM numbered
),
streaks AS (
SELECT
customerid,
MIN(MonthStart) AS StartMonth,
MAX(MonthStart) AS EndMonth,
COUNT(*) AS MonthCount
FROM grouped
GROUP BY customerid, grp
),
ranked AS (
SELECT
s.*,
ROW_NUMBER() OVER (
PARTITION BY customerid
ORDER BY monthcount DESC, endmonth DESC
) AS rnk
FROM streaks s
)
SELECT
customerid,
startmonth,
endmonth,
monthcount AS longest_streak_months
FROM ranked
WHERE rnk = 1
ORDER BY customerid;
Now let's walk through an Island problem ...
"Xorthax wants to celebrate his most consistent shoppers. For each customer, find the longest streak of consecutive months in which they placed at least one order."
WITH monthly AS (
SELECT
o.customerid,
DATE_TRUNC('month', o.orderdate)::date AS MonthStart
FROM orders o
GROUP BY o.customerid, DATE_TRUNC('month', o.orderdate)::date
),
Here is what the output of step 1 would look like:
CustomerID | MonthStart |
---|---|
1 | 2245-01-01 |
1 | 2245-02-01 |
1 | 2245-03-01 |
2 | 2245-01-01 |
2 | 2245-03-01 |
Use a window ordered by date, partitioned by entity.
Numbered AS (
SELECT
Category,
MonthStart,
ROW_NUMBER() OVER (PARTITION BY Category ORDER BY MonthStart) AS rn
FROM monthly
),
The output from the second CTE would look like:
CustomerID | MonthStart | RN |
---|---|---|
1 | 2245-01-01 | 1 |
1 | 2245-02-01 | 2 |
1 | 2245-03-01 | 3 |
2 | 2245-01-01 | 1 |
2 | 2245-03-01 | 2 |
grouped AS (
SELECT
customerid,
MonthStart,
MonthStart - (rn * INTERVAL '1 month') AS grp
FROM numbered
),
Here is the sample output from grouped:
CustomerID | MonthStart | Grp |
---|---|---|
1 | 2245-01-01 | 2245-01-01 |
1 | 2245-02-01 | 2245-01-01 |
1 | 2245-03-01 | 2245-01-01 |
2 | 2245-01-01 | 2245-01-01 |
2 | 2245-03-01 | 2245-02-01 |
streaks AS (
SELECT
customerid,
MIN(MonthStart) AS StartMonth,
MAX(MonthStart) AS EndMonth,
COUNT(*) AS MonthCount
FROM grouped
GROUP BY customerid, grp
),
The results from streaks look like this:
CustomerID | StartMonth | EndMonth | MonthCount |
---|---|---|---|
1 | 2245-01-01 | 2245-03-01 | 3 |
2 | 2245-01-01 | 2245-01-01 | 1 |
2 | 2245-03-01 | 2245-03-01 | 1 |
3 | 2245-02-01 | 2245-04-01 | 3 |
3 | 2245-05-01 | 2245-06-01 | 2 |
ranked AS (
SELECT
s.*,
ROW_NUMBER() OVER (
PARTITION BY customerid
ORDER BY monthcount DESC, endmonth DESC
) AS rnk
FROM streaks s
)
SELECT
customerid,
startmonth,
endmonth,
monthcount AS longest_streak_months
FROM ranked
WHERE rnk = 1
ORDER BY customerid;
The final results would look like this:
CustomerID | StartMonth | EndMonth | LongestStreakMonths |
---|---|---|---|
1 | 2245-01-01 | 2245-03-01 | 3 |
2 | 2245-01-01 | 2245-01-01 | 1 |
3 | 2245-02-01 | 2245-04-01 | 3 |
4 | 2245-01-01 | 2245-02-01 | 2 |
5 | 2245-03-01 | 2245-05-01 | 3 |
WITH monthly AS (
SELECT
o.customerid,
DATE_TRUNC('month', o.orderdate)::date AS month_start
FROM orders o
GROUP BY o.customerid, DATE_TRUNC('month', o.orderdate)::date
),
numbered AS (
SELECT
customerid,
month_start,
ROW_NUMBER() OVER (
PARTITION BY customerid
ORDER BY month_start
) AS rn
FROM monthly
),
grouped AS (
SELECT
customerid,
month_start,
-- Stable group key: shift each month back by rn months
month_start - (rn * INTERVAL '1 month') AS grp
FROM numbered
),
streaks AS (
SELECT
customerid,
MIN(month_start) AS start_month,
MAX(month_start) AS end_month,
COUNT(*) AS month_count
FROM grouped
GROUP BY customerid, grp
),
ranked AS (
SELECT
s.*,
ROW_NUMBER() OVER (
PARTITION BY customerid
ORDER BY month_count DESC, end_month DESC
) AS rnk
FROM streaks s
)
SELECT
customerid,
start_month,
end_month,
month_count AS longest_streak_months
FROM ranked
WHERE rnk = 1
ORDER BY customerid;;
LAG()
(or LEAD()
if you prefer “next”).ROW_NUMBER()
+ date grain + normalized key.1,2,2,4
): RANK()
1,2,2,3
): DENSE_RANK()
1,2,3,4
): ROW_NUMBER()
with a deterministic ORDER BY
prev_date
, then compute date - prev_date
.LAG()
.LAG()
.rn
).date - rn*interval
, so consecutive rows fall in the same island.RANK()
penalizes ties (1,2,2,4
), DENSE_RANK()
doesn’t (1,2,2,3
).ROW_NUMBER()
to break ties deterministically for “Top 1 per entity.”PARTITION BY
the entity you care about.ORDER BY
a deterministic key (date + tiebreaker).LAG(date)
→ compute gap_days
→ ROW_NUMBER()
to pick max per entity.ROW_NUMBER()
→ normalize (date - rn*interval
) → group/aggregate → rank longest streak.Symptom: “My streaks skip months / gaps look huge.”
Fix: Explicitly set grain:
::date
,
DATE_TRUNC('week', ...)
,
DATE_TRUNC('month', ...)
.
Symptom: Different answers run-to-run.
Fix: Add a tie-breaker:
ORDER BY orderdate, orderid
.
Symptom: Zero-day gaps; broken islands.
Fix: De-dup before windows
(SELECT DISTINCT customerid, orderdate::date
)
or keep one row via ROW_NUMBER()
and filter
(e.g., QUALIFY rn = 1
or a subquery).
Symptom: Islands undercounted because dates with no rows vanish.
Fix: For rules like “every day/week”, build a calendar/time series and join, then apply the condition.
Symptom: Streak length miscounts by 1.
Fix: Be explicit: for monthly streaks, count rows; for day streaks,
MAX(date) - MIN(date) + 1
only if every day is present.
Symptom: Streaks drop unexpectedly.
Fix: Apply the same filter in every CTE
(WHERE status = 'Completed'
), not just the last step.
Symptom: Streaks/gaps bleed across entities.
Fix: Double-check PARTITION BY
(customer vs supplier vs category)
matches the question.
Symptom: Consecutive months still split.
Fix: Create rn
at the same grain you normalized (e.g., month),
then grp = month_start - rn * interval '1 month'
.
Symptom: Gaps differ by 1 day around midnight.
Fix: Normalize to date
(or a consistent timezone) before windows.
Symptom: Not the most recent or not truly the max.
Fix: Rank inside entity:
ORDER BY month_count DESC, end_month DESC
(or your tie-break),
then filter to the top (e.g., WHERE rnk = 1
).
Symptom: Slow queries on big tables.
Fix: Pre-filter date range; index
(customerid, orderdate)
; avoid unnecessary DISTINCT
s; materialize intermediate CTEs if needed.
Here are some problems you can try right away:
See also: LEAD / LAG, ROWNUMBER() & RANK()