🏝️ Gap and Island Problems in SQL

This guide explains how to tackle SQL Gap and Island problems. These are questions that focus on finding streaks of consecutive activity or gaps between events. You’ll learn how to apply window functions such as ROW_NUMBER, LAG, and LEAD to identify patterns, measure breaks, and analyze sequences in your data sets.




🌟 Why You Should Learn Gap and Island Problems

Learning Gap and Island problems is essential because they appear frequently in SQL interviews and mirror real business challenges. Interviewers love them because they test whether you can go beyond simple aggregations and think about sequences over time. In practice, these problems help you:

  • Detect consecutive days or weeks of activity (islands), such as a customer ordering every week for three months.
  • Measure gaps between events, like the longest break a supplier took between fulfilling orders.
  • Work with irregular data where missing days, weeks, or transactions matter.
  • Apply window functions in creative ways to identify streaks, breaks, and patterns in temporal data.

By mastering this family of problems, you’ll be ready to analyze churn, reliability, and consistency in datasets — skills that are valuable in analytics, product monitoring, and operations.




❓ What Are Gaps & Islands?

A Gap problem asks you to measure the break between two consecutive events. Using window functions, especially LAG(), you compare the current row’s date (or sequence number) with the previous one to find how much time elapsed.

Gap Examples

  • Customer Breaks in Ordering: Xorthax suspects some of his loyal customers disappear for long stretches. For each customer, find the longest gap between two consecutive orders.
  • Supplier Delays: The merchants’ guild is concerned about supplier reliability. Identify the longest break between two shipments from each supplier.
  • Inventory Restocking Lapses: Xorthax wants to know how long shelves sit empty. For each product, calculate the longest period with no recorded restock events.

An Island problem asks you to find stretches of consecutive events that meet a certain condition. Instead of measuring the breaks, you identify uninterrupted streaks (the islands) of activity, often by combining ROW_NUMBER() with dates or sequence numbers to group consecutive rows together.

Island Examples

  • Customer Streaks: Xorthax wants to celebrate his most consistent shoppers. For each customer, find the longest streak of consecutive months in which they placed at least one order.
  • Category Sales Runs: On Zlacksys 19, Xorthax wonders which product categories sell steadily. Find the longest chain of consecutive weeks where each category had at least 10 completed orders.
  • Supplier Reliability Streaks: The intergalactic merchants’ guild praises reliable suppliers. For each supplier, identify the biggest set of consecutive days where they fulfilled at least one product order.



🛠️ Core Technique: Window Functions

We’ll walk through both a Gap problem and an Island problem, showing how window functions like ROW_NUMBER, LAG, and LEAD are used step by step.

Gap & Island: Step-by-StepA practical 4-step workflow using ROW_NUMBER, LAG, and LEAD.

  1. Create a time-grained series (day/week/month) or group by the relevant grain.
    • Gaps: Normalize to a date grain (::date, DATE_TRUNC(...)) and de-dupe if needed.
    • Islands: Choose the streak grain (day/week/month) and ensure one row per period per entity.
  2. Use a window ordered by date, partitioned by entity.
    • Gaps: LAG(date) to get the previous row’s date.
    • Islands: ROW_NUMBER() to assign a running index.
  3. Transform the Data: Gap vs. Island Logic
    • Gaps: Compute the gap gap_days = date - prev_date.
    • Islands: Build a group key like grp = date - rn * interval '1 day' (or week/month).
  4. Summarize:
    • Gaps: Rank gaps and pick the largest (e.g. WHERE rn = 1).
    • Islands: Aggregate each group (MIN/MAX/COUNT), then rank to choose the longest streak.

Let's see what that looks like for a gap problem!

"Xorthax suspects some of his loyal customers disappear for long stretches. For each customer, find the longest gap between two consecutive orders."

  1. Create a time-grained series or group by the relevant grain.

    WITH ordered AS (
      SELECT DISTINCT
        customerid,
        orderdate::date AS orderdate
      FROM orders
    ),
    

    If we were to select * from ordered we would get results that look like:

    CustomerIDOrderDate
    12245-01-02
    12245-01-03
    12245-01-05
    22245-01-20
    22245-01-15
  2. Use LAG() partitioned by entity and ordered by date.

    sequenced AS (
    SELECT
      customerid,
      orderdate,
      LAG(orderdate) OVER (
        PARTITION BY customerid
        ORDER BY orderdate
      ) AS prevdate
    FROM ordered
    ),
    

    The output from that CTE would look like this:

    CustomerIDOrderDatePrevDate
    12245-01-02NULL
    12245-01-032245-01-02
    12245-01-052245-01-03
    22245-01-15NULL
    22245-01-202245-01-15
  3. Compute the gap.

    gaps AS (
      SELECT
        customerid,
        (orderdate - prevdate) AS gapdays,
        prevdate AS gapstart,
        orderdate AS gapend
      FROM sequenced
      WHERE prev_date IS NOT NULL
    ),
    

    The output from "gaps" could look like:

    CustomerIDGapDaysGapStartGapEnd
    112245-01-022245-01-03
    122245-01-032245-01-05
    252245-01-152245-01-20
    372245-02-012245-02-08
    332245-02-082245-02-11
  4. Pick the desired gap per entity (in this case longest gap).

ranked AS (
  SELECT g.*, ROW_NUMBER() OVER (
    PARTITION BY customerid ORDER BY gapdays DESC, gapend DESC
  ) AS rn
  FROM gaps g
)

SELECT customerid, gapdays AS longestgapdays, gapstart, gapend
FROM ranked
WHERE rn = 1;

This is what the final output would look like:

CustomerID LongestGapDays GapStart GapEnd
1 2 2245-01-03 2245-01-05
2 5 2245-01-15 2245-01-20
3 7 2245-02-01 2245-02-08

Here is the complete gap query:
WITH monthly AS (
  SELECT
    o.customerid,
    DATE_TRUNC('month', o.orderdate)::date AS MonthStart
  FROM orders o
  GROUP BY o.customerid, DATE_TRUNC('month', o.orderdate)::date
),
Numbered AS (
   SELECT
       Category,
       MonthStart,
       ROW_NUMBER() OVER (PARTITION BY Category ORDER BY MonthStart) AS rn
   FROM monthly
),
grouped AS (
  SELECT
    customerid,
    MonthStart,
    MonthStart - (rn * INTERVAL '1 month') AS grp
  FROM numbered
),
streaks AS (
  SELECT
    customerid,
    MIN(MonthStart) AS StartMonth,
    MAX(MonthStart) AS EndMonth,
    COUNT(*)         AS MonthCount
  FROM grouped
  GROUP BY customerid, grp
),
ranked AS (
  SELECT
    s.*,
    ROW_NUMBER() OVER (
      PARTITION BY customerid
      ORDER BY monthcount DESC, endmonth DESC
    ) AS rnk
  FROM streaks s
)
SELECT
  customerid,
  startmonth,
  endmonth,
  monthcount AS longest_streak_months
FROM ranked
WHERE rnk = 1
ORDER BY customerid;



Now let's walk through an Island problem ...

"Xorthax wants to celebrate his most consistent shoppers. For each customer, find the longest streak of consecutive months in which they placed at least one order."

  1. Create a time-grained series or group by the relevant grain.
    WITH monthly AS (
      SELECT
        o.customerid,
        DATE_TRUNC('month', o.orderdate)::date AS MonthStart
      FROM orders o
      GROUP BY o.customerid, DATE_TRUNC('month', o.orderdate)::date
    ),
    

    Here is what the output of step 1 would look like:

    CustomerID MonthStart
    1 2245-01-01
    1 2245-02-01
    1 2245-03-01
    2 2245-01-01
    2 2245-03-01
  2. Use a window ordered by date, partitioned by entity.

    Numbered AS (
       SELECT
           Category,
           MonthStart,
           ROW_NUMBER() OVER (PARTITION BY Category ORDER BY MonthStart) AS rn
       FROM monthly
    ),
    

    The output from the second CTE would look like:

    CustomerID MonthStart RN
    1 2245-01-01 1
    1 2245-02-01 2
    1 2245-03-01 3
    2 2245-01-01 1
    2 2245-03-01 2
  3. Normalize with row number to form a stable group key 
    grouped AS (
      SELECT
        customerid,
        MonthStart,
        MonthStart - (rn * INTERVAL '1 month') AS grp
      FROM numbered
    ),
    

    Here is the sample output from grouped:

    CustomerID MonthStart Grp
    1 2245-01-01 2245-01-01
    1 2245-02-01 2245-01-01
    1 2245-03-01 2245-01-01
    2 2245-01-01 2245-01-01
    2 2245-03-01 2245-02-01
  4. Aggregate per group to get length and then pick the longest per entity.  This will take an additional CTE.
  5. streaks AS (
      SELECT
        customerid,
        MIN(MonthStart) AS StartMonth,
        MAX(MonthStart) AS EndMonth,
        COUNT(*)         AS MonthCount
      FROM grouped
      GROUP BY customerid, grp
    ),
    

    The results from streaks look like this:

    CustomerID StartMonth EndMonth MonthCount
    1 2245-01-01 2245-03-01 3
    2 2245-01-01 2245-01-01 1
    2 2245-03-01 2245-03-01 1
    3 2245-02-01 2245-04-01 3
    3 2245-05-01 2245-06-01 2

And finally …
ranked AS (
  SELECT
    s.*,
    ROW_NUMBER() OVER (
      PARTITION BY customerid
      ORDER BY monthcount DESC, endmonth DESC
    ) AS rnk
  FROM streaks s
)
SELECT
  customerid,
  startmonth,
  endmonth,
  monthcount AS longest_streak_months
FROM ranked
WHERE rnk = 1
ORDER BY customerid;

The final results would look like this:

CustomerID StartMonth EndMonth LongestStreakMonths
1 2245-01-01 2245-03-01 3
2 2245-01-01 2245-01-01 1
3 2245-02-01 2245-04-01 3
4 2245-01-01 2245-02-01 2
5 2245-03-01 2245-05-01 3

Here is the complete island query:
WITH monthly AS (
  SELECT
    o.customerid,
    DATE_TRUNC('month', o.orderdate)::date AS month_start
  FROM orders o
  GROUP BY o.customerid, DATE_TRUNC('month', o.orderdate)::date
),
numbered AS (
  SELECT
    customerid,
    month_start,
    ROW_NUMBER() OVER (
      PARTITION BY customerid
      ORDER BY month_start
    ) AS rn
  FROM monthly
),
grouped AS (
  SELECT
    customerid,
    month_start,
    -- Stable group key: shift each month back by rn months
    month_start - (rn * INTERVAL '1 month') AS grp
  FROM numbered
),
streaks AS (
  SELECT
    customerid,
    MIN(month_start) AS start_month,
    MAX(month_start) AS end_month,
    COUNT(*) AS month_count
  FROM grouped
  GROUP BY customerid, grp
),
ranked AS (
  SELECT
    s.*,
    ROW_NUMBER() OVER (
      PARTITION BY customerid
      ORDER BY month_count DESC, end_month DESC
    ) AS rnk
  FROM streaks s
)
SELECT
  customerid,
  start_month,
  end_month,
  month_count AS longest_streak_months
FROM ranked
WHERE rnk = 1
ORDER BY customerid;;



🔍 When to Use Which Window Function?

Quick Decision Tree

  • Need to measure time between events? → Use LAG() (or LEAD() if you prefer “next”).
  • Need to form consecutive islands (streaks)? → Use ROW_NUMBER() + date grain + normalized key.
  • Need to handle rankings with ties?
    • Penalize ties (1,2,2,4): RANK()
    • Don’t skip numbers (1,2,2,3): DENSE_RANK()
    • Force unique order (1,2,3,4): ROW_NUMBER() with a deterministic ORDER BY

Function-by-Function (with Gaps vs Islands)

LAG(expr) OVER (PARTITION BY entity ORDER BY date)
  • Best for Gaps: get prev_date, then compute date - prev_date.
  • Also useful for “change since last event” (e.g., price jumps, inventory deltas).
  • Tip: De-dup the grain first so same-day repeats don’t create 0-day gaps unless you want them.
LEAD(expr) OVER (PARTITION BY entity ORDER BY date)
  • Mirror of LAG().
  • Reads more naturally to compare current → next (e.g., “days until next order”).
  • Use same partitioning/ordering rules as LAG().
ROW_NUMBER() OVER (PARTITION BY entity ORDER BY date)
  • Core for Islands: create a running index (rn).
  • Build a stable group key, e.g. date - rn*interval, so consecutive rows fall in the same island.
  • Handy for deduping: keep the first row per (entity, date).
RANK() / DENSE_RANK()
  • Use after forming islands to pick the longest streak or gap.
  • RANK() penalizes ties (1,2,2,4), DENSE_RANK() doesn’t (1,2,2,3).
  • Often wrapped with ROW_NUMBER() to break ties deterministically for “Top 1 per entity.”

Framing & ORDER BY Gotchas

  • Always PARTITION BY the entity you care about.
  • Always ORDER BY a deterministic key (date + tiebreaker).
  • Default window frame is usually fine; what matters is row order.

Typical Combos

  • Gaps: grain → LAG(date) → compute gap_daysROW_NUMBER() to pick max per entity.
  • Islands: grain → ROW_NUMBER() → normalize (date - rn*interval) → group/aggregate → rank longest streak.



⚠️ Common Pitfalls

1) Wrong time grain

Symptom: “My streaks skip months / gaps look huge.”

Fix: Explicitly set grain: ::date, DATE_TRUNC('week', ...), DATE_TRUNC('month', ...).

2) Non-deterministic window ordering

Symptom: Different answers run-to-run.

Fix: Add a tie-breaker: ORDER BY orderdate, orderid.

3) Duplicates at the grain

Symptom: Zero-day gaps; broken islands.

Fix: De-dup before windows (SELECT DISTINCT customerid, orderdate::date) or keep one row via ROW_NUMBER() and filter (e.g., QUALIFY rn = 1 or a subquery).

4) Missing scaffold for “no activity”

Symptom: Islands undercounted because dates with no rows vanish.

Fix: For rules like “every day/week”, build a calendar/time series and join, then apply the condition.

5) Off-by-one boundaries

Symptom: Streak length miscounts by 1.

Fix: Be explicit: for monthly streaks, count rows; for day streaks, MAX(date) - MIN(date) + 1 only if every day is present.

6) Mixing “any order” vs “Completed orders only”

Symptom: Streaks drop unexpectedly.

Fix: Apply the same filter in every CTE (WHERE status = 'Completed'), not just the last step.

7) Partition mismatch

Symptom: Streaks/gaps bleed across entities.

Fix: Double-check PARTITION BY (customer vs supplier vs category) matches the question.

8) Island grouping key done on wrong grain

Symptom: Consecutive months still split.

Fix: Create rn at the same grain you normalized (e.g., month), then grp = month_start - rn * interval '1 month'.

9) Time zones & timestamps

Symptom: Gaps differ by 1 day around midnight.

Fix: Normalize to date (or a consistent timezone) before windows.

10) Picking the “longest” incorrectly

Symptom: Not the most recent or not truly the max.

Fix: Rank inside entity: ORDER BY month_count DESC, end_month DESC (or your tie-break), then filter to the top (e.g., WHERE rnk = 1).

11) Non-performant (slow) query

Symptom: Slow queries on big tables.

Fix: Pre-filter date range; index (customerid, orderdate); avoid unnecessary DISTINCTs; materialize intermediate CTEs if needed.




🧩 Practice Problems

Here are some problems you can try right away:


🔗 Related Topics

See also: LEAD / LAG, ROWNUMBER() & RANK()