You can get into a situation where you have two tables with values associated with date ranges. What’s worse, those date ranges don’t necessarily have to align, which can make joining them a seemingly complex task, but it is surprisingly simple when you learn how to think of overlapping date ranges, along with this relatively simple T-SQL join pattern.
What’s the problem?
If we put the two tables next to each other on an imaginary horizontal axis representing the dates, they look something like this:
Suppose table A contains something that we want to distribute to table B (which has a higher granularity), or vice versa. The first step in doing that is to translate which ranges of table A overlap (i.e. intersect) table B:
Some test data
Let’s create two test tables, one with monthly sales numbers and one with weeks.
DROP TABLE IF EXISTS #months, #weeks;
CREATE TABLE #months (
startDate date NOT NULL,
endDate date NOT NULL,
monthName nvarchar(100) NOT NULL,
unitSales numeric(12, 2) NOT NULL,
PRIMARY KEY CLUSTERED (startDate)
);
INSERT INTO #months
SELECT mo,
DATEADD(month, 1, mo),
DATENAME(month, mo)+N' '+STR(YEAR(mo), 4, 0),
ABS(CHECKSUM(DATENAME(month, mo)))%10000
FROM (
SELECT TOP (12) DATEADD(month, ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-1, {d '2023-01-01'}) AS mo
FROM sys.objects
) AS m;
CREATE TABLE #weeks (
startDate date NOT NULL,
endDate date NOT NULL,
weekName nvarchar(100) NOT NULL,
PRIMARY KEY CLUSTERED (startDate)
);
INSERT INTO #weeks
SELECT wk,
DATEADD(day, 7, wk),
N'Week '+CAST(DATEPART(isowk, wk) AS nvarchar(10))+N', '+STR(YEAR(wk), 4, 0)
FROM (
SELECT TOP (52) DATEADD(day, 7*ROW_NUMBER() OVER (ORDER BY (SELECT NULL))-7, {d '2023-01-02'}) AS wk
FROM sys.objects
) AS m;
Defining an overlapping range
If you’re joining two unknown range sets, there are four scenarios you can run into:
Comparing start-to-start and end-to-end does not really give us the answer we’re looking for. Let’s instead go back to the basic definition of what makes one range intersect the other.
If A starts before B ends and B starts before A ends, the two ranges overlap. In T-SQL terms, then, you’d join them like this:
SELECT *
FROM #months AS mo
INNER JOIN #weeks AS wk ON
mo.startDate<wk.endDate AND
wk.startDate<mo.endDate
ORDER BY mo.startDate, wk.startDate;
Note that the end dates in my datasets are the same as the start dates of the next row. If your range ends on the day before the next (such as the last day of the month), you’ll have to use “<=” instead of just “<” in the join condition.
Note how our 52 weeks and 12 months generate a result set with 62 rows! This is because a week can span across more than one month, which is evident from weeks 5, 9 and 13 in the resultset above. The grain of the new resultset, thus, is not the week, but the intersection of months and weeks.
Computing the range intersection
To compute those range intersections in this resultset, i.e. the actual start and end dates of the overlap, you’ll have to look at the start and end dates of both tables, and find the smallest range possible. This means finding the greater of the two start dates and the smaller of the two end dates.
In T-SQL, this is a CASE before SQL Server 2022:
SELECT (CASE WHEN mo.startDate>wk.startDate THEN mo.startDate ELSE wk.startDate END) AS intersectionStart,
(CASE WHEN mo.endDate<wk.endDate THEN mo.endDate ELSE wk.endDate END) AS intersectionEnd,
*
FROM #months AS mo
INNER JOIN #weeks AS wk ON
mo.startDate<wk.endDate AND
wk.startDate<mo.endDate
ORDER BY mo.startDate, wk.startDate;
If you’re lucky enough to be on SQL Server 2022 or Azure SQL Database, you can use the new GREATEST() and LEAST() functions to make this code a little more readable:
SELECT GREATEST(mo.startDate, wk.startDate) AS intersectionStart,
LEAST(mo.endDate, wk.endDate) AS intersectionEnd,
*
FROM #months AS mo
INNER JOIN #weeks AS wk ON
mo.startDate<wk.endDate AND
wk.startDate<mo.endDate
ORDER BY mo.startDate, wk.startDate;
Distributing the sales numbers
You’ll notice that the month table has monthly sales numbers, which we want to distribute evenly to each week. First off, let’s compute what the sales numbers are per day for each month.
SELECT GREATEST(mo.startDate, wk.startDate) AS intersectionStart,
LEAST(mo.endDate, wk.endDate) AS intersectionEnd,
mo.unitSales/DATEDIFF(day, mo.startDate, mo.endDate) AS dailySales,
*
FROM #months AS mo
INNER JOIN #weeks AS wk ON
mo.startDate<wk.endDate AND
wk.startDate<mo.endDate
ORDER BY mo.startDate, wk.startDate;
Before we move on, I’m going to move the intersectionStart and intersectionEnd columns to a CROSS APPLY clause, because we’re going to reuse it, and I’m just too lazy to type it over and over again:
SELECT i.intersectionStart,
i.intersectionEnd,
mo.unitSales/DATEDIFF(day, mo.startDate, mo.endDate) AS dailySales,
*
FROM #months AS mo
INNER JOIN #weeks AS wk ON
mo.startDate<wk.endDate AND
wk.startDate<mo.endDate
CROSS APPLY (
VALUES (GREATEST(mo.startDate, wk.startDate),
LEAST(mo.endDate, wk.endDate))
) AS i(intersectionStart, intersectionEnd)
ORDER BY mo.startDate, wk.startDate;
Now, let’s multiply the computed daily sales by the number of days in each intersection:
SELECT i.intersectionStart,
i.intersectionEnd,
mo.unitSales
*DATEDIFF(day, i.intersectionStart, i.intersectionEnd)
/DATEDIFF(day, mo.startDate, mo.endDate) AS intersectionSales,
*
FROM #months AS mo
INNER JOIN #weeks AS wk ON
mo.startDate<wk.endDate AND
wk.startDate<mo.endDate
CROSS APPLY (
VALUES (GREATEST(mo.startDate, wk.startDate),
LEAST(mo.endDate, wk.endDate))
) AS i(intersectionStart, intersectionEnd)
ORDER BY mo.startDate, wk.startDate;
And finally, we’ll aggregate the intersection numbers up to week ranges to get our weekly sales:
SELECT wk.startDate,
wk.endDate,
wk.weekName,
SUM(mo.unitSales
*DATEDIFF(day, i.intersectionStart, i.intersectionEnd)
/DATEDIFF(day, mo.startDate, mo.endDate)) AS weeklySales
FROM #months AS mo
INNER JOIN #weeks AS wk ON
mo.startDate<wk.endDate AND
wk.startDate<mo.endDate
CROSS APPLY (
VALUES (GREATEST(mo.startDate, wk.startDate),
LEAST(mo.endDate, wk.endDate))
) AS i(intersectionStart, intersectionEnd)
GROUP BY wk.startDate, wk.endDate, wk.weekName
ORDER BY wk.startDate;
A word about performance
Because there is no equijoin, this query will generate a Nested Loop join, effectively scanning the month table, then seeking the week table once for every month row. Nested Loop joins can be great when the top input is small the the bottom input can use an Index Seek, but it really doesn’t scale well when those two conditions are not both met.
But that’s for another post.
1 comment