Today, we’re going to look at how to distribute values with what I call “overlapping ranges”. What I mean by that is that the value that you need to distribute has a given range and the objects you want to distribute the values to are also defined as ranges. Sound complicated? It doesn’t have to.
Say that you have a business that bills customers based on some type of contract period. A contract period can start and end at any given point in time, depending on, for instance, when the contract was started and completed. However, for your reporting needs, you need to figure out how much money you’ve billed for each reporting/accounting period, which in our example is going to be calendar months.
Example: For a particular project, you’ve billed $910 from january 12th to april 12th. How much of this applies to january, february, march and april respectively? Here’s how it should break down:
In total, the billed period spans a total of 91 days. 20 of those days are in january, 28 in february, 31 in march and 12 in april.
january 20 days $910*20/91=$200 february 28 days $910*28/91=$280 march 31 days $910*31/91=$310 april 12 days $910*12/91=$200 total =91 days =$910
Pretty simple, right? Here’s how it translates into T-SQL, step by step:
First off, let’s create some sample tables with data to work with:
--- These are our accounting/reporting periods: CREATE TABLE dbo.periods ( periodID int IDENTITY(1, 1) NOT NULL, startDate date NOT NULL, endDate date NOT NULL, CONSTRAINT PK_periods PRIMARY KEY CLUSTERED (periodID) ); INSERT INTO dbo.periods (startDate, endDate) VALUES ({d '2014-01-01'}, {d '2014-01-31'}), ({d '2014-02-01'}, {d '2014-02-28'}), ({d '2014-03-01'}, {d '2014-03-31'}), ({d '2014-04-01'}, {d '2014-04-30'}), ({d '2014-05-01'}, {d '2014-05-31'}), ({d '2014-06-01'}, {d '2014-06-30'}), ({d '2014-07-01'}, {d '2014-07-31'}), ({d '2014-08-01'}, {d '2014-08-31'}), ({d '2014-09-01'}, {d '2014-09-30'}), ({d '2014-10-01'}, {d '2014-10-31'}), ({d '2014-11-01'}, {d '2014-11-30'}), ({d '2014-12-01'}, {d '2014-12-31'}); --- ... and here are the billing transactions: CREATE TABLE dbo.transactions ( transactionID int IDENTITY(1, 1) NOT NULL, validFrom date NOT NULL, validTo date NOT NULL, amount numeric(12, 2) NOT NULL, CONSTRAINT PK_transactions PRIMARY KEY CLUSTERED (transactionID) ); INSERT INTO dbo.transactions (validFrom, validTo, amount) VALUES ({d '2014-01-12'}, {d '2014-04-12'}, 910), ({d '2014-04-13'}, {d '2014-05-22'}, 400), ({d '2014-05-23'}, {d '2014-09-01'}, 1020), ({d '2014-09-02'}, {d '2014-12-31'}, 1210);
The first problem we’re going to tackle is matching billed transactions to each and every period that it somehow applies to, even if it’s just for a single day. We’ll do this with a type of sem-cartesian join that splits each bill into multiple accounting periods. The join will look like this:
SELECT t.transactionID, t.validFrom, t.validTo, p.periodID, p.startDate, p.endDate FROM dbo.transactions AS t INNER JOIN dbo.periods AS p ON t.validFrom<=p.endDate AND t.validTo>=p.startDate ORDER BY t.transactionID, p.startDate;
A period is matched to a transaction if there is any overlap between the two – by definition this happens when the transaction begins before the period ends and the transaction ends after the period begins. The longer version of how this is done can be found in this post, which deals just with ranges but not with distributions.
From here, we can calculate which days within each period a transaction applies to. In our first example, the billed transaction only applies from the 12th to the 31st of january within that month. The logic for this is fairly simple, as you’ll see in the following CASE expression:
SELECT t.transactionID, t.validFrom, t.validTo, p.periodID, p.startDate, p.endDate, --- First day within the period for this transaction: (CASE WHEN t.validFrom<p.startDate THEN p.startDate ELSE t.validFrom END) AS firstDayWithinPeriod, --- Last day within the period for this transaction: (CASE WHEN t.validTo>p.endDate THEN p.endDate ELSE t.validTo END) AS lastDayWithinPeriod FROM dbo.transactions AS t INNER JOIN dbo.periods AS p ON t.validFrom<=p.endDate AND t.validTo>=p.startDate ORDER BY t.transactionID, p.startDate;
With that done, we can use DATEDIFF() to calculate the number of days 1) between the transaction’s start and end dates, and 2) between the transaction’s start and end dates within the period.
SELECT t.transactionID, t.validFrom, t.validTo, p.periodID, p.startDate, p.endDate, --- How many days within this period? 1+DATEDIFF(dd, (CASE WHEN t.validFrom<p.startDate THEN p.startDate ELSE t.validFrom END), (CASE WHEN t.validTo>p.endDate THEN p.endDate ELSE t.validTo END)) AS daysInPeriod, --- How many days total for this transaction? 1+DATEDIFF(dd, t.validFrom, t.validTo) AS daysValid FROM dbo.transactions AS t INNER JOIN dbo.periods AS p ON t.validFrom<=p.endDate AND t.validTo>=p.startDate ORDER BY t.transactionID, p.startDate;
Did you notice the “1+” prefixing the DATEDIFF() function? This construct is necessary because we implicitly define the end date as the last day on which the transaction/period is valid, which is actually 24 hours before. Example: A transaction from march 3rd to march 5th lasts for three days (the 3rd, 4th and 5th), but the DATEDIFF() of those two dates is 2. You could argue that the correct storage of this type of range would actually be to set march 6th as the actual end date for this range. But that’s a different blog post. Back to the subject:
The number of days within the period divided by the transaction’s total number of days gives us the distribution key that we can multiply with the transaction amount.
SELECT t.transactionID, t.validFrom, t.validTo, p.periodID, p.startDate, p.endDate, --- The number of days within each period divided by --- the total number of days for the transaction gives --- us the distribution key: (1.0+DATEDIFF(dd, (CASE WHEN t.validFrom<p.startDate THEN p.startDate ELSE t.validFrom END), (CASE WHEN t.validTo>p.endDate THEN p.endDate ELSE t.validTo END))) / (1+DATEDIFF(dd, t.validFrom, t.validTo)) AS distributionKey FROM dbo.transactions AS t INNER JOIN dbo.periods AS p ON t.validFrom<=p.endDate AND t.validTo>=p.startDate ORDER BY t.transactionID, p.startDate;
Finally, multiplying the distribution key with the transaction amount will give us the distributed amount for each particular period:
SELECT t.transactionID,
p.periodID, p.startDate, p.endDate,
--- Amount multiplied by the key:
t.amount*
(1.0+DATEDIFF(dd,
(CASE WHEN t.validFrom<p.startDate
THEN p.startDate
ELSE t.validFrom END),
(CASE WHEN t.validTo>p.endDate
THEN p.endDate
ELSE t.validTo END))) /
(1+DATEDIFF(dd, t.validFrom, t.validTo)
) AS distributedAmount
FROM dbo.transactions AS t
INNER JOIN dbo.periods AS p ON
t.validFrom<=p.endDate AND
t.validTo>=p.startDate
ORDER BY t.transactionID, p.startDate;
That’s it. Now go and distribute numbers!
As usual, if you have any comments, leave them in the comments section below. And don’t forget to like and subscribe to the Sunday Morning T-SQL Facebook page. See you next week!
1 comment