Distributing values with overlapping ranges

Today, we’re going to look at how to distribute values with what I call “overlapping ranges”. What I mean by that is that the value that you need to distribute has a given range and the objects you want to distribute the values to are also defined as ranges. Sound complicated? It doesn’t have to.

Say that you have a business that bills customers based on some type of contract period. A contract period can start and end at any given point in time, depending on, for instance, when the contract was started and completed. However, for your reporting needs, you need to figure out how much money you’ve billed for each reporting/accounting period, which in our example is going to be calendar months.

Example: For a particular project, you’ve billed $910 from january 12th to april 12th. How much of this applies to january, february, march and april respectively? Here’s how it should break down:

In total, the billed period spans a total of 91 days. 20 of those days are in january, 28 in february, 31 in march and 12 in april.

january    20 days   $910*20/91=$200
february   28 days   $910*28/91=$280
march      31 days   $910*31/91=$310
april      12 days   $910*12/91=$200

total     =91 days             =$910

Pretty simple, right? Here’s how it translates into T-SQL, step by step:

First off, let’s create some sample tables with data to work with:

--- These are our accounting/reporting periods:
CREATE TABLE dbo.periods (
    periodID      int IDENTITY(1, 1) NOT NULL,
    startDate     date NOT NULL,
    endDate       date NOT NULL,
    CONSTRAINT PK_periods PRIMARY KEY CLUSTERED (periodID)
);

INSERT INTO dbo.periods (startDate, endDate)
VALUES ({d '2014-01-01'}, {d '2014-01-31'}),
       ({d '2014-02-01'}, {d '2014-02-28'}),
       ({d '2014-03-01'}, {d '2014-03-31'}),
       ({d '2014-04-01'}, {d '2014-04-30'}),
       ({d '2014-05-01'}, {d '2014-05-31'}),
       ({d '2014-06-01'}, {d '2014-06-30'}),
       ({d '2014-07-01'}, {d '2014-07-31'}),
       ({d '2014-08-01'}, {d '2014-08-31'}),
       ({d '2014-09-01'}, {d '2014-09-30'}),
       ({d '2014-10-01'}, {d '2014-10-31'}),
       ({d '2014-11-01'}, {d '2014-11-30'}),
       ({d '2014-12-01'}, {d '2014-12-31'});

--- ... and here are the billing transactions:
CREATE TABLE dbo.transactions (
    transactionID int IDENTITY(1, 1) NOT NULL,
    validFrom     date NOT NULL,
    validTo       date NOT NULL,
    amount        numeric(12, 2) NOT NULL,
    CONSTRAINT PK_transactions PRIMARY KEY CLUSTERED (transactionID)
);

INSERT INTO dbo.transactions (validFrom, validTo, amount)
VALUES ({d '2014-01-12'}, {d '2014-04-12'},  910),
       ({d '2014-04-13'}, {d '2014-05-22'},  400),
       ({d '2014-05-23'}, {d '2014-09-01'}, 1020),
       ({d '2014-09-02'}, {d '2014-12-31'}, 1210);

The first problem we’re going to tackle is matching billed transactions to each and every period that it somehow applies to, even if it’s just for a single day. We’ll do this with a type of sem-cartesian join that splits each bill into multiple accounting periods. The join will look like this:

SELECT t.transactionID, t.validFrom, t.validTo,
       p.periodID, p.startDate, p.endDate
FROM dbo.transactions AS t
INNER JOIN dbo.periods AS p ON
    t.validFrom<=p.endDate AND
    t.validTo>=p.startDate
ORDER BY t.transactionID, p.startDate;

A period is matched to a transaction if there is any overlap between the two – by definition this happens when the transaction begins before the period ends and the transaction ends after the period begins. The longer version of how this is done can be found in this post, which deals just with ranges but not with distributions.

From here, we can calculate which days within each period a transaction applies to. In our first example, the billed transaction only applies from the 12th to the 31st of january within that month. The logic for this is fairly simple, as you’ll see in the following CASE expression:

SELECT t.transactionID, t.validFrom, t.validTo,
       p.periodID, p.startDate, p.endDate,
       --- First day within the period for this transaction:
       (CASE WHEN t.validFrom<p.startDate
             THEN p.startDate
         ELSE t.validFrom END) AS firstDayWithinPeriod,
       --- Last day within the period for this transaction:
       (CASE WHEN t.validTo>p.endDate
             THEN p.endDate
         ELSE t.validTo END) AS lastDayWithinPeriod
FROM dbo.transactions AS t
INNER JOIN dbo.periods AS p ON
    t.validFrom<=p.endDate AND
    t.validTo>=p.startDate
ORDER BY t.transactionID, p.startDate;

With that done, we can use DATEDIFF() to calculate the number of days 1) between the transaction’s start and end dates, and 2) between the transaction’s start and end dates within the period.

SELECT t.transactionID, t.validFrom, t.validTo,
       p.periodID, p.startDate, p.endDate,
       --- How many days within this period?
       1+DATEDIFF(dd,
           (CASE WHEN t.validFrom<p.startDate
             THEN p.startDate
         ELSE t.validFrom END),
           (CASE WHEN t.validTo>p.endDate
             THEN p.endDate
         ELSE t.validTo END)) AS daysInPeriod,
       --- How many days total for this transaction?
       1+DATEDIFF(dd, t.validFrom, t.validTo) AS daysValid
FROM dbo.transactions AS t
INNER JOIN dbo.periods AS p ON
    t.validFrom<=p.endDate AND
    t.validTo>=p.startDate
ORDER BY t.transactionID, p.startDate;

Did you notice the “1+” prefixing the DATEDIFF() function? This construct is necessary because we implicitly define the end date as the last day on which the transaction/period is valid, which is actually 24 hours before. Example: A transaction from march 3rd to march 5th lasts for three days (the 3rd, 4th and 5th), but the DATEDIFF() of those two dates is 2. You could argue that the correct storage of this type of range would actually be to set march 6th as the actual end date for this range. But that’s a different blog post. Back to the subject:

The number of days within the period divided by the transaction’s total number of days gives us the distribution key that we can multiply with the transaction amount.

SELECT t.transactionID, t.validFrom, t.validTo,
       p.periodID, p.startDate, p.endDate,
       --- The number of days within each period divided by
       --- the total number of days for the transaction gives
       --- us the distribution key:
       (1.0+DATEDIFF(dd,
           (CASE WHEN t.validFrom<p.startDate
             THEN p.startDate
         ELSE t.validFrom END),
           (CASE WHEN t.validTo>p.endDate
             THEN p.endDate
         ELSE t.validTo END))) /
       (1+DATEDIFF(dd, t.validFrom, t.validTo)) AS distributionKey
FROM dbo.transactions AS t
INNER JOIN dbo.periods AS p ON
    t.validFrom<=p.endDate AND
    t.validTo>=p.startDate
ORDER BY t.transactionID, p.startDate;

Finally, multiplying the distribution key with the transaction amount will give us the distributed amount for each particular period:

SELECT t.transactionID,
       p.periodID, p.startDate, p.endDate,
       --- Amount multiplied by the key:
       t.amount*
           (1.0+DATEDIFF(dd,
               (CASE WHEN t.validFrom<p.startDate
                     THEN p.startDate
                     ELSE t.validFrom END),
               (CASE WHEN t.validTo>p.endDate
                     THEN p.endDate
                     ELSE t.validTo END))) /
           (1+DATEDIFF(dd, t.validFrom, t.validTo)
       ) AS distributedAmount
FROM dbo.transactions AS t
INNER JOIN dbo.periods AS p ON
    t.validFrom<=p.endDate AND
    t.validTo>=p.startDate
ORDER BY t.transactionID, p.startDate;

That’s it. Now go and distribute numbers!

As usual, if you have any comments, leave them in the comments section below. And don’t forget to like and subscribe to the Sunday Morning T-SQL Facebook page. See you next week!

One thought on “Distributing values with overlapping ranges

  1. Pingback: Joining two SCD2 tables « Sunday morning T-SQL

Let me hear your thoughts!

This site uses Akismet to reduce spam. Learn how your comment data is processed.