Aggregating date and time values

Date and time values are not entirely intuitive to aggregate into averages in T-SQL, although the business case does arguably exist. Suppose, for instance, that you have a production log with a “duration” column (in the “time” datatype), and you want to find the totalt or average duration for a certain group of items.

It’s possible, but I would still call it a workaround.

First off, let’s set up some sample data:

CREATE TABLE dbo.ProductionLog (
    LogSequenceID       int IDENTITY(1, 1) NOT NULL,
    ItemCode            varchar(10) NOT NULL,
    ProductionStart     datetime2(3) NOT NULL,
    Duration            time(3) NOT NULL,
    PRIMARY KEY CLUSTERED (LogSequenceID)
);

INSERT INTO dbo.ProductionLog (ItemCode, ProductionStart, Duration)
SELECT 'ITEM'+LEFT(CAST(message_id AS varchar(10)), 1) AS ItemCode,
       DATEADD(second, -10*message_id, SYSDATETIME()) AS ProductionStart,
       DATEADD(ms, 100*(message_id%100), CAST('00:00' AS time(3))) AS Duration
FROM sys.messages;

We now have a sample log table with about 290 000 rows from our production process. How much time does it take, on average, to produce each item, and has there been a change in recent days? How much time have we spent in total, producing each item?

SELECT ItemCode,
       CAST(ProductionStart AS date) AS ProductionDate,
       AVG(Duration) AS AvgDuration,
       SUM(Duration) AS TotalDuration
FROM dbo.ProductionLog
GROUP BY ItemCode, CAST(ProductionStart AS date);

Because the “Duration” column is a “time(3)” datatype, it doesn’t really aggregate into an average that easily.

Msg 8117, Level 16, State 1, Line 20
Operand data type time is invalid for avg operator.

This may seem a little odd, as time is really a continuum just like any integer or floating-point value, right? So let’s just convert the time value into something that allows for aggregation:

SELECT ItemCode,
       CAST(ProductionStart AS date) AS ProductionDate,
       AVG(CAST(Duration AS float)) AS AvgDuration,
       SUM(CAST(Duration AS float)) AS TotalDuration
FROM dbo.ProductionLog
GROUP BY ItemCode, CAST(ProductionStart AS date);

Well, that didn’t pan out.

Msg 529, Level 16, State 2, Line 29
Explicit conversion from data type time to float is not allowed.

Another take: We can use DATEDIFF() to calculate the difference, in milliseconds, between midnight and “Duration”. Then we could just AVG() and SUM() that millisecond difference, which is an integer value.

SELECT ItemCode,
       CAST(ProductionStart AS date) AS ProductionDate,
       AVG(DATEDIFF(ms, '00:00', Duration)) AS AvgDuration,
       SUM(DATEDIFF(ms, '00:00', Duration)) AS TotalDuration
FROM dbo.ProductionLog
GROUP BY ItemCode, CAST(ProductionStart AS date);

Now we’re talking. We now have millisecond aggregates for each item:

All we have to do now is to convert those values back to time(3), after the AVG() and SUM() operations have completed. And just like converting a time(3) to milliseconds, we can do the reverse by using DATEADD() to add those values to “00:00”:

SELECT ItemCode,
       CAST(ProductionStart AS date) AS ProductionDate,
       DATEADD(ms, AVG(DATEDIFF(ms, '00:00', Duration)), CAST('00:00' AS time(3))) AS AvgDuration,
       DATEADD(ms, SUM(DATEDIFF(ms, '00:00', Duration)), CAST('00:00' AS time(3))) AS TotalDuration
FROM dbo.ProductionLog
GROUP BY ItemCode, CAST(ProductionStart AS date);

Note that I’ve explicitly declared “00:00” as a time(3), to avoid an implicit conversion to datetime.

A word about datatype overflow

In this example, our process (the “Duration” column) is typically only a few seconds, so the millisecond aggregates won’t cause a problem when we’re converting them to “int” values (which DATEDIFF does). However, if the SUM(Duration) aggregate for any group is more than about 18 months total, the DATEDIFF() in milliseconds will overflow the “int” datatype.

This may actually apply to AVG() as well, because in SQL Server, AVG() is just a SUM()/COUNT(*) under the hood.

The solution to this problem is to change DATEDIFF() to DATEDIFF_BIG(), which returns a “bigint” instead of the default “int”, but remember that DATEDIFF_BIG() available only in SQL Server 2016 and newer.

1 comment

Leave a comment

Your email address will not be published. Required fields are marked *