Parallel execution, part 2

Continuing on last week’s post on parallelism, here’s part two, where we take a closer look at when parallel plans are considered and what you can do to either force or prevent a query from running parallel as well as things you want to avoid if you’re trying to achieve a parallel query plan.

Cost threshold

Probably the single most important thing to know about parallelism is that a parallel query plan includes a certain amount of overhead because it needs to distribute/repartition and/or gather streams. For this reason, only a query that the optimizer deems expensive enough at first look will become a candidate for a parallel plan.

This is where the cost threshold for parallelism is applied – on the server level, you can configure the threshold, the query cost (as seen in the query plan) at which point the server will consider a parallel plan. You can set this parameter using sp_configure, but in order for it to become visible, you’ll first have to enable “advanced options”, also with sp_configure:

EXECUTE sp_configure 'show advanced options', 1;
RECONFIGURE;

With advanced options enabled, you can use sp_configure to set the threshold like this:

EXECUTE sp_configure 'cost threshold for parallelism', 5;
RECONFIGURE;

The example above sets the theshold value to 5 (which is the default for new SQL Server instances), meaning that queries that are cheaper than that will always run serially. This value can be important when tuning loads on the server level – a lower value means that more queries will be considered for parallelization; a higher value means that a larger portion of your workload will run serially. The threshold is important to be aware of with regards to different types of server workloads. Without going into details, you could generally say that server loads can be illustrated with two extremes:

A high-load OLTP server with a large number of simultaneously active users, running relatively inexpensive queries may benefit significantly from a greater amount of serial queries. This is primarily because many concurrent users means many simultaneous processes, so you’ll want to reduce the number of CPU threads for each process in order to be able to server more users at the same time.

A datawarehouse with fewer users running more expensive queries, on the other hand, should probably use a significant amount of parallelization when available because the worloads are fewer but heavier.

Some SQL Server experts will argue that you should completely disable parallelism on OLTP servers for the reason stated above. I personally think that this is drastic – using the cost threshold setting allows you to more finely tune which queries get a boost from parallelization and which do not.

Max degree of parallelism

You can also configure the maximum number of threads that the server is allowed to use in a parallel plan, known as the “degree of parallelism”. You can do this on the server level, again using sp_configure:

EXECUTE sp_configure 'max degree of parallelism', 5;
RECONFIGURE;

… or you can add a query hint to the individual query, giving you a greater amount of control.

SELECT something
FROM someTable
OPTION (MAXDOP 4) --- Allow up to four parallel threads

The most common use for the MAXDOP setting (server-wide and “locally”) is to force a serial plan, by specifying MAXDOP 1.

Operators that can only be run serially

Some operations cannot be run in parallel in SQL Server. This is normally because a particular operation couldn’t logically (or realistically) run in parallel, but sometimes, like indexing, it’s a matter of Microsoft wanting more of your money for their premium features. Compiling a list of these operations is difficult, partly because of lacking documentation, partly because this information changes from version to version (sometimes even between service packs).

An important part of tuning high-performance queries, however, is understanding which operations are performed serially and trying to eliminate those from your query where possible. I found the following examples by googling the web, most notably from a blog post by SQL Server MVP Paul White.

Some operations will affect the entire query plan, meaning that just a single operator that can’t be parallelized will make the entire plan turn serial. For example:

INSERT, UPDATE and DELETE on table variables – use regular temporary tables instead of table variables where possible.
User-defined scalar functions – these are really tricky to get good performance from in the first place, but they will also make your plan serial.
References to system tables and DMVs.
Recursive common table expressions – the very nature of a recursion forces it to run in a single thread. The optimizer cannot determine at compile-time how to partition the data streams so that all recursions within a CTE happen in the same stream.
Any operation referencing in-memory tables – there are lots and lots of things in-memory tables just won’t do in SQL Server 2014. Parallelization is one of them.
CLR code that is allowed “data access”.
Inline table UPDATE (which isn’t a supported feature, anyway) because parallelization would potentially create inconsistensies in the ordering of the table. Just like with the recursive CTEs, SQL Server cannot guarantee that the streams are correctly partitioned, and sometimes it wouldn’t be theoretically possible at all.

Some operators aren’t parallelized themselves but they can still exist in a parallel plan. This works by placing them in a “serial zone”. A serial zone is when SQL Server gathers any preceding parallel streams into a single stream, performing the operation at hand. Once done, the streams can be distributed again if necessary.

You can tell the start and end of a parallel/serial “zone” by looking for the Parallelism (Distribute Streams) and Parallelism (Gather Streams) operators.

Index operations – this is a paid feature, so you’ll have to shell out for Enterprise Edition licenses.
TOP – because TOP requires data to be ordered, the streams will first have to be gathered.
Sequence project – this operator deals with ordered window functions like ROW_NUMBER() and RANK(). Because the data needs to be ordered first, it cannot run in parallel. However, I’ve seen cases where the Sequence Project operator is parallelized in some cases, using LEAD() and LAG(). This will be the basis of a different future post, though.
Multi-statement table value functions (i.e. the opposite of inline table value functions) run in a serial zone, but only when they’re referenced with a CROSS APPLY that passes arguments from another table or stream. This means that multi-statement TVFs can be parallelized when called with constants or variables.
Backward range scans – for some reason a forward scan of an index or table can be parallelized, but not a backward dito.

Using DBCC to force a parallel plan

In isolated test/development machines, you can use a special DBCC command to change the way query plans are calculated to all but force SQL Server to consider a parallel plan for your query. This is done by modifying the “weight” of the CPU cost in relation to the cost of I/O operations.

With the following example, you’re telling SQL Server that everything that the CPU has to do is crazy-expensive in relation to the cost of I/O operations – this will make queries a lot more expensive for the purpose of calculating a query plan, which will make the plan a candidate for parallelism. The default value for the cpu weight is 1.

DBCC FREEPROCCACHE;
DBCC SETCPUWEIGHT(1000);
GO

Note that this feature is undocumented (i.e. unsupported), so don’t use it on a production machine. However, in a controlled environment, it can be great for troubleshooting and performance tuning. For more on the subject, Paul White has written an excellent post on plan costing.

References

Paul White has also written an excellent article on forcing parallelism.
Spot the reason a query wasn’t parallelized in the query plan XML.
Updated april 2015: Simon Sabin published what looks like a more comprehensive list of operations that can’t be parallelized.