Wednesday, September 21, 2005

Temp Tables vs Table Variables


There was a recent newsgroup posting where the poster asked if creating/modifying temp tables incurred physical disk I/O. A consultant had attempted to justify his choice of using temp tables, as opposed to table variables, by declaring that temp tables are purely in-memory structures, incurring no physical disk I/O, and therefore perform as well as table variables. Well, the consultant was wrong but kind of right at the same time (sort of).

It's important to remember that temp tables & table variables are not the same thing with a different scope. They were created for different, but overlapping, purposes. Temp tables originated well before table variables and are essentially for the storage & manipulation of temporal data. Whereas table variables were introduced with SQL Server 2000 and were designed for returning datasets from table-valued functions, that is one or more rows and/or columns from a UDF as opposed to a single row/single column from a scalar-valued function. They are both instantiated in tempdb and are both backed by physical disk (as is every database, system or user, in SQL Server 2000). Changes to both temp tables and table variables are logged in the transaction log1. However, since tempdb always uses the simple recovery model, those transaction log records only last until the next tempdb checkpoint, at which time the tempdb log is truncated.

There are many similarities between temp tables and table variables, but there are also some notable differences. Namely, temp tables can be altered with DDL statements but table variables can't (so you cannot create a nonclustered index on a table variable for example). That makes every table variable a heap, or at best a table with a single, clustered index, and every table variable access a table scan (or clustered index scan, which is essentially the same thing). Temp tables have a looser scope than table variables, but no less well defined. Transactions involving table variables last only for the duration of the DML statement (the table variable change is committed implicitly after each DML statement, eg. four INSERT statements result in four begin/commit pairs in the tempdb transaction log), whereas transactions on temp tables can span multiple DML statements. As a result, transactions on table variables are typically shorter (than transactions involving temp tables) and require less locking2. SQL Server creates statistics for temp tables, so the query optimiser can choose different plans for data involving temp tables. However, SQL Server does not create statistics on columns in table variables (what's the point when the only way to access data in a table variable is though a table scan?). Table variables, when used in stored procedures, are supposed to result in fewer recompiles of those stored procedures. This is probably because the query optimser doesn't have to come up with different plans based on the content of the table variables because the only operations you can perform on table variables are heap/clustered index operations (table scan, clustered index insert, etc.), whereas a temp table may be manipulated through any number of indexes whose density, distribution & selectivity could change from access to access (hence the possibility of different execution plans, therefore more stored proc recompiles).

I've heard it recommended that you should use table variables where ever you can and only use temp tables where the additional schema you can define for them (such as indexes for example) would improve performance. But I've conducted my own tests on this issue and those tests yielded results that indicated that, in general, the performance of temp tables versus table variable, for small datasets, was almost identical. The basic gist was that table variables performed marginally better than temp tables when the number of rows contained within was quite small (< 100,000 rows) but over a rough size temp tables performed better (and the performance gap widened as the size of the table increased). It should be noted, though, that the temp tables in my tests were intentionally heaps, i.e. no clustered index, so as to keep them as close in structure to the table variables as I could. I suspect with a couple indexes on my temp tables, the performance would have improved considerably. That said, these are the figures from my hour or two of testing:

avg batch duration (msec)
#rowstable variabletemp table
51215.114.5
102417.929.4
204829.239.9
409661.871.6
8192127.9145.7
16384254.2291.7
32768544.6583.0
655361117.41103.4
1310722174.32224.2
2621444377.14553.2
5242888852.18402.0
104857617846.616188.6
209715236335.532739.4


I did the tests in batches of 10 (the figures above are averages for individual batches). Each batch basically consisted of a small loop that just inserted rows into, and selected rows from, the temp table/table variable as follows (the 22 in the while condition controls the number of rows to fill the table with; 2^(22-1) = 2097152):


declare @i int
set @i = 1

insert into @mytabvar (blah)
values ('This is a line of text')

while (@i < 22)
begin
insert into @mytabvar (blah)
select blah from @mytabvar
set @i = @i + 1
end


Anyway, the conclusions that I drew from my tests were that, for all intents and purposes, temp tables & table variables, all things being equal, for smallish datasets were so close to each other in comparative performance that any difference was purely academic. When the dataset grew a bit larger, temp tables started to shine a bit more than table variables and when you add extra DDL, like indexes, constraints, etc., into the mix, temp tables seemed the better choice. But the point remained that they were designed for different purposes and any problem involving temporary data storage should be tested with both scenarios (temp tables & table variables) and the solution that yielded the better performance (assuming it was still functionally correct) should be the preferred solution.

The consultant that told the original poster that temp tables did not write to disk was quite plainly wrong. However, that doesn't mean that his temp tables would have performed worse than the structurally equivalent table variables. In fact they might have performed as well as, if not better than, table variables, especially if there was a significant amount of data residing those temp tables and they were indexed. Only testing the solutions under a realistic load could answer that question.

The take home message from the thread: test, test, test. (And don't believe everything you read - perform your own tests to verify other's conclusions/assertions.)



1. SQL Books Online says that since table variables are not part of the persistent database, they are not impacted by rollbacks. I can only assume they say this because table variables are instantiated in tempdb rather than the persistent (user) database and, as such, when you issue a rollback in the user DB, the transaction log for the user DB is rewound, but the tempdb transaction log does not change as a result of the rollback statement (the tempdb transaction log may have even been truncated by that stage so how could you roll it back?). However, the same would hold true for temp tables so I don't know why Microsoft don't spell that out explicitly in the temp table documentation (at least I haven't read any temp table documentation where that is explicitly stated).

[I've since discovered, through my own testing again, that temp tables and table variables do not behave the same way in rollbacks. Temp table changes will get rolled back, even though they're logged in a different transaction log than the permanent data you're working on. But table variable changes are not affected by roll backs (which we already knew). That makes table variables very handy for creating/storing data that you wish to persist beyond roll backs (like informational log entries for example).]

2. Microsoft also say transactions involving table variables require less logging. I assume they say this because the transactions are shorter. Therefore, transactions stay open for less time and checkpoints can truncate the tempdb log more frequently without being blocked by the open transaction (open transactions cannot be checkpointed and flushed to disk before being committed and log truncations cannot touch the uncommitted transactions so the open transactions can remain in the transaction log for a longer period of time, therefore "more logging"). To me, the explicit LOP_BEGIN_XACT/LOP_COMMIT_XACT pairs for every INSERT, UPDATE or DELETE statement against a table variable means more logging, but perhaps I've just misinterpretted what Microsoft meant by their comment in BOL.

Friday, September 02, 2005

TechEd Australia 2005: Day 3

Last day today - slept in as I figured it was going to be the last time for a while. By that I mean my internal clock let me sleep until about 7:30...sigh...this is what having children has done to me (but I don't regret it).

Shower, paper, breakfast, checkout. Very leisurely, surprisingly smooth checkout & baggage storage (along with about another 100 or so TechEd delegates). I skipped the first session; wasn't anything I wanted to go to anyway. Arrived for the second session: Internet Safety for Children (Jeff Alexander). All the sessions I've been to this week have been purely for work, except for this one - this one was for me. I figure this is all going to be very pertinent (given my eldest is 4 and going to school next year) in a few years. Next I did an online lab on Reporting Sevices as none of the sessions in that time slot were of much interest to me. It was really just a starter for using Report Designer - not that interesting.

I nicked off to Pacific Fair for lunch (a little early) in the hopes of finding something my dad & father-in-law might enjoy for Father's Day this Sunday. I was just going to wish them a happy Father's Day and defer the present thing for later, but I figured there wasn't much point in procrastinating (got a triple CD - female jazz singers from the 30's - 50's - for dad and a book on the Kokoda Trail for my in-law - he's a retired Leftenant Colonel from the Army Reserve, so I think he'll probably find it a good read, and it comes highly recommended).

Back just after the next session kicked off (so I missed the first bit of it) - table partitioning in SQL 2005 (Danny Tamb), which was kind of interesting, and finished the day off with a session from local Microsoft boy Dave Lean on Report Builder. I was actually surprised how good that session was. It might even have been the best session of the week (and that's saying something as I'm a SQL purist - T-SQL query logic optimisation, execution plans, index tuning, storage engine internals, etc. - and don't really get into the whole BI/reporting bit of the SQL product suite). But I think there is a real future for Report Builder, which fits in between Report Designer & Report Manager and was not released with the early SQL 2000 release of Reporting Services. I was even envisaging which departments (Finance, HR, Marketing) would be big users of Report Builder if we gave them the infrastructure for it.

It's kind of fun checking out all these new Microsoft toys but at the same time kind of depressing because the opportunities to seriously evaluate them (in the context of the business) and deploy them are usually very limited. How many DBAs have the opportunity to play with table partitioning (not in production) in the context of flicking around 100+ GB of data on multiple filegroups, each with their own physical underlying RAID volume? I think I'm going to have to rip apart & rebuild my secret "play" SQL box in our secondary data centre (it's not really secret, but I just keep fairly quiet about it and people tend to keep forgetting it's out there <g>, otherwise it would probably get rebuilt in a heartbeat as some "useful" server for some other technology, like Citrix, Exchange or SMS or some such other crap). Yep, I'm definitely going to have to set up a proper play environment; where else are I going to mirror the DBs on my notebook to?

Anyway, left TechEd at about 5:30 and wandered around Broadbeach for about half an hour (couldn't go to the Gold Coast and not check out the beach...not bad...it's very long). By about 6pm I was running out of things to do to kill time (plane doesn't leave until 8pm) so I wandered back to the hotel, got my bags and cabbed it to Coolangatta airport; I figured reading my book at night at the airport wasn't much different from reading by book at night in the lobby of the Sofitel. Checkin was non-eventful. Waiting for boarding was boring (that's when I was typing all today's musings). I'm expecting to get home tonight at almost 11pm and I'll bet there'll be a middle-of-the-night feed for my little girl. So I'm not expecting to get too much sleep tonight, especially since the wife wants the day off tomorrow (and rightly so - she's had a hard week). Oh well...next stop, Sydney airport.

Uneventful flight, 1 hour taxi ride home ($94 later - ouch!) and I'm home at about 11pm. Ah, but it's good to be home.

Thursday, September 01, 2005

TechEd Australia 2005: Day 2

Another fairly uneventful day. The sessions I went to today were XML, XML & more XML first thing this morning (I'm not much of an unstructured data kind of guy - give me a relationship DB any day) and a tour of the SQL 2005 client tools (learnt a couple things but not much since I've been playing with the tools for about a year) to top off the morning. The afternoon started with debunking security myths (it's the Jesper & Steve show - very entertaining), followed by SQL backup & recovery (I was actually a bit surprised at the number of extra backup/recovery features in SQL 2005). Late arvo/evening was upgrading to SQL 2005 (for all the components) and Jesper came back to lock down SQL 2000 security by hacking the life out of account permissions, registry ACLs, NTFS permissions & SQL system objects (mostly fairly dangerous stuff IMHO - likely to leave your SQL server rather unstable/unusable...but definitely more secure).

Lots of stuff to think about from today and experiment with. Time to build up a virtual PC, because there's going to be a fair bit of tearing apart and rebuilding methinks. And I've got to get me a copy of Jesper & Steve's book!