Temp Tables vs Table Variables
There was a recent newsgroup posting where the poster asked if creating/modifying temp tables incurred physical disk I/O. A consultant had attempted to justify his choice of using temp tables, as opposed to table variables, by declaring that temp tables are purely in-memory structures, incurring no physical disk I/O, and therefore perform as well as table variables. Well, the consultant was wrong but kind of right at the same time (sort of).
It's important to remember that temp tables & table variables are not the same thing with a different scope. They were created for different, but overlapping, purposes. Temp tables originated well before table variables and are essentially for the storage & manipulation of temporal data. Whereas table variables were introduced with SQL Server 2000 and were designed for returning datasets from table-valued functions, that is one or more rows and/or columns from a UDF as opposed to a single row/single column from a scalar-valued function. They are both instantiated in tempdb and are both backed by physical disk (as is every database, system or user, in SQL Server 2000). Changes to both temp tables and table variables are logged in the transaction log1. However, since tempdb always uses the simple recovery model, those transaction log records only last until the next tempdb checkpoint, at which time the tempdb log is truncated.
There are many similarities between temp tables and table variables, but there are also some notable differences. Namely, temp tables can be altered with DDL statements but table variables can't (so you cannot create a nonclustered index on a table variable for example). That makes every table variable a heap, or at best a table with a single, clustered index, and every table variable access a table scan (or clustered index scan, which is essentially the same thing). Temp tables have a looser scope than table variables, but no less well defined. Transactions involving table variables last only for the duration of the DML statement (the table variable change is committed implicitly after each DML statement, eg. four INSERT statements result in four begin/commit pairs in the tempdb transaction log), whereas transactions on temp tables can span multiple DML statements. As a result, transactions on table variables are typically shorter (than transactions involving temp tables) and require less locking2. SQL Server creates statistics for temp tables, so the query optimiser can choose different plans for data involving temp tables. However, SQL Server does not create statistics on columns in table variables (what's the point when the only way to access data in a table variable is though a table scan?). Table variables, when used in stored procedures, are supposed to result in fewer recompiles of those stored procedures. This is probably because the query optimser doesn't have to come up with different plans based on the content of the table variables because the only operations you can perform on table variables are heap/clustered index operations (table scan, clustered index insert, etc.), whereas a temp table may be manipulated through any number of indexes whose density, distribution & selectivity could change from access to access (hence the possibility of different execution plans, therefore more stored proc recompiles).
I've heard it recommended that you should use table variables where ever you can and only use temp tables where the additional schema you can define for them (such as indexes for example) would improve performance. But I've conducted my own tests on this issue and those tests yielded results that indicated that, in general, the performance of temp tables versus table variable, for small datasets, was almost identical. The basic gist was that table variables performed marginally better than temp tables when the number of rows contained within was quite small (< 100,000 rows) but over a rough size temp tables performed better (and the performance gap widened as the size of the table increased). It should be noted, though, that the temp tables in my tests were intentionally heaps, i.e. no clustered index, so as to keep them as close in structure to the table variables as I could. I suspect with a couple indexes on my temp tables, the performance would have improved considerably. That said, these are the figures from my hour or two of testing:
avg batch duration (msec)
|#rows||table variable||temp table|
I did the tests in batches of 10 (the figures above are averages for individual batches). Each batch basically consisted of a small loop that just inserted rows into, and selected rows from, the temp table/table variable as follows (the 22 in the while condition controls the number of rows to fill the table with; 2^(22-1) = 2097152):
declare @i int
set @i = 1
insert into @mytabvar (blah)
values ('This is a line of text')
while (@i < 22)
insert into @mytabvar (blah)
select blah from @mytabvar
set @i = @i + 1
Anyway, the conclusions that I drew from my tests were that, for all intents and purposes, temp tables & table variables, all things being equal, for smallish datasets were so close to each other in comparative performance that any difference was purely academic. When the dataset grew a bit larger, temp tables started to shine a bit more than table variables and when you add extra DDL, like indexes, constraints, etc., into the mix, temp tables seemed the better choice. But the point remained that they were designed for different purposes and any problem involving temporary data storage should be tested with both scenarios (temp tables & table variables) and the solution that yielded the better performance (assuming it was still functionally correct) should be the preferred solution.
The consultant that told the original poster that temp tables did not write to disk was quite plainly wrong. However, that doesn't mean that his temp tables would have performed worse than the structurally equivalent table variables. In fact they might have performed as well as, if not better than, table variables, especially if there was a significant amount of data residing those temp tables and they were indexed. Only testing the solutions under a realistic load could answer that question.
The take home message from the thread: test, test, test. (And don't believe everything you read - perform your own tests to verify other's conclusions/assertions.)
1. SQL Books Online says that since table variables are not part of the persistent database, they are not impacted by rollbacks. I can only assume they say this because table variables are instantiated in tempdb rather than the persistent (user) database and, as such, when you issue a rollback in the user DB, the transaction log for the user DB is rewound, but the tempdb transaction log does not change as a result of the rollback statement (the tempdb transaction log may have even been truncated by that stage so how could you roll it back?). However, the same would hold true for temp tables so I don't know why Microsoft don't spell that out explicitly in the temp table documentation (at least I haven't read any temp table documentation where that is explicitly stated).
[I've since discovered, through my own testing again, that temp tables and table variables do not behave the same way in rollbacks. Temp table changes will get rolled back, even though they're logged in a different transaction log than the permanent data you're working on. But table variable changes are not affected by roll backs (which we already knew). That makes table variables very handy for creating/storing data that you wish to persist beyond roll backs (like informational log entries for example).]
2. Microsoft also say transactions involving table variables require less logging. I assume they say this because the transactions are shorter. Therefore, transactions stay open for less time and checkpoints can truncate the tempdb log more frequently without being blocked by the open transaction (open transactions cannot be checkpointed and flushed to disk before being committed and log truncations cannot touch the uncommitted transactions so the open transactions can remain in the transaction log for a longer period of time, therefore "more logging"). To me, the explicit LOP_BEGIN_XACT/LOP_COMMIT_XACT pairs for every INSERT, UPDATE or DELETE statement against a table variable means more logging, but perhaps I've just misinterpretted what Microsoft meant by their comment in BOL.