Wednesday, September 21, 2005

Temp Tables vs Table Variables


There was a recent newsgroup posting where the poster asked if creating/modifying temp tables incurred physical disk I/O. A consultant had attempted to justify his choice of using temp tables, as opposed to table variables, by declaring that temp tables are purely in-memory structures, incurring no physical disk I/O, and therefore perform as well as table variables. Well, the consultant was wrong but kind of right at the same time (sort of).

It's important to remember that temp tables & table variables are not the same thing with a different scope. They were created for different, but overlapping, purposes. Temp tables originated well before table variables and are essentially for the storage & manipulation of temporal data. Whereas table variables were introduced with SQL Server 2000 and were designed for returning datasets from table-valued functions, that is one or more rows and/or columns from a UDF as opposed to a single row/single column from a scalar-valued function. They are both instantiated in tempdb and are both backed by physical disk (as is every database, system or user, in SQL Server 2000). Changes to both temp tables and table variables are logged in the transaction log1. However, since tempdb always uses the simple recovery model, those transaction log records only last until the next tempdb checkpoint, at which time the tempdb log is truncated.

There are many similarities between temp tables and table variables, but there are also some notable differences. Namely, temp tables can be altered with DDL statements but table variables can't (so you cannot create a nonclustered index on a table variable for example). That makes every table variable a heap, or at best a table with a single, clustered index, and every table variable access a table scan (or clustered index scan, which is essentially the same thing). Temp tables have a looser scope than table variables, but no less well defined. Transactions involving table variables last only for the duration of the DML statement (the table variable change is committed implicitly after each DML statement, eg. four INSERT statements result in four begin/commit pairs in the tempdb transaction log), whereas transactions on temp tables can span multiple DML statements. As a result, transactions on table variables are typically shorter (than transactions involving temp tables) and require less locking2. SQL Server creates statistics for temp tables, so the query optimiser can choose different plans for data involving temp tables. However, SQL Server does not create statistics on columns in table variables (what's the point when the only way to access data in a table variable is though a table scan?). Table variables, when used in stored procedures, are supposed to result in fewer recompiles of those stored procedures. This is probably because the query optimser doesn't have to come up with different plans based on the content of the table variables because the only operations you can perform on table variables are heap/clustered index operations (table scan, clustered index insert, etc.), whereas a temp table may be manipulated through any number of indexes whose density, distribution & selectivity could change from access to access (hence the possibility of different execution plans, therefore more stored proc recompiles).

I've heard it recommended that you should use table variables where ever you can and only use temp tables where the additional schema you can define for them (such as indexes for example) would improve performance. But I've conducted my own tests on this issue and those tests yielded results that indicated that, in general, the performance of temp tables versus table variable, for small datasets, was almost identical. The basic gist was that table variables performed marginally better than temp tables when the number of rows contained within was quite small (< 100,000 rows) but over a rough size temp tables performed better (and the performance gap widened as the size of the table increased). It should be noted, though, that the temp tables in my tests were intentionally heaps, i.e. no clustered index, so as to keep them as close in structure to the table variables as I could. I suspect with a couple indexes on my temp tables, the performance would have improved considerably. That said, these are the figures from my hour or two of testing:

avg batch duration (msec)
#rowstable variabletemp table
51215.114.5
102417.929.4
204829.239.9
409661.871.6
8192127.9145.7
16384254.2291.7
32768544.6583.0
655361117.41103.4
1310722174.32224.2
2621444377.14553.2
5242888852.18402.0
104857617846.616188.6
209715236335.532739.4


I did the tests in batches of 10 (the figures above are averages for individual batches). Each batch basically consisted of a small loop that just inserted rows into, and selected rows from, the temp table/table variable as follows (the 22 in the while condition controls the number of rows to fill the table with; 2^(22-1) = 2097152):


declare @i int
set @i = 1

insert into @mytabvar (blah)
values ('This is a line of text')

while (@i < 22)
begin
insert into @mytabvar (blah)
select blah from @mytabvar
set @i = @i + 1
end


Anyway, the conclusions that I drew from my tests were that, for all intents and purposes, temp tables & table variables, all things being equal, for smallish datasets were so close to each other in comparative performance that any difference was purely academic. When the dataset grew a bit larger, temp tables started to shine a bit more than table variables and when you add extra DDL, like indexes, constraints, etc., into the mix, temp tables seemed the better choice. But the point remained that they were designed for different purposes and any problem involving temporary data storage should be tested with both scenarios (temp tables & table variables) and the solution that yielded the better performance (assuming it was still functionally correct) should be the preferred solution.

The consultant that told the original poster that temp tables did not write to disk was quite plainly wrong. However, that doesn't mean that his temp tables would have performed worse than the structurally equivalent table variables. In fact they might have performed as well as, if not better than, table variables, especially if there was a significant amount of data residing those temp tables and they were indexed. Only testing the solutions under a realistic load could answer that question.

The take home message from the thread: test, test, test. (And don't believe everything you read - perform your own tests to verify other's conclusions/assertions.)



1. SQL Books Online says that since table variables are not part of the persistent database, they are not impacted by rollbacks. I can only assume they say this because table variables are instantiated in tempdb rather than the persistent (user) database and, as such, when you issue a rollback in the user DB, the transaction log for the user DB is rewound, but the tempdb transaction log does not change as a result of the rollback statement (the tempdb transaction log may have even been truncated by that stage so how could you roll it back?). However, the same would hold true for temp tables so I don't know why Microsoft don't spell that out explicitly in the temp table documentation (at least I haven't read any temp table documentation where that is explicitly stated).

[I've since discovered, through my own testing again, that temp tables and table variables do not behave the same way in rollbacks. Temp table changes will get rolled back, even though they're logged in a different transaction log than the permanent data you're working on. But table variable changes are not affected by roll backs (which we already knew). That makes table variables very handy for creating/storing data that you wish to persist beyond roll backs (like informational log entries for example).]

2. Microsoft also say transactions involving table variables require less logging. I assume they say this because the transactions are shorter. Therefore, transactions stay open for less time and checkpoints can truncate the tempdb log more frequently without being blocked by the open transaction (open transactions cannot be checkpointed and flushed to disk before being committed and log truncations cannot touch the uncommitted transactions so the open transactions can remain in the transaction log for a longer period of time, therefore "more logging"). To me, the explicit LOP_BEGIN_XACT/LOP_COMMIT_XACT pairs for every INSERT, UPDATE or DELETE statement against a table variable means more logging, but perhaps I've just misinterpretted what Microsoft meant by their comment in BOL.

28 Comments:

At 29/9/05 05:46, Blogger WesleyB said...

Great job!
I've been looking for some time to test this but you just saved me some valuable time :-)

 
At 20/10/05 13:13, Blogger Mike Hodgson said...

Glad I could help.

 
At 23/10/05 21:28, Blogger Nick HaC said...

cool blog... keep up the indepth SQL info

 
At 15/5/06 04:35, Anonymous Patrick said...

Its a little bit old but this link has some good information:

http://support.microsoft.com/kb/305977/en-us

 
At 15/5/06 09:47, Blogger Mike Hodgson said...

I have read that KB article (many times) and even referenced it at the start of the 4th paragraph.

 
At 17/9/06 17:44, Anonymous Anonymous said...

Really inforamtive !

 
At 22/11/06 22:13, Anonymous Anonymous said...

Gud one dude.

 
At 8/12/06 03:52, Anonymous Chris said...

Good article. I can't agree with you more about testing. I've been using temp tables and table variables for years and was told that table variables were always faster. I figured that at some point a temp table would be better, well that day has come. I was just testing the two options on a nasty query returning 150,00+ rows and 65 or so columns from nearly as many tables. The temp table takes about 2 1/2 min., switching to the table variable, I stopped it at 10 min. ... nuf said

 
At 12/7/07 07:04, Anonymous Michelle U. said...

Very informative article. Thank you for your efforts and knowledge.

 
At 20/7/07 06:03, Anonymous sunil said...

Very informative. Many thanks.

 
At 24/7/07 21:12, Blogger Pramod Deshpandey said...

very much informative. Just had one doubt, I think we can do a select * into another_table from temp_table but we can't do the same with table variable. can you please confirm me?

 
At 13/8/07 16:44, Blogger Mike Hodgson said...

Sure you can. Try this small test script:

declare @t table (foo int, bar varchar(10));

insert into @t (foo, bar) values (1, 'a');
insert into @t (foo, bar) values (2, 'b');
insert into @t (foo, bar) values (3, 'c');
insert into @t (foo, bar) values (4, 'd');
insert into @t (foo, bar) values (5, 'e');

select * into #x from @t;

select * from @t;
select * from #x;

drop table #x;


This is what it came up with on my test SQL 2005 box (Ent Edition):


(1 row(s) affected)

(1 row(s) affected)

(1 row(s) affected)

(1 row(s) affected)

(1 row(s) affected)

(5 row(s) affected)
foo bar
----------- ----------
1 a
2 b
3 c
4 d
5 e

(5 row(s) affected)

foo bar
----------- ----------
1 a
2 b
3 c
4 d
5 e

(5 row(s) affected)


This is doing a "SELECT INTO" from the table variable. I got the exact same response from SQL 2005 Dev Edition & SQL 2000 Ent Edition boxes.

(Sorry for the delay in response; sometimes I don't look at these blog comments for a while.)

 
At 18/8/07 06:33, Anonymous Anonymous said...

Great job! Thanks for the post!

 
At 20/8/07 23:44, Anonymous Anonymous said...

Good article, but this statement :"Namely, temp tables can be altered with DDL statements but table variables can't (so you cannot create a nonclustered index on a table variable for example)"

is not correct, I don't think, at least not in SQLServer2005.
I created a UNIQUE CLUSTERED index on one column of a table variable, did a SELECT *, put the index on a different column, did another SELECT *. In each instance, the resulset was returned ordered by the column which had the clustered index.

- Jim

 
At 21/8/07 13:10, Blogger Mike Hodgson said...

Thanks for the comment Jim.

That's very interesting, because SQL Books Online for SQL 2005 (regarding table variables) still says "Indexes cannot be created explicitly on table variables, and no statistics are kept on table variables" (about half way through the Remarks section of that referenced page).

Additionally, I was pretty sure this was still not possible so I tried writing a quick T-SQL script:

--********************************
declare @t table (id int, foo varchar(10), bar varchar(10));

insert into @t (id, foo, bar) values (1, 'a', 'z');
insert into @t (id, foo, bar) values (2, 'b', 'y');
insert into @t (id, foo, bar) values (3, 'c', 'x');

create unique clustered index CIX_t_id on @t (id);
create nonclustered index IX_t_foo on @t (foo);

select * from @t;
--********************************

and, as expected, I got a syntax error on both the CREATE INDEX statements (run on a SQL 2005 Ent Edition x64 box (build 9.00.2047.00)).

Could you please show us a quick T-SQL script to demonstrate how you managed to get this syntax past the SQL parser? It would be an interesting loophole to know (and to pass back to the SQL dev team @ Microsoft).

 
At 29/8/07 04:22, Anonymous Anonymous said...

You need to create the index on the column while U are creating it..
For Example:

-- NOTE THE Primary Key below
declare @t table (id int Primary Key, foo varchar(10), bar varchar(10));

insert into @t (id, foo, bar) values (1, 'a', 'z');
insert into @t (id, foo, bar) values (2, 'b', 'y');
insert into @t (id, foo, bar) values (3, 'c', 'x');

select * from @t;

 
At 6/9/07 21:27, Anonymous Anonymous said...

Hi Jim,

This is implicitly creating a clustered index via the PRIMARY KEY statement, which is allowed. What you can't so is explicitly create on using CREATE INDEX blah.

- Dave

 
At 6/9/07 21:29, Anonymous Anonymous said...

Sorry about the dodgy spelling there....

So the problem is that you can only have a clustered primary key index, no non primary key clustered or unclustered ones.

- Dave

 
At 19/9/07 22:50, Blogger CoryC said...

I'm concerned with your testing methodology. Using only batches of ten and being a single user it is impossible to determine how much of an impact locking plays in the performance equation.

Also, concerning indexes, just because you add a couple of indexes to a table doesn’t necessarily result in improved performance. I would guess that unless the sample size was notably larger the indexes would prove to have a negative impact on performance.

Thanks for the information.

 
At 29/9/07 04:56, Anonymous Anonymous said...

Very good article. I got asked that question in an interview the other day for a SQL Developer position, and didn't have a good answer besides how I use the two objects differently. Mainly the Select * Into #temp from nontemp to create the table for me.

 
At 1/4/08 05:47, Anonymous Anonymous said...

I have learned to perfer Table Variables because

1. I find them easier to debug. If it blows up in the middle of running a query, I don't need to drop the table

2. It forces me to explicitly create the table, which I have been lazy when doing this with temp tables

 
At 10/7/08 03:18, Anonymous Mike Good said...

I realize this is a 2yr old article, but still good stuff and I bet this page will continue to get a lot of hits.

1. You can have 2 indexes on table vars: a primary key and a unique key. Simple test will prove this if you look at query plan.

2. I feel the biggest difference that affects performance is that table vars will NOT participate in parallel operations. Take a query that involves a #temp table and a large perm table where the optimizer uses parallelism (little yellow circles in the query plan icons). Convert the #temp table to a @table var and you'll get basically the same plan but the little parallelism indicators will be gone.

I know there are lot of folks out there who hate parallelism, but it was put there for a reason, and I think this is the single biggest reason you might want to stick with #temp tables in some cases when working with large data. As others have said, test test test.

PS - When comparing these alternatives, I'd use stopwatch as the primary metric, not optimizer's "cost", which I believe is often very misleading.

 
At 7/11/08 01:02, Anonymous Anonymous said...

This is so good I was abt to go for table variable I needed add indexes on all the columns glad that I have searched for this

 
At 20/12/08 02:45, Anonymous Anonymous said...

Nice post, but I think you missed the point of an earlier response. You CANNOT do:
SELECT * INTO @table
like you CAN do:
SELECT * INTO #table
without defining the table schema first. And that's the real kicker isn't it? If I want to pull in all fields for a table based on some criterea it's very convenient to NOT have to define the table schema - SELECT * INTO #table does it for me. In your example you defined the @table schema. So, when someone goes in and changes a VARCHAR(10) field to VARCHAR(100) on the table being queried - your table variable is now out of date whereas your temp table would be fine.
A minor consideration, but something that rears it head often enough.
Pat Reddy

 
At 24/12/08 23:21, Anonymous Anonymous said...

Great Post! I am a big fan of table variables and have found that a careful primary key increases performance massively (greater than 10 times with temp tables with 100K + rows). Testing against temp tables should have primary keys.
But it's also work noting that sprocs can output to table variables using INSERT INTO #tbl EXEC dbo.mySproc - which is really useful for comparmentalising code and ensuring more granular query plans.

Rory

 
At 24/12/08 23:23, Anonymous Anonymous said...

Sorry - previous post should have said sprocs can oputput to temp tables. oops.
Rory

 
At 16/4/09 04:06, Anonymous Anonymous said...

"SQL Books Online says that since table variables are not part of the persistent database, they are not impacted by rollbacks. I can only assume they say this because table variables are instantiated in tempdb rather than the persistent (user) database and, as such, when you issue a rollback in the user DB, the transaction log for the user DB is rewound, but the tempdb transaction log does not change as a result of the rollback statement (the tempdb transaction log may have even been truncated by that stage so how could you roll it back?). However, the same would hold true for temp tables so I don't know why Microsoft don't spell that out explicitly in the temp table documentation (at least I haven't read any temp table documentation where that is explicitly stated).
"
Just run the query ,it will show you difference between them in inside a transaction.

create table #tab (col1 int )
declare @tab table (col1 int )
begin tran
insert #tab select 1
insert @tab select 1
rollback tran

select * from #tab --0 rows return
select * from @tab --1 rows return

 
At 18/6/09 11:36, Anonymous Anonymous said...

You cannot use syntax 'INSERT INTO @TableVariable EXEC spStoredProc' with a table variable. Webpedia

 

Post a Comment

<< Home