Temp Tables vs Table Variables
There was a recent newsgroup posting where the poster asked if creating/modifying temp tables incurred physical disk I/O. A consultant had attempted to justify his choice of using temp tables, as opposed to table variables, by declaring that temp tables are purely in-memory structures, incurring no physical disk I/O, and therefore perform as well as table variables. Well, the consultant was wrong but kind of right at the same time (sort of).
It's important to remember that temp tables & table variables are not the same thing with a different scope. They were created for different, but overlapping, purposes. Temp tables originated well before table variables and are essentially for the storage & manipulation of temporal data. Whereas table variables were introduced with SQL Server 2000 and were designed for returning datasets from table-valued functions, that is one or more rows and/or columns from a UDF as opposed to a single row/single column from a scalar-valued function. They are both instantiated in tempdb and are both backed by physical disk (as is every database, system or user, in SQL Server 2000). Changes to both temp tables and table variables are logged in the transaction log1. However, since tempdb always uses the simple recovery model, those transaction log records only last until the next tempdb checkpoint, at which time the tempdb log is truncated.
There are many similarities between temp tables and table variables, but there are also some notable differences. Namely, temp tables can be altered with DDL statements but table variables can't (so you cannot create a nonclustered index on a table variable for example). That makes every table variable a heap, or at best a table with a single, clustered index, and every table variable access a table scan (or clustered index scan, which is essentially the same thing). Temp tables have a looser scope than table variables, but no less well defined. Transactions involving table variables last only for the duration of the DML statement (the table variable change is committed implicitly after each DML statement, eg. four INSERT statements result in four begin/commit pairs in the tempdb transaction log), whereas transactions on temp tables can span multiple DML statements. As a result, transactions on table variables are typically shorter (than transactions involving temp tables) and require less locking2. SQL Server creates statistics for temp tables, so the query optimiser can choose different plans for data involving temp tables. However, SQL Server does not create statistics on columns in table variables (what's the point when the only way to access data in a table variable is though a table scan?). Table variables, when used in stored procedures, are supposed to result in fewer recompiles of those stored procedures. This is probably because the query optimser doesn't have to come up with different plans based on the content of the table variables because the only operations you can perform on table variables are heap/clustered index operations (table scan, clustered index insert, etc.), whereas a temp table may be manipulated through any number of indexes whose density, distribution & selectivity could change from access to access (hence the possibility of different execution plans, therefore more stored proc recompiles).
I've heard it recommended that you should use table variables where ever you can and only use temp tables where the additional schema you can define for them (such as indexes for example) would improve performance. But I've conducted my own tests on this issue and those tests yielded results that indicated that, in general, the performance of temp tables versus table variable, for small datasets, was almost identical. The basic gist was that table variables performed marginally better than temp tables when the number of rows contained within was quite small (< 100,000 rows) but over a rough size temp tables performed better (and the performance gap widened as the size of the table increased). It should be noted, though, that the temp tables in my tests were intentionally heaps, i.e. no clustered index, so as to keep them as close in structure to the table variables as I could. I suspect with a couple indexes on my temp tables, the performance would have improved considerably. That said, these are the figures from my hour or two of testing:
avg batch duration (msec)
#rows | table variable | temp table |
---|---|---|
512 | 15.1 | 14.5 |
1024 | 17.9 | 29.4 |
2048 | 29.2 | 39.9 |
4096 | 61.8 | 71.6 |
8192 | 127.9 | 145.7 |
16384 | 254.2 | 291.7 |
32768 | 544.6 | 583.0 |
65536 | 1117.4 | 1103.4 |
131072 | 2174.3 | 2224.2 |
262144 | 4377.1 | 4553.2 |
524288 | 8852.1 | 8402.0 |
1048576 | 17846.6 | 16188.6 |
2097152 | 36335.5 | 32739.4 |
I did the tests in batches of 10 (the figures above are averages for individual batches). Each batch basically consisted of a small loop that just inserted rows into, and selected rows from, the temp table/table variable as follows (the 22 in the while condition controls the number of rows to fill the table with; 2^(22-1) = 2097152):
declare @i int
set @i = 1
insert into @mytabvar (blah)
values ('This is a line of text')
while (@i < 22)
begin
insert into @mytabvar (blah)
select blah from @mytabvar
set @i = @i + 1
end
Anyway, the conclusions that I drew from my tests were that, for all intents and purposes, temp tables & table variables, all things being equal, for smallish datasets were so close to each other in comparative performance that any difference was purely academic. When the dataset grew a bit larger, temp tables started to shine a bit more than table variables and when you add extra DDL, like indexes, constraints, etc., into the mix, temp tables seemed the better choice. But the point remained that they were designed for different purposes and any problem involving temporary data storage should be tested with both scenarios (temp tables & table variables) and the solution that yielded the better performance (assuming it was still functionally correct) should be the preferred solution.
The consultant that told the original poster that temp tables did not write to disk was quite plainly wrong. However, that doesn't mean that his temp tables would have performed worse than the structurally equivalent table variables. In fact they might have performed as well as, if not better than, table variables, especially if there was a significant amount of data residing those temp tables and they were indexed. Only testing the solutions under a realistic load could answer that question.
The take home message from the thread: test, test, test. (And don't believe everything you read - perform your own tests to verify other's conclusions/assertions.)
1. SQL Books Online says that since table variables are not part of the persistent database, they are not impacted by rollbacks. I can only assume they say this because table variables are instantiated in tempdb rather than the persistent (user) database and, as such, when you issue a rollback in the user DB, the transaction log for the user DB is rewound, but the tempdb transaction log does not change as a result of the rollback statement (the tempdb transaction log may have even been truncated by that stage so how could you roll it back?). However, the same would hold true for temp tables so I don't know why Microsoft don't spell that out explicitly in the temp table documentation (at least I haven't read any temp table documentation where that is explicitly stated).
[I've since discovered, through my own testing again, that temp tables and table variables do not behave the same way in rollbacks. Temp table changes will get rolled back, even though they're logged in a different transaction log than the permanent data you're working on. But table variable changes are not affected by roll backs (which we already knew). That makes table variables very handy for creating/storing data that you wish to persist beyond roll backs (like informational log entries for example).]
2. Microsoft also say transactions involving table variables require less logging. I assume they say this because the transactions are shorter. Therefore, transactions stay open for less time and checkpoints can truncate the tempdb log more frequently without being blocked by the open transaction (open transactions cannot be checkpointed and flushed to disk before being committed and log truncations cannot touch the uncommitted transactions so the open transactions can remain in the transaction log for a longer period of time, therefore "more logging"). To me, the explicit LOP_BEGIN_XACT/LOP_COMMIT_XACT pairs for every INSERT, UPDATE or DELETE statement against a table variable means more logging, but perhaps I've just misinterpretted what Microsoft meant by their comment in BOL.
39 Comments:
Great job!
I've been looking for some time to test this but you just saved me some valuable time :-)
Glad I could help.
cool blog... keep up the indepth SQL info
Its a little bit old but this link has some good information:
http://support.microsoft.com/kb/305977/en-us
I have read that KB article (many times) and even referenced it at the start of the 4th paragraph.
Really inforamtive !
Gud one dude.
Good article. I can't agree with you more about testing. I've been using temp tables and table variables for years and was told that table variables were always faster. I figured that at some point a temp table would be better, well that day has come. I was just testing the two options on a nasty query returning 150,00+ rows and 65 or so columns from nearly as many tables. The temp table takes about 2 1/2 min., switching to the table variable, I stopped it at 10 min. ... nuf said
Very informative article. Thank you for your efforts and knowledge.
Very informative. Many thanks.
very much informative. Just had one doubt, I think we can do a select * into another_table from temp_table but we can't do the same with table variable. can you please confirm me?
Sure you can. Try this small test script:
declare @t table (foo int, bar varchar(10));
insert into @t (foo, bar) values (1, 'a');
insert into @t (foo, bar) values (2, 'b');
insert into @t (foo, bar) values (3, 'c');
insert into @t (foo, bar) values (4, 'd');
insert into @t (foo, bar) values (5, 'e');
select * into #x from @t;
select * from @t;
select * from #x;
drop table #x;
This is what it came up with on my test SQL 2005 box (Ent Edition):
(1 row(s) affected)
(1 row(s) affected)
(1 row(s) affected)
(1 row(s) affected)
(1 row(s) affected)
(5 row(s) affected)
foo bar
----------- ----------
1 a
2 b
3 c
4 d
5 e
(5 row(s) affected)
foo bar
----------- ----------
1 a
2 b
3 c
4 d
5 e
(5 row(s) affected)
This is doing a "SELECT INTO" from the table variable. I got the exact same response from SQL 2005 Dev Edition & SQL 2000 Ent Edition boxes.
(Sorry for the delay in response; sometimes I don't look at these blog comments for a while.)
Great job! Thanks for the post!
Good article, but this statement :"Namely, temp tables can be altered with DDL statements but table variables can't (so you cannot create a nonclustered index on a table variable for example)"
is not correct, I don't think, at least not in SQLServer2005.
I created a UNIQUE CLUSTERED index on one column of a table variable, did a SELECT *, put the index on a different column, did another SELECT *. In each instance, the resulset was returned ordered by the column which had the clustered index.
- Jim
Thanks for the comment Jim.
That's very interesting, because SQL Books Online for SQL 2005 (regarding table variables) still says "Indexes cannot be created explicitly on table variables, and no statistics are kept on table variables" (about half way through the Remarks section of that referenced page).
Additionally, I was pretty sure this was still not possible so I tried writing a quick T-SQL script:
--********************************
declare @t table (id int, foo varchar(10), bar varchar(10));
insert into @t (id, foo, bar) values (1, 'a', 'z');
insert into @t (id, foo, bar) values (2, 'b', 'y');
insert into @t (id, foo, bar) values (3, 'c', 'x');
create unique clustered index CIX_t_id on @t (id);
create nonclustered index IX_t_foo on @t (foo);
select * from @t;
--********************************
and, as expected, I got a syntax error on both the CREATE INDEX statements (run on a SQL 2005 Ent Edition x64 box (build 9.00.2047.00)).
Could you please show us a quick T-SQL script to demonstrate how you managed to get this syntax past the SQL parser? It would be an interesting loophole to know (and to pass back to the SQL dev team @ Microsoft).
You need to create the index on the column while U are creating it..
For Example:
-- NOTE THE Primary Key below
declare @t table (id int Primary Key, foo varchar(10), bar varchar(10));
insert into @t (id, foo, bar) values (1, 'a', 'z');
insert into @t (id, foo, bar) values (2, 'b', 'y');
insert into @t (id, foo, bar) values (3, 'c', 'x');
select * from @t;
Hi Jim,
This is implicitly creating a clustered index via the PRIMARY KEY statement, which is allowed. What you can't so is explicitly create on using CREATE INDEX blah.
- Dave
Sorry about the dodgy spelling there....
So the problem is that you can only have a clustered primary key index, no non primary key clustered or unclustered ones.
- Dave
I'm concerned with your testing methodology. Using only batches of ten and being a single user it is impossible to determine how much of an impact locking plays in the performance equation.
Also, concerning indexes, just because you add a couple of indexes to a table doesn’t necessarily result in improved performance. I would guess that unless the sample size was notably larger the indexes would prove to have a negative impact on performance.
Thanks for the information.
Very good article. I got asked that question in an interview the other day for a SQL Developer position, and didn't have a good answer besides how I use the two objects differently. Mainly the Select * Into #temp from nontemp to create the table for me.
I have learned to perfer Table Variables because
1. I find them easier to debug. If it blows up in the middle of running a query, I don't need to drop the table
2. It forces me to explicitly create the table, which I have been lazy when doing this with temp tables
I realize this is a 2yr old article, but still good stuff and I bet this page will continue to get a lot of hits.
1. You can have 2 indexes on table vars: a primary key and a unique key. Simple test will prove this if you look at query plan.
2. I feel the biggest difference that affects performance is that table vars will NOT participate in parallel operations. Take a query that involves a #temp table and a large perm table where the optimizer uses parallelism (little yellow circles in the query plan icons). Convert the #temp table to a @table var and you'll get basically the same plan but the little parallelism indicators will be gone.
I know there are lot of folks out there who hate parallelism, but it was put there for a reason, and I think this is the single biggest reason you might want to stick with #temp tables in some cases when working with large data. As others have said, test test test.
PS - When comparing these alternatives, I'd use stopwatch as the primary metric, not optimizer's "cost", which I believe is often very misleading.
This is so good I was abt to go for table variable I needed add indexes on all the columns glad that I have searched for this
Nice post, but I think you missed the point of an earlier response. You CANNOT do:
SELECT * INTO @table
like you CAN do:
SELECT * INTO #table
without defining the table schema first. And that's the real kicker isn't it? If I want to pull in all fields for a table based on some criterea it's very convenient to NOT have to define the table schema - SELECT * INTO #table does it for me. In your example you defined the @table schema. So, when someone goes in and changes a VARCHAR(10) field to VARCHAR(100) on the table being queried - your table variable is now out of date whereas your temp table would be fine.
A minor consideration, but something that rears it head often enough.
Pat Reddy
Great Post! I am a big fan of table variables and have found that a careful primary key increases performance massively (greater than 10 times with temp tables with 100K + rows). Testing against temp tables should have primary keys.
But it's also work noting that sprocs can output to table variables using INSERT INTO #tbl EXEC dbo.mySproc - which is really useful for comparmentalising code and ensuring more granular query plans.
Rory
Sorry - previous post should have said sprocs can oputput to temp tables. oops.
Rory
"SQL Books Online says that since table variables are not part of the persistent database, they are not impacted by rollbacks. I can only assume they say this because table variables are instantiated in tempdb rather than the persistent (user) database and, as such, when you issue a rollback in the user DB, the transaction log for the user DB is rewound, but the tempdb transaction log does not change as a result of the rollback statement (the tempdb transaction log may have even been truncated by that stage so how could you roll it back?). However, the same would hold true for temp tables so I don't know why Microsoft don't spell that out explicitly in the temp table documentation (at least I haven't read any temp table documentation where that is explicitly stated).
"
Just run the query ,it will show you difference between them in inside a transaction.
create table #tab (col1 int )
declare @tab table (col1 int )
begin tran
insert #tab select 1
insert @tab select 1
rollback tran
select * from #tab --0 rows return
select * from @tab --1 rows return
You cannot use syntax 'INSERT INTO @TableVariable EXEC spStoredProc' with a table variable. Webpedia
I use temp table a lot and love it. I had tried table variable last week but I had an issue for "dynamic" sentense like below
Can anyone help me.
Thank you
--code
DECLARE @mySQL Varchar(1000)
DECLARE @myLog Table(seq_no Int,
table_name nVarchar(50),
field_name nVarchar(25),
parent_name nVarchar(25),
parent_field nVarchar(25),
total Int);
INSERT INTO @myLog SELECT *,-1 FROM temp_log
SET @mySQL='select * from @myLog'
Exec (@mySQL)
--error
Msg 137, Level 15, State 2, Line 1
Must declare the variable '@myLog'.
great article
I came across a strang scenario, which is as below:
One of the stored procedure was using table variable and referenced almost 20 times on the combination of three columns. In this, it is processing huge data and performance was not good.
In a hope to improve the performance, i converted the table variable to temp table and created a non clustered index on it.
after this when i checked the execution plan, i was surprised as
- some of the clustered index seek operations converted into clustered index scan
- Number of Hash Match operations got increased from 5 to 10
- however table scans were reduced.
I am just struggling to know why and how it happened just changing from table variable to temp table.
if you get sometime, probably you can help me in understanding the impact of converting table variable to temp table.
Thanks in advance
I had someone dispute your article stating,"Running the same tests on a server that is constantly being hit for disk I/O in a production environment would yield surprisingly different results."
Could you clarify and tell me if this is true? It seems to me that this would impact both temp tables and table variables in a similar way.
good work mate NAPLAN
You cannot use a tbl variable in dynamic sql just like u cannot use a normal variable. you might want to use sp_executesql and pass the table variable as a parameter or declare the table inside the string
what are the performance stats related to disk I/O as the temp tables are stored on disk and table variables are stored partly on memory and party on disk
"...indexes cannot be created on table variables..."
Please refer Tom Moreau's article on MS Technet wherein he does create indexes and constraints on a table variable:
http://technet.microsoft.com/en-us/library/aa175774(SQL.80).aspx
Excellent post and discussion. Thanks you all.
Very Useful post. My own experience also says the same.
I have observed that upto a 1000 records, both table variable & temp tables perform equally well. However, beyond thousand or so rows, Table variable performance (for joins) drops drastically....unless there is a primary key clustered index on Table variable (which is not always possible)
Actually you can declare either a clustered or nonclustered primary key on a table variable. The following example will perform a Clustered Index Seek.
use msdb;
declare @t table
(
primary key clustered ( id ),
id int not null,
name varchar(180) not null
);
insert into @t
( id, name )
select row_number() over (order by name), name
from sys.objects;
select * from @t where (id between 100 and 200);
However, if you declare the table using the a nonclustered key, it will perform a Table Scan.
declare @t table
(
primary key nonclustered ( id ),
id int not null,
name varchar(180) not null
);
By default, the key will be clustered, and I can't think of a benefit for the nonclustered key in this cimcumstance, but it's there.
My testing:
I'm optimalizing our system for huge amount of data and testing temp table vs table variable for caching some problem views before they will be joined with another views. Optimized SQL query have more than 400 rows of code.
Results:
Original (unoptimized) query runs ~51 secs. When I use temp table to cache problem view which is severaly joined, query run ~5 sec. When I use table variable, query run ~93 sec. I use no index on temp table.
John
Post a Comment
<< Home