Friday, June 24, 2005

Index Fill Factor & Performance Considerations

A developer colleague (let's call him Pete because that's his name) asked me today about fill factors and the performance considerations behind choosing a fill factor for an index...so I thought, since it was fresh in my mind, I'd blog it. His basic situation was he's developing a middle-tier feature around a new table that has 200,000 rows in it and it took him about 2 minutes to delete those 200,000 rows, which seemed excessive. He suspected it was something to do with the fill factor he was choosing (I don't actually know what fill factor he chose) and so he asked me to explain a little about fill factors.

Well...

The fill factor applied to an index defines how full each leaf level page is when the index is created (or rebuilt with DBCC DBREINDEX(), which, for all intents & purposes, is basically the same as doing a DROP INDEX followed by a CREATE INDEX). If you specify a fill factor on 90 then that means each leaf page in the index will be 90% full of data initially. Fill factors are more relevant with clustered indexes (the actual table data) simply because clustered indexes tend to be much wider than your non-clustered indexes but the same principles apply to both clustered and non-clustered indexes.

So, why leave empty space in an index page? Well, what happens when a page becomes 100% full and SQL Server needs to insert another row in that page? What SQL Server does is take the full page, split it in 2 roughly equally filled pages (i.e 50% full) and then insert the new row into the appropriate of those 2 new pages, both of which should now have enough free space for the new row. This is known as a page split. As you can imagine, page splits increase the I/O activity on an index (allocating new pages, moving data, shuffling pages in the b-tree to put them in the right place, in addition to the actual insert you originally wanted to do) and so hurt performance. So you want to try to minimise page splits.

So does that mean I should just create my indexes with a fill factor of, say, 5%? (This was how the actual conversation with Pete went.) Well, if each page was only 5% full (as opposed to 100% full) then you'd need 20 times as many pages (each of which is 8k) to store the data. So we're talking about a pretty big index which is mostly full of empty space. That's great in terms of inserts & modifications (assuming you've got enough disk space to handle indexes that are suddenly 20 times larger than before) - you wouldn't expect page splits very often. However, if you're retrieving a range of data from that index with a SELECT statement then that data is most likely going to be spread out across 20 times more pages than before. So your range scans are going to have to hit 20 times more pages to return the same amount of data as before, hence 20 times more I/O.

Generally speaking, read performance of an index is directly proportional to the fill factor and write performance is
inversely proportional to the fill factor. Damned if you do, damned if you don't. This is one of the reasons you've got to know the data you're playing with. Do you optimise the index for reads or writes? Generally speaking, once again, an index will be used much more for reads than writes (Microsoft estimates that reads typically outnumber writes by at least a factor of 5 to 10). So, will this index (or the table in general) mostly get used for reads or writes? Also, when new rows get inserted into the index are they going to be allocated at the end of the index (in sequential order, like an IDENTITY column for example) or are they going to be all over the place (like new uniqueidentifier values assigned with NEWID())? The answers to these two questions will be very important in determining what initial fill factor to set when creating your index (or rebuilding it later), assuming you don't want to stick with the server default (set with sp_configure).

Firstly, since indexes are generally read more than written to, you should probably err on the side of larger fill factors (for example, at least greater than 50%). If you're expecting a large percentage of writes on the index then you'll want a lower fill factor to provide some room for future adds. On the other hand if you're not expecting many writes in the index then you should fill the pages more (if you don't change the server default fill factor and you don't specify what fill factor you want when creating an index then SQL Server will fill each page as much as it can, i.e. 100%). That's pretty straight forward.

However, you may also be able to get away with a very high fill factor if new rows will almost always go at the end of the index (not in the middle) and existing rows will rarely, if ever, get changed. This will create a hotspot at the end of the index and page splits will be rare because almost all new rows will go at the end of the index resulting in simple new page allocations rather than page splits. The classic example is a clustered index on an identity column; this is one argument for the surrogate key side of the famous (or should I say infamous?) surrogate key versus natural key debate. But I won't get into that can of worms here.

The basic theory behind fill factors that I've described here should stand you in good steed to come up with general rules of thumb when designing your indexes. But, since it's essentially a trade-off between read performance and write performance, that will really only give you a starting point. From there it's a case of trial and error, bearing in mind how changes in fill factor affect performance. If you're getting excessive page splits then rebuild your index with a smaller fill factor. If I/Os are high during SELECT statements and it's affecting performance then you may be able to rebuild your index with a bigger fill factor. (Fill factors are only a very minor point to consider during query optimisation; good query design and table normalisation will usually play a much bigger role.)

One more thing to think about is that with page splits comes index fragmentation. Fragmentation is really pretty unavoidable but excessive page splits will lead to greater index fragmentation and reduced I/O performance, both read performance and write performance (and I/O is typically the bottleneck in RDBMSs). But index fragmentation is a topic for a whole other blog. And, by the way, I think Pete's slow deletes problem had more to do with waits/locks on the resources he was trying to delete than fill factors...but anyway...

32 Comments:

At 19/5/06 04:26, Anonymous Anonymous said...

Thanks a lot. This topic was not that much clear to me before.

 
At 19/8/06 13:22, Anonymous Anonymous said...

Thanks for sharing, it's a good article and clear article on fill factor. That really helps!

 
At 13/9/06 06:02, Anonymous Anonymous said...

great job pete's friend.

 
At 15/9/06 06:04, Anonymous Anonymous said...

thx, good synopsis, helped me understand this concept better..

 
At 18/10/06 22:36, Anonymous Anonymous said...

nice article on fill factor

 
At 1/12/06 17:54, Anonymous Anonymous said...

Crystal clear explanation on Fill Factor. Gr8 Job

 
At 11/12/06 16:40, Anonymous Anonymous said...

neat and clear article.

 
At 1/3/07 15:28, Anonymous Anonymous said...

Thanks a lot. This is called knowledge imparting. Really appreciate the effort.

 
At 26/3/07 03:24, Blogger Adrian said...

Excellent article. Really cleared me up the topic. Thanks a lot.

 
At 27/3/07 05:04, Anonymous Anonymous said...

What happens if you specify a low fill factor and load an empty table using a non sequential unique clustered index? How are the pages filled and what are the page split consequences? Does the fill factor mean anything?

 
At 27/3/07 10:06, Blogger Mike Hodgson said...

Well, since the fill factor is only relevant when you create (or rebuild) the index, and since the table (and therefore indexes) in your example starts off empty, then the initial low fill factor is irrelevant. As you insert rows in the empty table, the index pages (both leaf & intermediate pages) will fill up and eventually need to split to make room for more rows (regardless of the distribution of data; ie. "non-sequential", unique, etc.).

This initial "load" process is often better done without any indexes on the table. Then, once the initial load is complete you can build your indexes on the table data, specifying the fill factor (ie. how much free space you want to leave on each leaf page going forward from this point). This will result in a faster initial data load (because the server isn't trying to keep indexes up to date at the same time) and also well organised, non-fragmented indexes at the end of the process. The best of both worlds. ;-)

 
At 28/3/07 03:50, Anonymous Anonymous said...

wow.. you guys are smart... just recently started with sql and this article put the end to my doubts about fill factor... thanks for sharing!!!

 
At 13/8/07 06:57, Blogger VenkiPenki said...

Thanks for your article. I understood and followed well to some extent. But,I'm unable to visualize the difference between page splits happens when we specify the fill factor and when we don't specify the fill factor.

Mr.Hodgson, Is there any chance of putting your words into Images?

Thank you.

 
At 7/9/07 03:01, Anonymous Anonymous said...

Best explanation of Fill Factor, & its performance implications that I have come aress

 
At 7/2/08 18:34, Blogger BlackPerl said...

Clear accurate and useful ...thanks for sharing :)

 
At 12/3/08 05:33, Anonymous Anonymous said...

Awesome article.. very well explained

 
At 22/5/08 17:15, Blogger Kunal Shah said...

Very Neatly explained.

Thanks

Kunal

 
At 21/10/08 03:52, Blogger Andrei said...

Thanks! Good job

 
At 6/1/09 17:03, Anonymous Anonymous said...

Thanks! Much appreciated.

 
At 20/1/09 14:53, Anonymous Anonymous said...

Hi, that's great. Thanks

 
At 29/4/09 03:01, Anonymous Anonymous said...

Clear and concise. This helped greatly in my gaining an understanding of these areas. Thanks.

 
At 9/5/09 00:55, Blogger Kueco said...

What is the Pad Index? When should I use it? Thank you very much!

 
At 1/10/09 06:36, Anonymous Anonymous said...

Wonderfull article

 
At 23/10/09 20:20, Blogger Unknown said...

Have got new information about Fill Factor.. Good article..

 
At 24/10/09 12:08, Blogger Arvind said...

Great job man...keep it up...keep sharing

 
At 25/11/09 19:38, Blogger GeneraL aFiFy said...

Thank u mike ...
Great Explaination from a great person

 
At 27/11/09 01:42, Anonymous Hugo said...

Thanks a lot for sharing and doing it well. This post made fill factor be something a lot simpler than how it looks in other websites. This post is perfectly clear.

 
At 4/4/11 12:52, Anonymous Anonymous said...

Thanks that was the best I have seen Fill Factors explained. Cheers have saved me lots of hours :)

 
At 20/6/11 23:25, Anonymous Anonymous said...

Thank you so much for this great article! It really helped me to finally understand this subject.

 
At 21/9/11 21:31, Anonymous Anonymous said...

What is the effect on NonClustered Index if I will change Fill Factor of it?

 
At 12/12/11 23:34, Anonymous Fran Lens said...

Thanks for the article!!Finally I understand Fill Factor

 
At 8/5/12 23:17, Anonymous Anonymous said...

This is really great article and Fill factor is clearly explained.

Thanku so much
Shailesh

 

Post a Comment

<< Home