Index Fill Factor & Performance Considerations
A developer colleague (let's call him Pete because that's his name) asked me today about fill factors and the performance considerations behind choosing a fill factor for an index...so I thought, since it was fresh in my mind, I'd blog it. His basic situation was he's developing a middle-tier feature around a new table that has 200,000 rows in it and it took him about 2 minutes to delete those 200,000 rows, which seemed excessive. He suspected it was something to do with the fill factor he was choosing (I don't actually know what fill factor he chose) and so he asked me to explain a little about fill factors.
The fill factor applied to an index defines how full each leaf level page is when the index is created (or rebuilt with DBCC DBREINDEX(), which, for all intents & purposes, is basically the same as doing a DROP INDEX followed by a CREATE INDEX). If you specify a fill factor on 90 then that means each leaf page in the index will be 90% full of data initially. Fill factors are more relevant with clustered indexes (the actual table data) simply because clustered indexes tend to be much wider than your non-clustered indexes but the same principles apply to both clustered and non-clustered indexes.
So, why leave empty space in an index page? Well, what happens when a page becomes 100% full and SQL Server needs to insert another row in that page? What SQL Server does is take the full page, split it in 2 roughly equally filled pages (i.e 50% full) and then insert the new row into the appropriate of those 2 new pages, both of which should now have enough free space for the new row. This is known as a page split. As you can imagine, page splits increase the I/O activity on an index (allocating new pages, moving data, shuffling pages in the b-tree to put them in the right place, in addition to the actual insert you originally wanted to do) and so hurt performance. So you want to try to minimise page splits.
So does that mean I should just create my indexes with a fill factor of, say, 5%? (This was how the actual conversation with Pete went.) Well, if each page was only 5% full (as opposed to 100% full) then you'd need 20 times as many pages (each of which is 8k) to store the data. So we're talking about a pretty big index which is mostly full of empty space. That's great in terms of inserts & modifications (assuming you've got enough disk space to handle indexes that are suddenly 20 times larger than before) - you wouldn't expect page splits very often. However, if you're retrieving a range of data from that index with a SELECT statement then that data is most likely going to be spread out across 20 times more pages than before. So your range scans are going to have to hit 20 times more pages to return the same amount of data as before, hence 20 times more I/O.
Generally speaking, read performance of an index is directly proportional to the fill factor and write performance is inversely proportional to the fill factor. Damned if you do, damned if you don't. This is one of the reasons you've got to know the data you're playing with. Do you optimise the index for reads or writes? Generally speaking, once again, an index will be used much more for reads than writes (Microsoft estimates that reads typically outnumber writes by at least a factor of 5 to 10). So, will this index (or the table in general) mostly get used for reads or writes? Also, when new rows get inserted into the index are they going to be allocated at the end of the index (in sequential order, like an IDENTITY column for example) or are they going to be all over the place (like new uniqueidentifier values assigned with NEWID())? The answers to these two questions will be very important in determining what initial fill factor to set when creating your index (or rebuilding it later), assuming you don't want to stick with the server default (set with sp_configure).
Firstly, since indexes are generally read more than written to, you should probably err on the side of larger fill factors (for example, at least greater than 50%). If you're expecting a large percentage of writes on the index then you'll want a lower fill factor to provide some room for future adds. On the other hand if you're not expecting many writes in the index then you should fill the pages more (if you don't change the server default fill factor and you don't specify what fill factor you want when creating an index then SQL Server will fill each page as much as it can, i.e. 100%). That's pretty straight forward.
However, you may also be able to get away with a very high fill factor if new rows will almost always go at the end of the index (not in the middle) and existing rows will rarely, if ever, get changed. This will create a hotspot at the end of the index and page splits will be rare because almost all new rows will go at the end of the index resulting in simple new page allocations rather than page splits. The classic example is a clustered index on an identity column; this is one argument for the surrogate key side of the famous (or should I say infamous?) surrogate key versus natural key debate. But I won't get into that can of worms here.
The basic theory behind fill factors that I've described here should stand you in good steed to come up with general rules of thumb when designing your indexes. But, since it's essentially a trade-off between read performance and write performance, that will really only give you a starting point. From there it's a case of trial and error, bearing in mind how changes in fill factor affect performance. If you're getting excessive page splits then rebuild your index with a smaller fill factor. If I/Os are high during SELECT statements and it's affecting performance then you may be able to rebuild your index with a bigger fill factor. (Fill factors are only a very minor point to consider during query optimisation; good query design and table normalisation will usually play a much bigger role.)
One more thing to think about is that with page splits comes index fragmentation. Fragmentation is really pretty unavoidable but excessive page splits will lead to greater index fragmentation and reduced I/O performance, both read performance and write performance (and I/O is typically the bottleneck in RDBMSs). But index fragmentation is a topic for a whole other blog. And, by the way, I think Pete's slow deletes problem had more to do with waits/locks on the resources he was trying to delete than fill factors...but anyway...