Very large table performance – basic data capture

Every once in a while I see a post on the forums that is similar to the one below:-

Hi

Can anybody provide tips on how to improve performance on a very large table, the table has 20 million rows in it.

Thanks

Most of the first 10 replies are basically just people asking for more info. So I wrote this script to help both the OP and the People providing answers with some context. The number of rows is not an accurate indicator of the size of the table and the possible types of performance issue that might be encountered on it.

/*
    This script retrieves detailed metadata and statistics about a specified table and its indexes in SQL Server.
    It is intended for performance analysis, index fragmentation review, and understanding table storage characteristics.

    Parameters:
        @TableName - The name of the table to analyze.

    Output Columns:
        - Verison: SQL Server version.
        - tbl_name: Name of the table.
        - ix_type: Type of index (HEAP, CLUSTERED, NONCLUSTERED, etc.).
        - object_id: Object ID of the table.
        - #_of_columns: Number of columns in the table.
        - partition_id: Partition ID (to check if the table is partitioned).
        - #_of_rows: Number of rows in the partition.
        - used_page_count: Number of pages used by the partition.
        - data_ix_used_size_in_kb: Size of the index data in KB.
        - in_vs_lob_vs_overflow_page_count: Page counts for in-row, LOB, and row-overflow data.
        - ixname: Name of the index.
        - alloc_unit_type_desc: Allocation unit type description.
        - index_level: Level of the index (0 = leaf, higher = internal/root).
        - avg_fragmentation_in_percent: Average fragmentation percentage of the index.
        - Statsinfo: XML with statistics properties (rows_sampled, rows, create_date, steps, modification_counter).

    Joins and Data Sources:
        - sys.tables: Table metadata.
        - sys.partitions: Partition information.
        - sys.dm_db_partition_stats: Partition statistics.
        - sys.indexes: Index metadata.
        - sys.dm_db_index_physical_stats: Index fragmentation and physical stats.
        - sys.stats: Statistics metadata.
        - sys.dm_db_stats_properties: Detailed statistics properties.

    Usage:
        - Change the value of @TableName to the table you want to analyze.
        - Run the script to get a comprehensive overview of the table's structure, index fragmentation, and statistics health.

    Notes:
        - Useful for identifying performance issues related to index fragmentation or outdated statistics.
        - Helps in capacity planning by showing storage usage and row distribution.
*/
DECLARE @TableName VARCHAR(50) = 'FactAirline'

SELECT DISTINCT @@version AS Verison
    ,t.name AS 'tbl_name'
    ,-- table name for obvious reasons
    'ix_type' = i.type_desc
    ,-- the format of the data stored in the table heap vs Clustered vs NC 
    t.object_id
    ,-- objectid for the table in case we need it later 
    max_column_id_used AS '#_of_columns'
    ,-- number of columns to figure out if we are dealing with a wide table ( fact ) vs a deep table ( oltp tran table) 
    ps.partition_id
    ,-- partition id to know if the table is already partitioned and the indexes are aligned 
    p.rows AS '#_of_rows'
    ,-- number of rows coz this is subjective by user 
    ps.used_page_count
    ,-- actual number of pages used to figure out number of rows/page and understand the size of the table ( 1 Million row can occupy less than 10 pages of more than 100 pages) 
    ps.in_row_used_page_count * 8 AS 'data_ix_used_size_in_kb'
    ,-- size of the object a true measure of a big table, 
    cast(ps.in_row_used_page_count AS VARCHAR(20)) + ' | ' + cast(ps.lob_used_page_count AS VARCHAR(20)) + ' | ' + cast(ps.row_overflow_used_page_count AS VARCHAR(20)) AS in_vs_lob_vs_overflow_page_count
    ,-- number of pages in each category ( helps to identify if the data is traditional rows or lob data) 
    i.name AS 'ixname'
    ,ip.alloc_unit_type_desc
    ,-- describes the index allocation type being analyzed 
    ip.index_level
    ,-- the level at which fragmentation occurs , 	0= leaf , highest number is the root. 
    ip.avg_fragmentation_in_percent
    ,-- index fragementation level to identify if its the index thats the cause for the performance issue 
    stp.Statsinfo -- stats info to identity if the stats are up to date or else it might be the reason for the issue 
FROM sys.tables t
LEFT OUTER JOIN sys.partitions p ON t.object_id = p.object_id
LEFT OUTER JOIN sys.dm_db_partition_stats ps ON p.partition_id = ps.partition_id
LEFT OUTER JOIN sys.indexes i ON ps.index_id = i.index_id
    AND ps.object_id = i.object_id
CROSS APPLY (
    SELECT *
    FROM sys.dm_db_index_physical_stats(db_id(), t.object_id, ps.index_id, p.partition_number, 'DETAILED')
    ) ip
LEFT OUTER JOIN sys.stats st ON st.object_id = t.object_id
    AND st.name LIKE '%' + i.name + '%'
CROSS APPLY (
    SELECT (
            SELECT rows_sampled
                ,rows
                ,create_date
                ,steps
                ,modification_counter
            FROM sys.dm_db_stats_properties(t.object_id, st.stats_id) dd
            FOR XML PATH('StatsInfo')
                ,ROOT('Root')
            ) AS Statsinfo
    ) stp
WHERE t.name = @tableName
ORDER BY ix_type
    ,i.name

Details of why each column was included and how to read it are mentioned below.

Version: – This column indicates the version of SQL, Edition of SQL, Architecture of SQL, Service Pack of SQL and Level of OS. This is probably the most important column since different OS and SQL Combinations have different features and memory and CPU capacity so it will help eliminate solutions that are not relevant e.g. partitioning on Standard edition etc.

Tbl_name:- The Name of the table, useful if meaningful naming conventions are used e.g. Fact_internetSale, CustomerProfiles etc. This can be used to infer or make certain assumptions on the nature of the table.

Ix_type:- Lets you know if the table has an index, if so what type. Pay attention to the type since Columnstore indexes etc. do not display all the info and are treated differently. There will be one row per index

Object_id:- Just the id of the table for future reference

#_of_Columns:- Indicate if we are dealing with a wide table (typical of a DW) or a narrow table (typical of an OLTP system).

Partition_id:- If there are multiple values for the ix_type it indicate the table is partitioned and if the index is aligned, helps avoid suggestions that are already implemented.

#_of_rows:- An estimate of the number of rows in the table, to provide an idea of avg number of rows per page.

Used_page_count:- Number of pages used by this table, an indication of the true size on disk of the table

Data_ix_used_size_in_kb:- Indicate the size of the table / Index on the disk for both data and index. True metric for size of the table.

In_vs_lob_vs_overflow_page_count:- Number of pages allocated as in_row, LOB or Row overflow. Helps to identify if the table is large because of special data types.

Ix_name:- Helps to identify the index being used for the following columns, might contain multiple rows based on number of levels in the index.

Avg_fragmentation_in_percent:– Indicates the level of fragmentation on each level of the B-tree, helps identify if a simple index rebuild or re-or might fix the issue.

Stats Info: – Provides details of the statistics for the index, like number of rows sampled and the modification counter which will indicate how recently the stats were updated.

WARNING: – The script has been tested for SQL Server 2014, certain DMVs do not work for older versions and table with Columnstore indexes modify as needed.

Please Consider Subscribing

Leave a Reply