NULL Management, SPARSE Columns, Vertical partitioning and a query

When discussing DW design I often take a lot of time to emphasize the impact Nulls have on query performance, aggregations, design, storage and much more. While this is not new to most seasoned DW developers it is often overlooked during the design phase and added at a later point in time mostly after testing etc. In the interest of putting it out there I figured I will post the topic here as well as a script I created few years back to identify suitable columns in the table.

Which is better Count (*) or Count (id)?

Here is a post on how nulls affect COUNT behavior and performance, the default measure added for every measure group is count so I guess this is important.

Now NULLS Occupy space and as a result a wide table that has a large number of columns which are mostly NULL should ideally be carved out into its own table with sparse columns which is what we call vertical partitioning. In fact I have often seen implementations where the wide table is converted into key value pairs so as to avoid having as many nulls in the table.

ID Name AGE PhNumber SSN

1 Jayanth 4 123-456-7890 NULL

2 James 5 987-064-1234 123456

ID FName FValue

1 Name Jayanth

1 PhNumber 123-456-7890

1 Age 4

2 Name James

2 Age 5

2 PhNumber 987-064-1234

2 SSN 123456

While the second approach does in fact eliminate nulls, it also significantly adds to the number of rows while at the same time requiring complex pivoting of data to bring it back to a record set format that users can understand. I.e. the second approach works fine for OLTP databases where all records of an individual might need to be fetched but doesn’t lend itself to aggregation which is what the DW would require.

The query below identifies nullable columns within a table and then check if there are actual null values in those nullable columns. It then either generates a view or actually creates a view with only those columns which are having values for all rows.

USE [madworks]
GO

/****** Object:  StoredProcedure [dbo].[CreateNotNullViews]    Script Date: 6/23/2015 1:35:32 PM ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

CREATE PROCEDURE [dbo].[CreateNotNullViews] (
	@Schema_name NVARCHAR(256)
	,@table_name NVARCHAR(256)
	,@Defonly BIT
	)
AS
BEGIN
	/***********************************************************
Author name :- Jayanth Kurup
Created date :- 201204012
Purpose:- Creates view definition or view with all columns in the table
that have not null values .
@defonyl paramter accepts values of 1 or 0
1 = schema definition (it wont create the actual view)
0 = it will create the view.

***********************************************************/
	SET NOCOUNT ON;

	WITH CTE
	AS (
		SELECT *
		FROM INFORMATION_SCHEMA.COLUMNS
		WHERE TABLE_NAME = @TABLE_NAME
			AND TABLE_SCHEMA = @SCHEMA_NAME
		)
	SELECT ROW_NUMBER() OVER (
			ORDER BY c.COLUMN_NAME
			) AS Colids
		,c.COLUMN_NAME
		,d.texts
		,NULL AS 'NullColumn'
	INTO #tempcolstore
	FROM CTE c
	CROSS APPLY (
		SELECT ' IF EXISTS (SELECT 1 FROM ' + @schema_name + '.' + @table_name + ' WHERE ' + c.COLUMN_NAME + ' IS NULL)
BEGIN
SELECT ' + CHAR(39) + c.COLUMN_NAME + CHAR(39) + ' AS NULLCOLUMN
END' AS texts
		) d
	WHERE IS_NULLABLE = 'YES'
		---- THE ABOVE CODE TELLS THE LOGIC TO IGNORE NULLABLE COLUMNS
		 

	DECLARE @colid INT

	SET @colid = 1

	DECLARE @NullLIST TABLE (Columnname NVARCHAR(256))

	WHILE @colid < = (
			SELECT MAX(Colids)
			FROM #tempcolstore
			)
	BEGIN
		DECLARE @sql NVARCHAR(MAX)

		SET @SQL = (
				SELECT TEXTS
				FROM #TEMPCOLSTORE
				WHERE COLIDS = @COLID
				)

		--PRINT @SQL
		INSERT INTO @NULLLIST
		EXEC SP_EXECUTESQL @SQL

		SET @colid = @colid + 1
	END

	---- THE ABOVE CODE PERFORMS THE NULL EXISTANCE CHECK
	SET @SQL = (
			SELECT 'CREATE VIEW ' + @Schema_name + '.vw_' + @table_name + ' AS SELECT '
			) + (
			SELECT DISTINCT Replace((
						SELECT DISTINCT u2.COLUMN_NAME + ',' AS 'data()'
						FROM INFORMATION_SCHEMA.COLUMNS u2
						WHERE TABLE_NAME = @TABLE_NAME
							AND TABLE_SCHEMA = @SCHEMA_NAME
							AND COLUMN_NAME NOT IN (
								SELECT COLUMNNAME
								FROM @NULLLIST
								)
						FOR XML PATH('')
						) + '$', ',$', '') AS Roles
			FROM INFORMATION_SCHEMA.COLUMNS u1
			WHERE TABLE_NAME = @TABLE_NAME
				AND TABLE_SCHEMA = @SCHEMA_NAME
				AND COLUMN_NAME NOT IN (
					SELECT COLUMNNAME
					FROM @NULLLIST
					)
			) + ' FROM ' + @schema_name + '.' + @table_name

	--- THE ABOVE CODE CREATES THE VIEW DEFINITION
	IF @DEFONLY = 1
	BEGIN
		SELECT @SQL AS ViewOutput
	END
	ELSE
	BEGIN
		EXEC SP_EXECUTESQL @SQL

		PRINT 'SUCCESS :- View created'
	END

	--- PERFORMS ACTION AS PER USER INPUT.
	DROP TABLE #TEMPCOLSTORE
		---- CLEAN UP
END

If the column is nullable but doesn’t have any null values then the script will include that column as part of the view definition say for example the Phnumber column in the above table, if the column is not nullable it will always appear in the view definition say for example the Name column in the same table and if the column is nullable and has at least 1 null value in the column it will be ignored from the view definition like the SSN column.

Therefore the query will create the table structure for the parent table in vertical partitioning. Why bother with all this, because it’s better to implement proper design in retrospect than to not do it at all; a case in point is horizontal partitioning which almost never gets implement proactively.

Please Consider Subscribing

Leave a Reply