Set based vs row by row (procedural code)

I was reading a blog about how to write procedural code today ( link below) and wanted to take a moment to explore another approach just to provide a more complete reference. To provide some context I am explaining the business case here.

A hospital wants to identify people whose heights have changed over time in order to detect if the person is who they say they are. Since the height of a person doesn’t change after a certain age the approach is used to detect identity theft. The requirement is that we compare the heights logged during different visit and flag any patients where they have different heights logged.

The code to create the dummy data is mentioned below and was taken from the link mentioned earlier.

/*
    This script creates and populates sample tables to simulate patient visit and height data.

    Steps performed:
    1. Creates a Dates table and populates it with 20 sequential weekly visit dates starting from the current date.
    2. Creates a PatientHeight table and populates it with 10,000 patients, assigning each a height between 59 and 74 inches.
    3. Adds a primary key constraint to the PatientHeight table.
    4. Generates a PatientVisit table as a cartesian join of all patients and all visit dates, resulting in 200,000 records.
    5. Adds a primary key constraint to the PatientVisit table.
    6. Updates the height for certain patients on specific visit dates to simulate changes in height.
    7. (Commented out) Provides cleanup statements to drop the created tables.

    Usage:
    - Intended for generating test data for scenarios involving patient visits and height tracking.
    - The cleanup section can be uncommented to remove all created tables after testing.
*/
SET NOCOUNT ON;

CREATE TABLE Dates (
    ID INT
    ,VisitDate DATETIME
    );

--populate table with 20 visit dates
DECLARE @i INT
    ,@startdate DATETIME;

SET @i = 1;
SET @startdate = GETDATE();

WHILE @i <= 20
BEGIN
    INSERT Dates (
        ID
        ,VisitDate
        )
    VALUES (
        @i
        ,@startdate
        );

    SET @startdate = DATEADD(dd, 7, @startdate);
    SET @i = @i + 1;
END

CREATE TABLE PatientHeight (
    PatientID INT NOT NULL
    ,Height INT
    );

-- populate table with 1000 patientids with heights between 59 and 74 inches
SET @i = 1;

WHILE @i <= 10000
BEGIN
    INSERT PatientHeight (
        PatientID
        ,Height
        )
    VALUES (
        @i
        ,@i % 16 + 59
        );

    SET @i = @i + 1;
END

ALTER TABLE PatientHeight ADD CONSTRAINT PK_PatientHeight PRIMARY KEY (PatientID);

-- cartesian join produces 200,000 PatientVisit records
SELECT ISNULL(PatientID, - 1) AS PatientID
    ,ISNULL(VisitDate, '19000101') AS VisitDate
    ,Height
INTO PatientVisit
FROM PatientHeight
CROSS JOIN Dates;

ALTER TABLE PatientVisit ADD CONSTRAINT PK_PatientVisit PRIMARY KEY (
    PatientID
    ,VisitDate
    );

-- create changes of height
SET @i = 3;

WHILE @i < 10000
BEGIN
    UPDATE pv
    SET Height = Height + 2
    FROM PatientVisit pv
    WHERE PatientID = @i
        AND pv.VisitDate = (
            SELECT TOP 1 VisitDate
            FROM Dates
            WHERE id = ABS(CHECKSUM(@i)) % 19
            );

    SET @i = @i + 7;
END
        /*
-- return AdventureWorks to its previous state when you are finished
-- with this example.

DROP TABLE Dates;
DROP TABLE PatientHeight;
DROP TABLE PatientVisit;
*/

The first approach uses a cursor to iterate row by row and identify if there has been a change in height. The second approach uses a ranking function to perform a self-join based on patient id to identify any change. Both these approaches are detailed below as well to save the reader time.

CUROSR BASED APPROACH

/*
    This script identifies and records changes in patient height across visits using a cursor-based approach.

    Steps:
    1. Creates a temporary table (#Changes) to store records where a patient's height changes between visits.
    2. Declares variables to track the current patient, visit date, and height values.
    3. Opens a FAST_FORWARD cursor on the PatientVisit table, ordered by PatientID and VisitDate.
    4. Iterates through each visit:
        - Detects the first record for each patient and initializes the baseline height.
        - Compares the current height to the baseline; if different, inserts a record into #Changes and updates the baseline.
    5. Closes and deallocates the cursor.
    6. Selects all records from #Changes to display the detected changes.
    7. Drops the temporary table.

    Purpose:
    - To track and report all instances where a patient's height has changed between visits.
    - Useful for auditing, data quality checks, or clinical analysis.
*/
-- CURSOR BASED APPROACH
CREATE TABLE #Changes (
    PatientID INT
    ,VisitDate DATETIME
    ,BeginHeight SMALLINT
    ,CurrentHeight SMALLINT
    );

DECLARE @PatientID INT
    ,@CurrentID INT
    ,@BeginHeight SMALLINT
    ,@CurrentHeight SMALLINT
    ,@VisitDate DATETIME;

SET @PatientID = 0;

DECLARE Patient_cur CURSOR FAST_FORWARD
FOR
SELECT PatientID
    ,VisitDate
    ,Height
FROM PatientVisit
ORDER BY PatientID
    ,VisitDate;

OPEN Patient_cur;

FETCH NEXT
FROM Patient_cur
INTO @CurrentID
    ,@VisitDate
    ,@CurrentHeight;

WHILE @@FETCH_STATUS = 0
BEGIN
    -- first record for this patient
    IF @PatientID <> @CurrentID
    BEGIN
        SET @PatientID = @CurrentID;
        SET @BeginHeight = @CurrentHeight;
    END

    IF @BeginHeight <> @CurrentHeight
    BEGIN
        INSERT #Changes (
            PatientID
            ,VisitDate
            ,BeginHeight
            ,CurrentHeight
            )
        VALUES (
            @PatientID
            ,@VisitDate
            ,@BeginHeight
            ,@CurrentHeight
            );

        SET @BeginHeight = @CurrentHeight;
    END

    FETCH NEXT
    FROM Patient_cur
    INTO @CurrentID
        ,@VisitDate
        ,@CurrentHeight;
END

CLOSE Patient_cur;

DEALLOCATE Patient_cur;

SELECT *
FROM #Changes

DROP TABLE #Changes
GO

SET BASED APPROACH

/*
    This query identifies changes in patient height between consecutive visits using a ranking function approach.

    Steps:
    1. The CTE 'PV_RN' assigns a unique row number (ROWID) to each patient visit, ordered by PatientID and VisitDate.
    2. The main query self-joins 'PV_RN' to compare each visit (t1) with the immediately following visit (t2) for the same patient.
    3. It filters for cases where the PatientID matches and the Height value has changed between visits.
    4. The result shows the PatientID, the date of the height change, the previous height, and the new height.
    5. Results are ordered by PatientID and the date of the changed visit.
*/
-- RANKING FUNCTION BASED APPROACH
WITH PV_RN
AS (
    SELECT ROW_NUMBER() OVER (
            ORDER BY PatientID
                ,VisitDate
            ) AS ROWID
        ,*
    FROM PatientVisit
    )
SELECT t1.PatientID
    ,t2.VisitDate AS DateChanged
    ,t1.Height AS HeightChangedFrom
    ,t2.Height AS HeightChangedTo
FROM PV_RN t1
JOIN PV_RN t2 ON t2.ROWID = t1.ROWID + 1
WHERE t1.patientid = t2.patientid
    AND t1.Height <> t2.Height
ORDER BY t1.PatientID
    ,t2.VisitDate;

There is however a third option which I believe would be much simpler and is also much faster has been listed below. Here we use the math operator STDEV which calculates a standard deviation for the heights that were captured. When the height hasn’t changed STDEV will return a value of zero else a non-zero number which can be used to identify patient’s with different heights.

THIRD APPROACH

/*
    This query retrieves the minimum and maximum height measurements for each patient
    from the PatientVisit table, but only for those patients whose recorded heights
    show some variability (i.e., the standard deviation of their height measurements is greater than zero).
    This approach, based on mathematical operators, allows for a margin of error in measurements.
    - Filters patients to include only those with more than one unique height value.
    - For each qualifying patient, returns their PatientID, minimum height, and maximum height.
*/
-- MATH OPERATOR BASED APPROACH 
-- HAS THE ADVANTAGE OF PROVIDING A MARGIN FOR ERROR FOR MEASUREMENTS AS WELL 
SELECT PatientID
    ,MIN(Height)
    ,Max(Height)
FROM PatientVisit
WHERE PatientID IN (
        SELECT  patientid FROM  PatientVisit GROUP BY PatientID
        HAVING STDEV(height) > 0
        )
GROUP BY PatientID

Hope this approach helps clarify what we are doing with set based approach since recently I have noticed that ranking functions and CTEs seem to be the go to approach for most developers when they think set based logic.

Reference

http://sqlmag.com/t-sql/programming-sql-set-based-way

Please Consider Subscribing