Monty Hall Simulation using T-SQL

Please Consider Subscribing

The first time I heard about the Monty Hall problem was while watching TV. The host of the TV show had presented the problem as you typically see. I am recapping it for your benefit now. 

The TV show has three doors. Behind one of which there is a prize. The guest is expected to choose one of the doors. Upon the first selection the host opens up another door to reveal that there is no prize behind it. The dilemma facing the guest now is; should he change his original selection? Or stick with it. Everybody has heard some variation of this problem and it did cause quite some controversy when it first came out. It has since been proven that you should always switch the door. This presents an interesting problem, because what changed? Why would you suddenly want to switch the door?

The fact is you have new information, which should dictate your choice. Simply speaking, your probability will change between the first and the second run because of the fact that you have additional information. Now rather than try to prove this mathematically, I’m just going to go ahead and explain it another way. Say for example, you always put the prize behind door 1. After a couple of attempts, you start to figure out the pattern and you understand that the prize will always be behind door 1. This additional information. Should be taken into account when making your guess. While the probability remains 1/3 because you have three doors and only one of them can have a prize behind it. The additional information that the prize is always behind door one can change your probability from 1/3 to 100%.

This aspect of additional information changing the probability was something that wasn’t taken into account when this problem was first introduced. A number of famous mathematicians had ridiculed the original author (Vos Savant) who suggested that you should always switch. Subsequently, people have investigated this further and proven that you should always switch. Computer simulations have also done the same.

In this post I present one more computer simulation that you can do using Microsoft SQL Server.

/*
    ---------------------------------------------------------------------------
    Script: Monty Hall Problem Simulation in T-SQL
    ---------------------------------------------------------------------------
    Description:
        This script simulates the Monty Hall problem using T-SQL. It runs 10,001
        iterations of the game, storing the results of each run in a temporary table.
        The simulation consists of two scenarios for each run:
            1. The user sticks with their initial choice ("First Chance").
            2. The user always switches to the other unrevealed cup ("Second Chance").
        The script then aggregates and displays the win/loss statistics for both
        strategies, demonstrating the probability outcomes of the Monty Hall problem.

    Key Steps:
        - Drops any existing temporary table used for results.
        - Creates a temporary table #MontyHall to store simulation data.
        - Loops 10,001 times, simulating the Monty Hall game:
            - Randomly assigns the winning cup and the user's initial choice.
            - Records the outcome if the user sticks with their choice.
            - Identifies an empty cup to reveal (not the user's or winning cup).
            - Simulates the user switching to the remaining unrevealed cup.
            - Records the outcome if the user switches.
        - Aggregates and displays the number of wins and losses for both strategies.
        - Outputs the detailed results for each run.

    Columns in #MontyHall:
        - id: Identity column for each record.
        - runid: Unique identifier for each simulation run.
        - cup0, cup1, cup2: Indicates which cup contains the prize (1) or is empty (0/NULL).
        - usercup: The cup selected by the user.
        - firstrunwinner: The cup that actually contains the prize.
        - RUndescription: Describes whether the record is for the "First Chance" or "Second Chance".

    Usage:
        - Run the script in a SQL Server environment.
        - Review the aggregated results to observe the statistical advantage of switching.
        - Inspect the detailed results for individual simulation runs.

    Notes:
        - The script uses randomization functions and temporary tables.
        - The simulation demonstrates the classic probability puzzle and its counterintuitive result.
    ---------------------------------------------------------------------------
*/
SET NOCOUNT ON -- improves loop performace

-- drop temp table used to store results
IF EXISTS (
		SELECT 1
		FROM tempdb.INFORMATION_SCHEMA.TABLES
		WHERE TABLE_NAME LIKE '%Mont%'
		)
BEGIN
	DROP TABLE #MontyHall
END

-- create table to store results for montyhall choices
CREATE TABLE #MontyHall (
	id INT identity(1, 1)
	,runid VARCHAR(100)
	,cup0 BIT
	,cup1 BIT
	,cup2 BIT
	,usercup INT
	,firstrunwinner INT
	,RUndescription VARCHAR(100)
	)

DECLARE @cntr INT = 0

-- loop  a 10000 runs of the game
WHILE @cntr <= 10000
BEGIN
	--- declare our three cups
	DECLARE @cup0 BIT = 0
	DECLARE @cup1 BIT = 0
	DECLARE @cup2 BIT = 0
	-- run id to correlate first and second choices for the case when a switch happens
	DECLARE @runid VARCHAR(100) = cast(newid() AS VARCHAR(100))
	-- assigns a random number between 0 and 2 to simulate a guess of one of the cups -- this is the winning cup
	DECLARE @cupselect INT = (
			SELECT cast(round(rand() * 10, 0) AS INT) % 3
			)

	-- a step to set the value of cup vairable based on what was the value of the @cupselect variable
	IF @cupselect = 0
		SET @cup0 = 1

	IF @cupselect = 1
		SET @cup1 = 1

	IF @cupselect = 2
		SET @cup2 = 1

	--- simulates a guess by the user ( select 0,1 or 2  as the cup choices )
	DECLARE @cupuser INT = (
			SELECT cast(round(rand() * 10, 0) AS INT) % 3
			)

	-- In the first run the winner is already decicded after one run becuase the user never changes the inital selection
	-- storing the results to confirm the same and verify if the probability is 1/3 
	INSERT INTO #Montyhall (
		[runid]
		,[cup0]
		,[cup1]
		,[cup2]
		,[usercup]
		,[firstrunwinner]
		,RUndescription
		)
	SELECT @runid
		,@cup0 AS cup0
		,@cup1 AS cup1
		,@cup2 AS cup2
		,@cupuser AS UserCup
		,@cupselect AS WinnerFirstRound
		,'First Chance'

	--identify empty cup
	-- empty cup can only be the cup that doesnt contain the winner or the users original choice.
	-- the empty cup is the one that is revealed. Forcing the user to take a chance of sticking with the original 
	-- choice or switching the cup that wasnt revealed without knowing if they got it right the first time.
	-- in the second we simulate the user alwats switches
	DECLARE @emptycutp INT = (
			SELECT TOP 1 cup
			FROM (
				SELECT cup
				FROM (
					SELECT 0 AS cup
					
					UNION
					
					SELECT 1
					
					UNION
					
					SELECT 2
					) a -- order by newid()
				
				EXCEPT -- ignore the below two and choose what is left.
				
				SELECT *
				FROM (
					SELECT @cupuser AS cup -- users choice
					
					UNION
					
					SELECT @cupselect -- winning up
					) b
				) q
			ORDER BY newid()
			)

	-- Set empty cup as NULL just so that its easier to identify within the table.
	IF @emptycutp = 0
		SET @cup0 = NULL

	IF @emptycutp = 1
		SET @cup1 = NULL

	IF @emptycutp = 2
		SET @cup2 = NULL

	-- set the users choice to always select the cup that was not revealed 
	SELECT @cupuser = (
			SELECT cup
			FROM (
				SELECT 0 AS cup
				
				UNION
				
				SELECT 1
				
				UNION
				
				SELECT 2
				) a
			
			EXCEPT
			
			SELECT *
			FROM (
				SELECT @cupuser AS cup -- new choice cannot be the orignal choice
				
				UNION
				
				SELECT @emptycutp -- or the empty cup that was revealed.
				) aa
			)

	INSERT INTO #Montyhall (
		[runid]
		,[cup0]
		,[cup1]
		,[cup2]
		,[usercup]
		,[firstrunwinner]
		,RUndescription
		)
	SELECT @runid
		,@cup0 AS cup0
		,@cup1 AS cup1
		,@cup2 AS cup2
		,@cupuser AS UserCup
		,@cupselect AS WinnerFirstRound
		,'Second Chance'

	SET @cntr += 1
END

SELECT RUndescription
	,Winstate
	,count(*) AS '#wins'
FROM (
	SELECT *
		,WinState = CASE 
			WHEN [firstrunwinner] = [usercup]
				THEN 'Winner'
			ELSE 'Loser'
			END
	FROM #MontyHall
	) a
GROUP BY RUndescription
	,Winstate
ORDER BY RUndescription
	,Winstate

SELECT *
	,WinState = CASE 
		WHEN [firstrunwinner] = [usercup]
			THEN 'Winner'
		ELSE 'Loser'
		END
FROM #MontyHall
ORDER BY runid


 

Leave a Reply