SQL Server Comma-Delimited List

April 8, 2011 07:15 by wjchristenson2

I’ve come across a few solutions to creating comma-delimited lists in SQL Server.  The two most common solutions include CURSORS/WHILE loops and COALESCE.  CURSORS and WHILE loops are a performance drag so I won’t even go there.  COALESCE can work but the function requires variable declarations and function encapsulation if you want to return more than one record with delimited results without looping.  The approach I prefer is FOR XML PATH.  Below is an example using the AdventureWorks database.  I show a comma-delimited list of address types for 3 customers.

 

SELECT 
	customer.[CustomerID]
	,customer.[FirstName]
	,customer.[LastName]
	,SUBSTRING((
		SELECT 
			', ' + customer_address.AddressType
		FROM [SalesLT].[CustomerAddress] AS customer_address 
		WHERE customer_address.CustomerID = customer.CustomerID
	FOR XML PATH('')), 3, 200000) AS AddressTypes
FROM [SalesLT].[Customer] AS customer
WHERE customer.CustomerID IN (29544,29545,29559);

Bookmark and Share

SQL Server Triggers and Multiple Rows

March 4, 2011 09:51 by wjchristenson2

A common mistake that I’ve seen is the assumption that a DML trigger is fired once for each record modified.  This is not true.  A good SQL developer will use set-based DML statements to alter multiple records at a time.  Regardless of how many records may have been changed, the table’s DML trigger will fire only once.  The trigger’s INSERTED & DELETED tables will contain all the records affected.  I don’t like triggers, but if you find yourself in a pickle and need one… make sure you code for the possibility that more than one record was changed.

 

BAD trigger:
CREATE TRIGGER trgTableA ON TableA FOR UPDATE
AS
	DECLARE @ID int, @name varchar(50);

	SELECT @ID = i.ID, @name = i.Name FROM inserted i;

	UPDATE TableB SET Name = @name WHERE ID = @ID;
GO

 

GOOD trigger:
CREATE TRIGGER trgTableA ON TableA FOR UPDATE
AS
	UPDATE b SET 
		b.Name = i.Name
	FROM TableB b
		INNER JOIN inserted i ON i.ID = b.ID;
GO
Bookmark and Share

ADO.NET and SQL Server 2008 User-Defined Table Types

February 12, 2011 04:22 by wjchristenson2

I recently came across a situation where I needed to batch insert, update, and delete 1 to 100,000+ records to SQL Server at any given time.  Traditionally I would generate an xml string and parse it to a SQL Server table variable and do batch set operations from there.  However I heard about SQL Server 2008’s new UDTTs (user-defined table types) which I could pass a DataTable to a stored procedure as TVPs (table-valued parameters).  It sounded very attractive as I wouldn’t have to generate & parse xml strings to table variables in SQL Server.  I also thought it would give me huge performance gains.  I quickly found that UDTTs were not my saving grace after all and were less than impressive in my opinion.

Before I show examples on how to use UDTTs/TVPs from ADO.NET, I’ll give you my takeaways as to why I chose not to use them:

1) They do not scale.  If you profile SQL Server you’ll notice that for each record in your DataTable passed via ADO.NET, SQL Server will essentially create a table variable perform an insert for each record.  Imagine 10,000 inserts into a table variable before you can begin batch set operations… not a good thing.

2) “sp_executesql” does not play well with TVPs from ADO.NET.  I was in a situation where I had to dynamically generate SQL based on the schema of the table.  Therefore I wanted to cache the execution plan with “sp_executesql” as the schema would not change much.  However if you try and use “sp_executesql” with a SqlCommand with the CommandType as a stored procedure and are passing in a UDTT as a SqlParameter, it will fail.  Error:  Could not find stored procedure 'sp_executesql'.  This error only occurs if you try and pass a TVP from ADO.NET.  If you want to pass a TVP to “sp_executesql” with ADO.NET, you have to get creative with having the SqlCommand command’s type be text and use “sp_executesql” that way.

3) SQL Server compiles the execution plan every time.  You can see more info here and here.  If you use SQL Server Management Studio and pass TVPs to a stored procedure, no SQL compilations are performed.  However if you pass a TVP from ADO.NET, compilations will occur every time.  I won’t go into great detail over it as you can read the 2 links above.  However if you have a stored procedure which will be called a lot, you may want to consider this fact.

4) When passing a DataTable as a TVP from ADO.NET, the DataTable must have the exact same number of columns in the exact same order and the same compatible data types.  I can understand the data types, however there are times where I have a DataTable that may be used for other programmatic purposes outside of TVP/SQL operations. If I want to pass it as a TVP to SQL Server, I essentially have to make a new DataTable and refill it perfectly so the column count is the same and in the same order.  Very finicky and requires coordination and communication if anyone changes the UDTT as it will break code easily.

There’s my 4 reasons why I was not impressed with UDTTs/TVPs.  One nice thing is they are very simple to pass via ADO.NET and easy to use within a stored procedure once you set things up correctly.  Below is a quick example how to wire things up if you wish to use TVPs with ADO.NET.

 

Create the UDTT

IF EXISTS (SELECT * FROM sys.types st JOIN sys.schemas ss ON st.schema_id = ss.schema_id WHERE st.name = N'udtt_PersonNames' AND ss.name = N'dbo')
DROP TYPE [dbo].[udtt_PersonNames];
GO

CREATE TYPE [dbo].[udtt_PersonNames] AS TABLE(
	ID [int] NOT NULL,
	[Name] [varchar](255) NOT NULL,
	UNIQUE CLUSTERED ([ID] ASC) WITH (IGNORE_DUP_KEY = OFF)
);
GO

 

Create the Stored Procedure accepting the TVP as a UDTT

IF EXISTS (SELECT * FROM sysobjects WHERE type = 'P' AND name = 'usp_InsertPersonNames')
	BEGIN
		DROP PROCEDURE dbo.usp_InsertPersonNames
	END
GO

CREATE PROCEDURE dbo.usp_InsertPersonNames (
	@PersonNamesDT AS dbo.udtt_PersonNames READONLY
)
AS
	SET NOCOUNT ON;
	
	INSERT INTO dbo.PersonNames([ID],[Name])
	SELECT [ID], [Name] FROM @PersonNamesDT;

	SET NOCOUNT OFF;
GO

 

Create an ADO.NET DataTable (same # of columns and order as the UDTT)

        Dim dt As DataTable = New DataTable()
        dt.Columns.Add(New DataColumn("ID", GetType(Integer)))
        dt.Columns.Add(New DataColumn("Name", GetType(String)))

        dt.Rows.Add(New Object() {1, "Joe"})
        dt.Rows.Add(New Object() {2, "Mark"})
        dt.Rows.Add(New Object() {3, "Ted"})

 

Pass the ADO.NET DataTable as a TVP to the Stored Procedure

        Using conn As SqlConnection = New SqlConnection("Data Source=localhost\sqlexpress;Initial Catalog=UDTT_test;Integrated Security=True;")
            conn.Open()

            Dim cmd As SqlCommand = New SqlCommand()
            With cmd
                .Connection = conn
                .CommandType = CommandType.StoredProcedure
                .CommandText = "dbo.usp_InsertPersonNames"

                Dim tvParam As SqlParameter = New SqlParameter("@PersonNamesDT", dt)
                tvParam.SqlDbType = SqlDbType.Structured
                tvParam.TypeName = "udtt_PersonNames"
                .Parameters.Add(tvParam)

                .ExecuteNonQuery()
            End With
        End Using

UserDefinedTableTypes_Soln.zip (82.82 kb)
 

Bookmark and Share

Primary Keys and Distributed Environments

September 22, 2010 14:24 by wjchristenson2

In architecting a new application or enhancing an existing one, important considerations need to be made when choosing how to implement primary keys/IDs.  From my own personal research and experience; unique, clustered, auto-incrementing, integer identity columns as the PK perform the best when it comes to DB performance.  They result in more compact clustered indexes, less IO when the table is queried, and they yield more efficient joins.  However, they have inherent problems when used in a distributed environment.

 

Auto-Increment (Identity Columns)
Auto-Increment columns on a SQL Server have seed and and increment properties.  Because these properties are fixed and can be selected from a finite number of possible values, the probability that you’ll get a PK collision is very high.  Therefore an auto-incrementing integer identity column should only be used in non-distributed/download-only situations.  No other server would be creating PK values and thus, you would not have PK conflicts.

There is one way where you could use this type of PK.  Each node which generates PKs could be assigned an identity range to avoid conflicts.  I can see some negatives to this solution: 
1) Have to manage nodes/ranges
2) Page splitting
3) Increased IO on new record inserts as the next identity value isn’t necessarily the the max
4) Loss of PK value possibilities as ranges may not be fully utilized

 

GUIDs - NEWID()
Using a GUID or UNIQUEIDENTIFIER in SQL Server is attractive because the generation algorithm provides a sufficiently high degree of probability that the same key would never be generated twice on any 2 nodes at any given time.

Although GUIDs meet the uniqueness required of a distributed environment, they have some glaring drawbacks that should be considered:
1) Huge in size (4x larger than an integer / 16 bytes). 
2) Large size negatively affects indexes.
3) Because they are random, they would be inserted into random locations within the clustered index resulting in fragmented clustered indexes which yields poor IO performance when the table is queried.
4) Generation of a new GUID in SQL Server does cost more resources than auto-incrementing an integer.  When bulk inserting, this has some glaring performance problems.
5) NEWID() uses the the server’s network card’s ID number plus a unique number from the CPU clock to generate the GUID.  The server would need to have a NIC to ensure GUID uniqueness.

 

GUIDs – NEWSEQUENTIALID()
The answer to the randomness problems with generating a new GUID is to generate sequential GUIDs.  SQL Server has the NEWSEQUENTIALID() function which will generate a GUID that will be greater in value than the previously generated one.  This solves the problems of fragmented clustered indexes.  It also solves the performance problems with generating a new GUID.  Because the new “sequential” GUID takes the previous GUID and increments it per se, the overhead is minimal and performs almost as well as an auto-incremented integer.

So what are the drawbacks to a sequential GUID?
1) Still Huge in size (4x larger than an integer / 16 bytes).
2) Large size negatively affects indexes.
3) NEWSEQUENTIALID() uses the the server’s network card’s MAC address to assist in the GUID generation process.  If privacy is a concern, then this probably isn’t a viable option for you.  Again, the server would need to have a NIC to ensure GUID uniqueness.

 

Natural Keys
In this strategy, you use business keys to uniquely identify records (e.g. social security number for a person).  The main drawbacks with this approach is:
1) Many columns may be needed to create uniqueness of the record.
2) Inconsistent PK columns/data types.
3) Not all records may have a glaring natural key to setup a PK for.

 

Node Identifiers
For this approach, the PK will combine a value that is unique to the node and a value that is unique to the topology.  For instance, you could have a PK that combines an auto-increment identity column and a unique id for the node/server column.  You could also create a custom system to generate values for each inserted row based on the rowID/nodeID.

 

Summary
In summary, there are many ways to architect a distributed database environment in regard to primary keys.  Personally sequential GUIDs and node identifiers are my top 2 picks.  If I had to choose between the two, I think I’d opt for node identifiers.  GUID size, privacy concerns, and ease of use appear to be a bit too much in my opinion.

Bookmark and Share

SQL Server 2008 FILESTREAM and ADO.NET

August 6, 2010 06:45 by wjchristenson2

In my last post, I gave a tutorial of SQL Server 2008’s new FILESTREAM feature.  I described what it is, why it should be used for BLOB data storage, and released some TSQL scripts to enable FILESTREAM and create a table to hold some media.  In this post, I will show how to insert and read FILESTREAM data from a SQL 2008 Server using ADO.NET.

ADO.NET comes with the SqlClient data provider (System.Data.SqlClient).  It supports FILESTREAM operations by using the SqlFileStream class which resides in the System.Data.SqlTypes namespace.  As its name suggests, the SqlFileStream is a stream and thus inherits from System.IO.Stream.  The SqlFileStream will provide methods to allow us to read and write streams of data to the SQL Server.  I will show 2 examples: inserting and reading FILESTREAM data with ADO.NET (the SqlFileStream object).  The program I created allows the user to save and view .jpg images from the SQL Server using FILESTREAM.  Here is a picture of the final example program.


Overview:
The SqlFileStream requires 2 arguments which are supplied from the SQL Server: the read/write path to the server’s file and the transactional context/token.  These 2 items are acquired via a preliminary SQL query before the the SqlFileStream is employed to read or write to the NTFS file system.

Inserting FILESTREAM Data
I first INSERT a record with a NULL value for the varbinary(max) field.  I use this blank record to acquire the server file path and also return the current FILESTREAM transactional context/token.  After this is complete, I can employ the SqlFileStream to write to the server file from my local file.  Below is my stored proc I used along with the .NET code.

 

IF EXISTS (SELECT * FROM sysobjects WHERE type = 'P' AND name = 'usp_InsertDocument')
	BEGIN
		DROP PROCEDURE dbo.usp_InsertDocument
	END
GO

CREATE PROCEDURE dbo.usp_InsertDocument 
	WITH ENCRYPTION
AS
	SET NOCOUNT ON;

	DECLARE @DocumentID INT;

	--insert NULL first to acquire ID & the transaction context values
	INSERT INTO dbo.Document (Document) VALUES (0x00);

	SELECT @DocumentID = SCOPE_IDENTITY();

	SELECT 
		doc.DocumentID,
		doc.DocumentGUID,
		doc.Document.PathName(0) AS [PathName], 
		GET_FILESTREAM_TRANSACTION_CONTEXT() AS [TransactionContext] 
	FROM dbo.Document AS doc WITH (NOLOCK) 
	WHERE doc.DocumentID = @DocumentID;

	SET NOCOUNT OFF;
GO

 

    Private Sub InsertDocument(ByVal clientFilePath As String)
        Dim dt As DataTable = New DataTable()

        Using conn As SqlConnection = New SqlConnection(_connectionString)
            Dim reader As SqlDataReader = Nothing
            Dim context() As Byte = Nothing
            Dim serverFilePath As String = ""

            conn.Open()

            Dim transaction As SqlTransaction = conn.BeginTransaction()

            Try
                Dim cmd As SqlCommand = conn.CreateCommand()
                With cmd
                    .Transaction = transaction
                    .CommandType = CommandType.StoredProcedure
                    .CommandText = "dbo.usp_InsertDocument"
                End With

                reader = cmd.ExecuteReader()
                dt.Load(reader)

                If dt.Rows.Count > 0 Then
                    serverFilePath = dt.Rows(0)("PathName").ToString()
                    context = DirectCast(dt.Rows(0)("TransactionContext"), Byte())

                    Using serverStream As SqlFileStream = New SqlFileStream(serverFilePath, context, FileAccess.Write)
                        Using clientStream As FileStream = New FileStream(clientFilePath, FileMode.Open, FileAccess.Read)
                            Dim bytesRead As Integer = 0, bufferSize As Integer = 4096
                            Dim buffer As Byte() = New Byte(bufferSize - 1) {}

                            Do
                                bytesRead = clientStream.Read(buffer, 0, bufferSize)
                                If bytesRead = 0 Then
                                    Exit Do
                                End If

                                serverStream.Write(buffer, 0, bytesRead)
                                serverStream.Flush()
                            Loop While True

                            clientStream.Close()
                        End Using

                        serverStream.Close()
                    End Using
                End If

                transaction.Commit()
            Catch ex As Exception
                Try
                    'attempt to rollback the transaction
                    transaction.Rollback()
                Catch ex2 As Exception
                    'rollback was unsuccessful (connection may be closed)
                End Try

                MessageBox.Show(ex.Message)
            Finally
                If reader IsNot Nothing AndAlso Not reader.IsClosed Then reader.Close()
            End Try
        End Using

        LoadDocuments()
    End Sub

 

Reading FILESTREAM Data
Similar to the insertion of a file, reading a file requires getting the server file path and then current FILESTREAM transactional context/token.  After those items are gathered from the SQL Server, we can connect to and stream the data anyway we like.  Below I streamed .jpg data from the SQL Server and loaded them into a WPF Image control for viewing purposes.

 

IF EXISTS (SELECT * FROM sysobjects WHERE type = 'P' AND name = 'usp_SelectDocument')
	BEGIN
		DROP PROCEDURE dbo.usp_SelectDocument
	END
GO

CREATE PROCEDURE dbo.usp_SelectDocument (
	@DocumentID INT
)
	WITH ENCRYPTION
AS
	SET NOCOUNT ON;

	SELECT 
		doc.DocumentID,
		doc.DocumentGUID,
		doc.Document.PathName(0) AS [PathName], 
		GET_FILESTREAM_TRANSACTION_CONTEXT() AS [TransactionContext] 
	FROM dbo.Document AS doc WITH (NOLOCK)
	WHERE doc.DocumentID = @DocumentID;

	SET NOCOUNT OFF;
GO

 

    Private Sub LoadDocument(ByVal documentID As Integer)
        Dim dt As DataTable = New DataTable()

        Using conn As SqlConnection = New SqlConnection(_connectionString)
            Dim reader As SqlDataReader = Nothing
            Dim context() As Byte = Nothing
            Dim serverFilePath As String = ""

            conn.Open()

            Dim transaction As SqlTransaction = conn.BeginTransaction()

            Try
                Dim cmd As SqlCommand = conn.CreateCommand()
                With cmd
                    .Transaction = transaction
                    .CommandType = CommandType.StoredProcedure
                    .CommandText = "dbo.usp_SelectDocument"
                    .Parameters.Add(New SqlParameter("@DocumentID", documentID))
                End With

                reader = cmd.ExecuteReader()
                dt.Load(reader)

                If dt.Rows.Count > 0 Then
                    serverFilePath = dt.Rows(0)("PathName").ToString()
                    context = DirectCast(dt.Rows(0)("TransactionContext"), Byte())

                    Using memStream As MemoryStream = New MemoryStream()
                        Using serverStream As SqlFileStream = New SqlFileStream(serverFilePath, context, FileAccess.Read)

                            Dim bytesRead As Integer = 0, bufferSize As Integer = 4096
                            Dim buffer As Byte() = New Byte(bufferSize - 1) {}

                            Do
                                bytesRead = serverStream.Read(buffer, 0, bufferSize)
                                If bytesRead = 0 Then
                                    Exit Do
                                End If

                                memStream.Write(buffer, 0, bytesRead)
                                memStream.Flush()
                            Loop While True

                            serverStream.Close()
                        End Using

                        memStream.Seek(0, SeekOrigin.Begin)

                        Dim bmpImage As BitmapImage = New BitmapImage()
                        With bmpImage
                            .BeginInit()
                            .StreamSource = memStream
                            .CreateOptions = BitmapCreateOptions.None
                            .CacheOption = BitmapCacheOption.OnLoad
                            .EndInit()
                            .Freeze()
                        End With

                        Me.imgView.Source = bmpImage

                        memStream.Close()
                    End Using
                End If

                transaction.Commit()
            Catch ex As Exception
            Try
                'attempt to rollback the transaction
                transaction.Rollback()
            Catch ex2 As Exception
                'rollback was unsuccessful (connection may be closed)
            End Try

            MessageBox.Show(ex.Message)
        Finally
            If reader IsNot Nothing AndAlso Not reader.IsClosed Then reader.Close()
        End Try
        End Using
    End Sub

 

So there you have it.  The basics of writing and reading to a FILESTREAM on a SQL 2008 Server using ADO.NET.  Below you’ll find my scripts and .NET source code.

SqlFileStreamExample_Soln.zip (559.27 kb)
 

Bookmark and Share