Welcome to Technohub.in

SQL Server 2000 | SQL Server 2005 | SQL Server 2008

   How to Share Data Between Stored Procedures
   Arrays and Lists in SQL Server
   Connecting to MS SQL Server from Unix
   Free SQL resource Links
   Conver Access DataABse to SQL Server
   Creating Stored Procedures in SQL Server 2000
   Moving SQL to New Server
   Accessing Data from SQL Server
   SQL Server 2005 FAQs


Note: this article was written for SQL 2000 and most of the material applies to SQL7 and SQL 6.5 as well. On the other hand, the article is less applicable to SQL 2005 as this version offers radically impoved methods for error handling, that beside a short section I am silent on. I hope to cover SQL 2005 in a future article.

Table of Contents: Top of the Page

   Introduction
   The Basics
      The Anatomy of an Error Message
      How to Detect an Error in T-SQL – @@error
      Return Values from Stored Procedures
      @@rowcount
      @@trancount
      More on Severity Levels
   What Happens when an Error Occurs?
      The Possible Actions
      When Does SQL Server Take which Action?
         Connection-termination
         Scope-abortion
         Statement-termination and Batch-abortion
      Trigger Context
      Errors in User-Defined Functions
   Control Over Error Handling
      SET XACT_ABORT
      ARITHABORT, ARITHIGNORE and ANSI_WARNINGS
      RAISERROR WITH NOWAIT
      Duplicates
      Using Linked Servers
      Retrieving the Text of an Error Message
      TRY-CATCH in SQL 2005
   Client-side Error Handling
      DB-Library
      ODBC
      ADO
      ADO .Net
   Acknowledgements and Feedback
   Revision History

Introduction

In many aspects SQL Server is a very good DBMS that permits you implement powerful solutions with good performance. However, when it comes to error handling... To be blunt: error handling in SQL Server is poor. It is a patchwork of not-always-so-consistent behaviour. It's also weak in that you have fairly little control over error handling, and for advanced error handling like suppressing errors or logging errors, you must take help from the client-side. Unfortunately, depeng on which client library you use, you may find that the client library has its own quirks, sometimes painting you into a corner where there is no real good solution.

In this article, I will first look at what parts an error message consists of, and how you can detect that an error has occurred in T-SQL code. Next, I describe the possible actions can SQL Server can take in case of an error. I then proceed to describe the few possibilities you have to control SQL Server's error handling. Finally, there is a section on how the different client libraries from Microsoft behave, with most of the focus on ADO and ADO .Net.

General disclaimer: whereas some information in this text is drawn from Books Online and other documentation from Microsoft, a lot of what I say is based on observations that I have made from working with SQL Server, and far from all of this is documented in Books Online. Therefore, you should be wary to rely on a specific behaviour like "this error have this-and-this effect", as it could be different in another version of SQL Server, even different between service packs.

Top of the Page

The Basics

The Anatomy of an Error Message

Here is a typical error message you can get from SQL Server when working from Query Analyzer.

Server: Msg 547, Level 16, State 1, Procedure error_demo_sp, Line 2 UPDATE statement conflicted with COLUMN FOREIGN KEY constraint 'fk7_acc_cur'. The conflict occurred in database 'bos_sommar', table 'currencies', column 'curcode'. The statement has been terminated. Note: Under Tools->Options->Connections, I have checked Parse ODBC Message Prefixes.

The error information that SQL Server passes to the client consists of several components, and the client is responsible for the final interpretation of the message. These are the components that SQL Server passes to the client.

Message number – each error message has a number. You can find most of the message numbers in the table sysmessages in the master database. (There some special numbers like 0 and 50000 that do not appear there.) In this example, the message number is 547. Since most interesting messages are errors, I will also use the term error number. Message numbers from 50001 and up are user-defined. Lower numbers are system defined.

Severity level – a number from 0 to 25. The stort story is that if the severity level is in the range 0-10, the message is informational or a warning, and not an error. Errors resulting from programming errors in your SQL code have a severity level in the range 11-16. Severity levels 17-25 indicate resource problems, hardware problems or internal problems in SQL Server, and if the severity is 20 or higher, the connection is terminated. For the long story, see the section More on Severity Levels for some interesting tidbits. For system messages you can find the severity level in master..sysmessages, but for some messages SQL Server employs a different severity level than what's in sysmessages.

State – a value between 0 and 127. The meaning of this item is specific to the error message, but Microsoft has not documented these values, so this value is rarely of interest to you.

Procedure – in which stored procedure, trigger or user-defined function the error occurred. Blank if the error occurred in a plain batch of SQL statements (inclug dynamic SQL).

Line – Line number within the procedure/function/trigger/batch the error occurred. A line number of 0 indicates that the problem occurred when the procedure was invoked.

Top of the Page

Message text – the actual text of the message that tells you what went wrong. You can find this text in master..sysmessages, or rather a template for it, with placeholders for names of databases, tables etc.

As I mentioned the client is responsible for the formatting of the error message, and for messages with a severity level with 10 or lower, most client programs print only the message text, but not severity level, procedure etc. In fact, we see an example of this above. The text The statement has been terminated is a message on its own, message 3621.

When you write your own client program, you can choose your own way to display error messages. You may be somewhat constrained by what your client library supplies to you. The full information is available with low-level interfaces such as DB-Library, ODBC or the OLE DB provider for SQL Server. On the other hand, in ADO you only have access to the error number and the text of the message.

There are two ways an error message can appear: 1) an SQL statement can result in an error (or a warning) 2) you emit it yourself with RAISERROR (or PRINT). Let's take a brief look at RAISERROR here. Here is sample statement:

RAISERROR('This is a test', 16, 1)

Here you supply the message text, the severity level and the state. The output is:

Server: Msg 50000, Level 16, State 1, Line 1 This is a test

Thus, SQL Server supplies the message number 50000, which is the error number you get when you supply a text string to RAISERROR. (There is no procedure name here, since I ran the statement directly from Query Analyzer.) Rather than a string, you could have a supplied a number of 50001 or greater, and SQL Server would have looked up that number in sysmessages to find the message text. You would have stored that message with the system procedure sp_addmessage. (If you just supply a random number, you will get an error message, saying that the message is missing.) Whichever method you use, the message can include placeholders, and you can provide values for these placeholders as parameters to RAISERROR, something I do not cover here. Please refer to Books Online for details.

As I mentioned State is rarely of interest. With RAISERROR, you can use it as you wish. If you raise the same message in several places, you can provide different values to State so that you can conclude which RAISERROR statement that fired. The command-line tools OSQL and ISQL have a special handling of state: if you use a state of 127, the two tools abort and set the DOS variable ERRORLEVEL to the message number. This can be handy in installation scripts if you want to abort the script if you detect some serious condition. (For instance, that database is not on the level that the installation script is written for.) This behaviour is entirely client-dependent; for instance, Query Analyzer does not react on state 127.

How to Detect an Error in T-SQL – @@error

Top of the Page

After each statement in T-SQL, with one single exception that I cover in the next section, SQL Server sets the global variable @@error to 0, unless an error occurs, in which case @@error is set to the number of that error. (Note: these days, the SQL Server documentation refers to @@error as a "function". Being an old-timer, I prefer "global variables" for the entities whose names that start with @@.)

More precisely, if SQL Server emits a message with a severity of 11 or higher, @@error will hold the number of that message. And if SQL Server emits a message with a severity level of 10 or lower, SQL Server does not set @@error, and thus you cannot tell from T-SQL that the message was produced.

But the message number is also the only field of the error message that you easily can access from T-SQL. A common question on the newsgroups is how to retrieve the text of an error message, and for a long time the answer was "you can't". But recently Mark Williams pointed out to me a way to do it, but as it requires that the user has sysadmin privileges, you cannot easily put in an application. I will return to this topic in the section Retrieving the Text of an Error Message.

There is no way to prevent SQL Server from raising error messages. There is a small set of conditions for which you can use SET commands to control whether these conditions are errors or not. We will look closer at these possibilities later, but I repeat that this is a small set, and there is no general way in T-SQL to suppress error messages. You will need to take care of that in your client code. (Another common question on the newsgroups.)

As I mentioned, @@error is set after each statement. Therefore, you should always save the save the value of @@error into a local variable, before you do anything with it. Here is an example of what happens if you don't:

CREATE TABLE notnull(a int NOT NULL) DECLARE @value int INSERT notnull VALUES (@value) IF @@error <> 0 PRINT '@@error is ' + ltrim(str(@@error)) + '.'

The output is:

Server: Msg 515, Level 16, State 2, Line 3 Cannot insert the value NULL into column 'a', table 'tempdb.dbo.notnull'; column does not allow nulls. INSERT fails. The statement has been terminated. @@error is 0.

Here is the correct way.

CREATE TABLE notnull(a int NOT NULL) DECLARE @err int, @value int INSERT notnull VALUES (@value) SELECT @err = @@error IF @err <> 0 PRINT '@err is ' + ltrim(str(@err)) + '.' The output is: Server: Msg 515, Level 16, State 2, Line 3 Cannot insert the value NULL into column 'a', table 'tempdb.dbo.notnull'; column does not allow nulls. INSERT fails. The statement has been terminated. @err is 515. Top of the Page

Return Values from Stored Procedures

All stored procedures have a return value, determined by the RETURN statement. The RETURN statement takes one optional argument, which should be a numeric value. If you say RETURN without providing a value, the return value is 0 if there is no error during execution. If an error occurs during execution of the procedure, the return value may be 0, or it may be a negative number. The same is true if there is no RETURN statement at all in the procedure: the return value may be a negative number or it may be 0.

Whether these negative numbers have any meaning, is a bit difficult to tell. It used to be the case, that the return values -1 to -99 were reserved for system-generated return values, and Books Online for earlier versions of SQL Server specified meanings for values -1 to -14. However, Books Online for SQL 2000 is silent on any such reservations, and does not explain what -1 to -14 would mean.

With some occasional exception, the system stored procedures that Microsoft ships with SQL Server return 0 to indicate success and any non-zero value indicates failure.

While there is no law that requires you to follow the same convention for your stored procedures, my strong recommendation is that you use return values solely to indicate success/failure. If you want to return data such as the id for an inserted row, number of affected rows or whatever, use an OUTPUT parameter instead. It follows from the fact that a blank RETURN may return 0, even if there has been an error during execution, that you should be careful to return an explict value yourself if an error occurs in the procedure.

There is one situation when a stored procedure does not return any value at all, leaving the variable receiving the return value unaffected. This is when the procedure is aborted because of a scope-aborting error. We will look more into this later. There is also one situation when the return value is NULL: this happens with remote procedures and occurs when the batch is aborted on the remote server. (Batch-abortion is also something we will look into more later on.)

There is one curious exception to the rule that @@error is set after each statement: a RETURN without parameters does not change the value of @@error, but leaves the variable unchanged. In my opinion, this is not really practically useful. (I owe this information to a correspondent who gave me this tip by e-mail. Alas, I lost his mail due to problems at my ISP, so I can credit him by name.)

@@rowcount

@@rowcount is a global variable reports the number of affected rows in the most recently executed statement. Just like @@error you need to save it in a local variable if you want to use the value later, since @@rowcount is set after each statement. Since with SET you can only assign variable at a time, you must use SELECT if you need to save both @@error and @@rowcount into local variables:

SELECT @err = @@error, @rowc = @@rowcount

(For this reason, I prefer to always use SELECT for variable assignment, despite Microsoft's recommendations to use SET.)

In T-SQL it is not an error if, for instance, an UPDATE statement did not affect any rows. But it can of course indicate an error in your application, as it could be an error if a SELECT returns more that one row. For these situations, you can check @@rowcount and raise an error and set a return value, if @@rowcount is not the expected value.

@@trancount

@@trancount is a global variable which reflects the level of nested transactions. Each BEGIN TRANSACTION increases @@trancount by 1, and each COMMIT TRANSACTION decreases @@trancount by 1. Nothing is actually committed until @@trancount reaches 0. ROLLBACK TRANSACTION rolls back everything to the outermost BEGIN TRANSACTION (unless you have used the fairly exotic SAVE TRANSACTION), and forces @@trancount to 0, regards of the previous value.

When you exit a stored procedure, if @@trancount does not have the same value as it had when the procedure commenced execution, SQL Server raises error 266. This error is not raised, though, if the procedure is called from a trigger, directly or indirectly. Neither is it raised if you are running with SET IMPLICIT TRANSACTIONS ON.

More on Severity Levels

In this section we will look a little closer on the various severity levels.

0
Messages with Level 0 are purely informational. A PRINT statement produces a message on severity level 0. These messages do not set @@error. Most query tools prints only the text part of a level 0 message. 1-9
These levels, too, are for informational messages/warnings. I cannot recall that I have encountered this from SQL Server, but I've used it myself in RAISERROR at times. Query Analyzer and SQL Management Studio prints the message number, the level and the state, but not the procedure and line number for these messages. 10
This level does not really exist. It appears that SQL Server internally converts level 10 to level 0, both for its own messages when you use level 10 in RAISERROR. 11-16
These levels indicate a regular programming error of some sort. But it is not the case that level 16 is more serious than level 11. Rather it appears to be a somewhat random categorisation. Books Online gives no details on what the levels might mean, but SQL Server MVP Jacco Schalkwijk pointed out to me that there is a drop-down box in the dialog for defining alerts in Enterprise Manager and SQL Management Studio which has description on these levels. Here is what the drop-down box has to say:

11 – Specified Database Object Not Found
12 – Unused
13 – User Transaction Syntax Error
14 – Insufficient Permission
15 – Syntax Error in SQL Statements
16 – Miscellaneous User Error

My experience is that it may not always be this way, but there certain are matches. Deadlock, for instance is level 13. (So now you know what a User Transaction Syntax Error is!)

17-25
Messages with any of these severity levels indicate some sort of resource problem (for instance running out of disk space), or internal error in SQL Server, or a problem with the operating system or hardware. The higher the severity, the more serious problems. These levels are documented in in the setion Troubleshooting->Error Messages->Error Message Formats->Error Message Severity Levels in Books Online. 19-25 To use level 19 or higher in RAISERROR you must use the WITH LOG option, and you must have sysadmin rights. 20-25 Errors with these severity levels are so fatal, that they always terminate the connection.

What Happens when an Error Occurs?

Many programming languages have a fairly consistent behaviour when there is a run-time error. Common is that the execution simply terminates in case of an error, unless you have set up an exception handler that takes care the error. In other languages, some error variable is set and you have to check this variable. T-SQL is confusing, because depending on what error that occurs and in which context it occurs, SQL Server can take no less than four different actions. I first give an overview of these alternatives, followed by a more detailed discussion of which errors that cause which actions. I then discuss two special cases: trigger context and user-defined functions.

The Possible Actions

These are the four main possible actions SQL Server can take:

Statement-termination. The current statement is aborted and rolled back. Execution continues on the next statement. Any open transaction is not rolled back. @@error is set to the number of the error. Since the statement is rolled back, this means that if you run an UPDATE statement that affects 1000 rows, and for one row a CHECK constraint is violated, none of the rows will be updated. But if the UPDATE statement was part of a longer transaction, the effect of the preceding INSERT, UPDATE or DELETE statements are not affected. You need to issue a ROLLBACK TRANSACTION yourself to undo them.

Scope-abortion. The current scope (stored procedure, user-defined function, or block of loose SQL statements, including dynamic SQL) is aborted, and execution continues on the next statement in the calling scope. That is, if stored procedure A calls B and B runs into a scope-aborting error, execution continues in A, just after the call to B. @@error is set, but the aborted procedure does not have a return value, but the variable to receive the return value is unaffected. As for statement-termination, any outstanding transaction is not affected, not even if it was started by the aborted procedure.

Batch-abortion. The execution of the entire batch – that is, the block of SQL statements that the client submitted to SQL Server – is aborted. Any open transaction is rolled back. @@error is still set, so if you would retrieve @@error first in the next batch, you would see a non-zero value. There is no way you can intercept batch-abortion in T-SQL code. (Almost. We will look a possibility using linked servers later on.)

Connection-termination. The client is disconnected and any open transaction is rolled back. In this case there is no @@error to access.

One can note from this, that there are two things that cannot happen:

  • The transaction is rolled back, but execution of the current batch continues.
  • The batch is aborted, but the transaction is not rolled back.

But I like to stress that this is based on my own observations. I have found no documentation that actually states that these two cases cannot occur under any circumstances.

The above caters for most of the error situations in SQL Server, but since a hallmark of the error handling in SQL Server is inconsistency, every now and then I discover some new odd situation. I am overlooking these cases here, not to burden the reader with too many nitty-gritty details.

There is however, one more situation you should be aware of and that is batch-cancellation. The client may at any time tell SQL Server to stop executing the batch, and SQL Server will comply more or less immediately. In this situation SQL Server will not roll back any open transaction. (In the general case that is. It seems that if the T-SQL execution is in a trigger, when the cancellation request comes, then there is a rollback.) However, if the current statement when the cancellation request comes in is an UPDATE, INSERT or DELETE statement, then SQL Server will roll back the updates from that particular statement. Batch-cancellation may occur because an explicit call to a cancellation method in the client code, but the most common reason is that a query timeout in the client library expires. ODBC, OLE DB, ADO and ADO.Net all have a default timeout of 30 seconds. (Which judging from the questions on the newsgroups, many programmers believe to come from SQL Server, but not so.)

When Does SQL Server Take which Action?

As you may guess, it depends on the error which action SQL Server takes, but not only. Context also matters. One is the setting of the command SET XACT_ABORT, which we shall look at in a later section. A special case is trigger context, in which almost all errors abort the batch and this will be the topic for the next section. Right now we will discuss the default context, that is outside triggers and when the setting XACT_ABORT is OFF.

You may guess that the more severe the error is, the more drastic action SQL Server takes, but this is only really true for connection-termination. When it comes to scope-abortion, this occurs for a fairly well-defined family, but I am not sure that I agree with that these errors are less severe than the errors that abort the batch. And there is not really any clear distinction between the errors that abort the batch on the one hand, and those that merely terminate the statement on the other. For this reason, I will first cover connection-termination, then scope-abortion and then the other two together.

Connection-termination

When SQL Server terminates the connection, this is because something really bad happened. The most common reason is an execution error in the SQL Server process itself, e.g. an access violation (that is, attempt to access an illegal memory address), a stack overflow, or an assertion error (a programmer-added check for a certain condition that must be true for his code to work). It could also be a protocol error in the communication between the client library and SQL Server. These errors are normally due to bugs in SQL Server or in the client library, but they can also appear due to hardware problems, network problems, database corruption or severe resource problems.

SQL Server terminates the connection, because it would not be safe to continue execution, as internal process structures may be damaged. In some cases, not only is your connection terminated, but SQL Server as such crashes.

Connection-termination can sometimes be due to errors in your application in so far that you may have written some bad SQL that SQL Server could not cope with. But in such case it is still an SQL Server bug if the connection terminates, because you should get a proper error message. (The error messages in conjunction with connection-termination are often very opaque.)

There is one case, though, where a bug in application code can cause connection-termination on its own, and that is if you have your written your own extended stored procedures or your own OLE objects that you call through the sp_OAxxxxx procedures. An unhandled execution error in such code will terminate your connection – and may crash SQL Server as well.

There is one way to terminate the connection from T-SQL: if you issue a RAISERROR statement with a severity level >= 20. To do this you must provide WITH LOG, and you must be sysadmin. Since errors with severities >= 19 may trigger an operator alert, and eventually may alert someone's pager, don't do this just for fun.

Scope-abortion

This appears to be confined to compilation errors. At least I have not seen it happen with any other sort of error. Due to the feature known as deferred name resolution (in my opinion this is a misfeature), compilation errors can happen during run-time too. Consider this example (you can run it in the Northwind database):

CREATE PROCEDURE inner_sp @productid int AS CREATE TABLE #temp (orderid int NOT NULL, orderdate datetime NOT NULL) PRINT 'This prints.' BEGIN TRANSACTION INSERT #temp (orderid, orderdate) SELECT o.OrderID, o.OrderDate FROM Orders WHERE EXISTS (SELECT * FROM [Order Details] od WHERE od.OrderID = o.OrderID AND od.ProductID = @productid) COMMIT TRANSACTION PRINT 'This does not print.' go CREATE PROCEDURE outer_sp AS DECLARE @ret int SET @ret = 4711 EXEC @ret = inner_sp 76 PRINT '@@error is ' + ltrim(str(@@error)) + '.' PRINT '@@trancount is ' + ltrim(str(@@trancount)) + '.' PRINT '@ret ' + coalesce(ltrim(str(@ret)), 'NULL') + '.' IF @@trancount > 0 ROLLBACK TRANSACTION go EXEC outer_sp go

Because the table #temp does not exist when you create inner_sp, SQL Server defers examination of the entire INSERT-SELECT statement until run-time. Again, when you invoke inner_sp, SQL Server cannot find #temp and defers building a query plan for the INSERT-SELECT statement until it actually comes to execute the statement. It is first at this point, that SQL Server discovers that the SELECT statement is incorrect (the alias for Orders is missing). And at that precise point, the execution of inner_sp is aborted. Here is the output:

This prints. Server: Msg 266, Level 16, State 2, Procedure inner_sp, Line 18 Transaction count after EXECUTE indicates that a COMMIT or ROLLBACK TRANSACTION statement is missing. Previous count = 0, current count = 1. Server: Msg 107, Level 16, State 1, Procedure inner_sp, Line 9 The column prefix 'o' does not match with a table name or alias name used in the query. Server: Msg 107, Level 16, State 1, Procedure inner_sp, Line 9 The column prefix 'o' does not match with a table name or alias name used in the query. Server: Msg 107, Level 16, State 1, Procedure inner_sp, Line 9 The column prefix 'o' does not match with a table name or alias name used in the query. @@error is 266. @@trancount is 1. @ret is 4711.

Note the next-to-last line in the output: inner_sp started a transaction. But just because inner_sp was aborted does not mean that the transaction was rolled back. When you implement you error handling, this is something you need to consider, and I look closer at this in the accompanying article on error handling.

Also observe that @ret never was set, but retained the value it had prior to the call.

Not all compilation errors passes unnoticed when SQL Server loads procedure. A pure syntax error like a missing parenthesis will be reported when you try to create the procedure. But the list of errors not detected because of deferred name resolution is longer than you might expect. After all, one would expect SQL Server be able to detect the missing alias even if #temp is missing. With some effort, it could even detect the missing alias with the Orders table missing, couldn't it?

Actually, I can offer a way to avoid this problem altogether. On http://www.abaris.se/abaperls/ I have made available a toolset as freeware that includes a load tool, ABASQL. Before creating a procedure, ABASQL extracts all temp tables in the procedure and creates them, so that SQL Server will flag errors such as missing aliases or columns. ABASQL also checks the SQL code for references to non-existing tables.

Statement-termination and Batch-abortion

These two groups comprise regular run-time errors, such as duplicates in unique indexes, running out of disk space etc. As I have already have discussed, which error that causes which action is not always easy to predict beforehand. This table lists some common errors, and whether they abort the current statement or the entire batch.

I am only able to make out a semi-consistency. Some real fatal errors after which I would not really be interested in continuing execution do abort the batch. The examples here are deadlock victim and running out of disk space. But why would it be more severe to pass a superfluous parameter to a parameterless one, than to one that has parameters? And conversion errors? Are they more severe than a constraint violation? And why not all conversion errors? (We will return to conversion errors, as well as arithmetic errors that I purposely excluded from this table, when we discuss the SET commands ANSI_WARNINGS and ARITHABORT. They belong to the small et of errors, where you have some sort of a choice.)

And don't look to severity levels for help. As noteed above, the severity levels 11-16 is another classification, that don't reflect any difference in severity. Most of the errors above have severity level 16, but being a deadlock victim has severity level 13. (Running out of a disk space, which is a resource problem, is level 17.)

Trigger Context

You have trigger context when you are in a trigger, or you are in a stored procedure, user-defined function or block of dynamic SQL that has been called directly or indirectly from a trigger. That is, somewhere on the call stack, there is a trigger. If you are in trigger context, all errors terminate the batch and roll back the transaction on the spot. (Connection-terminating errors still terminate the connection, of course.)

Well, almost. When it comes to error handling in SQL Server, no rule is valid without an exception. Errors you raise yourself with RAISERROR do not abort the batch, not even in trigger context. Neither does error 266, Transaction count after EXECUTE indicates that a COMMIT or ROLLBACK TRANSACTION statement is missing. This error is simply not raised at all when this condition occurs in trigger context. No, this is not a bug, but it is documented in Books Online, and according to Books Online, error 266 is informational only. (Now, taste that concept: an informational error.)

There is one more way that a trigger can terminate the batch. This happens if @@trancount is 0 when the trigger exits. A trigger always executes in the context of a transaction, since even if there is no multi-statement transaction in progress each INSERT, UPDATE and DELETE statement is its own transaction in SQL Server, and the trigger is part of that transaction. Thus, @@trancount is at least 1 when you enter a trigger, and if it is 0 on exit this means that somewhere has been a ROLLBACK statement. (Or sufficiently many COMMIT to bring @@trancount to 0.) Why this would have to abort the batch? Because the sky is blue. Seriously, I don't know, but it has always been that way, and there is no way you can change it.

The normal use for this is that if you have an integrity check in a trigger you raise a message and roll back the transaction, as in this example.

IF EXISTS (SELECT * FROM inserted i JOIN abainstallhistory inh ON i.inhid = inh.inhid WHERE inh.ss_label <> i.ss_label OR inh.ss_label IS NULL AND i.ss_label IS NOT NULL OR inh.ss_label IS NOT NULL AND i.ss_label IS NULL) BEGIN ROLLBACK TRANSACTION RAISERROR('Values on ss_label does not match abainstallhistory.', 16, 1) RETURN END

Thus, this trigger aborts the batch, not because of the RAISERROR, but because of the ROLLBACK TRANSACTION, and since the trigger is permitted to execute to end, the RAISERROR statement is executed.

Errors in User-Defined Functions

User-defined functions are usually invoked as part of a SET, SELECT, INSERT, UPDATE or DELETE statement. What I have found is that if an error appears in a multi-statement table-valued function or in a scalar function, the execution of the function is aborted immediately, and so is the statement the function is part of. Execution continues on the next line, unless the error aborted the batch. In either case, @@error is 0. Thus, there is no way to detect that an error occurred in a function from T-SQL.

The problem does not appear with inline table-functions, since an inline table-valued function is basically a macro that the query processor pastes into the query.

You can also execute scalar functions with the EXEC statement. In this case, execution continues if an error occurs (unless it is a batch-aborting error). @@error is set, and you can check the value of @@error within the function. It can be problematic to communicate the error to the caller though.

Control Over Error Handling

No, SQL Server does not offer much in this area, but we will look at the few possibilities, of which the most important is SET XACT_ABORT ON.

SET XACT_ABORT

What I have said this far applies to when XACT_ABORT is OFF, which is the default. When you issue SET XACT_ABORT ON, the very most of the statement-terminating errors instead become batch-aborting errors. Thus, if you don't want to litter your T-SQL code with checks on @@error, and if you are not interested in trying to recover from the error or invoke some error-logging routine in T-SQL, but you are content with execution being aborted on first error, then XACT_ABORT is for you.

Beware, though, that even when XACT_ABORT is ON, not all errors terminate the batch. Here are the exceptions I know of:

  • Errors you raise yourself with RAISERROR.
  • Compilation errors (which normally terminate the scope) do not terminate the batch.
  • Error 266, Transaction count after EXECUTE indicates that a COMMIT or ROLLBACK TRANSACTION statement is missing.

So at a minimum you still need to check @@error after the execution of a stored procedure or a block of dynamic SQL even if you use XACT_ABORT ON.

ARITHABORT, ARITHIGNORE and ANSI_WARNINGS

These three SET commands give you very fine-grained control for a very small set of errors. When a division by zero or an overflow occurs, there are no less four choices.

  • No action at all, result is NULL – when ARITHIGNORE is ON.
  • Warning message, result is NULL – when all are OFF.
  • Statement-termination – when ANSI_WARNINGS is ON.
  • Batch-abortion – when ARITHABORT is ON and ANSI_WARNINGS is OFF.

ARITHABORT and ARITHIGNORE also control domain errors, such as attempt to take the square root of a negative number. But this is error is not covered by ANSI_WARNINGS, so here you only have three choices.

As for what is an overflow, SQL Server has extended the domain of this error to datetime value in a way which is not really intuitive. Consider these two statements:

select convert(datetime, '2003123') -- This causes a conversion error select @@error go select convert(datetime, '20031234') -- This causes an overflow select @@error

Thus, if you have a string which conforms syntactically to some date format, but some element is out of range, this particular form of conversion error only aborts the batch with a certain setting – and in other settings it may not cause an error at all.

ANSI_WARNINGS controls a few more errors and warnings. With ANSI_WARNINGS ON, it is an error to assign a character or binary column a value that exceeds the the maximum length of the column, and this terminates the statement. When ANSI_WARNINGS is OFF, this condition is not an error, but the value is silently truncated. The error is never raised for variable assignment. Also, with ANSI_WARNINGS ON, if an aggregate function such as SUM() or MIN() sees a NULL value, you get a warning message. (Thus it does not set @@error, nor terminate the statement.)

When you use ODBC, OLE DB and Query Analyzer (SQL 2000), ANSI_WARNINGS is ON by default. Since some features (indexed views, index on computed columns and distributed queries) in SQL Server requires ANSI_WARNINGS to be ON, I strongly recommend that you stick to this. Indexed views and index on computed columns also require ARITHABORT to be ON, but I don't think you can rely on it being ON by default.

Finally, I should mention that there is one more SET command in this area: NUMERIC_ROUNDABORT. When ON, the batch is aborted if operation with a decimal data type results in loss of precision. The option is OFF by default, and it must be OFF for indexed views and indexes on computed columns to work.

RAISERROR WITH NOWAIT

SQL Server buffers the output, so an error message or a result set may not appear directly at the client. In many cases, this is not an issue, but if you are running a long-running procedure, you may want to produce diagnostic messages. To have them displayed immediately in the client, you can use the WITH NOWAIT clause to the RAISERROR statement, as in this example:

PRINT 'This message does not display immediately' WAITFOR DELAY '00:00:05' RAISERROR ('But this one does', 0, 1) WITH NOWAIT WAITFOR DELAY '00:00:05' PRINT 'It''s over now'

Once there is a message with NOWAIT, all that is ahead of the message in the buffer is also passed to the client.

Unfortunately, there is a bug in SQL Server with NOWAIT, which affects you only if you are calling a procedure through RPC (remote procedure call), so that it this case, SQL Server buffers the messages nevertheless. RPC is the normal way to call a procedure from an application (at least it should be), but if you are running a script from OSQL or Query Analyzer, this bug does not affect you.

Duplicates

Normally when you try to insert a value that would be a duplicate in a unique index, this is an error and the statement is rolled back. However, the syntax for the CREATE INDEX statement includes the option IGNORE_DUP_KEY. When this option is in effect, duplicates are merely discarded. The statement is not rolled back, and if the INSERT statement compassed several rows, the rows that do not violate the uniqueness of the index are inserted.

According to Books Online, SQL Server issues a warning when ignoring a duplicate row. However, in real life the message has severity level 16, and thus comes across to the client as an error. Nevertheless, SQL Server does not set @@error, and as I noted the statement is not rolled back, this message falls in none of four categories I have presented. Microsoft has acknowledged the incorrect severity level as a bug, so hopefully this will be fixed in some future version of SQL Server.

This option applies to unique indexes only. It is not available for PRIMARY KEY or UNIQUE constraints.

Using Linked Servers

There is no way to switch off batch-abortion on a general level. But there is actually one way to handle the case in T-SQL, and that is through linked servers. If you call a remote stored procedure, and the procedure runs into a batch-aborting error, the batch in the calling server is not aborted. On return to the local server, @@error holds the value of the error that aborted the batch on the remote server, and the return value of the stored procedure is set to NULL. (At least my tests indicate this. Thus, it is not the same case as when a local procedure dies with scope-abortion, when the return value is not set at all.) It goes without saying, that this is a fairly lame workaround that is only applicable in special situations.

Some notes:

  • It must be a truly remote server. If you call a procedure in the local server with four-part notation, SQL Server is too smart for you.
  • Set up the remote server with SQLOLEDB. When I set up the remote server with the OLE DB-over-ODBC provider (MSDASQL), the diagnostics about the error was poorer on the calling server.

Retrieving the Text of an Error Message

There is no supported way to retrieve the full text of an error message in SQL 2000. You can get a text from master.dbo.sysmessages, but then you only get placeholders for interesting things like which constraint that was violated. To get the full text of the error message in a proper way, you need a client to pick it up and log it.

However, Mark Williams pointed out you can retrieve the full mesage text from within T-SQL with help of DBCC OUTPUTBUFFER. To wit, after an error has been raised, the messge text is in the output buffer for the process.

The output from DBCC OUTPUTBUFFER is a single colunm, where each row as a byte number, a list of hex values, and a textual representation of the hex values. Mark made the effort to extract message from the last part, and was kind to send me a stored procedure he had written. As I looked at the output from DBCC OUTPUTBUFFER, I found a byte that appeared to hold the length of the message, which helped me to improve Mark's procedure.

There is a very significant restriction with this trick: to run DBCC OUTPUTBUFFER you need sysadmin rights even to look at your own spid, so you cannot put this in an application that is to be run by plain users.

Here is the stored procedure that Mark and I produced:

CREATE PROCEDURE showErrorMessage @errmsg nvarchar(500) OUTPUT AS DECLARE @dbccrow nchar(77), @msglen int, @lenstr nchar(2), @sql nvarchar(2000), @s tinyint -- Catch the output buffer. CREATE TABLE #DBCCOUT (col1 nchar(77) NOT NULL) INSERT INTO #DBCCOUT EXEC ('DBCC OUTPUTBUFFER(@@spid)') -- Set up a cursor over the table. We skip the first -- row, because there is nothing of interest. DECLARE error_cursor CURSOR STATIC FORWARD_ONLY FOR SELECT col1 FROM #DBCCOUT WHERE left(col1, 8) <> replicate('0', 8) ORDER BY col1 -- Init variable, and open cursor. SELECT @errmsg = '' OPEN error_cursor FETCH NEXT FROM error_cursor INTO @dbccrow -- On this first row we find the length. SELECT @lenstr = substring(@dbccrow, 15, 2) -- Convert hexstring to int SELECT @sql = 'SELECT @int = convert(int, 0x00' + @lenstr + ')' EXEC sp_executesql @sql, N'@int int OUTPUT', @msglen OUTPUT -- @s is where the text part of the buffer starts. SELECT @s = 62 -- Now assemble rest of string. WHILE @@FETCH_STATUS = 0 AND datalength(@errmsg) - 1 < 2 * @msglen BEGIN SELECT @errmsg = @errmsg + substring(@dbccrow, @s + 1, 1) + substring(@dbccrow, @s + 3, 1) + substring(@dbccrow, @s + 5, 1) + substring(@dbccrow, @s + 7, 1) + substring(@dbccrow, @s + 9, 1) + substring(@dbccrow, @s + 11, 1) + substring(@dbccrow, @s + 13, 1) + substring(@dbccrow, @s + 15, 1) FETCH NEXT FROM error_cursor INTO @dbccrow END CLOSE error_cursor DEALLOCATE error_cursor -- Now chop first character which is the length, and cut after end. SELECT @errmsg = substring(@errmsg, 2, @msglen)

All sorts of disclaimers apply: we have tested this on a couple of cases, and it seems to work. But it depends on undocumented behaviour. Use at your own risk.

TRY-CATCH in SQL 2005

Next version of SQL Server, SQL 2005, code-named Yukon, introduces significant improvements to the error handling in SQL Server. Here is a simple example:

BEGIN TRY SELECT convert(smallint, '2003121') END TRY BEGIN CATCH PRINT 'errno: ' + ltrim(str(error_number())) PRINT 'errmsg: ' + error_message() END CATCH

The output is:

errno: 244 errmsg: The conversion of the varchar value '2003121' overflowed an INT2 column. Use a larger integer column.

The construct is similar to error-handling concepts in languages like C++. If an error occurs in the TRY block, or in a stored procedure called by the TRY block, execution is transferred to the CATCH block. In the CATCH block, you have access to six new functions: error_number(), error_severity(), error_state(), error_message(), error_procedure() and error_state(), that gives you all parts of the message associated with the error. And, yes, error_message(), is the expanded message with the parameters filled in.

If you are in a transaction, and the error occurred is a batch-abortion error, your transaction will be doomed. This means that you cannot commit or perform any more updates within the transaction – you must roll back.

One caveat is that if you catch an error in this way, the client will never see the error, unless you call RAISERROR in the error handler. Unfortunately, you cannot reraise the exact error message, since RAISERROR does not permit you to use error numbers less than 50000.

Client-side Error Handling

The various client libraries from which you can access SQL Server have their quirks too. Some libraries are low-level libraries like DB-Library, ODBC and the SQLOLEDB provider. Others are higher-level libraries that sit on top of one of the low-level libraries, one example is ADO. If the low-level library has some quirk or limitation, the high-level library is likely to inherit that. The high-level library might also add its own quirks and limitations.

I am covering four libraries here: DB-Library, ODBC, ADO and ADO .Net, although the first two I discuss very briefly, since most devleopers today use ADO or ADO .Net.

DB-Library

When it comes to error handling, DB-Library is probably the best in the game. When SQL Server produces a message – be that an error, a warning or just an informational message such as a PRINT statement – DB-Library invokes a callback routine, and in that callback routine you have full access to all parts of the message: error number, severity level, state, procedure, line number and of course the message itself. You can then set some global variable to determine what should happen when you come back from the DB-Library call that caused the error.

Unfortunately, Microsoft stopped developing DB-Library with SQL 6.5, and you have poor or no support for new features in SQL Server with DB-Library. Thus, I cannot but discourage you from using DB-Library.

ODBC

With ODBC, you have to rely on return-status values, and then retrieve the error message yourself. Exactly how, I have to admit that I am bit foggy on at this point. However, you do have access to all parts of the error message, and you get all messages. This is evidenced by the fact that you get all this information in Query Analyzer which connects through ODBC.

ADO

ADO is not that good when it comes to error handling. First, you don't have full access to the error message. You only get the error number and the error text. You do not get the severity level (so you don't know whether really is an error at all), nor do you get state, procedure or line number. You do get something called SQLState, which is a five-letter code, not related to SQL Server but inherited from ODBC. Another problem is that you do far from always get all error messages, as I will detail below.

The basic operation with ADO appears simple: You submit a command to SQL Server and if there is an error in the T-SQL execution, ADO raises an error, and if you have set up an error handler with On Error Goto, this is where you will wind up. (On Error is for Basic-derived languages. In C++ I suppose you can use try-catch, but I have not verified this.) You can retrieve all messages from SQL Server in the Errors collection on the Connection object.

But there are quite some surprises hiding here. One thing that makes ADO complicated, is that there are so many ways that you can submit a command and retrieve the results. Partly, this is due to that ADO permits you to access other data sources than SQL Server, including non-relational ones. Also, as your "command" you can simply provide a table name. Since this text is about error handling with stored procedures in SQL Server, I disregard other possibilities. But even if you want to invoke a stored procedure, there are a whole lot of choices:

  • Which provider. You can use SQLOLEDB or MSDASQL (OLE DB over ODBC).
  • Cursor location. Server-side cursor or client-side cursor? (The concept of a cursor in this context confused me for a long time. Being an SQL programmer, I think cursors are bad and should be avoided. Eventually, I have understood that a client-side cursor is not really a cursor at all. You get the entire data to the client in one go. A Server-side cursor gets the data from the server in pieces, which may or may not involve an SQL cursor, depending on the cursor type.)
  • From which object to invoke the stored procedure. You can use the .Execute method of the Connection and Command objects or the .Open method of the Recordset object.
  • Command type. You can construct an EXEC command as a string and use adCmdText. You can also use adCmdText with ODBC syntax and supply parameters through the .Parameters collection. And you can use adCmdStoredProc to supply the name of a stored procedure and use the .Parameters collection.
  • Cursor type. Cursors can be forward-only, static, dynamic or keyset.
  • Lock type. You can choose between read-only, optimistic, batch optimistic and pessimistic.

And that's not really all.

What errors you see in your client code, depends on which combination of all these parameters you use. I developed a form, from which I could choose between these parameters, and then I played with a fairly stupid stored procedure which depending on input could cause some errors, generate some PRINT messages and produce some results sets. And there was a great difference in what I got back. When I used SQLOLEDB and client-side cursors, I did not get any of my two PRINT messages in my .Errrors collection if there were no errors, whereas with SQLOLEDB and server-side cursors I got both messages. With MSDASQL, I got the first PRINT message, but not the second, no matter the cursor location.

If there were error messages, I did not always get all of them, but at least one error was communicated and an error was raised in the VB code. However, there is a gotcha here, or two depending on how you see it. The first gotcha is that if the stored procedure produces one or more recordsets before the error occurs, ADO will not raise an error until you have walked past those preceding recordsets with .NextRecordset. This is not peculiar to ADO, but as far as I know applies to all client libraries, and is how SQL Server pass the information to the client. The only odd thing with ADO is that many programmers do not use .NextRecordset, or even know about it. I have also found that in some situations ADO may raise an error and say that .NextRecordset is not supported for your provider or cursor type.

The second gotcha is that your procedure may have more recordsets than you can imagine. To wit, INSERT, UPDATE and DELETE statements generate recordsets to report the rowcount, unless the setting NOCOUNT is ON.

Another irritating feature with ADO that I found, was that as soon there had been an error in the stored procedure, all subsequent result sets from the stored procedure were discarded. I could still tell from the return value of the stored procedure that execution had continued. I have found no combination where you can get the result sets that were produced after an error.

ADO also takes the freedom to make its own considerations about what is an error. I found that ADO always considers division by zero to be an error, even if both ARITHABORT and ANSI_WARNINGS are OFF. In this case, SQL Server merely produces a warning, but ADO opts to handle this warning as an error. A good thing in my opinion. Of what I have found, this only happens with division by zero; not with arithmetic errors such as overflow.

Above I said that even if I did not get all errors from SQL Server, ADO would raise an error. This is true as long as we are talking about commands you submit yourself. But ADO can submit commands behind your back, and if they result in errors, ADO may not alert you – even if the abort the batch and thereby rollback any outstanding transaction. This ugly situation is described further in KB article 810100.

Finally, a note on the return value and value of output parameters from a stored procedure. They are accessible from ADO, even if there is an error during execution of the stored procedure (as long the error does causes the procedure to terminate execution). If you use a client-side cursor you can normally access them directly after executing the procedure, whereas with a server-side cursor you must first retrieve all rows in all result sets. (Which means that if .NextRecordset is not supported for your cursor, you may not be able to retrieve the return value.) Beware that if you try to retrieve these values too soon, you will not be able to retrieve them even when you have retrieved all rows.

It is not really the topic for this text, but the reader might want to know my recommendation of what to choose from all these possibilities. And I say that you should use the SQLOLEDB provider (note that MSDASQL is the default), client-side cursors (note that server-side cursors is the default), invoke your stored procedures from the Command object, using adCmdStoredProcedure. Not because this is the best for error handling, but this appears to be the best from an overall programming perspective. (If you make these choices you will get a static read-only cursor.)

ADO .Net

Note: this applies to ADO .Net 1.1. Since some behaviour I describe may be due to bugs or design flaws, earlier or later versions of ADO .Net may be different in some points.

To some extent, ADO .Net is much better fitted than ADO to handle errors and informational messages from SQL Server, but unfortunately neither ADO .Net is without shortcomings.

The ADO .Net classes can be divided into two groups. The disconnected classes that are common for all data sources, and the connected classes that are data-source specific, but.derived from a common interface. A group such of connected classes makes up a .Net Data Provider and each provider has its own name space. Three providers can connect to SQL Server: There is SqlClient, which is specific to SQL Server, and there are the OLE DB and ODBC .Net Data Providers that connect to anything for which there is an OLE DB provider or an ODBC driver. I will refer to them here as OleDb and Odbc, as this is how their namespaces are spelled in the .Net Framework.

If the only data source you target is SQL Server, SqlClient is of course the natural choice. As we shall see, however, there are situations where OleDb may be preferrable. There is even the odd case where Odbc is the best choice, but as I will detail later, you do best to avoid Odbc when connecting to SQL Server.

The three data providers have some common characteristics when it comes to handling of errors and messages from SQL Server, but there are also significant differences. I will first cover the common features.

To invoke a stored procedure from ADO .Net, you need a Command object. (SqlCommand, OleDbCommand or OdbcCommand). Normally you specify the CommandType as StoredProcedure and provide the procedure name as the command text, but you can also use the CommandType Text and specify an EXEC statement.

There are four methods that you can use to invoke a stored procedure from ADO .Net, and I list them here in the order you are most likely to use them:

DataAdapter.Fill Fills a DataTable or a DataSet with the data from the stored procedure. The are several overloaded Fill methods, some of which permit you to pass a CommandBehavior to specify that you want key or schema information, or that you want only a single row or a single result set. ExecuteNonQuery Performs a command that does not return any result set (or if it does, you are not interested in it). One example is a store procedure that updates data. ExecuteReader Returns a DataReader object, through which you can access the rows as they come from SQL Server. If there are several result sets, you use .NextResult to traverse them. This is the most general method to access data. Also here you can specify CommandBehavior. ExecuteScalar Use this method to run a command that produces a result set of a single value.

To test the possible variations, I wrote a simple application in VB .Net, from which I could pass an SQL command or a stored procedure, and select which data provider and which call method to use. For most of the tests, I used a procedure that depending on input parameters would produce results sets, informational or error messages, possibly interleaved. What follows is based on my observations when playing with this application.

If an error occurs during execution of a stored procedure, the method you used to invoke the procedure will raise an exception. Thus, you should always call these methods within a Try-Catch block, so that you can handle the error message in some way. In the exception handler you have access to a provider-specific Exception object with an ErrorCollection, that containts information about the error. What information that is available is specific for the provider.

If you are interested in informational messages, that is messages with a severity ≤ 10, you can set up an InfoMessage event handler, which you register with the Connection object. It seems, though, if there are both errors and informational messages, that the informational messages comes with the exception. In the event handler, too, you have access to the ErrorsCollection from where you can retrieve the individual messages.

As long as you stick to Fill, ExecuteNonQuery and ExecuteScalar, your life is very simple, as all data has been retrieved once you come back, and if there is an error you wind up in your exception handler. Thus, in difference to ADO, you don't have to bother about unexpected result sets and all that. If you want the return value of a stored procedure or the value of output parameters, these are available in the Parameters collection. However, the OleDb and Odbc providers normally do not fill in these values, if an error occurs during execution of a stored procedure.

If you use ExecuteReader, there are a few extra precautions. If the stored procedure first produces a result set, and then a message, you must first call .NextResult before you get an exception, or, for an informational message, any InfoMessage event handler is invoked. In difference to ADO, ADO .Net does not produce extra result sets for the rowcount of of INSERT, UPDATE and DELETE statements. However, under some circumstances, errors and messages may give cause to extraneous result sets.

Beware that if .NextResult throws an exception, it does not return a value, so if you have something like:

Do .... Try more_results = reader.NextResult() Catch e as Exception MsgBox(e.Message) End Try Loop Until Not more_results more_results retains the value it had before you called .NextResult. (Caveat: I'm not an experienced .Net programmer, but this is my observation.)

To get the return value from a stored procedure and the value of output parameters when you use ExecuteReader, you first have to retrieve all rows and all result sets for these values to be available.

Just like ADO, ADO .Net can sometimes generate commands behind your back; this appears mainly to happen when you use the CommandBehaviors KeyInfo and SchemaOnly. But in difference to ADO, ADO .Net communicates any SQL errors from these extra commands, and throws an exception in this case too.

So far, it may seem that ADO .Net is lot more well-behaving than ADO. To some extent it is, but I will now will procede to the specifics for each data provider, and this mainly deals with their respective shortcomings.

SqlClient

One very nice thing with SqlClient, is that the SqlError class includes all components of an SQL Server message: server, error number, message text, severity level, state, procedure and line number.

Another good thing with SqlClient, is that in difference to the other two providers, you do almost always get the return value and the value of output parameters from a stored procedure, even if there is an error during execution (provided that the error does not terminate the execution of the procedure, of course.).

But there are a couple of bad things too:

  • If the procedure produces more than one error, you only get one error message, unless you are using ExecuteNonQuery. This may be addressed by the fix described in KB 823679.
  • If the procedure produces an error before the first result set, you cannot access any data with any of the methods. (ExecuteReader does not even return a SqlDataReader object.) If you need to access data in this case, Odbc is your sole possibility.
  • If the stored procedure produces a result set, then an error, then another result set, there is only one way to retrieve the second result set: use ExecuteReader and be sure to have SET NOCOUNT ON. If you run with NOCOUNT OFF, things can go really bad, and data may linger on the connection and come back when the connection is reused from the pool. Eventually SqlClient may get stuck in an infinite loop or throw some nonsensical exception.
  • RAISERROR WITH NOWAIT does not work with ExecuteNonQuery, but the messages are buffered as if there was no NOWAIT. Use any of the other methods, if you need RAISERROR WITH NOWAIT. (Note that to use NOWAIT; you must use CommandType Text, and a single unparameterized SQL string, due to a bug in SQL Server.)

OleDb

In an OleDbErrorCollection, you don't have access to all information about the error from SQL Server, but only the message text and the message number.

Notes on OleDb:

  • If there is an error message during execution, OleDb does in most situations not provide the return value of the stored procedure or the value of any output parameters.
  • If the procedure produces more than one error, you only get one error message if NOCOUNT is OFF. If NOCOUNT is ON, you may get all messages, unless there are result sets interleaved with the messages. For some reason the error messages comes in reverse order.
  • If the procedure produces an error before the first result set, you cannot access any data with any of the methods. (ExecuteReader does not even return a OleDbDataReader object.) If you need to access data in this case, Odbc is your sole possibility.
  • If the stored procedure produces a result set, then an error, then another result set, there is only one way to retrieve the second and successive result sets: use ExecuteReader and be sure to have SET NOCOUNT OFF. If you have NOCOUNT ON, you will still get a lot of result sets, but most of them will be empty.
  • RAISERROR WITH NOWAIT does not always work with OleDb, but the messages are sometimes buffered. I have not been able to find a pattern for this. For NOWAIT to work at all, you must use CommandType Text, because a bug in SQL 2000,

Odbc

In an OdbcErrorCollection, you don't have access to all information about the error from SQL Server, but only the message text and the message number.

Odbc has all sorts of problems with errors and informational messages. If there are several informational messages, Odbc may lose control and fail to return data, including providing the return value and the values of output parameters of stored procedures. It does not matter whether you have declared an InfoMessage event handler. If there are error messages, and you try to retrieve data, you may get exceptions from the ODBC SQL Server driver saying Function sequence error or Associated statement not prepared.

If there are error messages before any result sets are produced, Odbc may not throw an exception for the first error message, but only invoke your InfoMessage event handler. And if you don't have one, you will not even notice that there was an error. Under some circumstances more than one error message may be dropped this way.

Some of these problems may go away if you run with SET NOCOUNT ON, but not all. In general therefore, I'll advice against using the Odbc .Net Data Provider to access SQL Server.

Still, there is one situation where Odbc is your sole choice, and that is if you call a stored procedure that first produces an error message and then a result set. The other two providers never return any data in this situation. With Odbc you can do it – but it is a narrow path to follow. You must have SET NOCOUNT ON. If you only have one result set, you can probably use OdbcDataAdapter.Fill. If there are more than one result set, you must use ExecuteReader, and you must specify the CommandBehavior SingleResult (!). You may get an exception about Function Sequence Error at the end, but by then you have retrieved all your data.

Acknowledgements and Feedback

Thanks to Trevor Morris who pointed out the tidbit on IMPLICIT_TRANSACTIONS and error 266, Mark Williams who investigated DBCC OUTPUTBUFFER and SQL Server MVP Jacco Schalkwijk who found definition of the severity levels 11-16.

If you have suggestions for improvements or corrections on contents, language or formatting, please mail me at esquel@sommarskog.se. If you have technical questions that any knowledgeable person could answer, I encourage you to post to any of the newsgroups microsoft.public.sqlserver.programming or comp.databases.ms-sqlserver.

For more articles error-handling in .Net, check out ErrorBank.com.

Revision History

2006-01-21 - Added a section on how to retrieve the text from an error message, and a description of severity levels 11-16.

2004-12-26 – Rewritten the section on TRY-CATCH in SQL 2005, to adapt for the Beta 2 changes.

2004-05-30 – One more case where error 266 is not raised.

2004-02-17 – Added a section on ADO .Net.

2003-12-03 – Rewrote section on the RETURN statement. Added section on TRY-CATCH in Yukon.

How to Share Data Between Stored Procedures

Top of the Page

An SQL text by Erland Sommarskog, SQL Server MVP.

An older version of this article is available in Spanish, translated by Simon Hayes.

This article tackles two related questions:

  • How can I use the result set from one stored procedure in another, also expressed as How can I use the result set from a stored procedure in a SELECT statement?
  • How can I pass a table as a parameter from one stored procedure to another?

In this article I will discuss a number of methods, and also point out their advantages and drawbacks. Some of the methods apply only when you want to reuse a result set, whereas others apply in both situations. In the case you want to reuse a result set, most methods require you to rewrite the stored procedure in one way or another, but there are some methods that do not.

This table summarises the methods:

Method Input/ Output SQL Server versionsComment Using OUTPUT Parameters Output All Not generally applicable, but sometimes overlooked. Table-valued Functions OutputSQL 2000 Probably the best method for output, but has some restrictions. Inline Functions Use this when you want to reuse a single SELECT. Multi-statement Functions When you need to encapsulate more complex logic. Using a Table In/OutAll Most general methods with no restrictions, but a little more complex to use. Sharing a Temp Table Mainly for single pair of caller/callee. Process-keyed Table Best choice for many callers to same callee. INSERT-EXEC OutputSQL 6.5 Does not require rewrite. Has some gotchas. OPENQUERY OutputSQL 7 Does not require rewrite. Tricky with many pitfalls. Using XML In/OutSQL 2005 More cute than useful? Using Cursor Variables OutputSQL 7 Not recommendable.

A question very close to the second question above is how to pass table data from a client, but taat is a topic which is a bit outside the scope of this text. Of the methods that I discuss in this article, only XML is useful for this case. For a more general discussion on passing structured data from client to SQL Server, see my article Arrays and Lists in SQL Server.

OUTPUT Parameters

This method can only be used when the result set is one single row. Nevertheless, this is a method that is sometimes overlooked. Say you have this simple stored procedure:

CREATE PROCEDURE insert_customer @name nvarchar(50), @address nvarchar(50), @city nvarchar(50) AS DECLARE @cust_id int BEGIN TRANSACTION SELECT @cust_id = coalesce(MAX(cust_id), 0) + 1 FROM customers (UPDLOCK) INSERT customers (cust_id, name, address, city) VALUES (@cust_id, @name, @address, @city) COMMIT TRANSACTION SELECT @cust_id That is, the procedure inserts a row into a table, and returns the id for the row.

Rewrite this procedure as:

CREATE PROCEDURE insert_customer @name nvarchar(50), @address nvarchar(50), @city nvarchar(50), @cust_id int OUTPUT AS BEGIN TRANSACTION SELECT @cust_id = coalesce(MAX(cust_id), 0) + 1 FROM customers (UPDLOCK) INSERT customers (cust_id, name, address, city) VALUES (@cust_id, @name, @address, @city) COMMIT TRANSACTION

You can now easily call insert_customer from another stored procedure. Just recall that in T-SQL you need to specify the OUTPUT keyword also in the call:

EXEC insert_customer @name, @address, @city, @cust_id OUTPUT

Note: this example has a single output parameter, but a stored procedure can have many output parameters.

If you take this path, you need to learn how to use OUTPUT parameters with your client API. As this is not a text on client programming, I leave it to the reader to explore how to do that. I let suffice to say that whichever client library you are using, it can be done. I also like to add that using OUTPUT parameters rather than result sets (where this is possible) results in simpler client code, as you don't have to open recordsets and that. You can also expect better performance, although it is only likely to exhibit if you run a tight loop and call the same procedure all over again.

Table-valued User-defined Functions

User-defined functions were introduced in SQL 2000 and there are in fact three kinds of them: 1) scalar functions 2) inline table-valued functions. 3) multi-statement table-valued functions. It is only the latter two that are of interest here.

I am not going to give an in-depth description of user-defined functions, but just give some quick examples. For full details, please see the CREATE FUNCTION topic in Books Online.

Table-valued functions are far more generally applicable than output parameters, but as we shall see, there are still quite a few restrictions, so there are situations where you cannot use them.

Inline Functions

Here is a example of an inline function taken from Books Online:

CREATE FUNCTION SalesByStore (@storeid varchar(30)) RETURNS TABLE AS RETURN (SELECT title, qty FROM sales s, titles t WHERE s.stor_id = @storeid and t.title_id = s.title_id) To use it, you simply say: SELECT * FROM SalesByStore('6380') That is, you use it just like was a table or a view. Inline functions are indeed much like a parameterized view, because the query optimizer expands the function as if it was a macro, and generates the plan as if you had provided the expanded query. Thus, there is no performance cost for packaging a SELECT statement into a table-valued function. For this reason, when you want to reuse a stored procedure that consists of a single SELECT statement, rewriting it into an inline UDF is without doubt the best choice. (Or instead of rewriting it, move the SELECT into a UDF, and rewrite the existing procedure as a wrapper on the function, so that the client is unaffected.)

When you are on SQL 2000, there is one restriction, though: you cannot use system functions that are nondeterministic, that is they do not return the same value for the same input parameters on each call. A typical example is getdate(). This restriction has been lifted in SQL 2005.

Multi-statement Functions

A multi-statement function has a body that can have as many statements as you like. You need to declare a return table, and you insert the data to return into that table. Here is the function above as a multi-statement function:

CREATE FUNCTION SalesByStore (@storeid varchar(30)) RETURNS @t TABLE (title varchar(80) NOT NULL, qty smallint NOT NULL) AS BEGIN INSERT @t (title, qty) SELECT t.title, s.qty FROM sales s JOIN titles t ON t.title_id = s.title_id WHERE s.stor_id = @storeid RETURN END

You use multi-statement functions as inline functions, but in difference to inline functions, they are not expanded in place, but instead it's like you would call a stored procedure in the middle of the query, and return the data in a temp table. This permits you to move more complex stored procedure into functions.

Compared to inline functions, multi-statement functions incur some overhead due to the return table. More important, though, is that if you use the function in a query, the optimizer will have no idea of what the function returns, and will make standard assumptions. Many times this is not a real issue, but particularly if the result is huge, it can be. In this case, you may be better off getting the result into a temp table, as a temp table has statistics that the optimizer can consider. And once there, using one of the table methods or INSERT-EXEC may be better choice. (More about statistics etc in the section using a temp table.)

It follows from this, that there is rarely any reason to consider which sort of function to use. If you can express you problem in a single query, use a inline function. Use multi-statement only when you have to.

A drawback with multi-statement functions, is that there are many things you cannot do in a UDF, because a UDF must never change database state. This leads to the following restrictions:

  • You can only perform INSERT, UPDATE or DELETE statements on table variables local to the function.
  • You cannot call stored procedures (with the exception of extended stored procedures).
  • You cannot invoke dynamic SQL.
  • You cannot create temp tables (use table variables instead).

As with inline functions, on SQL 2000, you cannot access non-deterministic system functions. Yet a restriction is that on SQL 2000, you cannot use user-defined data types in the return table. On SQL 2005 you can, but not if they have rules or defaults bound to them.

Please see the Remarks section for the CREATE FUNCTION topic in Books Online for a complete list of restrictions.

Using a Table

Of course, what could be better to use in a database than a table? We will look at two ways to do this. The beauty of using tables to share the data, is that you can use this in all situations. You may feel, though, that they require more red tape to use. There are also some potential performance issues with these methods, related to recompilation.

Sharing a Temp Tables

When you want to share data between a single pair of a calling stored procedure (caller) and a called stored procedure (callee), using a local temp table is probably the most appealing choice, as you do not have to create a permanent table. But the more procedures you throw into the mix, the more likley that you will face a maintenance problem that makes this method less preferrable.

The method itself is as simple as this:

CREATE PROCEDURE called_procedure @par1 int, @par2 bit, ... AS ... INSERT/UPDATE/DELETE #tmp go CREATE PROCEDURE caller AS DECLARE ... CREATE TABLE #tmp (col1 int NOT NULL, col2 char(5) NULL, ...) ... EXEC called_procedure @par1, @par2 ... SELECT * FROM #tmp go

In this example, caller creates the temp table, and called_procedure fills in in. But it could also be the case that caller fills the table with input data, that called_procedure then uses for some processing.

There is a potential maintenance problem here: if called_procedure is called from many places, and you want to change what columns that called_procedure read/writes, you need to revisit all calling stored procedures to edit the temp-table definition. For this reason, sharing temp tables is mainly useful when you have a single pair of caller and callee. If there are several callers, using a process-keyed table is probably a better choice. Then again, if the temp is small, maybe only a single column of customer IDs to process, the table is likely to be very stable. Use your own good judgement to determine what is best for you.

(A more advanced way to tackle the maintenance problem is to use a pre-preprocessor, and put the definition of the temp table in an include file. If you have a C compiler around, you can use the C pre-processor. In our shop we use our own pre-processor, Preppis, which is part of our internal toolset, AbaPerls which is available as freeware.)

There is also an interesting possibility for flexibility here. The callee only has to care about the columns it reads or writes. This permits a caller to add extra columns for its own usage when it creates the temp tables, and two callers to the same callee could have different definitions of the same temp table.

There is also a possible performance problem with using temp tables and that is recompilation. This can be a serious issue on SQL 7 and SQL 2000. Each time the time the caller is invoked, a new instance of the temp table is created, and for this reason the callee will be recompiled (as the table actually could have a new definition). The cost for a recompilation of a procedure with a single SELECT that is moderately complex, may be less than 200 ms. A longer procedure of 100-200 lines of code may take over a second to compile. If the procedure itself needs a minute to run, this is no big issue. But if the procedure once compiled executes in sub-second, that recompilation second will be very noticeable to end users. So this is an important issue that you need to consider before you choose this method.

In SQL 2005 the situation is different, as it does recompilation on statement level. This means that only the statements that refer to the shared temp table will be recompiled. Thus, a callee that accepts input data into a temp table, could copy that data into a local temp table or a table variable, and only that statement would be recompiled, and that is likely to be cheap. Likewise a callee that fills in output data, only needs to refer to the shared table in a final INSERT statement.

Here I only discussed recompilation because the temp table is a new table each time. Temp tables can also cause recompilation issues because of auto-statistics, something that applies in general when you use temp table, not only when you share them between stored procedures. I return to the topic of auto-statistics when I look at process-keyed tables.

If you want to reuse an existing procedure by sharing a temp table, you do as for a table-valued function: you move to the core to a common procedure, and transform the existing procedure into a wrapper so that client-side programs are unaffected.

Process-Keyed Tables

This method evades the maintenance problem by using a permanent table instead. There is still a recompilation problem, but of a different nature.

A process-keyed table is simply a permanent table that acts like a temp table. To permit several processes to use the table simultaneously, the table has an extra column to identify the process. The simplest way to identify a process is the global variable @@spid (@@spid is the process id in SQL Server). In fact, this is so common, that these tables are often referred to as spid-keyed tables. Here is an example:

CREATE TABLE process_keyed (spid int NOT NULL, col1 int NOT NULL, col2 char(5) NULL, ...) go CREATE CLUSTERED INDEX processkey_ix ON process_keyed (spid) -- Add other columns as needed. go ... DELETE process_keyed WHERE spid = @@spid INSERT process_keyed (spi, col1, col2, ....) VALUES (@@spid, @val1, @val2, ...) ... SELECT col1, col2, ... FROM process_keyed WHERE spid = @@spid ... DELETE process_keyed WHERE spid = @@spid

A few things to note here:

  1. The table should have a clustered index on the process key (spid in this example), as all queries against the table will include the condition WHERE spid = @@spid.
  2. You should delete any existing data for @@spid before you insert any data into the table, as a safety precaution.
  3. And when you are finished using the data you should delete it, so that it does not occupy any extra space.

If you start to use this sort of tables on a larger scale in your database, I would recommend that you use some naming convention so that you easily can do:

SELECT 'TRUNCATE TABLE ' + name FROM sysobjects WHERE type = 'U' AND name LIKE '%yourpattern%'

and then cut and paste the result into a query window. It cannot be denied that a drawback with this method is that sloppy programmers could forget to delete data when they are done.

In the example, I used @@spid as the process key, and this is great as long only stored procedures are involved. But if a client needs to pass a process key, @@spid is likely to be a poor choice, as modern clients typically connect and disconnect and thus may get different spids. In this case, make the process-key a GUID (data type uniqueidentifier) and create the process key with the NEWID() function.

The nice thing with process-keyed tables, is that since the table is permanent, there is no maintenance problem. But what about recompilation? It turns out that also with a process-keyed table you are likely to face recompilation issues, but for a difference reason than for temp tables, to wit changed statistics. Recall that SQL Server has auto-statistics enabled by default. Auto-statistics kicks in when the first 500 rows are added, or when 20% of the rows have changed. (For full details on recompilation, see this white paper by Eric Hansen.) Since a process-keyed table is typically empty when it is not in use, auto-statistics sets in often. Sometimes this can be a good thing, as the statistics may help the optimizer to find a better plan. Other times, the recompilation adds extra seconds to the execution time, which may not always be acceptable. In difference to temp tables, there is a possible way out: use sp_autostats to disable auto-stats for you process-keyed table.

Overall, process-keyed tables is a very nice solution, and in the system I work with, we have quite a few of these tables.

INSERT-EXEC

The main advantage with INSERT-EXEC is that does not require a change to the procedure being called. You do like this:

CREATE TABLE #tmp (...) INSERT #tmp (...) EXEC @err = some_sp @par1, @par2 ...

The definition of #tmp must match the result set of the stored procedure exactly. The stored procedure should only return one result set. (But it may work with several result sets, as long as they match the table you insert into.) The table must be a real table, it cannot be a table variable.

At the surface, this method looks neat, but there are a couple of issues that you should be aware of:

  • It can't nest. If some_sp tries to call some_other_sp with INSERT-EXEC, you will get an error message. Thus, you can only have one INSERT-EXEC active at a time. This is a restriction in SQL Server.
  • There is a serious maintenance problem. If someone changes the result set in the callee, the INSERT-EXEC statement will fail, because the column list must match the INSERT statement exactly. Keep in mind that the person who changes the callee may not even be aware of that you are calling it with INSERT-EXEC.
  • The procedure is executed in the context of a transaction. Since the procedure is called as part of an INSERT statement, there will always be an open transaction when the procedure is executed. This is not cause for any greater concern, but occassionally it can cause surprises. (Unless you are still on 6.5 where creating temp tables in a transaction spells trouble.)
  • A ROLLBACK aborts the batch. If the called procedure issues a ROLLBACK statement, you get an error message saying you cannot do this. The transaction is rolled back, though. (Since the batch is aborted.) Warning: on SQL 6.5, this is a lot worse. When I made an experiment, my SQL Server process crashed, and when I reconnected, tempdb, where I had run the test, was corrupted.
  • INSERT-EXEC does not start an implicit transaction on some SQL Server versions. This applies only if: 1) you have SQL 2000 SP3 or earlier or SQL 7, and 2) you run with SET IMPLICIT_TRANSACTIONS ON. With this setting, an INSERT statement should start a transaction, but because of a bug this does not happen with INSERT-EXEC. This bug is fixed in SQL 2000 SP4 and SQL 2005. It is not present in SQL 6.5.

Because of these issues, I advice against using INSERT-EXEC as a general solution. This method is mainly useful when you cannot change the code of the procedure you are calling, for instance a system stored procedure, or a stored procedure in a third-party application.

Sometimes INSERT-EXEC can be useful with dynamic SQL, like this:

INSERT #tmp (...) EXEC(@sql)

In this case, the statement in @sql presumably created locally, so there is no maintenance problem. But it still can't nest, and you still get a transaction context. It might be better to put the INSERT statement inside the EXEC() statement. If you are using this to get scalar values back from the EXEC() statement to assign variables in the surrounding procedure, you should probably use sp_executesql instead. Dynamic SQL is a complex topic, and if you are not acquainted with it, I recommend you to read my article The Curse and Blessings of Dynamic SQL.

OPENQUERY

Just like INSERT-EXEC this is a method that leaves the called stored procedure untouched. But OPENQUERY is poor choice for the general case, because there are several traps as well as performance issues with OPENQUERY.

This construct was introduced in SQL 7 together with its cousin OPENROWSET to permit you to send pass-through queries to linked servers. It can be very useful, not the least with non-relational data sources such as Active Directory. You can also call back to your own server, permitting you to say things like:

SELECT * FROM OPENQUERY(LOCALSERVER, 'EXEC sp_who') WHERE status = 'runnable'

Thus, you can use the result set from a stored procedure right out of the box in a SELECT statement and do whatever you like with it.

While this is neat, it takes some effort to come there. The first thing is that the settings ANSI_NULLS and ANSI_WARNINGS must be ON for queries involving linked servers. For an ad-hoc statement the two commands

SET ANSI_NULLS ON SET ANSI_WARNINGS ON

will suffice. These settings are also ON by default with most means of connection, the main exception being DB-Library clients. If you call OPENQUERY from a stored procedure, you need to be aware of that the ANSI_NULLS setting for a stored procedure is saved with the procedure when it is created. And when you create procedures from Enterprise Manager, ANSI_NULLS is OFF. (EM is a poor tool for editing stored procedures anyway.)

The next thing is to define LOCALSERVER. It may look like a keyword in the example, but it is in fact only a name. This is how you do it:

sp_addlinkedserver @server = 'LOCALSERVER', @srvproduct = '', @provider = 'SQLOLEDB', @datasrc = @@servername

You must have sysadmin or setupadmin permissions to do this.

OPENQUERY opens a new connection to SQL Server. This has some implications:

  • The procedure that you call with OPENQUERY cannot refer temp tables created in the current connection.
  • The new connection has its own default database (defined with sp_addlinkedserver, default is master), so all object specification must include a database name.
  • If you have an open transaction and are holding locks when you call OPENQUERY, the called procedure can not access what you lock. That is, if you are not careful you will block yourself.
  • Connecting is not for free, so there is a performance penalty.

There are some other issues with OPENQUERY that we need to cover more in detail. The first to notice is that its parameters cannot be variables, so you cannot say:

SELECT * FROM OPENQUERY(LOCALSERVER, @sql)

If the stored procedure takes parameters, and these are not known until run-time, you will have to use dynamic SQL with all the baggage that comes with it. See here for an example of using OPENQUERY with dynamic SQL.

The second issue is that OPENQUERY uses the SQLOLEDB provider to connect to the server, and SQLOLEDB first runs the command with SET FMTONLY ON before actually executing it.When FMTONLY is ON, SQL Server does not execute any statements, but only returns metadata about the result sets. FMTONLY can be a cause for confusion in more than one way. One example, is that if the procedure creates a temp table, you will get an error message, since the table never gets created in FMTONLY mode. For instance:

SELECT * FROM OPENQUERY(LOOPBACK, 'EXEC pubs..sp_helpindex authors')

On SQL 2000, this results in:

Server: Msg 208, Level 16, State 1, Line 1 Invalid object name '#spindtab'.

(On SQL 2005, the message is different and longer. Not less confusing, though.) The workaround for this is to say:

SELECT * FROM OPENQUERY(LOCALSERVER, 'SET FMTONLY OFF EXEC pubs..sp_helpindex authors')

But beware! This means that the procedure is actually executed twice, so there certainly is a performance cost. And if the procedure performs updating actions, these are also performed twice which may be the completely wrong thing to do.

A similar issue appears, if the procedure you call uses dynamic SQL to return the results set. For instance:

SELECT * FROM OPENQUERY(LOOPBACK, 'EXEC sp_who2')

returns on SQL 2000:

Could not process object 'EXEC sp_who2'. The OLE DB provider 'SQLOLEDB' indicates that the object has no columns.

Again, the workaround with SET FMTONLY OFF can be applied. (But only on SQL 2000. On SQL 2005 you will get an error, because sp_who2 returns two columns called SPID.)

Yet another possible source of confusion with FMTONLY is demonstrated by this script:

CREATE TABLE nisse (a int NOT NULL) go CREATE PROCEDURE silly_sp @x int AS --SET NOCOUNT ON INSERT nisse VALUES (@x) SELECT @x * @@trancount SELECT @x * 3 go SELECT * FROM OPENQUERY(KESAMETSA, 'EXEC tempdb.dbo.silly_sp 7') go SELECT * FROM nisse go

The yields the same errors as for sp_who2, saying that there are no columns. The reason for this message is that the first "result set" comes from the rows affected message from the INSERT statement, which is thus the "result set" SQLOLEDB finds when it runs the query. Adding SET NOCOUNT ON to the procedure, resolves this issue.

Leaving FMTONLY, we turn to the next thing that may be confusing when using OPENQUERY. When SQLOLEDB comes around to send the command to SQL Server for actual execution, it first issues SET IMPLICIT_TRANSACTIONS ON. With this setting SQL Server starts a transaction when an INSERT, UPDATE or DELETE statement is executed. (This also applies to a few more statements, see Books Online for details.) This can give some surprises. For instance, take the script above. Once SET NOCOUNT ON is in force, this is the output:

----------- 7 (1 row(s) affected) a ----------- (0 row(s) affected)

You see that I get back '7' from the call to silly_sp, which indicates that @@trancount is 1, and there is thus an open transaction, despite there is no BEGIN TRANSACTION in the procedure. I don't get the '21' that I get when execute silly_sp directly, because with OPENQUERY, I only get one result set. You also see that when I SELECT directly from nisse after the call to OPENQUERY, that this table is empty; this is because the implicit transaction was rolled back.

There are a few more things to know about OPENQUERY:

  • If you use table variables rather than temp tables in the called procedure, you don't need to use the trick with SET FMTONLY OFF. Then again, if you are to rewrite the stored procedure you are calling, this article has presented other ways to access its result set that all are superior.
  • There is an alternative to SQLOLEDB. You can set up the linked server with MSDASQL instead, that is OLE DB over ODBC. It seems that MSDASQL does not have the peculiarities that SQLOLEDB has. However, Microsoft deprecates the use of MSDASQL, so it is difficult to recommend it.

By now, you might be overwhelmed by the complexity of the issues with OPENQUERY, and that is sort of the point. OPENQUERY was not intended for accessing the local server, and I think you should think twice before you use it.

Note: all through this section, I have talked about SQLOLEDB. You may know that SQL 2005 comes with a new OLE DB provider, SQL Native Client or SQLNCLI. SQLNCLI is essentially a successor to SQLOLEDB, and all issues I've discussed in this section applies to SQLNCLI as well.

Using XML

Already in SQL 2000, it is possible to use XML for input. You can pass an XML string as an input parameter to a stored procedure, and in the procedure you can use sp_xml_preparedocument and OPENXML to extract data from the XML document into one or more tables. This is a very valuable method to pass tablular data from client level to a stored procedure. But in SQL 2000, you cannot use XML to pass data from one procedure to another, at least not easily.

But this is possible in SQL 2005, thanks to the new xml data type. Here is an outline where we use XML for input data:

CREATE PROCEDURE callee @input xml AS DECLARE @d int EXEC sp_xml_preparedocument @d OUTPUT, @input SELECT .. FROM OPENXML(@d, '/ROOT/NODE/', 2) WITH(...) ... go CREATE PROCEDURE called AS DECLARE @paramdata xml SELECT @paramdata = (SELECT ... ... FOR XML ...) EXEC callee @paramdata

Here I used OPENXML to unpack @input, but SQL 2005 includes support for XQuery, and you could use .nodes() and .value() to extract the data, to relieve you from the use of sp_xml_preparedocument.

As for constructing the XML document, FOR XML has several options, that I am not detailing here. Please see Books Online for details.

Now, is this method a serious candidate or not? From a practical point of view, the methods looks a bit complicated to me, but I should hasten to add that I am only so-so with XML. If you are good at XML and speak XQuery fluently, you may find this method very handy. The method could also be useful, if you already have a procedure that accepts an XML document as its input parameter, and you want to call that procedure from another stored procedure. Then again, you could rewrite the callee to unpack the XML document into a table, and then use one of the table methods.

What about performance? I don't know. Obviously, there is an overhead for first creating the XML document and then taking it apart. Then again, if you are using a table, there is an overhead for saving things to disk when you don't really need to. I would suggest, though, that it matters more about what query plans you get when you use the data in queries, be that from XML or a shared table. When in doubt, benchmark.

So let's say that the jury is still out on this one. But if I am to give an opinion now, is that the method looks more cute than useful.

Using Cursor Variables

This method was suggested by Peter Radocchia. Cursor variables were introduced in SQL 7, but I suspect that many SQL developers are at most only dimly aware of their existence. I never use them myself. Here is an example of how you use them to bring the result set from one procedure to another:

CREATE PROCEDURE get_cursor @cursor CURSOR VARYING OUTPUT AS SET @cursor = CURSOR STATIC FOR SELECT au_id, au_lname, au_fname FROM pubs..authors OPEN @cursor go CREATE PROCEDURE caller AS DECLARE @cursor CURSOR DECLARE @au_id char(11), @au_fname varchar(40), @au_lname varchar(40) SET NOCOUNT ON EXEC get_cursor @cursor OUTPUT WHILE 1 = 1 BEGIN FETCH NEXT FROM @cursor into @au_id, @au_lname, @au_fname IF @@fetch_status <> 0 BREAK PRINT 'Au_id: ' + @au_id + ', name: ' + @au_fname + ' ' + @au_lname END DEALLOCATE @cursor go EXEC caller

Note that the cursor is STATIC. A FAST_FORWARD may also work, but you could get problems if you use the default keyset-driven cursors.

I will have to admit that I see little reason to use this method. Just like INSERT-EXEC, this method requires an exact match between the caller the callee for the column list. And since data is processed row by row, performance is likely to take a serious toll if there are any volumes.

Arrays and Lists in SQL Server

>Top of the Page

An SQL text by Erland Sommarskog, SQL Server MVP.

Introduction

In the public forums for SQL Server, you often see people asking How do I use arrays in SQL Server? Or Why does SELECT * FROM tbl WHERE col IN (@list) not work? This text describes several methods to do this, both good and bad ones. I also present data from performance tests of the various methods.

This is a quite a long article, and if all you want to know is how to handle a comma-separated list, then just click one these three quick links depending on which version of SQL Server you are using: SQL2000, SQL7, SQL6.5. If that answer does not meet your needs, come back and read from the beginning. :-)

If you feel deterred by the sheer length of this article, you should be relieved to know that this is the kind of article where you may come and go as you please. If you are a plain SQL programmer who want to know "how do I?", you may find the section on performance data be too much of nitty-gritty details. On the other hand true SQL buffs who are curious about the performance numbers, might find the explanations of the methods a little tedious and may prefer to skim these parts.

Here is a table of contents:

   Introduction
   The problem
   Overview of the Solutions
   General Performance Considerations
      Interface considerations
      The Benfit of Temp Tables
   The Iterative Method
      List-of-integers
      List-of-strings
   Using a Table of Numbers
      A Core Function
      Permitting Unlimited Input
      Unpacking a Table Column
   Fixed-Length Array Elements
      Fixed Length and the Iterative Method
      Performance and Format Caveats
   OPENXML
      A Don't With OPENXML
   Dynamic SQL
   Making the List Into a SELECT
   Really Slow Methods
   SQL Server 7
      List-of-string Procedure
      An Extravagant List-of-integers Procedure
   SQL Server 6.5
   Data from Performance Tests
      General Disclaimer
      The Contenders
      How the Tests Were Carried Out
      The Results
      Special Observations
      Try This at Home
   Acknowledgements and Feedback

The problem

You have a number of key values, identifying a couple of rows in a table, and you want to retrieve these rows. If you are composing the SELECT statement in the client code, you might have some code that looks like this:

SQL = "SELECT ProductID, ProductName FROM Northwind..Products " & _ "WHERE ProductID IN (" & List & ")" rs = cmd.Execute(SQL)

List is here a variable which you somewhere have assigned a string value of a comma-separated list, for instance "9, 12, 27, 39".

Now you want to use stored procedures. However, you don't seem to find that any apparent way of doing this. Some people try with:

CREATE PROCEDURE get_product_names @ids varchar(50) AS SELECT ProductID, ProductName FROM Northwind..Products WHERE ProductID IN (@ids)

Which they then try to call:

EXEC get_product_names '9, 12, 27, 37'

But this fails with:

Server: Msg 245, Level 16, State 1, Procedure get_product_names, Line 2 Syntax error converting the varchar value '9, 12, 27, 37' to a column of data type int.

This fails, because we are no longer composing an SQL statement dynamically, and @ids is just one value in the IN clause. An IN clause could also read:

... WHERE col IN (@a, @b, @c)

Others that run into the problem say to themselves: If T-SQL has arrays like any other normal language, then this is not a problem. Well, T-SQL does have arrays. They are called tables, and for all matters they are much more general in nature than arrays. If all you want is plain integer-indexed array with a single value in each cell, this is easily emulated in a table. However, you cannot pass a table as a parameter to a stored procedure. When one stored procedure calls another stored procedure this is not much of a problem, because they can agree on a common table to use for their interchange. This is not possible when you call a stored procedure from a client. But: a client can pass a string, and in SQL you can unpack that string into a table, and most of the methods I describe in this article use some technique to do this.

Overview of the Solutions

There are several possible solutions to this problem that can be implemented more or less generically. I present the methods I know of. Here is a summary of the methods that we shall look at, divided into two groups:

The good ones:

  • The iterative method. Looping through a comma-separated list, and returning the elements in a table, either with a used-defined function (UDF) or a stored procedure. Decent performance, and easy to understand and easily extensible.
  • Using a table of numbers to unpack a comma-separated list, either in a UDF or an SP. The fastest of all methods for a list with delimiters.
  • Fixed-length array. Rather than using a comma-separated list, use a string where all elements have the same length. You unpack the string with a table of numbers in a UDF or SP. This is the fastest method I know of.
  • XML. Overkill for a comma-separated list, but marvellous for an array of structured data. Only possible on SQL 2000. Decent performance, but slower than the iterative method.
The ones to stay away from:
  • Dynamic SQL. For a list of numbers, it may appear simpler than any other method, but there are permissions issues to consider. And messier to use for a list of strings. Performance is unacceptable for long input. It is probably the only viable method on SQL 6.5 though.
  • Making the List Into a SELECT. Transform a delimited list into an INSERT statement. Or many INSERT statements. Cute, but cannot handle input longer than ~3-5000 chars, and has no other particular benefit.
  • Really slow methods. Methods that uses charindex, patindex or LIKE. These solutions are just unbelievably slow even for short input.

If you want a shorter review of the various methods, SQL Server MVP Anith Sen has a good summary on his web site at http://www.bizdatasolutions.com/tsql/sqlarrays.asp. His assessment of the methods does not always agree with mine, though.

General Performance Considerations

Before I proceed to the various methods, I like to cover two general issues that both are related to performance.

Interface considerations

Most of these methods can be implemented as a table-valued functions or stored procedures. This is very nice, because it permits you to pick a solution and write one or two functions to put in your database, and then you can use it all over the place. Here is an example of how you would use such a function:

CREATE PROCEDURE get_company_names_iter @customers nvarchar(2000) AS SELECT C.CustomerID, C.CompanyName FROM Northwind..Customers C JOIN charlist_to_table(@customers) s ON C.CustomerID = s.str go EXEC get_company_names_iter 'ALFKI, BONAP, CACTU, FRANK'

If you are on SQL7 you would have to use a stored procedure instead. Please see the SQL7 section for details on the technique.

Now, how should the interface of a function like charlist_to_table be like? The input parameter should be a long text type, preferably text or ntext, so that the function can handle input of any size. But some methods use T-SQL functions that do not support text/ntext, in which case a varchar(8000) or nvarchar(4000) will have to do. You may also prefer to add a parameter that permits you to define another delimiter than comma or similar type of usability parameters. You will occasionally see such parameters in the examples in this text.

How should the table returned from the function look like? Obviously, it should include a column which holds the value of the list elements, and we will return to this in a minute. Sometimes you may also want a column which indicates the position in the list. Such a column is easily achieved with some of the methods I present, but are very difficult to achieve with others.

OK, list-element value, but which data type? You could write a function for every possible data type in SQL Server. In this document I only discuss lists of integers and strings, because it is my assumption that this covers 99% of the cases.

If you have a list of integers, you will have to convert from string to integer somewhere, because the input is text. It may be a matter of taste whether you want to write a function that returns a table of integers, so you can join directly with the table-function column, or whether you prefer to write:

CREATE PROCEDURE get_product_names_tblnum @ids varchar(50) AS SELECT P.ProductName, P.ProductID FROM Northwind..Products P JOIN charlist_to_table(@ids) i ON P.ProductID = convert(int, i.str) go EXEC get_product_names_tblnum '9, 12, 27, 37'

If you return a list of strings, you may think there is no conversion issue, but watch out! SQL Server has two families of string types: The 8-bit char/varchar/text datatypes and the 16-bit nchar/nvarchar/ntext datatypes for Unicode. Which of these should your function return? You may think that returning an nvarchar(4000) column is the best, because then you will get the right result with both 8-bit and 16-bit data, but not so fast! The answer is that if you are to join with a char/varchar column, your function must return varchar and if you join with an nchar/nvarchar column you must return nvarchar. The reason for this is that if you say: SELECT * FROM tbl JOIN charlist_to_table(@list) c ON tbl.varcharcol = c.nvarcharcol tbl.varcharcol will be implicitly converted to nvarchar and this prevents SQL Server from using any index on tbl.varcharcol. This can lead to a significant degrade in performance.

There are two ways to address this. One is to write two functions: one that returns varchar(8000) and one that returns nvarchar(4000). The alternative is to write one function that accepts ntext as input and returns two columns, one varchar and one nvarchar column. For some methods, there is some minor performance cost (5-10%) for the latter arrangement, whereas other methods do not seem to take a toll at all. You will see examples of both strategies in this text.

The Benefit of Temp Tables

Even if you have the data types correct, there is another potential problem with joining directly to a table-valued function (or in case of XML to OPENXML). The optimizer has no information of what is in that table beforehand, and will have to build the query plan from general assumptions. In many situations this will work alright, but sometimes this will work better:

CREATE PROCEDURE get_product_names_tblnum @ids varchar(50) AS CREATE TABLE #temp (id int NOT NULL) INSERT TABLE #temp (id) SELECT convert(int, i.str) FROM charlist_to_table(@ids) SELECT P.ProductName, P.ProductID FROM Northwind..Products P JOIN #temp t ON P.ProductID = t.id go

The reason for this is that a temp table has statistics, so SQL Server may recompile the procedure prior to executing the SELECT statement and use the statistics about the temp table to build a better query plan for the SELECT statement. On the other hand, there is a cost for the recompilation, so you may as well lose performance with using a temp table. As a general rule, only consider a temp table, if you find that joining directly to the function gives you bad performance. (One reader suggested using a table variable instead. Since a table variable does not have statistics, it should not make much difference as to join directly to a table-valued function.)

The Iterative Method

If all you want is a solution and then continue with your coding, this is my suggestion for your pick. There are faster methods than this one, but you need to have very long input to find performance for this method to be intolerable. Where the iterative method wins is simplicity. You just load the functions and then you are on the air. The faster methods require a help table with correct data to work. While not a major issue, it's one more component to keep track of. Also, the code for the iterative method is very easy to understand, not the least if you have a background with C or Visual Basic. This is not the least important if you have an input list with a special format that you need to adapt the code for.

List-of-integers

Here is you would use such a function in SQL 2000 for a list of integers.

CREATE PROCEDURE get_product_names_iter @ids varchar(50) AS SELECT P.ProductName, P.ProductID FROM Northwind..Products P JOIN iter_intlist_to_table(@ids) i ON P.ProductID = i.number go EXEC get_product_names_iter '9 12 27 37'

Yeah, that's right, that's not a comma-separated list. I figured that for a list of integers, the comma does not really have any function, so I opted to use space as separator instead when I wrote this function. Here is the code for iter_intlist_to_table:

CREATE FUNCTION iter_intlist_to_table (@list ntext) RETURNS @tbl TABLE (listpos int IDENTITY(1, 1) NOT NULL, number int NOT NULL) AS BEGIN DECLARE @pos int, @textpos int, @chunklen smallint, @str nvarchar(4000), @tmpstr nvarchar(4000), @leftover nvarchar(4000) SET @textpos = 1 SET @leftover = '' WHILE @textpos <= datalength(@list) / 2 BEGIN SET @chunklen = 4000 - datalength(@leftover) / 2 SET @tmpstr = ltrim(@leftover + substring(@list, @textpos, @chunklen)) SET @textpos = @textpos + @chunklen SET @pos = charindex(' ', @tmpstr) WHILE @pos > 0 BEGIN SET @str = substring(@tmpstr, 1, @pos - 1) INSERT @tbl (number) VALUES(convert(int, @str)) SET @tmpstr = ltrim(substring(@tmpstr, @pos + 1, len(@tmpstr))) SET @pos = charindex(' ', @tmpstr) END SET @leftover = @tmpstr END IF ltrim(rtrim(@leftover)) <> '' INSERT @tbl (number) VALUES(convert(int, @leftover)) RETURN END

The table returned from this function has two columns. number is list element, and listpos the position of the number in the list.

Since this function only deals with numbers, it may surprise you that I am using the Unicode string types, ntext and nvarchar. But I found that I got some 10% better performance with them than the 8-bit data types. Exactly why I don't know, but my guess is that charindex is internally implemented un Unicode, so the 8-bit data types require conversion.

The iteration is a two-level loop. This is because the input parameter is of the type ntext to permit unlimited input. Not all string functions functions in T-SQL accept text/ntext as parameters (and some that do, do not operate beyond the limit of varchar/nvarchar.). Therefore, I chop up the input string in chunks of 4000 characters, before I move on to the main loop where I use charindex to search for space which is my delimiter. If you prefer comma, you can easily change that. When the inner loop is completed, there may be some leftover piece that should be joined to the start of the next chunk. And finally, when the last chunk is done, I pick up the last element in the list.

You might notice that I use both datalength and len. These are two functions that are deceivingly similar, but you need to apply them with care. Both return the length of a string, but datalength includes trailing spaces whereas len does not. More importantly, datalength is the only to support the text/ntext data types. datalength counts bytes, and len counts characters, which is why I divide the return value from datalength by 2, but not the result from len.

The function is naïve enough to assume that @list only includes numbers. If this is not the case, you will get a conversion error, and the batch will be aborted. If you create the list from internal identifiers in client code, you may feel confident that the list is valid, and this is not much of an issue for you. But what if you actually want to apply error-checking? For instance, what if you are a DBA, and you don't trust your client programmers? You can add code to verify that @str is a legal number, but if the test fails, you cannot use RAISERROR since you are in a function. The best you can do is to skip the illegal list item, or insert NULL (in which case you must change the return table to permit NULL). If you would like raise an error, then you need to implement the method in a stored procedure instead. You find an example in the SQL7 section.

List-of-strings

Here is a similar function, but that returns a table of strings.

CREATE FUNCTION iter_charlist_to_table (@list ntext, @delimiter nchar(1) = N',') RETURNS @tbl TABLE (listpos int IDENTITY(1, 1) NOT NULL, str varchar(4000), nstr nvarchar(2000)) AS BEGIN DECLARE @pos int, @textpos int, @chunklen smallint, @tmpstr nvarchar(4000), @leftover nvarchar(4000), @tmpval nvarchar(4000) SET @textpos = 1 SET @leftover = '' WHILE @textpos <= datalength(@list) / 2 BEGIN SET @chunklen = 4000 - datalength(@leftover) / 2 SET @tmpstr = @leftover + substring(@list, @textpos, @chunklen) SET @textpos = @textpos + @chunklen SET @pos = charindex(@delimiter, @tmpstr) WHILE @pos > 0 BEGIN SET @tmpval = ltrim(rtrim(left(@tmpstr, @pos - 1))) INSERT @tbl (str, nstr) VALUES(@tmpval, @tmpval) SET @tmpstr = substring(@tmpstr, @pos + 1, len(@tmpstr)) SET @pos = charindex(@delimiter, @tmpstr) END SET @leftover = @tmpstr END INSERT @tbl(str, nstr) VALUES (ltrim(rtrim(@leftover)), ltrim(rtrim(@leftover))) RETURN END

Here is an example on how you would use the function:

CREATE PROCEDURE get_company_names_iter @customers nvarchar(2000) AS SELECT C.CustomerID, C.CompanyName FROM Northwind..Customers C JOIN iter_charlist_to_table(@customers, DEFAULT) s ON C.CustomerID = s.nstr go EXEC get_company_names_iter 'ALFKI, BONAP, CACTU, FRANK'

This function is very similar to iter_intlist_to_table. There is one difference in the interface: you can specify which delimiter to use. There is a default value for the delimiter, but even if you are satisfied with it, you must specify DEFAULT when you call the function, because T-SQL does not permit you to leave to a function argument out completely. No matter which delimiter you use, leading and trailing blanks are stripped from the resulting strings.

The return table has two columns beside listpos, namely str and nstr which both contain the list elements, one is varchar and the other is nvarchar, and you should use the column that matches the column you are joining with. Since Northwind..Customers.CustomerID is nchar(10), nstr is the choice in the example. Do not use nstr when you join with a varchar column, because this can give very bad performance! (I discuss this in the section General Interface Considerations above.) You may think that this extra column comes with a performance cost, but I was not able to detect this when I ran my performance tests.

One thing which is nice with this solution, is that it is fairly extensible. Say that your input string looks like something like this: "Yes, I want to", "Charlie", "Give it to me"

That is the list items are enclosed by quotes, and the list delimiter can appear in the strings. The function above does not handle this format, but writing a function that handles this sort of input is no big deal.

Using a Table of Numbers

The fastest way to unpack a comma-separated list is to use a table numbers. A table of numbers is simply a table with one single integer column, with the values 1, 2, 3, ... Here is an easy way to fill in such a table:

SELECT TOP 8000 Number = IDENTITY(int, 1, 1) INTO Numbers FROM pubs..authors t1, pubs..authors t2, pubs..authors t3

A Core Function

I first show you a version that is limited to an input string of 7998 characters to demonstrate the core algorithm. I culled this function from SQL Server MVP Anith Sen's web site.

CREATE FUNCTION inline_split_me (@param varchar(7998)) RETURNS TABLE AS RETURN(SELECT substring(',' + @param + ',', Number + 1, charindex(',', ',' + @param + ',', Number + 1) - Number - 1) AS Value FROM Numbers WHERE Number <= len(',' + @param + ',') - 1 AND substring(',' + @param + ',', Number, 1) = ',')

While the iterative solution was somewhat long-winding but straightforward, this approach is compact, but not all readers may grasp this SQL in the first go. (I had a hard time myself.) The first thing to observe is the expression  ',' + @param + ',' which reoccurs no less than four times. By adding the delimiter in beginning and at the end of the string, the first and last items in the string appear in the same context as all other items.

Next, let's turn to the WHERE clause. The expression: substring(',' + @param + ',', Number, 1) = ','

evaluates to TRUE for all positions in the string where the delimiter appears. The expression: Number <= len(',' + @param + ',') - 1 simply sets an upper limit of which numbers we are using.

Let's now look at the SELECTed column Value. We extract the strings with substring, starting on the position after the delimiter. We find the length of the substring by searching for the next delimiter with charindex, taking use of its third parameter which tells charindex where to start searching. Once the next delimiter is found, we subtract the position for the current delimiter and then one more, as the delimiter itself should not be included in the extracted string, to get the length of the list item.

Here is an example on how to use this particular function: CREATE PROCEDURE get_company_names_inline @customers nvarchar(2000) AS SELECT C.CustomerID, C.CompanyName FROM Northwind..Customers C JOIN inline_split_me(@customers) s ON C.CustomerID = s.Value go EXEC get_company_names_inline 'ALFKI,BONAP,CACTU,FRANK'

Note here that, since this function does not strip leading and trailing blanks, there must be no spaces in the list.

This function is an inline function, in difference from the iterative functions which are multi-step functions. An inline function is basically a macro, so that when the optimizer builds the query plan, it inserts the text of the function into the query, making further optimizations possible. I find it difficult to believe though, that for this particular function that you will get much benefit from the function being inline. The function is too convoluted for that, and I have not been able to detect any considerable advantages in my performance tests.

Permitting Unlimited Input

The function inline_split_me limits the input to 7998 characters (or 3998 for nvarchar). It is not possible to use text or ntext for the input parameter, because you cannot use text/ntext parameters in expressions such as ',' + @param + ','. Even more importantly, charindex does not handle text/ntext data beyond the limit of varchar/nvarchar.

However, this is no fatal obstacle. As with the iterative functions we can break up the text in chunks. Here is a function that does this:

CREATE FUNCTION duo_text_split_me(@list ntext, @delim nchar(1) = N',') RETURNS @t TABLE (str varchar(4000), nstr nvarchar(2000)) AS BEGIN DECLARE @slices TABLE (slice nvarchar(4000) NOT NULL) DECLARE @slice nvarchar(4000), @textpos int, @maxlen int, @stoppos int SELECT @textpos = 1, @maxlen = 4000 - 2 WHILE datalength(@list) / 2 - (@textpos - 1) >= @maxlen BEGIN SELECT @slice = substring(@list, @textpos, @maxlen) SELECT @stoppos = @maxlen - charindex(@delim, reverse(@slice)) INSERT @slices (slice) VALUES (@delim + left(@slice, @stoppos) + @delim) SELECT @textpos = @textpos - 1 + @stoppos + 2 -- On the other side of the comma. END INSERT @slices (slice) VALUES (@delim + substring(@list, @textpos, @maxlen) + @delim) INSERT @t (str, nstr) SELECT str, str FROM (SELECT str = ltrim(rtrim(substring(s.slice, N.Number + 1, charindex(@delim, s.slice, N.Number + 1) - N.Number - 1))) FROM Numbers N JOIN @slices s ON N.Number <= len(s.slice) - 1 AND substring(s.slice, N.Number, 1) = @delim) AS x RETURN END

We first split up the text in slices that we put in the table variable @slices. When we do this, we need to apply some care, so that the last character in a slice is a delimiter. To find the last delimiter in a chunk that we chopped off from the input parameter, we feed charindex the result of the reverse function, neat little trick. While we're at it, we make sure that all rows in @slices start and end with a delimiter. You may note that if the input text is within the limits of a regular nvarchar, we will never enter the loop, but just insert the text directly in to the @slices table.

Once @slices is populated, we apply the core expression. Note here that we do not need to iterate over @slices; we can join directly with Numbers. We use a derived table, to save us from repeating the complex expression with substring and charindex, and which we now have augmented with trim functions to remove leading and trailing spaces. (A derived table is a table constructed by a SELECT expression in the middle of a query, this is a very very useful tool when building complex queries. The alias AS x may not seem to serve a purpose, but the SQL syntax mandates that a derived table must have an alias.)

As with iter_charlist_to_table, this function returns a table with both a varchar and an nvarchar column. In this case, though, my tests indicate that there is a cost of 3-5% over returning a table with only a nvarchar column. Since we are talking execution times in milliseconds, this cost is not likely to have any importance. The performance data that I present, is from a function that only returns an nvarchar column, though.

There is no listpos column here. This is somewhat more difficult to achieve with this method. One way is to add a sliceno column to @slices and an IDENTITY column to the result table, and then order by sliceno and Number when selecting into the result table. However, it is not fully clear whether you can rely on that the IDENTITY values are actually assigned according to the ORDER BY clause. A more reliable way is to join Numbers with itself, as shown by an example on Anith Sen's web site.

You might now be waiting for a function that unpacks a list of numbers, but I am not presenting such a creature. You could write a new function which uses convert in the right places, but you can also do like this:

CREATE PROCEDURE get_product_names_tblnum @ids varchar(50) AS SELECT P.ProductName, P.ProductID FROM Northwind..Products P JOIN duo_text_split_me(@ids, DEFAULT) i ON P.ProductID = convert(int, i.str) go EXEC get_product_names_tblnum '9, 12, 27, 37'

Unpacking a Table Column

In this text I focus on the case that the list you like to process is a single list that comes from a client. Occasionally you might stumble on a table contains one or more columns with comma-separated lists. For instance:

CREATE TABLE jackets (model varchar(30), sizes varchar(200), colours varchar(200))

Here sizes and colours contain comma-separated lists of the sizes and colours the jackets are available in. This database design violates the first normal form, and let me stress that, in the very most cases, this is a very poor design. In any case, say that you need to unpack these lists into tables. You can of course use any of the functions that we have looked at here, but you would need to iterate over the table, and unpack row by row, because you cannot pass a table column as a parameter to a table-valued function. As you may know, iterating over a table can be magnitudes slower than applying a set-based operation on all rows in one statement, so therefore it may be a better idea to join directly with the Numbers table. I'm not giving an example here, but you saw the way to do it, in the code for duo_text_split_me, when we joined with the @slices table.

Fixed-Length Array Elements

This is a method that was proposed by SQL Server MVP Steve Kass, inspired by an idea in Ken Henderson's book The Guru's Guide to Transact-SQL.

This basic idea is that rather than defining the array as a delimited string, the list elements have fixed length. An advantage from a usability point of view is that you don't have to worry about a delimiter appearing in the data. But the main reason for this approach is performance. This is the fastest of all methods presented here (but see the caveats below). Here is a demonstration of the technique, applied directly on a table:

CREATE PROCEDURE get_product_names_fix @ids varchar(8000), @itemlen tinyint AS SELECT P.ProductID, P.ProductName FROM Northwind..Products P JOIN Numbers n ON P.ProductID = convert(int, substring(@ids, @itemlen * (n.Number - 1) + 1, @itemlen)) AND n.Number <= len(@ids) / @itemlen GO EXEC get_product_names_fix ' 9 12 27 37', 4

Each element in the "array" has the same length, as specified by the parameter @itemlen. We use the substring function to extract each individual element. The table Numbers that appears here, is the same table that we created in the beginning of the section Using a table of Numbers.

Here is a function that embeds the unpacking of the string.

CREATE FUNCTION fixstring_single(@str text, @itemlen tinyint) RETURNS TABLE AS RETURN(SELECT listpos = n.Number, str = substring(@str, @itemlen * (n.Number - 1) + 1, @itemlen) FROM Numbers n WHERE n.Number <= datalength(@str) / @itemlen + CASE datalength(@str) % @itemlen WHEN 0 THEN 0 ELSE 1 END)

The purpose of the expression on the last line is to permit the last element in the array to be shorter than the others, in case trailing blanks have been stripped. Here is an example using fixstring_single with a list of strings:

CREATE PROCEDURE get_company_names_fix @customers nvarchar(2000) AS SELECT C.CustomerID, C.CompanyName FROM Northwind..Customers C JOIN fixstring_single(@customers, 6) s ON C.CustomerID = s.str go EXEC get_company_names_fix 'ALFKI BONAP CACTU FRANK'

This function uses text for the input parameter, and returns a varchar column, so for Unicode data you would need a twin function that accepts ntext for input and nvarchar for return data. For other methods in this article, I've opted to return varchar and nvarchar from the same function, but this has a bigger overhead for this method (because it's so fast). Beware that you don't use a Unicode function to join with a char/varchar column; this can wreck performance completely, see General Interface Considerations.

The result set also includes the listpos column, which gives the position of the item in the array. Very conveniently, this is the same as the number from the Numbers table.

fixstring_single handles input with as many elements as there are numbers in Numbers. The number I have used in this article, 8000, is likely to be sufficient for most applications. If you need to support more elements, you can just add more numbers to Numbers. If you really feel that you want to support unlimited input, you could write a multi-step function that chops up the input string in slices. Here's another twist, that Steve Kass came up with. This is a function that uses a self-join, so that with 8000 rows in Numbers fixstring_multi handles up to 64 million list items:

CREATE FUNCTION fixstring_multi(@str text, @itemlen tinyint) RETURNS TABLE AS RETURN(SELECT listpos = n1.Number + m.maxnum * (n2.Number - 1), str = substring(@str, @itemlen * (n1.Number + m.maxnum * (n2.Number - 1) - 1) + 1, @itemlen) FROM Numbers n1 CROSS JOIN (SELECT maxnum = MAX(Number) FROM Numbers) AS m JOIN Numbers n2 ON @itemlen * (n1.Number + m.maxnum * (n2.Number - 1) - 1) + 1 <= datalength(@str) WHERE n2.Number <= datalength(@str) / (m.maxnum * @itemlen) + 1 AND n1.Number <= CASE WHEN datalength(@str) / @itemlen <= m.maxnum THEN datalength(@str) / @itemlen + CASE datalength(@str) % @itemlen WHEN 0 THEN 0 ELSE 1 END ELSE m.maxnum END)

This is a more complex function that fixstring_single. I leave it as an exercise to the reader to understand what's going on, and I only make a note about the line with CROSS JOIN: this saves me from hard-coding the number of rows in Numbers.

Fixed Length and the Iterative Method

Since what is special with this method is the input format, not the algorithm itself, you could also use this input format with the iterative method. This might appeal to you, if the table of numbers makes you nervous. (If the table does not have the numbers as advertised, methods relying on it will yield incorrect result.) However, the performance benefit of using the iterative method with fixed-length input over delimited input is not extraordinary, and it is slower than using a table with numbers and delimited input. I'm not including an example of an iterative function using fixed input format, but in the full test data, you can see how well it fares.

Performance and Format Caveats

If you look in the performance section below, you will see that this method outsmarts everything else. But there are a few caveats you should keep in mind, of which the last one is the most important.

First caveat: The fixed format is more sensitive to disruptions. One extra character somewhere, and the the rest of the string is completely wrecked. When you compose a string in an application, this is not so much of an issue, once you have gotten it right. This leads, however, to the

Second caveat: The cost for composing the fixed-length may take more programming power in the client than composing a comma-separated list. I have not considered this for my performance tests, as it would have made those tests far more complex. However, consider these two points: 1) Client languages are generally better in string handling than SQL. 2) Moving execution from the server to the client means that the overall system scales better. So even if there is a certain toll on the client side, it is not likely to overshadow the gain on the server side.

Third caveat: The strings you pass from client to server when using fixed length are longer, so there is more network overhead. This could be an issue over slow links or on a crowded network. But it is not very likely that this is something to worry about. However, the observation on size leads to the

Fourth and most important caveat: It matters how you invoke a procedure using this method. My test script for the performance tests uses RPC (remote procedure call), and this is the recommended way to call a stored procedure from a client program. (For instance, if you use adCmdStoredProcedure for the CommandType on the Command object in ADO, you use RPC.) The other way to call a stored procedure is to send a command-text batch with an EXEC statement, which is what you do in Query Analyzer. I found that for a certain length of the input string, this method is actually somewhat slower than using a comma-separated list with the table-of-numbers function duo_text_split_me. In fact, I had an alternate version of fixstring_multi which was somewhat faster than the one above as long as I invoked it through RPC. But when I invoked it in a command batch, the response time was several seconds, far beyond what is acceptable. Why this is so, I don't know. But the string size surely has something to do with it. I have not examined at which size the effects sets in, but it might be around 8000, that is the limit for a regular varchar.

OPENXML

XML has emerged as a standard for data interchange in recent years. Just like HTML, XML is derived from the general SGML standard, so the visual appearance of XML and HTML is similar. But there are important differences. Unlike HTML, XML is case-sensitive and a single error in an XML document invalidates the document. The purpose of XML is not presentation, but transport of structured data. Microsoft added extensive support for XML in SQL Server 2000, and what interests us here is the OPENXML function which extracts data from an XML string and returns the data as a table.

This opens for the possibility to encode a list of values as an XML string, and then extract the values with OPENXML in T-SQL. But to be blunt:. this is not the right way to do it. Not because of performance: while slower than any other method we have looked at so far, performance is still perfectly acceptable for most situations. No, what disqualifies XML is simply that it is too complex for the task. Unless, of course, you already have an XML document with the values you want to use. But to build an XML string only to mimic an array is overkill in my opinion.

But: if you need to send SQL Server an array of structured data – which even might be nested – to insert data into one or more tables, XML is truly a great asset.

Let's nevertheless first look at the case of a simple list. This is how the XML version of get_product_names looks like:

CREATE PROCEDURE get_product_names_xml @ids ntext AS DECLARE @idoc int, @err int EXEC @err = sp_xml_preparedocument @idoc OUTPUT, @ids SELECT @err = @@error + coalesce(@err, 4711) IF @err <> 0 RETURN @err SELECT P.ProductName, P.ProductID FROM Northwind..Products P JOIN OPENXML(@idoc, '/Root/Num', 1) WITH (num int) AS a ON P.ProductID = a.num EXEC sp_xml_removedocument @idoc go EXEC get_product_names_xml N'<Root><Num num="9"/><Num num="12"/> <Num num="27"/><Num num="37"/></Root>

The important thing to notice here is that we first call sp_xml_prepraredocument which causes SQL Server to build an internal representation of the XML string. @idoc is a reference to that representation, which we use when we invoke OPENXML, which is kind of a table function, although the exact term used by Books Online is rowset provider. Before we exit the routine, we call sp_xml_removedocument to free up the memory allocated for the XML document. Failing to do this can lead to a memory leaks which eventually can prevent sp_xml_preparedocument from completing successfully. (By the way, while it is good habit to check return values from stored procedures, don't do this for sp_xml_removedocument. It always returns 1, which is a known bug. Check @@error instead.)

When you compose an XML string, you need to be careful, because there are several characters such as <, & and " serves as meta characters in XML, so you need to encode them in the same manner as in HTML. The same applies to non-ASCII data if you are not using Unicode. Also, you need to keep in mind that XML is very strict on that all tags must be closed. Thus, you are probably better off using some library routine to compose the XML string. This is not really my area, but MS XML seems to offer some methods for this, as does the XmlTextWriter class in the .Net Framework.

Where the power of XML comes into full play is when you have a bigger set of data that you want to insert into SQL Server, for instance orders and their order details. Before the advent of XML in SQL Server, you did not have much better choice than to call a stored procedure for each row to insert. (There has always been bulk-copy, but this is far from always a convenient solution.) With XML, you can compose an XML document of all that data, and you have a stored procedure which calls sp_xml_prepredocument once, and then you invoke OPENXML twice: first for the orders, and next for the order details. Replacing thousands of calls to stored procedures over the network with a single one, this can give a huge performance benefit.

I'm not including an example of this, as it would take up some space, and this article is already far too long. Rather I leave this as a teaser. If you are completely unacquainted with XML and using with XML with SQL Server, you might consider picking up a book or two on the subject. You might also find some useful information on SQL Server MVP Bryant Likes's site http://www.sqlxml.org.

Note: those who know XML, know that the above is attribute-centred XML. An alternative way to compose the XML string is to use element-centred XML (or a mix of the two). I'm not including any example of element-centred XML, as it there is little practical difference for unpacking a list, neither is there in terms of performance.

A Don't With OPENXML

Occasionally I have seen the suggestion that you should have a stored procedure that accepts a delimited string as parameter, and which uses the replace function in T-SQL to convert this string to XML before passing it to OPENXML. In my opinion, this is a bad idea for several reasons:

  1. Since the output of replace must be varchar, the resulting XML string cannot be longer than 8000 characters and thus the input string is even more restricted in length.
  2. You need to run several passes with replace to handle all characters that must be escaped (see above). The example I saw, completely ignored to do this.
  3. And since XML is slower than other methods that are easier to use, it is completely pointless anyway.

Dynamic SQL

For a list of numbers, this method may seem deceivingly palatable in its simplicity:

CREATE PROCEDURE get_product_names_exec @ids varchar(255) AS EXEC('SELECT ProductName, ProductID FROM Northwind..Products WHERE ProductID IN (' + @ids + ')') go EXEC get_product_names_exec '9, 12, 27, 37'

This example looks very similar to the client-code example in the beginning of this article. And in fact, this method is just a variant of sending down SQL statements from the client and it has the same issues, which we shall look at in a moment. First, though, let's look at using this method with a list of strings, to see that in this case the method is not equally attractive:

CREATE PROCEDURE get_company_names_exec @customers nvarchar(2000) AS EXEC('SELECT CustomerID, CompanyName FROM Northwind..Customers WHERE CustomerID IN (' + @customers + ')') go EXEC get_company_names_exec '''ALFKI'', ''BONAP'', ''CACTU'', ''FRANK'''

The nested quotes make procedures using this method difficult to call.

So what are the issues with this method?

  • Performance. This method is slower any of the other methods we have looked at this far, and execution times grows very fast when the input string grows in size. With a string of 1000 items, the other methods still operates in sub-second, but dynamic SQL may take over ten seconds. It is not the execution itself that takes up all that time; it's building the query plan from that long IN expression that occupies SQL Server. The plan is saved in cache, so if you use exactly the same list a second time, execution time can be shorter than for any most other methods. But it has to be a exactly the same list: one single character that is different and SQL Server will spend another ten seconds to build a plan. .
  • Permissions. This method requires that the user has direct SELECT permission to the involved tables. With a regular stored procedure, the user only needs EXEC permisssion on the procedure, and then the rights on the tables of the procedure owner is transferred to the user.
  • SQL Injection. This is a method by which an intruder can enter data into an input field, and if that input field is passed without further control to build an SQL statement dynamically, the intruder can cause the SQL batch to include something else than intended.

For further details on dynamic SQL, see my web article, The Curse and Blessings of Dynamic SQL.

Often when people ask how to handle comma-separated lists in the SQL newsgroups, dynamic SQL is presented as a solution. Therefore, I like to stress that this is a poor solution, not the least when there are nice and simple generic methods for converting a list to a table. So if you see this advice on the newsgroups, ignore it. And if you have the habit of giving it, please stop doing it.

There is one exception: on SQL Server 6.5 the other methods can only support lists that fit into varchar(255), so dynamic SQL is probably the only viable solution there.

Making the List Into a SELECT

In this section I've collected methods where you transform the list into one or more SELECT statement which you excecute with dynamic SQL. Here is the best of these three methods:

CREATE PROCEDURE unpack_without_union @list varchar(8000), @tbl varchar(30), @delimiter char(1) = ',' AS DECLARE @sql varchar(8000), @select varchar(8000) SET @select = 'SELECT ' + REPLACE(@list, @delimiter, ' SELECT ') SET @select = REPLACE(@select, '''', '''''') SET @sql = 'INSERT ' + @tbl + ' EXEC(''' + @select + ''')' EXEC (@sql)

This method builds on idea suggested by Jim Ebbers. Since the code is somewhat obscure with its double use of dynamic SQL, before I explain it, I directly proceed to a method suggested by SQL Server MVP Steve Kass, which is easier to understand, albeit not as good:

CREATE PROCEDURE unpack_with_union @list varchar(8000), @tbl varchar(30), @delimiter char(1) = ',' AS DECLARE @sql varchar(8000) SET @sql = 'INSERT INTO ' + @tbl + ' SELECT ''' + REPLACE(@list, @delimiter, ''' UNION ALL SELECT ''') + '''' EXEC (@sql)

The idea is that you transform the list to a SELECT statement with help of the UNION ALL operator. (Only UNION removes duplicates, UNION ALL includes them.) Then you use dynamic SQL to execute that statement and insert the data into a temp table, of which the name is passed to the procedure. Since the the dynamic SQL only refers to a temp table, there is no permissions issue here.

In the first method, I have made use of what Jim made me aware of: If you say INSERT tbl EXEC('SELECT 1 SELECT 2 SELECT 3'), INSERT will handle all those result sets from EXEC(), as if it was one and only. Notice that I also cater for the fact that there may be single quotes in the input. This is in fact necessary with unpack_with_union as well, but if you do it once in unpack_with_union, you should double that up in unpack_without_union. Why, I leave as an exercise to the reader to understand.

Here is how you would use these methods:

CREATE PROCEDURE get_company_names_nounion @customers varchar(8000) AS CREATE TABLE #temp (custid nchar(10) NOT NULL) EXEC unpack_without_union @customers, '#temp' SELECT C.CustomerID, C.CompanyName FROM Northwind..Customers C JOIN #temp t ON C.CustomerID = t.custid go EXEC get_company_names_nounion 'ALFKI,BONAP,CACTU,FRANK'

These method can possibly get a reward for being cute, but that's about it. Both methods, as presented here, have a fairly low limit on the maximum number of elements, since all the space you splice in takes up space. Jim's method is better than Steve's since he splices in less text. But for a list with 500 elements, that's still 3500 extra characters caused by those SELECT. You could write a version that takes a ntext parameter and breaks it up in slices. I have not pursued this, since my performance tests indicate that these method are slower than OPENXML, which in its turn is slower than the iterative method, and using a table with numbers. Neither is there any other particular advantage with these methods.

When I originally conducted the performance tests for this article, I was not aware of Jim's method, and since then I have changed hardware, why it is not possible to paste in the numbers for Jim's method (code-name NOUNION) with the old data. However, re-testing only OPENXML, UNION and NOUNION, I find that NOUNION is twice the speed of UNION, but still lags behind OPENXML.

SQL Server MVP Anith Sen proposed a similar method, where he simple converts the list to a bunch of INSERT statements: CREATE PROCEDURE unpack_with_insert @list varchar(8000), @tbl varchar(30), @delimiter char(1) = ',' AS DECLARE @sql varchar(8000) SET @sql = 'INSERT ' + @tbl + ' VALUES (' + REPLACE(@list, ',', ') INSERT ' + @tbl + ' VALUES (') + ')' EXEC (@sql) This method has the same problems as the other two methods in this section. In my performance tests it fared even worse than UNION.

Really Slow Methods

In a Q&A column of an SQL journal, the following solution was suggested: CREATE PROCEDURE get_company_names_charindex @customers nvarchar(2000) AS SELECT CustomerID, CompanyName FROM Northwind..Customers WHERE charindex(',' + CustomerID + ',', ',' + @customers + ',') > 0 go EXEC get_company_names_charindex 'ALFKI,BONAP,CACTU,FRANK'

You may recognize the theme from when we used a table of numbers. By adding commas of both sides of the input string, we can use charindex to find ",ALFKI," etc. (Note that you cannot have embedded blanks here.)

The author noted in his column that this method would not have good performance, since embedding the table column in an expression precludes use of any index on that column, leading to a table scan. But that's only a small part of the story. A plain table scan on my test table takes less than 100 ms, once the table is entirely in cache. This method needs 42 seconds, even for my shortest test list of 200 characters and 15 items!

Variations on this theme are illustrated by these WHERE clauses: WHERE patindex('%,' + CustomerID + ',%', ',' + @customers + ',') > 0 WHERE ',' + @customers + ',' LIKE '%,' + CustomerID + ',%'

The solution with patindex also needs another 42 seconds. The solution with LIKE was actually four times faster on one of my test machines (but exactly as slow as charindex and patindex on the other machines.) Unfortunately, it is still 100 times slower than dynamic SQL, and 200 times slower than the best methods.

SQL Server 7

If you are using SQL Server 7, you don't have access to user-defined functions and not to XML. Still you can implement the iterative method or use a table of numbers, with comma-separated or fixed-length input, in stored procedures instead. To demonstrate this, I'm including examples for the iterative method here. You could easily apply the same technique for the other two methods.

List-of-string Procedure

Here is a stored procedure which is very similar to the function iter_charlist_to_table above. Rather than returning a table variable, the procedure fills in the temp table #strings:

CREATE PROCEDURE charlist_to_table_sp @list ntext, @delimiter char(1) = N',' AS DECLARE @pos int, @textpos int, @chunklen smallint, @tmpstr nvarchar(4000), @leftover nvarchar(4000), @tmpval nvarchar(4000) SET NOCOUNT ON SELECT @textpos = 1, @leftover = '' WHILE @textpos <= datalength(@list) / 2 BEGIN SELECT @chunklen = 4000 - datalength(@leftover) / 2 SELECT @tmpstr = @leftover + substring(@list, @textpos, @chunklen) SELECT @textpos = @textpos + @chunklen SELECT @pos = charindex(@delimiter, @tmpstr) WHILE @pos > 0 BEGIN SELECT @tmpval = left(@tmpstr, @pos - 1) SELECT @tmpval = ltrim(rtrim(@tmpval)) INSERT #strings(str) VALUES (@tmpval) SELECT @tmpstr = substring(@tmpstr, @pos + 1, len(@tmpstr)) SELECT @pos = charindex(@delimiter, @tmpstr) END SELECT @leftover = @tmpstr END INSERT #strings(str) VALUES(ltrim(rtrim(@leftover)))

And here is an example of how you use it:

CREATE PROCEDURE get_company_names_iterproc @customerids nvarchar(2000) AS CREATE TABLE #strings (str nchar(10) NOT NULL) EXEC charlist_to_table_sp @customerids SELECT C.CustomerID, C.CompanyName FROM Northwind..Customers C JOIN #strings s ON C.CustomerID = s.str go EXEC get_company_names_iterproc 'ALFKI, BONAP, CACTU, FRANK'

The function for unpacking a list of strings had a dual return table, with both a varchar and an nvarchar column. There is not really any reason for this when you use a procedure. The caller specifies the data type that fits with the kind of data he is working with. In the same manner, the function does not populate any listpos column; if you want to keep track of list position, the caller can include an IDENTITY column in the table.

In an earlier incarnation of this article, I suggested that you could have the table name to be a parameter, and the use dynamic SQL to insert into that table. However, this was poor advice. When I ran my performance tests, I found that the performance penalty for the use of dynamic SQL for each INSERT statement is too big to be acceptable. I discuss this further in the section special observations in the performance tests..

There is however a potential performance problem with the solution above as well. It depends on how use it, but normally it will be a new temp table each time. In this case, charlist_to_table_sp will be recompiled on each invocation. In many situations, this would not be an issue. In fact, I found in my performance tests that this procedure was somewhat faster the corresponding function to unpack a string into table, despite the recompilation. However, on a busy system with an intensive call frequency you could experience blocking from compile locks, see KB article 263889.

One way to avoid this would be to use a permanent table, looking something like this:

CREATE TABLE stringarray (spid int NOT NULL, listpos int NOT NULL, str varchar(4000) NOT NULL, nstr nvarchar(2000) NOT NULL, CONSTRAINT pk_array PRIMARY KEY (spid, listpos))

For the column spid, you would use @@spid, the process id. A procedure populating this table would as its first statement include:

DELETE stringarray WHERE spid = @@spid And a caller would need to remember to include the spid column in all references as in this example: CREATE PROCEDURE get_company_names_spid @customerids nvarchar(2000) AS EXEC charlist_to_table_spid @customerids SELECT C.CustomerID, C.CompanyName FROM Northwind..Customers C JOIN stringarray s ON C.CustomerID = s.nstr WHERE s.spid = @@spid DELETE stringarray WHERE spid = @@spid -- Housekeeping. go EXEC get_company_names_spid 'ALFKI, BONAP, CACTU, FRANK'

(Follow the link to see the source code of charlist_to_table_spid.)

Yet another alternative would be to have the temp table local to the procedure, and then produce a result set, which the caller can catch with INSERT EXEC. For more details about this technique, and its weaknesses, see my article How to share data between stored procedures.

An Extravagant List-of-integers Procedure

The technique in the previous section can of course be applied to a list of integers as well, so what comes here is not a true port of the iter_intlist_to_table function, but a version that goes head over heels to validate that the list items are valid numbers to avoid a conversion error. And to be extra ambitious, the procedure permits for signed numbers such as +98 or -83. If a list item is not a legal number, the procedure produces a warning. The procedure fills in a temp table that has a listpos column; this column will show a gap if there is an illegal item in the input.

CREATE PROCEDURE intlist_to_table_sp @list ntext AS DECLARE @pos int, @textpos int, @listpos int, @chunklen smallint, @str nvarchar(4000), @tmpstr nvarchar(4000), @leftover nvarchar(4000) SET NOCOUNT ON SELECT @textpos = 1, @listpos = 1, @leftover = '' WHILE @textpos <= datalength(@list) / 2 BEGIN SELECT @chunklen = 4000 - datalength(@leftover) / 2 SELECT @tmpstr = ltrim(@leftover + substring(@list, @textpos, @chunklen)) SELECT @textpos = @textpos + @chunklen SELECT @pos = charindex(' ', @tmpstr) WHILE @pos > 0 BEGIN SELECT @str = rtrim(ltrim(substring(@tmpstr, 1, @pos - 1))) EXEC insert_str_to_number @str, @listpos SELECT @listpos = @listpos + 1 SELECT @tmpstr = ltrim(substring(@tmpstr, @pos + 1, len(@tmpstr))) SELECT @pos = charindex(' ', @tmpstr) END SELECT @leftover = @tmpstr END IF ltrim(rtrim(@leftover)) <> '' EXEC insert_str_to_number @leftover, @listpos go -- This is a sub-procedure to intlist_to_table_sp CREATE PROCEDURE insert_str_to_number @str nvarchar(200), @listpos int AS DECLARE @number int, @orgstr nvarchar(200), @sign smallint, @decimal decimal(10, 0) SELECT @orgstr = @str IF substring(@str, 1, 1) IN ('-', '+') BEGIN SELECT @sign = CASE substring(@str, 1, 1) WHEN '-' THEN -1 WHEN '+' THEN 1 END SELECT @str = substring(@str, 2, len(@str)) END ELSE SELECT @sign = 1 IF @str LIKE '%[0-9]%' AND @str NOT LIKE '%[^0-9]%' BEGIN IF len(@str) <= 9 SELECT @number = convert(int, @str) ELSE IF len(@str) = 10 BEGIN SELECT @decimal = convert(decimal(10, 0), @str) IF @decimal <= convert(int, 0x7FFFFFFF) SELECT @number = @decimal END END IF @number IS NOT NULL INSERT #numbers (listpos, number) VALUES (@listpos, @sign * @number) ELSE RAISERROR('Warning: at position %d, the string "%s" is not an legal integer', 10, -1, @listpos, @orgstr) go

Here is how you would use it: CREATE PROCEDURE get_product_names_iterproc @ids varchar(50) AS CREATE TABLE #numbers (listpos int NOT NULL, number int NOT NULL) EXEC intlist_to_table_sp @ids SELECT P.ProductID, P.ProductName FROM Northwind..Products P JOIN #numbers n ON P.ProductID = n.number go EXEC get_product_names_iterproc '9 12 27 37'

The validation of the list item is in the sub-procedure insert_str_to_number. For many purposes it would be sufficient to have the test @str NOT LIKE '%[^0-9]%' AND len(@str) BETWEEN 1 AND 9

which checks that @str only contain digits and is at most nine digits long (that is, you disapprove ten-digit numbers as well as signed numbers).

You might guess that there is a performance cost for this extravaganza, and indeed the procedure needs about 50% more time than the corresponding function. Still, for many situations, the execution time is acceptable.

One note about the warning produced with RAISERROR: with ADO, this warning may be difficult or impossible to detect on client level. If you change the severity from 10 to 11, it will be an error, and raise an error in your client code.

SQL Server 6.5

If you are on SQL 6.5, you are far more limited in what you can do. The varchar data type is limited to 255 characters, and the substring function does not work on text, so you cannot split up in chunks or extract fixed-length elements. You can still implement stored procedures as for SQL 7, but since the input string would be limited to 255 characters, the risk that you every now and then would pass a list exceeding this limit is too considerable to be neglected.

Therefore, the only method that is robust enough on 6.5 is dynamic SQL. You can still use text to pass a long list as in this example:

CREATE PROCEDURE get_authors_exec @authors text AS EXEC('SELECT au_id, au_lname, au_fname FROM pubs..authors WHERE au_id IN (' + @authors + ')') go EXEC get_authors_exec '''172-32-1176'', ''427-17-2319'', ''724-08-9931'''

Data from Performance Tests

General Disclaimer

Testing performance in a DBMS is not a simple task. There are so many factors that affect performance. In my tests I have tested unpacking the list alone into a table (for the methods where this is possible) and using the list to get data from a table. Merely unpacking tells us something about the method as such, but the great impact comes when you try to use the method to get data from one or more tables, in which case the nature of the table(s) involved (index, statistics, size etc) can have great importance. You may find that some method that I claim to be fast, results in slow query plan for your query. Another issue is how submit your SQL statements to the server. For the fixed-string functions, I found that there was a considerable difference between RPC calls and command-text batches. (All data I present are from calls submitted through RPC.) Yet, another issue what happens when the system comes under load. For the sake of simplicity, I have run the tests on idle machines.

So take these numbers for what they are. They certainly serve to point out some methods as hopelessly bad. But you will find that several of them have response times that are perfectly reasonable when run on their own. If you are worried about performance in a certain context, you should probably run your own benchmarks.

The Contenders

As I have assembled data for this article, I have tested too many methods to include data for all of them in the main body of this text. Some of the methods have just been small variations of each other like returning varchar instead of nvarchar, and I have not even bothered to save data for every little variation. Eventually I have data for 16 test cases. Of these, I have selected the nine to include in the main section of the test results in the article. Yet a few more appear in the section for special observations where I highlight some details. The full data for all 16 methods on the three test machines I have used are available as text files, for which there are links further down in this text.

I have given each method that I've tested a name, and I will use these names as a convenient short-hand notation. All methods appear in the text above; most of them in code examples, but some are only mentioned in passing.

Here are the nine main competitors:

ITERThe iterative functions: iter_intlist_to_table and iter_charlist_to_table. (Note: the data for iter_charlist_to_table if for an older version. The version presented in this article is some 10% faster.) TBLNUM A table of numbers: the function text_split_me which is the same as duo_text_split me above, save that it only returns a varchar column. FIX$SINGLE Fixed-length list elements: the function fixstring_single, joining once with the table of numbers. FIX$MULTI Fixed-length list elements: fixstring_multi, that joins the table of numbers twice to make it virtually unlimited. XMLATTROPENXML, using attribute-centred XML. EXEC$ADynamic SQL, when called with a new input string. UNIONMaking the list into a SELECT, the procedure unpack_with_union. REALSLOWThe really slow method, using charindex to find the list elements.. <SLOW$LIKEAnother really slow method, using LIKE to find the list elements.

And here are the other seven:

ITER$PROCThe iterative procedures iter_intlist_to_table_sp and iter_charlist_to_table_sp. (Note: the data for iter_charlist_to_table_sp if for an older version. The version presented in this article is some 10% faster.) ITER$EXECA variation of iter_charlist_to_table_sp that accepts a table name as parameter and uses dynamic SQL to insert into it. TBLNUM$IL A table with numbers; the inline function inline_split_me which is limited in input size. FIX$ITER Fixed-length list elements, unpacked iteratively. XMLELEMOPENXML, using element-centred XML. EXEC$BDynamic SQL, invoked for a second time for the same input string. INSERTA trick with INSERT, the procedure unpack_with_insert.

In the following sections I describe how the tests were set up and what the presumptions were. If you find this too detailed, please feel free to jump directly to the results.

How the Tests Were Carried Out

The Test Table

This is the table I have used for all tests.

CREATE TABLE usrdictwords (wordno int NOT NULL, word varchar(50) NOT NULL, guid char(36) NOT NULL) CREATE CLUSTERED INDEX wordno_ix ON usrdictwords(wordno) CREATE UNIQUE INDEX word_ix ON usrdictwords(word)

The table contains 202985 rows, and the average length of the column word is 9.7 characters. The longest word is 31 characters. The values in the column wordno are in fact unique; the reason that the index is not unique is simply a mistake that I didn't notice until I had ran all tests. The order of wordno is not correlated with the order of word. The column guid serves to make the table a little wider, and to be a token non-indexed column.

The Test Data

For each test, my test script randomly selected rows from usrdictwords, to construct one comma-separated list each of strings and integers from the columns word and wordno respectively. After each list item the script randomly added 0-3 spaces between the list elements. (The reason for this is that the test script also served to validate that the implementations of the methods yielded the correct result.) The script kept adding to the lists until the list of strings exceeded a pre-determined size. Thus, the integer lists were shorter in length, but the number of list elements was the same.


Google Reader or Homepage Subscribe Add to My Yahoo! Subscribe with Bloglines Subscribe in NewsGator Online Add to Technorati Favorites!
Web Development : Free Php Scripts | PHP Script and Programs | Javascripts | Javascript Kit | For latest IT news | Sitemap
Free Web Templates : Nuvio Templates - Free Web Templates & WordPress Themes
Best PHP email Systems and Forms : PHP Email Systems Script and Programs | Free ready to use Contact me form for your Website | Free PHP contact us form script | PHP Webmasters tools and help | Website Ranking - Rankings - Get Website Optimization basics correct, add great linking strategies and your website rankings will rocket. Most Web Masters lack website optimization service skills, website ranking and website optimization services is our specialty.

Search Technohub.in

Guest Book    |   Contact Me    |    Articles    |    Comments    |    Bookmark My Site !