Executes a command string or character string within a Transact-SQL batch, or one of the following modules: system stored procedure, user-defined stored procedure, CLR stored procedure, scalar-valued user-defined function, or extended stored procedure. The EXECUTE statement can be used to send pass-through commands to linked servers. Additionally, the context in which a string or command is executed can be explicitly set. Metadata for the result set can be defined by using the WITH RESULT SETS options.
Before you call EXECUTE with a character string, validate the character string. Never execute a command constructed from user input that has not been validated.
Transact-SQL Syntax Conventions
Is an optional integer variable that stores the return status of a module. This variable must be declared in the batch, stored procedure, or function before it is used in an EXECUTE statement.
When used to invoke a scalar-valued user-defined function, the @return_status variable can be of any scalar data type.
Is the fully qualified or nonfully qualified name of the stored procedure or scalar-valued user-defined function to call. Module names must comply with the rules for identifiers. The names of extended stored procedures are always case-sensitive, regardless of the collation of the server.
A module that has been created in another database can be executed if the user running the module owns the module or has the appropriate permission to execute it in that database. A module can be executed on another server running SQL Server if the user running the module has the appropriate permission to use that server (remote access) and to execute the module in that database. If a server name is specified but no database name is specified, the SQL Server Database Engine looks for the module in the default database of the user.
Applies to: SQL Server 2008 through SQL Server 2017
Is an optional integer that is used to group procedures of the same name. This parameter is not used for extended stored procedures.
This feature will be removed in a future version of Microsoft SQL Server. Avoid using this feature in new development work, and plan to modify applications that currently use this feature.
For more information about procedure groups, see CREATE PROCEDURE (Transact-SQL).
Is the name of a locally defined variable that represents a module name.
This can be a variable that holds the name of a natively compiled, scalar user-defined function.
Is the parameter for module_name, as defined in the module. Parameter names must be preceded by the at sign (@). When used with the @parameter_name=value form, parameter names and constants do not have to be supplied in the order in which they are defined in the module. However, if the @parameter_name=value form is used for any parameter, it must be used for all subsequent parameters.
By default, parameters are nullable.
Is the value of the parameter to pass to the module or pass-through command. If parameter names are not specified, parameter values must be supplied in the order defined in the module.
When executing pass-through commands against linked servers, the order of the parameter values depends on the OLE DB provider of the linked server. Most OLE DB providers bind values to parameters from left to right.
If the value of a parameter is an object name, character string, or qualified by a database name or schema name, the whole name must be enclosed in single quotation marks. If the value of a parameter is a keyword, the keyword must be enclosed in double quotation marks.
If a default is defined in the module, a user can execute the module without specifying a parameter.
The default can also be NULL. Generally, the module definition specifies the action that should be taken if a parameter value is NULL.
Is the variable that stores a parameter or a return parameter.
Specifies that the module or command string returns a parameter. The matching parameter in the module or command string must also have been created by using the keyword OUTPUT. Use this keyword when you use cursor variables as parameters.
If value is defined as OUTPUT of a module executed against a linked server, any changes to the corresponding @parameter performed by the OLE DB provider will be copied back to the variable at the end of the execution of module.
If OUTPUT parameters are being used and the intent is to use the return values in other statements within the calling batch or module, the value of the parameter must be passed as a variable, such as @parameter = @variable. You cannot execute a module by specifying OUTPUT for a parameter that is not defined as an OUTPUT parameter in the module. Constants cannot be passed to module by using OUTPUT; the return parameter requires a variable name. The data type of the variable must be declared and a value assigned before executing the procedure.
When EXECUTE is used against a remote stored procedure, or to execute a pass-through command against a linked server, OUTPUT parameters cannot be any one of the large object (LOB) data types.
Return parameters can be of any data type except the LOB data types.
Supplies the default value of the parameter as defined in the module. When the module expects a value for a parameter that does not have a defined default and either a parameter is missing or the DEFAULT keyword is specified, an error occurs.
Is the name of a local variable. @string_variable can be any char, varchar, nchar, or nvarchar data type. These include the (max) data types.
Is a constant string. tsql_string can be any nvarchar or varchar data type. If the N is included, the string is interpreted as nvarchar data type.
Specifies the context in which the statement is executed.
Applies to: SQL Server 2008 through SQL Server 2017
Specifies the context to be impersonated is a login. The scope of impersonation is the server.
Specifies the context to be impersonated is a user in the current database. The scope of impersonation is restricted to the current database. A context switch to a database user does not inherit the server-level permissions of that user.
While the context switch to the database user is active, any attempt to access resources outside the database will cause the statement to fail. This includes USE database statements, distributed queries, and queries that reference another database by using three- or four-part identifiers.
Is a valid user or login name. name must be a member of the sysadmin fixed server role or exist as a principal in sys.database_principals or sys.server_principals, respectively.
name cannot be a built-in account, such as NT AUTHORITY\LocalService, NT AUTHORITY\NetworkService, or NT AUTHORITY\LocalSystem.
For more information, see Specifying a User or Login Name later in this topic.
Is a constant string that contains the command to be passed through to the linked server. If the N is included, the string is interpreted as nvarchar data type.
Indicates parameters for which values are supplied in the <arg-list> of pass-through commands that are used in an EXEC('…', <arg-list>) AT <linkedsrv> statement.
Applies to: SQL Server 2008 through SQL Server 2017
Specifies that command_string is executed against linked_server_name and results, if any, are returned to the client. linked_server_name must refer to an existing linked server definition in the local server. Linked servers are defined by using sp_addlinkedserver.
Possible execute options. The RESULT SETS options cannot be specified in an INSERT…EXEC statement.
|RECOMPILE||Forces a new plan to be compiled, used, and discarded after the module is executed. If there is an existing query plan for the module, this plan remains in the cache.|
Use this option if the parameter you are supplying is atypical or if the data has significantly changed. This option is not used for extended stored procedures. We recommend that you use this option sparingly because it is expensive.
Note: You can not use WITH RECOMPILE when calling a stored procedure that uses OPENDATASOURCE syntax. The WITH RECOMPILE option is ignored when a four-part object name is specified.
Note: RECOMPILE is not supported with natively compiled, scalar user-defined functions. If you need to recompile, use sp_recompile (Transact-SQL).
|RESULT SETS UNDEFINED||Applies to: SQL Server 2012 through SQL Server 2017, Azure SQL Database.|
This option provides no guarantee of what results, if any, will be returned, and no definition is provided. The statement executes without error if any results are returned or no results are returned. RESULT SETS UNDEFINED is the default behavior if a result_sets_option is not provided.
For interpreted scalar user-defined functions, and natively compiled scalar user-defined functions, this option is not operational because the functions never return a result set.
|RESULT SETS NONE||Applies to: SQL Server 2012 through SQL Server 2017, Azure SQL Database.|
Guarantees that the execute statement will not return any results. If any results are returned the batch is aborted.
For interpreted scalar user-defined functions, and natively compiled scalar user-defined functions, this option is not operational because the functions never return a result set.
|<result_sets_definition>||Applies to: SQL Server 2012 through SQL Server 2017, Azure SQL Database.|
Provides a guarantee that the result will come back as specified in the result_sets_definition. For statements that return multiple result sets, provide multiple result_sets_definition sections. Enclose each result_sets_definition in parentheses, separated by commas. For more information, see <result_sets_definition> later in this topic.
This option always results in an error for natively compiled, scalar user-defined functions because the functions never return a result set.
<result_sets_definition> Applies to: SQL Server 2012 through SQL Server 2017, Azure SQL Database
Describes the result sets returned by the executed statements. The clauses of the result_sets_definition have the following meaning
[ COLLATE collation_name]
[NULL | NOT NULL]
|See the table below.|
|db_name||The name of the database containing the table, view or table valued function.|
|schema_name||The name of the schema owning the table, view or table valued function.|
|table_name | view_name | table_valued_function_name||Specifies that the columns returned will be those specified in the table, view or table valued function named. Table variables, temporary tables, and synonyms are not supported in the AS object syntax.|
|AS TYPE [schema_name.]table_type_name||Specifies that the columns returned will be those specified in the table type.|
|AS FOR XML||Specifies that the XML results from the statement or stored procedure called by the EXECUTE statement will be converted into the format as though they were produced by a SELECT … FOR XML … statement. All formatting from the type directives in the original statement are removed, and the results returned are as though no type directive was specified. AS FOR XML does not convert non-XML tabular results from the executed statement or stored procedure into XML.|
|column_name||The names of each column. If the number of columns differs from the result set, an error occurs and the batch is aborted. If the name of a column differs from the result set, the column name returned will be set to the name defined.|
|data_type||The data types of each column. If the data types differ, an implicit conversion to the defined data type is performed. If the conversion fails the batch is aborted|
|COLLATE collation_name||The collation of each column. If there is a collation mismatch, an implicit collation is attempted. If that fails, the batch is aborted.|
|NULL | NOT NULL||The nullability of each column. If the defined nullability is NOT NULL and the data returned contains NULLs an error occurs and the batch is aborted. If not specified, the default value conforms to the setting of the ANSI_NULL_DFLT_ON and ANSI_NULL_DFLT_OFF options.|
The actual result set being returned during execution can differ from the result defined using the WITH RESULT SETS clause in one of the following ways: number of result sets, number of columns, column name, nullability, and data type. If the number of result sets differs, an error occurs and the batch is aborted.
Parameters can be supplied either by using value or by using @parameter_name=value. A parameter is not part of a transaction; therefore, if a parameter is changed in a transaction that is later rolled back, the value of the parameter does not revert to its previous value. The value returned to the caller is always the value at the time the module returns.
Nesting occurs when one module calls another or executes managed code by referencing a common language runtime (CLR) module, user-defined type, or aggregate. The nesting level is incremented when the called module or managed code reference starts execution, and it is decremented when the called module or managed code reference has finished. Exceeding the maximum of 32 nesting levels causes the complete calling chain to fail. The current nesting level is stored in the @@NESTLEVEL system function.
Because remote stored procedures and extended stored procedures are not within the scope of a transaction (unless issued within a BEGIN DISTRIBUTED TRANSACTION statement or when used with various configuration options), commands executed through calls to them cannot be rolled back. For more information, see System Stored Procedures (Transact-SQL) and BEGIN DISTRIBUTED TRANSACTION (Transact-SQL).
When you use cursor variables, if you execute a procedure that passes in a cursor variable with a cursor allocated to it an error occurs.
You do not have to specify the EXECUTE keyword when executing modules if the statement is the first one in a batch.
For additional information specific to CLR stored procedures, see CLR Stored Procedures.
Using EXECUTE with Stored Procedures
You do not have to specify the EXECUTE keyword when you execute stored procedures when the statement is the first one in a batch.
SQL Server system stored procedures start with the characters sp_. They are physically stored in the Resource database, but logically appear in the sys schema of every system and user-defined database. When you execute a system stored procedure, either in a batch or inside a module such as a user-defined stored procedure or function, we recommend that you qualify the stored procedure name with the sys schema name.
SQL Server system extended stored procedures start with the characters xp_, and these are contained in the dbo schema of the master database. When you execute a system extended stored procedure, either in a batch or inside a module such as a user-defined stored procedure or function, we recommend that you qualify the stored procedure name with master.dbo.
When you execute a user-defined stored procedure, either in a batch or inside a module such as a user-defined stored procedure or function, we recommend that you qualify the stored procedure name with a schema name. We do not recommend that you name a user-defined stored procedure with the same name as a system stored procedure. For more information about executing stored procedures, see Execute a Stored Procedure.
Using EXECUTE with a Character String
In earlier versions of SQL Server, character strings are limited to 8,000 bytes. This requires concatenating large strings for dynamic execution. In SQL Server, the varchar(max) and nvarchar(max) data types can be specified that allow for character strings to be up to 2 gigabytes of data.
Changes in database context last only until the end of the EXECUTE statement. For example, after the in this following statement is run, the database context is master.
You can use the clause to switch the execution context of a dynamic statement. When the context switch is specified as , the duration of the context switch is limited to the scope of the query being executed.
Specifying a User or Login Name
The user or login name specified in must exist as a principal in sys.database_principals or sys.server_principals, respectively, or the statement will fail. Additionally, IMPERSONATE permissions must be granted on the principal. Unless the caller is the database owner or is a member of the sysadmin fixed server role, the principal must exist even when the user is accessing the database or instance of SQL Server through a Windows group membership. For example, assume the following conditions:
CompanyDomain\SQLUsers group has access to the Sales database.
CompanyDomain\SqlUser1 is a member of SQLUsers and, therefore, has implicit access to the Sales database.
Although CompanyDomain\SqlUser1 has access to the database through membership in the SQLUsers group, the statement will fail because does not exist as a principal in the database.
Specify a login or user that has the least privileges required to perform the operations that are defined in the statement or module. For example, do not specify a login name, which has server-level permissions, if only database-level permissions are required; or do not specify a database owner account unless those permissions are required.
Permissions are not required to run the EXECUTE statement. However, permissions are required on the securables that are referenced within the EXECUTE string. For example, if the string contains an INSERT statement, the caller of the EXECUTE statement must have INSERT permission on the target table. Permissions are checked at the time EXECUTE statement is encountered, even if the EXECUTE statement is included within a module.
EXECUTE permissions for a module default to the owner of the module, who can transfer them to other users. When a module is run that executes a string, permissions are checked in the context of the user who executes the module, not in the context of the user who created the module. However, if the same user owns the calling module and the module being called, EXECUTE permission checking is not performed for the second module.
If the module accesses other database objects, execution succeeds when you have EXECUTE permission on the module and one of the following is true:
The module is marked EXECUTE AS USER or SELF, and the module owner has the corresponding permissions on the referenced object. For more information about impersonation within a module, see EXECUTE AS Clause (Transact-SQL).
The module is marked EXECUTE AS CALLER, and you have the corresponding permissions on the object.
The module is marked EXECUTE AS user_name, and user_name has the corresponding permissions on the object.
Context Switching Permissions
To specify EXECUTE AS on a login, the caller must have IMPERSONATE permissions on the specified login name. To specify EXECUTE AS on a database user, the caller must have IMPERSONATE permissions on the specified user name. When no execution context is specified, or EXECUTE AS CALLER is specified, IMPERSONATE permissions are not required.
A. Using EXECUTE to pass a single parameter
The stored procedure in the AdventureWorks2012 database expects one parameter (). The following examples execute the stored procedure with as its parameter value.
The variable can be explicitly named in the execution:
If the following is the first statement in a batch or an osql or sqlcmd script, EXEC is not required.
B. Using multiple parameters
The following example executes the stored procedure in the AdventureWorks2012 database. It passes two parameters: the first parameter is a product ID () and the second parameter, is a value.
C. Using EXECUTE 'tsql_string' with a variable
The following example shows how handles dynamically built strings that contain variables. This example creates the cursor to hold a list of all user-defined tables in the AdventureWorks2012 database, and then uses that list to rebuild all indexes on the tables.
D. Using EXECUTE with a remote stored procedure
The following example executes the stored procedure on the remote server and stores the return status that indicates success or failure in .
Applies to: SQL Server 2008 through SQL Server 2017
E. Using EXECUTE with a stored procedure variable
The following example creates a variable that represents a stored procedure name.
F. Using EXECUTE with DEFAULT
The following example creates a stored procedure with default values for the first and third parameters. When the procedure is run, these defaults are inserted for the first and third parameters when no value is passed in the call or when the default is specified. Note the various ways the keyword can be used.
The stored procedure can be executed in many combinations.
G. Using EXECUTE with AT linked_server_name
The following example passes a command string to a remote server. It creates a linked server that points to another instance of SQL Server and executes a DDL statement () against that linked server.
Applies to: SQL Server 2008 through SQL Server 2017
H. Using EXECUTE WITH RECOMPILE
The following example executes the stored procedure and forces a new query plan to be compiled, used, and discarded after the module is executed.
I. Using EXECUTE with a user-defined function
The following example executes the scalar user-defined function in the AdventureWorks2012 database. It uses the variable to store the value returned by the function. The function expects one input parameter, . This is defined as a tinyint data type.
J. Using EXECUTE to query an Oracle database on a linked server
The following example executes several statements at the remote Oracle server. The example begins by adding the Oracle server as a linked server and creating linked server login.
Applies to: SQL Server 2008 through SQL Server 2017
K. Using EXECUTE AS USER to switch context to another user
The following example executes a Transact-SQL string that creates a table and specifies the clause to switch the execution context of the statement from the caller to . The Database Engine will check the permissions of when the statement is run. must exist as a user in the database and must have permission to create tables in the schema, or the statement fails.
L. Using a parameter with EXECUTE and AT linked_server_name
The following example passes a command string to a remote server by using a question mark () placeholder for a parameter. The example creates a linked server that points to another instance of SQL Server and executes a statement against that linked server. The statement uses the question mark as a place holder for the parameter (), which is provided after the statement.
Applies to: SQL Server 2008 through SQL Server 2017
M. Using EXECUTE to redefine a single result set
Some of the previous examples executed which returned 7 columns. The following example demonstrates using the syntax to change the names and data types of the returning result set.
Applies to: SQL Server 2012 through SQL Server 2017, Azure SQL Database
N. Using EXECUTE to redefine a two result sets
When executing a statement that returns more than one result set, define each expected result set. The following example in AdventureWorks2012 creates a procedure that returns two result sets. Then the procedure is executed using the WITH RESULT SETS clause, and specifying two result set definitions.
Applies to: SQL Server 2012 through SQL Server 2017, Azure SQL Database
Examples: Azure SQL Data Warehouse and Parallel Data Warehouse
Example O: Basic Procedure Execution
Executing a stored procedure:
Calling a stored procedure with name determined at runtime:
Calling a stored procedure from within a stored procedure:
Example P: Executing Strings
Executing a SQL string:
Executing a nested string:
Executing a string variable:
Example Q: Procedures with Parameters
The following example creates a procedure with parameters and demonstrates 3 ways to execute the procedure:
DECLARE @local_variable (Transact-SQL)
EXECUTE AS Clause (Transact-SQL)
Principals (Database Engine)
Scalar User-Defined Functions for In-Memory OLTP
How to Share Data between Stored Procedures
An SQL text by Erland Sommarskog, SQL Server MVP. Most recent update 2017-05-26.
Copyright applies to this text. See here for font conventions used in this article.
This article tackles two related questions:
- How can I use the result set from one stored procedure in another, also expressed as How can I use the result set from a stored procedure in a SELECT statement?
- How can I pass table data in a parameter from one stored procedure to another?
In this text I will discuss a number of possible solutions and point out their advantages and drawbacks. Some methods apply only when you want to access the output from a stored procedure, whereas other methods are good for the input scenario, and yet others are good for both input and output. In the case you want to access a result set, most methods require you to rewrite the stored procedure you are calling (the callee) in one way or another, but some solutions do not.
Here is a summary of the methods that I will cover. Required version refers to the earliest version of SQL Server where the solution is available. When the column is empty, this means all versions from SQL 2000 and up.
|Method||In/Out||Rewrite callee?||Required version||Comment|
|Output||Yes||Not generally applicable, but sometimes overlooked.|
|Table-valued Functions||Output||Yes||Often the best choice for output-only, but there are several restrictions.|
|Inline Functions||Use this to reuse a single SELECT.|
|Multi-statement Functions||When you need to encapsulate more complex logic.|
|Using a Table||In/Out||Yes||The most general solution. My favoured choice for input/output scenarios.|
|Sharing a Temp Table||Mainly for a single pair of caller/callee.|
|Process-keyed Table||Best choice for many callers to the same callee.|
|Table-valued Parameters||Input||Yes||SQL 2008||Mainly useful when passing data from a client.|
|Output||No||Deceivingly appealing, but should be used sparingly.|
|Output||No||SQL 2005||Complex, but useful as a last resort when INSERT-EXEC does not work.|
|Output||No||Tricky with many pitfalls. Discouraged.|
|In/Out||Yes||SQL 2005||A bit of a kludge, but not without advantages.|
|Using Cursor Variables||Output||Yes||Not recommendable.|
|Session Context||In/Out||Yes||SQL 2016||Not a general method, but useful to keep data globally available in a process.|
At the end of the article, I briefly discuss the particular situation when your stored procedures are on different servers, which is a quite challenging situation.
A related question is how to pass table data from a client, but this is a topic which is outside the scope for this text, but I discuss this in my article Using Table-Valued Parameters in SQL Server and .NET.
Examples in the article featuring tables such as authors, titles, sales etc run in the old sample database pubs. You can download the script for pubs from Microsoft's web site. (Some examples use purely fictive tables, and do not run in pubs.)
This method can only be used when the result set is one single row. Nevertheless, this is a method that is sometimes overlooked. Say you have this simple stored procedure:CREATE PROCEDURE insert_customer @name nvarchar(50), @address nvarchar(50), @city nvarchar(50) AS DECLARE @cust_id int BEGIN TRANSACTION SELECT @cust_id = coalesce(MAX(cust_id), 0) + 1 FROM customers (UPDLOCK) INSERT customers (cust_id, name, address, city) VALUES (@cust_id, @name, @address, @city) COMMIT TRANSACTION SELECT @cust_id That is, the procedure inserts a row into a table, and returns the id for the row.
Rewrite this procedure as:CREATE PROCEDURE insert_customer @name nvarchar(50), @address nvarchar(50), @city nvarchar(50), @cust_id int OUTPUT AS BEGIN TRANSACTION SELECT @cust_id = coalesce(MAX(cust_id), 0) + 1 FROM customers (UPDLOCK) INSERT customers (cust_id, name, address, city) VALUES (@cust_id, @name, @address, @city) COMMIT TRANSACTION
You can now easily call insert_customer from another stored procedure. Just recall that in T‑SQL you need to specify the OUTPUT keyword also in the call:EXEC insert_customer @name, @address, @city, @cust_id OUTPUT
Note: this example has a single output parameter, but a stored procedure can have many output parameters.
When all you want to do is to reuse the result set from a stored procedure, the first thing to investigate is whether it is possible to rewrite the stored procedure as a table-valued function. This is far from always possible, because SQL Server is very restrictive with what you can put into a function. But when it is possible, this is often the best choice.
There are two types of table functions in SQL Server: inline and multi-statement functions.
Here is a example of an inline function adapted from Books Online for SQL 2000:CREATE FUNCTION SalesByStore (@storeid varchar(30)) RETURNS TABLE AS RETURN (SELECT t.title, s.qty FROM sales s JOIN titles t ON t.title_id = s.title_id WHERE s.stor_id = @storeid) To use it, you simply say: SELECT * FROM SalesByStore('6380') You can filter the data with WHERE or use it in a bigger query that includes other tables. That is, you use the function just like was a table or a view. You could say that an inline function is a parameterised view, because the query optimizer expands the function as if it was a macro, and generates the plan as if you had provided the expanded query. Thus, there is no performance cost for packaging a SELECT statement into an inline function. For this reason, when you want to reuse a stored procedure that consists of a single SELECT statement, rewriting it into an inline UDF is without doubt the best choice. (Or instead of rewriting it, move the SELECT into a UDF, and rewrite the existing procedure as a wrapper on the function, so that the client is unaffected.)
There are a couple of system functions you cannot use in a UDF, because SQL Server thinks it matters that they are side-effecting. Two examples are newid(), and rand(). On SQL 2000 this restriction goes further and disallows all system functions that are nondeterministic, that is, functions that do not return the same value for the same input parameters on each call. A typical example is getdate().
A multi-statement function has a body that can have as many statements as you like. You need to declare a return table, and you insert the data to return into that table. Here is the function above as a multi-statement function:CREATE FUNCTION SalesByStore (@storeid varchar(30)) RETURNS @t TABLE (title varchar(80) NOT NULL PRIMARY KEY, qty smallint NOT NULL) AS BEGIN INSERT @t (title, qty) SELECT t.title, s.qty FROM sales s JOIN titles t ON t.title_id = s.title_id WHERE s.stor_id = @storeid RETURN END
You use multi-statement functions in the same way as you use inline functions, but in difference to inline functions, they are not expanded in place, but instead it's like you would call a stored procedure in the middle of the query and return the data in a table variable. This permits you to move the code of a more complex stored procedure into a function.
As you can see in the example, you can define a primary key for your return table. I like to point out that this definitely best practice for two reasons:
- It states your assumptions of the data. If your assumptions are incorrect, you will be told up front. (Instead of spending time to understand why your application presents incorrect data.)
- This information is valuable to the optimizer when you use the function in a larger query.
It goes without saying, that this is only meaningful if you define a primary key on the columns you produce in the body of the UDF. Adding an IDENTITY column to the return table only to get a primary key is pointless.
Compared to inline functions, multi-statement functions incur some overhead due to the return table. More important, though, is that if you use the function in a query where you join with other tables, the optimizer will have no idea of what the function returns, and will make standard assumptions. This is far from always an issue, but the more rows the function returns, the higher the risk that the optimizer will make incorrect estimates and produce an inefficient query plan. One way to avoid this is to insert the results from the function into a temp table. Since a temp table has statistics this helps the optimizer to make a better plan.
Note: In SQL 2017, Microsoft introduced something they called interleaved execution for multi-statement functions. When compiling the query, they first run the function, and once the optimizer knows the number of rows the function returns, it builds the rest of the plan based on this. However, because of the lack of distribution statistics, bouncing the data over a temp table may still yield better results. Also, beware that interleaved execution only applies to SELECT statements, but not to SELECT INTO, INSERT, UPDATE, DELETE or MERGE. Particularly the limitation with the first two can be deceitful, since a SELECT that works fine when it returns data to the client, may start to misbehave when you decide to capture the rows in a table in an INSERT or SELECT INTO statement instead, because you now get the blind estimate.
User-defined functions are quite restricted in what they can do, because a UDF is not permitted to change the database state. The most important restrictions are:
- You can only perform INSERT, UPDATE or DELETE statements on table variables local to the function.
- You cannot call stored procedures (with the exception of extended stored procedures).
- You cannot invoke dynamic SQL.
- You cannot create tables, neither permanent tables nor temp tables. You can use table variables.
- You cannot use RAISERROR, TRY-CATCH or BEGIN/COMMIT/ROLLBACK TRANSACTION.
- You cannot use "side-effecting" system functions, such as newid() and rand().
- On SQL 2000, you cannot use non-deterministic system functions.
Please see the Remarks section in the topic for CREATE FUNCTION in Books Online for a complete list of restrictions.
Using a Table
What could be better for passing data in a database than a table? When using a table there are no restrictions like there is when you use a table-valued function. There are two main variations of this method: 1) Sharing a local temp table. 2) Using a process-keyed table. The former is more lightweight, but it comes with a maintainability problem which the second alternative addresses by using a table with a persisted schema. Both solution comes with recompilation problems that can be a serious problem for stored procedures that are called with a high frequency, although for a process-keyed tables this can be mitigated. Using a local temp table also introduces a risk for cache littering. I will discuss these problems in more details as we move on.
Sharing a Temp Table
The method itself is as simple as this:CREATE PROCEDURE inner_sp @par1 int, @par2 bit, ... AS ... INSERT/UPDATE/DELETE #tmp go CREATE PROCEDURE outer_sp AS DECLARE ... CREATE TABLE #mytemp (col1 int NOT NULL, col2 char(5) NULL, ...) ... EXEC inner_sp @par1, @par2 ... SELECT * FROM #mytemp go
In this example, outer_sp creates the temp table, and inner_sp fills it in, that is, the table is output-only. A different scenario is that outer_sp fills the table with input data whereupon inner_sp performs some general computation, and the caller uses the result from that computation for some purpose. That is, the table is used for both input and output. Yet a scenario is that the caller prepares the temp table with data, and the callee first performs checks to verify that a number of business rules are not violated, and then goes on to update one or more tables. This would be an input-only scenario.
Changing Existing Code
Say that you have this procedure:CREATE PROCEDURE SalesByStore @storeid varchar(30) AS SELECT t.title, s.qty FROM sales s JOIN titles t ON t.title_id = s.title_id WHERE s.stor_id = @storeid
You want to reuse this result set in a second procedure that returns only titles that have sold above a certain quantity. How would you achieve this by sharing a temp table without affect existing clients? The solution is to move the meat of the procedure into a sub-procedure, and make the original procedure a wrapper on the original like this:CREATE PROCEDURE SalesByStore_core @storeid varchar(30) AS INSERT #SalesByStore (title, qty) SELECT t.title, s.qty FROM sales s JOIN titles t ON t.title_id = s.title_id WHERE s.stor_id = @storeid go CREATE PROCEDURE SalesByStore @storeid varchar(30) AS CREATE TABLE #SalesByStore(title varchar(80) NOT NULL PRIMARY KEY, qty smallint NOT NULL) EXEC SalesByStore_core @storeid SELECT * FROM #SalesByStore go CREATE PROCEDURE BigSalesByStore @storeid varchar(30), @qty smallint AS CREATE TABLE #SalesByStore(title varchar(80) NOT NULL PRIMARY KEY, qty smallint NOT NULL) EXEC SalesByStore_core @storeid SELECT * FROM #SalesByStore WHERE qty >= @qty go EXEC SalesByStore '7131' EXEC BigSalesByStore '7131', 25 go DROP PROCEDURE SalesByStore, BigSalesByStore, SalesByStore_core
Note: This script is a complete repro script that creates some objects, tests them, and then drops them, to permit simple testing of variations. We will look at more versions of these procedures later in this text.
Just like in the example with the multi-statement function, I have defined a primary key for the temp table, and exactly for the same reasons. Speaking of best practices, some readers may wonder about the use of SELECT * here. I think using SELECT * from a temp table created in the same procedure is OK, particularly if the purpose is to return all columns in the temp table. (In difference to using SELECT * from a table created elsewhere, and which may be altered without your knowledge.)
While this solution is straightforward, you may feel uneasy by the fact that the CREATE TABLE statement for the temp table appears in two places, and there is a third procedure that depends on the definition. Here is a solution which is a little more convoluted that to some extent alleviates the situation:CREATE PROCEDURE SalesByStore_core @storeid varchar(30), @wantresultset bit = 0 AS IF object_id('tempdb..#SalesByStore') IS NULL BEGIN CREATE TABLE #SalesByStore(title varchar(80) NOT NULL PRIMARY KEY, qty smallint NOT NULL) END INSERT #SalesByStore (title, qty) SELECT t.title, s.qty FROM sales s JOIN titles t ON t.title_id = s.title_id WHERE s.stor_id = @storeid IF @wantresultset = 1 SELECT * FROM #SalesByStore go CREATE PROCEDURE SalesByStore @storeid varchar(30) AS EXEC SalesByStore_core @storeid, 1 go
I've moved the CREATE TABLE statement from the wrapper into the core procedure, which only creates the temp table only if it does not already exist. The wrapper now consists of a single EXEC statement and passes the parameter @wantresultset as 1 to instruct the core procedure to produce the result set. Since this parameter has a default of 0, BigSalesByStore can be left unaffected from the previous example. (Thus, there are still two CREATE TABLE statements for the temp table.)
A Note on the Virtues of Code Reuse
Before we move on, I like to point out that the given example as such is not very good practice. Not because the concept of sharing temp tables as such is bad, but as with all solutions, you need to use them in the right place. As you realise, defining a temp table and creating one extra stored procedure is too heavy artillery for this simple problem. But an example where sharing temp tables would be a good solution would have to consist of many more lines of code, which would have obscured the forest with a number of trees. Thus, I've chosen a very simple example to highlight the technique as such.
Keep in mind that compared to languages such as C# and Java, Transact-SQL is poorly equipped for code reuse, why solutions in T‑SQL to reuse code are clumsier. For this reason, the bar for reuse is somewhat higher in T‑SQL. It's still a virtue, but not as big virtue as in modern object-oriented languages. In this simple problem, the best would of course be to add @qty as a parameter to SalesByStore. And if that would not be feasible for some reason, it would still be better to create BigSalesByStore by means of copy-paste than sharing a temp table.
There are two performance issues with this technique that you need to be aware of: it can cause quite some cache littering and all statements in the inner procedure that refer to the shared temp table will be recompiled every time.
The first issue is something I managed to sleep over myself for many years, until Alex Friedman made me aware of it. It is covered in the white paper Plan Caching in SQL Server 2008 (which despite the title apply to later versions as well) where it says:
If a stored procedure refers to a temporary table not created statically in the procedure, the spid (process ID) gets added to the cache key. This means that the plan for the stored procedure would only be reused when executed again by the same session. Temporary tables created statically within the stored procedure do not cause this behavior.
That is, if two sessions invoke outer_sp there be will be one entry each in the cache of inner_sp for the two sessions. Say now that you have a busy system where there are thousand sessions that invoke outer_sp. That results in thousand cache entries for one single procedure. Imagine now that you employ this technique in some 10-20 pairs of stored procedure with that usage pattern. That's up to 20000 cache entries. That is quite a lot. A second consequence of this is that depending on the session id you get you can get different execution plans due to parameter sniffing, which can lead to some confusion until you realise what is going on.
There is a simple way to avoid this cache littering, although it comes with its own price: add WITH RECOMPILE to the inner procedure. This prevents the plan for this procedure to be put in the cache at all, but it also means that the procedure has to be compiled every time. This is however not as bad it may sound at first. Or more precisely: even without WITH RECOMPILE, you may still have to face a lot of recompilation of the inner procedure and that is the second performance issue with this technique.
To wit: every time outer_sp is called, inner_sp sees a new instance of the shared temp table, and there is no guarantee that the schema is the same as last time. For this reason, SQL Server will recompile all statements in inner_sp that refers to the shared temp table. As long as the procedure is called with moderate frequency, this recompilation is not so much of concern, but in a high-frequency scenario it can cause quite an increase in CPU usage.
If the pattern is such that outer_sp is called with high frequency, but from a small number of sessions, so that you don't need WITH RECOMPILE to avoid cache littering, and the number of rows in the shared temp table is moderate, you can reduce the number of recompilations in this way: you write inner_sp so that it most of the time it works with a temp table local to the procedure (which you must not create with SELECT INTO!). On entry and exit from the procedure you copy data from/to the shared temp table. To be any point in this, I think you should use the query hint OPTION (KEEPFIXED PLAN) in the queries referring to this local temp table to prevent recompilation because of changes in statistics.
If the inner procedure is called from many places, and you want to change which columns it reads/writes, you need to revisit all calling stored procedures to edit the temp-table definition. For this reason, sharing temp tables is mainly useful when you have a single pair of caller and callee. Then again, if the temp table is narrow, maybe only a single column of customer IDs to process, the table is likely to be very stable.
There are some alternatives to overcome the maintenance problem. One is to use a process-keyed table, which we will look into in the next section. I have also received some interesting ideas from readers of this article.
One solution comes from Richard St-Aubin. The callers create the temp table with a single dummy column, and then call a stored procedure that uses ALTER TABLE to add the real columns. It would look something like this:CREATE PROCEDURE inner_sp @par1 int, @par2 bit, ... AS ... INSERT/UPDATE/DELETE #mytemp go CREATE PROCEDURE define_temp_table AS ALTER TABLE #mytemp ADD col1 int NOT NULL, col2 char(5) NULL, ... go CREATE PROCEDURE outer_sp AS DECLARE ... CREATE TABLE #mytemp (dummycol bit) EXEC define_temp_table ... EXEC inner_sp @par1, @par2 ... SELECT * FROM #mytemp go
You must create the temp table in outer_sp, since if you were to put the CREATE TABLE statement in define_temp_table, the table would be dropped when that procedure exits. This method can definitely be worth exploring, but beware that now you will get the schema-induced recompile that I discussed in the previous section in the outer procedures as well. Also, beware that this trick prevents SQL Server from caching the temp-table definition. This is one more thing that could have a significant impact in case of high-frequency calls.
Another solution, which requires SQL 2008, comes from Wayne Bloss. He creates a table type that holds the definition of the temp table. You can only use table types for declaring table variable and table parameters. But Wayne has a cure for this:DECLARE @dummy my_table_type SELECT * INTO #mytemp FROM @dummy
From this point you work with #mytemp; the sole purpose of @dummy is to be able to create #mytemp from a known and shared definition. (If you are unacquainted with table types, we will take a closer look on them in the section on table-valued parameters.) A limitation with this method is that you can only centralise column definitions this way, but not constraints as they are not copied with SELECT INTO. You may think that constraints are odd things that you rarely put in a temp table, but I have found that it is often fruitful to add constraints to my temp tables as assertions for my assumptions about the data. This does not the least apply for temp tables that are shared between stored procedures. Also, defining primary keys for your temp tables can avoid performance issues when you start to join them.
Let me end this section by pointing out that sharing temp tables opens for some flexibility. The callee only cares about the columns it reads or writes. This permits a caller to add extra columns for its own usage when it creates the temp table. Thus, two callers to the same inner procedure could have different definitions of the temp table, as long as the columns accessed by the inner procedure are defined consistently.
Note: A more advanced way to tackle the maintenance problem is to use a pre-processor and put the definition of the temp table in an include-file. If you have a C compiler around, you can use the C pre-processor. My AbaPerls includes a pre-processor, Preppis, which we use in the system I spend most of my time with.
A Note on SQL Server Data Tools
SQL Server Data Tools, SSDT, is a very versatile environment that gives you many benefits. One benefit is that if you write a stored procedure like:CREATE PROCEDURE example_sp AS CREATE TABLE #temp(a int NOT NULL) SELECT a FROM #temmp
SSDT will tell you up front of the misspelling about the temp table name, before you try to create the procedure. This is certainly a very helpful feature to have the typo trapped early. However, SSDT has no notion about sharing temp tables, so SSDT will also give you a warning for a procedure like SalesByStore_core, or more precisely three: one per column. They are only warnings, so you can proceed, but it only takes a handful of such procedures to clutter up the Error List window so there is a risk that you miss other and more important issues.
There is a way to suppress the warning: right-click the file in Solution Explorer and select Properties. There is a property Suppress T‑Sql Warning and here you can enter the code for the error. But this means that you lose the checking of all table names in the procedure; there is no means to only suppress the warning only for the shared temp table.
All and all, if you are using SSDT, you will find this to be an extra resistance barrier against sharing temp tables.
This method evades cache-littering problem and the maintenance problem by using a permanent table instead. There is still a recompilation problem, though, but of a different nature.
A process-keyed table is simply a permanent table that serves as a temp table. To permit processes to use the table simultaneously, the table has an extra column to identify the process. The simplest way to do this is the global variable @@spid (@@spid is the process id in SQL Server). In fact, this is so common, that these tables are often referred to as spid-keyed tables. Here is an outline; I will give you a more complete example later.CREATE TABLE process_keyed (spid int NOT NULL, col1 int NOT NULL, col2 char(5) NULL, ...) go CREATE CLUSTERED INDEX processkey_ix ON process_keyed (spid) -- Add other columns as needed. go ... DELETE process_keyed WHERE spid = @@spid INSERT process_keyed (spid, col1, col2, ....) VALUES (@@spid, @val1, @val2, ...) ... SELECT col1, col2, ... FROM process_keyed WHERE spid = @@spid ... DELETE process_keyed WHERE spid = @@spid
A few things to note here:
- The table should have a clustered index on the process key (spid in this example), as all queries against the table will include the condition .
- You should delete any existing data for @@spid before you insert any data into the table, as a safety precaution.
- When you are finished using the data you should delete it, so that it does not occupy any extra space.
Choosing the Process-key
While it's common to use @@spid as the process key there are two problems with this:
- If sloppy programmers neglect to delete all rows for the spid before and after use, old data may be passed to the callee, causing incorrect results that can be difficult to understand how they arose.
- If a client needs to pass a process-key around, there is no guarantee that it will always connect with the same @@spid.
One alternative for the process-key is to use a GUID (data type uniqueidentifier). If you create the process key in SQL Server, you can use the function newid(). (You can rely on newid() to return a unique value, why it addresses the first point.) You may have heard that you should not have guids in your clustered index, but that applies when the guid is the primary key alone, since this can cause fragmentation and a lot of page splits. In a process-keyed table, you will typically have many rows for the same guid, so it is a different situation. And more to the point: you don't get more page splits if you use a GUID than if you use @@spid.
Another alternative is to generate the process key from a sequence object, which you create with the statement CREATE SEQUENCE, for instance:CREATE SEQUENCE MySequence AS int
You can then use NEXT VALUE FOR to get values from the sequence:DECLARE @processkey int = NEXT VALUE FOR MySequence INSERT tbl (processkey, col1, col2, ...) SELECT @processkey, col1, col2 FROM ...
You need to use the variable, or else the sequence will generate a different value on each row. (The typical use for sequences is to generate surrogate keys just like IDENTITY.)
Note: Sequences were introduced in SQL 2012.
A Longer Example
Let's say that there are several places in the application where you need to compute the total number of sold books for one or more stores. You put this computation in a procedure ComputeTotalStoreQty, which operates on the table stores_aid. In this example, the procedure is nothing more than a simple UPDATE statement that computes the total number of books sold per store. A real-life problem could have a complex computation that runs over several hundred lines of code. There is also an example procedure TotalStoreQty which returns the returns the total sales for a certain state. It fills stores_aid with all stores in that state, calls ComputeTotalStoreQty and then returns the result to the client. Note that TotalStoreQty is careful to clear its entry in stores_aid both before and after the call.CREATE TABLE stores_aid (process_key uniqueidentifier NOT NULL, storeid char(4) NOT NULL, totalqty smallint NULL, CONSTRAINT pk_stores_aid PRIMARY KEY (process_key, storeid) ) go CREATE PROCEDURE ComputeTotalStoreQty @process_key uniqueidentifier AS UPDATE stores_aid SET totalqty = s.totalqty FROM stores_aid sa JOIN (SELECT stor_id, SUM(qty) AS totalqty FROM sales GROUP BY stor_id) AS s ON s.stor_id = sa.storeid WHERE sa.process_key = @process_key go CREATE PROCEDURE TotalStoreQty @state char(2) AS DECLARE @process_key uniqueidentifier SELECT @process_key = newid() DELETE stores_aid WHERE process_key = @process_key INSERT stores_aid(process_key, storeid) SELECT @process_key, stor_id FROM stores WHERE state = @state EXEC ComputeTotalStoreQty @process_key SELECT storeid, totalqty FROM stores_aid WHERE process_key = @process_key DELETE stores_aid WHERE process_key = @process_key go EXEC TotalStoreQty 'CA' go DROP PROCEDURE TotalStoreQty, ComputeTotalStoreQty DROP TABLE stores_aid
Please note that I have defined a proper key for stores_aid adhering to best practices.
Name Convention and Clean-up
You may wonder what that _aid in the table name comes from. In the environment where I do my daily chores, we have quite a few process-keyed tables, and we have adapted the convention that all these tables end in -aid. This way, when you read some code, you know directly that this is not a "real" table with persistent data. (Nevertheless, some of our aid tables are very important in our system as they are used by core functions.)
There is a second point with this name convention. It cannot be denied that a drawback with process-keyed tables is that sloppy programmers could forget to delete data when they are done. Not only this wastes space, it can also result in incorrect row-count estimates leading to poor query plans. For this reason, it is a good idea to clean up these tables on a regular basis. For instance, in our night job we have a procedure that runs the query below and then executes the generated statements:SELECT 'DELETE ' + quotename(name) FROM sys.objects WHERE type = 'U' AND name LIKE '%aid'
Issues with Recompilation
As we saw, when sharing temp tables, this causes recompilations in the inner procedure, because the temp table is a new table every time. While this issue does not exist with process-keyed tables, you can still get a fair share of recompilation because of auto-statistics, a feature which is enabled in SQL Server by default. For a permanent table, auto-statistics kicks in when the first 500 rows have been added, and then every time 20 % of the rows have changed. (For full details on recompilation, see this white paper by Eric Hanson and Yavor Angelov.) Since a process-keyed table is typically empty when it is not in use, auto-statistics sets in often. Sometimes this can be a good thing, as the statistics may help the optimizer to find a better plan. But the recompilation may also cause an unacceptable performance overhead. There are three ways to deal with this:
- Disable auto-statistics for the table entirely with sp_autostats.
- Use the query hint OPTION (KEEPFIXED PLAN) for queries which are costly to recompile, and where the changed statistics are unlikely to affect the outcome of the compilation.
- Copy the data to table variable on input/output and use the table variable as work area. Only do this when you expect a small number of rows, and the plans are not dependent on the data, since a table variable does not have statistics, and the optimizer typically will work with an estimate of one row.
The Cost of Logging
Compared to sharing temp tables, one disadvantage with process-keyed tables is that you tend to put them in the same database as your other tables. This has two ramifications:
- The tables are subject to complete logging; temp tables are only logged for rollbacks, not for recovery on start-up, since tempdb is always recreated when SQL Server starts.
- If the database has full recovery, the process-keyed table will consume extra space in your transaction-log backups.
You can address the second point putting your process-keyed tables in a separate database with simple recovery. Both points can be addressed by using a memory-optimised table or a global temp table, discussed in the next two sections.
Using Memory-optimised Tables
In SQL 2014, Microsoft introduced In-Memory OLTP, also known as Hekaton, which amongst other things introduced so-called memory-optimised tables. In SQL 2014, In-Memory OLTP is only available in Enterprise and Developer Edition, but starting with SQL 2016 SP1, it is available all editions, save for localdb and the very lowest editions of Windows Azure SQL Database.
A memory-optimised table is entirely in memory. By default, updates are also logged and written to files, so that data in the table survives a restart of the server. However, you can define a memory-optimised table to have a durability only for the schema. That is, all data in the table is lost on a server restart. This makes them a perfect fit for process-keyed tables, as these tables have hardly have any logging at all.
A couple of notes:
- To be able to create memory-optimised tables in your database, you need to add a filegroup for memory-optimised data. (This is a directory, akin to what you have for FILESTREAM data.)
- The surface area for Hekaton is limited, and currently these data types are not supported: xml, text, ntext, image, sql_variant, datetimeoffset and CLR data types. In SQL 2014 there are further restrictions: there is no support for the MAX data types, and the maximum data size for a row must not exceed 8060 bytes.
- You must define a primary key for the table. This can be a bit of a bummer – some of my process-keyed tables are a bit denormalised as they contain different types of data. But you can always work around this by adding an IDENTITY column. In such case, make sure you make the second column of your key. Always have the process key as the first column. The primary key must be a non-clustered index (or a hash index).
- You cannot access tables in other databases in queries that access a memory-optimised table.
Here is an in-memory version of the stores_aid table:CREATE TABLE stores_aid (process_key uniqueidentifier NOT NULL, storeid char(4) NOT NULL, totalqty smallint NULL, CONSTRAINT pk_stores_aid PRIMARY KEY NONCLUSTERED (process_key, storeid) ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY)
Global Temp Tables
Using a global temp table is another solution to reduce logging for your process-keyed table, which you mainly would choose if you are on an edition or version of SQL Server where memory-optimised tables are not available to you. (Or you bump into to any of the restrictions for such tables.)
A global temp table has a name which has two leading hash marks (e.g. ##temp). In difference to a regular temp table, a global temp table is visible to all processes. When the process that created the table goes away, so does the table (with some delay if another process is running a query against the table in that precise moment). Under that condition, it is difficult to use such a table as a process-keyed table.
However, there is a special case that SQL Server MVP Itzik Ben-Gan made me aware of: if you create a global temp table in a start-up procedure, the global temp table will be around as long as the server is up, unless someone explicitly drops it. This permits you to use the table as a process-keyed table, because you can rely on the table to always be there. Since the table is in tempdb, you get less logging that for a table in your main database.
Here is a quick sample of how you create a global temp table when SQL Server starts:USE master go CREATE PROCEDURE create_global_temp AS CREATE TABLE ##global(process_key uniqueidentifier NOT NULL, -- other columns here ) go EXEC sp_procoption create_global_temp, 'startup', 'true'
It cannot be denied that there are some problems with this solution. What if you need to change the definition of the global temp table in way that cannot be handled with ALTER TABLE? Restarting the server to get the new definition in place may not be acceptable. Or what if you have different versions of your database on a test server, and the different versions require different schemas for the process-keyed table? I would recommend that you refer to your process-keyed table through a synonym, so that in a development database or on a common test server it can point to a database-local table, and only on production or acceptance-test servers it would point to the global temp table. This also permits you retarget the synonym if the schema has to be changed without a server restart.
While process-keyed tables are not without issues when it comes to performance, and they are certainly a bit heavy-handed for the simpler cases, I still see this is the best overall solution that I present in this article. It does not come with a ton of restrictions like table-valued functions and it is robust, meaning that code will not break because of seemingly innocent changes in difference to some of the other methods we will look at later.
But that does not mean that using a process-keyed table is always the way to go. For instance, if you only need output-only, and your procedure can be written as a table-valued function, that should be your choice.
Table-valued parameters (TVP) were introduced in SQL 2008. They permit you to pass a table variable as a parameter to a stored procedure. When you create your procedure, you don't put the table definition directly in the parameter list, instead you first have to create a table type and use that in the procedure definition. At first glance, it may seem like an extra step of work, but when you think of it, it makes very much sense: you will need to declare the table in at least two places, in the caller and in the callee. So why not have the definition in one place?
Here is a quick example of a table-valued parameter in play:CREATE TYPE my_table_type AS TABLE(a int NOT NULL, b int NOT NULL) go CREATE PROCEDURE inner_sp @indata my_table_type READONLY AS INSERT targettable (col1, col2) SELECT a, b FROM @indata go CREATE PROCEDURE outer_sp AS DECLARE @data my_table_type INSERT @data (a, b) VALUES (5, 7) EXEC outer_sp @data go
One thing to note is that a table-valued parameter always has an implicit default value of an empty table. So saying in this example would not be an error.
Table-valued parameters certainly seem like the definite solution, don't they? Unfortunately, TVPs have a very limited usage for the problem I'm discussing in this article. If you look closely at the procedure definition, you find the keyword READONLY. And that is not an optional keyword, but it is compulsory for TVPs. So if you want to use TVPs to pass data between stored procedures, they are usable solely for input-only scenarios. I don't know about you, but in almost all situations where I share a temp table or use a process-keyed table it's for input-output or output-only.
When I first heard that SQL 2008 was to have TVPs, I was really excited. And when I learnt that they were readonly, I was equally disappointed. During the beta of SQL 2008 I wrote an article, Why read-only table parameters is not enough, where I tried to whip up support for a Connect item in order to persuade the dev team to permit read-write TVPs when they are passed between stored procedures inside SQL Server. (Making them read-write when called from a client is likely to be a bigger challenge.) Ten years later, the Connect item is still active, but table-valued parameters are still readonly.
Note: While outside the scope for this article, table-valued parameters is still a welcome addition to SQL Server, since it makes it a lot easier to pass a set of data from client to server, and this context the READONLY restriction is not a big deal. I give an introduction how to use TVPs from ADO .Net in my article Using Table-Valued Parameters in SQL Server and .NET.
INSERT-EXEC is a method that has been in the product for a long time. It's a method that is seemingly very appealing, because it's very simple to use and understand. Also, it permits you use the result of a stored procedure without any changes to it. Above we had the example with the procedure SalesByStore. Here is a how we can implement BigSalesByStore with INSERT-EXEC:CREATE PROCEDURE SalesByStore @storeid varchar(30) AS SELECT t.title, s.qty FROM sales s JOIN titles t ON t.title_id = s.title_id WHERE s.stor_id = @storeid go CREATE PROCEDURE BigSalesByStore @storeid varchar(30), @qty smallint AS CREATE TABLE #SalesByStore(title varchar(80) NOT NULL PRIMARY KEY, qty smallint NOT NULL) INSERT #SalesByStore (title, qty) EXEC SalesByStore @storeid SELECT * FROM #SalesByStore WHERE qty >= @qty go EXEC SalesByStore '7131' EXEC BigSalesByStore '7131', 25 go DROP PROCEDURE SalesByStore, BigSalesByStore
In this example, I receive the data in a temp table, but it could also be a permanent table or a table variable. (Except on SQL 2000, where you cannot use a table variable.)
It cannot be denied that this solution is simpler than the solution with sharing a temp table. So why then did I first present a more complex solution? Because when we peel off the surface, we find that this method has a couple of issues that are quite problematic.
It Can't Nest
If you for some reason would try:CREATE TABLE #BigSalesByStore(titleid varchar(80) NOT NULL PRIMARY KEY, qty smallint NOT NULL) INSERT #BigSalesByStore (titleid, qty) EXEC BigSalesByStore '7131', 25
SQL Server will tell you:
Msg 8164, Level 16, State 1, Procedure BigSalesByStore, Line 8
An INSERT EXEC statement cannot be nested.
This is a restriction in SQL Server and there is not much you can do about it. Except than to save the use of INSERT-EXEC until when you really need it. That is, when rewriting the callee is out of the question, for instance because it is a system stored procedure.
There is a Serious Maintenance Problem
Six months later there is a user requirement for the application function that uses the result set from SalesByStore that the column title_id should be displayed. A developer merrily adds the column to the result set. Unfortunately, any attempt to use the form that calls BigSalesByStore now ends in tears:
Msg 213, Level 16, State 7, Procedure SalesByStore, Line 2
Column name or number of supplied values does not match table definition.
What it says. The result set from the called procedure must match the column list in the INSERT statement exactly. The procedure may produce multiple result sets, and that's alright as long as all of them match the INSERT statement.
From my perspective, having spent a lot of my professional life with systems development, this is completely unacceptable. Yes, there are many ways to break code in SQL Server. For instance, a developer could add a new mandatory parameter to SalesByStore and that would also break BigSalesByStore. But most developers are aware the risks with such a change to an API and therefore adds a default value for the new parameter. Likewise, most developers understand that removing a column from a result set could break client code that expects that column and they would not do this without checking all code that uses the procedure. But adding a column to a result set seems so innocent. And what is really bad: there is no way to find out that there is a dependency – save searching through all the database code for calls.
Provided that you can change the procedure you are calling, there are two ways to alleviate the problem. One is simply to add a comment in the code of the callee, so that the next developer that comes around is made aware of the dependency and hopefully changes your procedure as well.
Another way is to use table types (if you are on SQL 2008 or later). Here is an example:CREATE TYPE SalesByStore_tbl AS TABLE (title varchar(80) NOT NULL PRIMARY KEY, qty smallint NOT NULL) go CREATE PROCEDURE SalesByStore @storeid varchar(30) AS DECLARE @ret SalesByStore_tbl INSERT @ret (title, qty) SELECT t.title, s.qty FROM sales s JOIN titles t ON t.title_id = s.title_id WHERE s.stor_id = @storeid SELECT * FROM @ret go CREATE PROCEDURE BigSalesByStore @storeid varchar(30), @qty smallint AS DECLARE @data SalesByStore_tbl INSERT @data EXEC SalesByStore @storeid SELECT title, qty FROM @data WHERE qty >= @qty go EXEC SalesByStore '7131' EXEC BigSalesByStore '7131', 25 go DROP PROCEDURE SalesByStore, BigSalesByStore DROP TYPE SalesByStore_tbl
It is interesting to note that this code makes virtue of two things that usually are bad practice, to wit SELECT * and INSERT without an explicit column list. This is not a matter of sloppiness – it is essential here. If someone wants to extend the result set of SalesByStore, the developer has to change the table type, and BigSalesByStore will survive, even if the developer does not know about its existence.
You could argue that this almost like an output TVP, but don't forget the other problems with INSERT-EXEC – of which there are two more to cover.
The Procedure is Executed in the Context of a Transaction
Even if there is no explicit transaction started with BEGIN TRANSACTION, an INSERT statement constitutes a transaction of its own. (So that the statement can be rolled back in case of an error.) That includes any procedure called through INSERT-EXEC. Is this bad or not? In many cases, this is not much of an issue. But there are a couple of situations where this can cause problems:
- The procedure performs an update intended to be quick. Locks are now held for a longer duration, which may cause contention problems.
- The isolation level is REPEATABLE READ or SERIALIZABLE, as opposed to the default READ COMMITTED. This too causes locks to be held longer than intended.
- Some system procedures disagree to be called within a transaction.
- If the procedure accesses a linked server, you now have a distributed transaction. Distributed transactions are sometimes difficult to get working. See more about this in the closing chapter on linked servers.
Rollback and Error Handling is Difficult
In my articles on Error and Transaction Handling in SQL Server, I suggest that you should always have an error handler likeBEGIN CATCH IF @@trancount > 0 ROLLBACK TRANSACTION EXEC error_handler_sp RETURN 55555 END CATCH
The idea is that even if you do not start a transaction in the procedure, you should always include a ROLLBACK, because if you were not able to fulfil your contract, the transaction is not valid.
Unfortunately, this does not work well with INSERT-EXEC. If the called procedure executes a ROLLBACK statement, this happens:
Msg 3915, Level 16, State 0, Procedure SalesByStore, Line 9
Cannot use the ROLLBACK statement within an INSERT-EXEC statement.
The execution of the stored procedure is aborted. If there is no CATCH handler anywhere, the entire batch is aborted, and the transaction is rolled back. If the INSERT-EXEC is inside TRY-CATCH, that CATCH handler will fire, but the transaction is doomed, that is, you must roll it back. The net effect is that the rollback is achieved as requested, but the original error message that triggered the rollback is lost. That may seem like a small thing, but it makes troubleshooting much more difficult, because when you see this error, all you know is that something went wrong, but you don't know what.
If you at run-time can find out whether you are in INSERT-EXEC? Hm, yes, but that is a serious kludge. See this section in Part Two of Error and Transaction Handling in SQL Server for how to do it, if you absolutely need to.
You can also use INSERT-EXEC with dynamic SQL:INSERT #tmp (...) EXEC sp_executesql @sql, @params, @par1, ...
Presumably, you have created the statement in @sql within your stored procedure, so it is unlikely that a change in the result set will go unnoticed. So from this perspective, INSERT-EXEC is fine. But the restriction that INSERT-EXEC can't nest remains, so if you use it, no one can call you with INSERT-EXEC. For this reason, in many cases it is better to put the INSERT statement inside the dynamic SQL.
There is also a performance aspect, that SQL Server MVP Adam Machanic has detailed in a blog post. The short summary is that with INSERT-EXEC, data does not go directly to the target table but bounces over a "parameter table", which incurs some overhead. Then again, if your target table is a temp table, and you put the INSERT inside the dynamic SQL, you may face a performance issue because of recompilation.
Occasionally, I see people who use INSERT-EXEC to get back scalar values from their dynamic SQL statement, which they typically invoke with EXEC(). In this case, you should not use INSERT-EXEC at all, but instead use sp_executesql which permits you to use OUTPUT parameters. Dynamic SQL is a complex topic, and if you are not acquainted with it, I recommend you to read my article The Curse and Blessings of Dynamic SQL.
INSERT-EXEC is simple to use, and if all you want to do is to grab a big result set from a stored procedure for further analysis ad hoc, it's alright.
But you should be very restrictive to use it in application code. Only use it when rewriting the procedure you are calling is completely out of the question. That is, the procedure is not part of your application: a system stored procedure or part of a third-party product. And in this case, you should make it a routine to always test your code before you take a new version of the other product in use.
Using the CLR
If INSERT-EXEC shines in its simplicity, using the CLR is complex and bulky. It is not likely to be your first choice, and nor should it. However, if you are in the situation that you cannot change the callee, and nor it possible for you to use INSERT-EXEC because of any of its limitations, the CLR can be your last resort.
As a recap, here are the main situations where INSERT-EXEC fails you, and you may want to turn to the CLR:
- The called procedure returns several result sets with different structures. This is true for many system procedures in SQL Server.
- The called procedure cannot be called within an active transaction.
- The called procedure already uses INSERT-EXEC.
- The called procedure accesses a linked server, and you cannot get the distributed transaction to work.
The CLR has one more advantage over INSERT-EXEC: it is less sensitive to changes in the procedure you call. If a column is added to the result set of the procedure, your CLR procedure will not break.
The idea as such is simple: you write a stored procedure in a CLR language like C# or VB .NET that runs the callee and captures the result set(s) into a DataSet object. Then you write the data from the DataSet back to the table where you want the data. While simple, you need to write some code.
Let's have a look at an example. When you call the system procedure sp_helpdb for a specific database, it produces two result sets, of which the second lists the files for the database. Say that you want to gather this output for all databases on the server. You cannot use INSERT-EXEC due to the multiple result sets. To address this issue, I wrote a stored procedure in C# that you find in the file helpdb.cs. In the script helpdb.sql you can see how I employ it. This script defines a temp table #helpdb which has the same structure as the output as the second result set from sp_helpdb with two alterations: 1) I have added a column dbname to hold the name of the database the row comes from. 2) I have renamed the column name to logicalname to make things a little clearer.
The C# procedure runs sp_helpdb with the DataAdapter.Fill method to get the data into a DataSet. It then inserts the data in the second DataTable in the DataSet to #helpdb. This is done in a single INSERT statement by passing the DataTable directly to a table-valued parameter. The database name is passed in a separate parameter.
Note: If you still are on SQL 2005 which does not support table-valued parameters, you can use helpdb-2005.cs which inserts the rows one by one.
Undoubtedly, this solution requires a bit of work. You need to write more code than with most other methods, and you get an assembly that you must somehow deploy. If you already are using the CLR in your database, you probably already have routines for dealing with assemblies. But if you are not, that first assembly you add to the database is quite of a step to take. A further complication is that the CLR in SQL Server is disabled by default. To enable it, you (or the DBA) need to run:EXEC sp_configure 'clr enabled', 1 RECONFIGURE
Another issue is that this solution goes against best practices for using the CLR in SQL Server. First of all, data access from the CLR should be avoided, simply because T‑SQL is better equipped for this. But here we are talking about situations where we need to circumvent limitations in T‑SQL. Another violation of best practice is the use of the DataAdapter, DataTable and DataSet classes. This is something to be avoided in CLR stored procedures, because it means that you have data in memory in SQL Server outside the buffer pool. Of course, a few megabytes is not an issue, but if you would read several gigabytes of data into a DataSet, this could have quite nasty effects for the stability of the entire SQL Server process.
The alternative is to use a plain ExecuteReader and insert the rows as they come, possibly buffering them in small sets of say 500 rows to improve performance. This is certainly a viable solution, but it makes deployment even more difficult. To wit, you cannot perform the INSERT statements on the context connection while the DataReader is running, so you would need to open a second connection and this requires that the assembly has the permission EXTERNAL_ACCESS. So for practical purposes, you would only go this road, if you are anxious that you will read too much data than what is defensible for a DataSet.
Just like INSERT-EXEC this is a method where you can use the called stored procedure as-is. The purpose of OPENQUERY and its cousin OPENROWSET is to permit you to run pass-through queries on linked servers. It can be very useful, not the least if you want to join multiple tables on the remote server and want to be sure that the join is evaluated remotely.
Instead of accessing a remote server, you can make a loopback connection to your own server, so you can say things like:SELECT * FROM OPENQUERY(LOCALSERVER, 'EXEC sp_who') WHERE status = 'runnable'
If you want to create a table from the output of a stored procedure with SELECT INTO to save typing, this is the only method in the article that fits the bill.
So far, OPENQUERY looks very simple, but as this chapter moves on you will learn that OPENQUERY can be very difficult to use. Moreover, it is not aimed at improving performance. It may save you from rewriting your stored procedure, but most likely you will have to put in more work overall – and in the end you get a poorer solution. While I'm not enthusiastic over INSERT-EXEC, it is still a far better choice than OPENQUERY.
In the example, LOCALSERVER may look like a keyword, but it is only name. This is how you define it:EXEC sp_addlinkedserver @server = 'LOCALSERVER', @srvproduct = '', @provider = 'SQLOLEDB', @datasrc = @@servername
To create a linked server, you must have the permission ALTER ANY SERVER, or be a member of any of the fixed server roles sysadmin or setupadmin. Instead of SQLOLEDB, you can specify SQLNCLI, SQLNCLI10 or SQLNCLI11 depending on your version of SQL Server. SQL Server seems to use the most recent version of the provider anyway.
Implications of Using a Loopback Connection
It's important to understand that OPENQUERY opens a new connection to SQL Server. This has some implications:
- The procedure that you call with OPENQUERY cannot refer temp tables created in the current connection.
- The new connection will connect to the default database of the current user (unless you defined otherwise when you set up the linked server), so all object specifications must include a database name to avoid surprises.
- If you have an open transaction and you are holding locks when you call OPENQUERY, the called procedure cannot access what you lock. That is, if you are not careful you will block yourself.
- Connecting is not for free, so there is a performance penalty.
- There is also a performance penalty for passing the data out from SQL Server and back. Even if there is no network involved, data is copied twice extra compared to a plain SELECT query. This can be costly if the result set is big.
The settings ANSI_NULLS and ANSI_WARNINGS must be ON for queries involving linked servers. Thankfully, these settings are also on by default in most contexts. However, if you have a database which started its life on SQL 2000 or earlier there can be problems with the ANSI_NULLS setting as it is saved with the procedure. (ANSI_WARNINGS is a pure run-time setting.) One of the tools that came with SQL 2000 defaulted to create procedures with ANSI_NULLS OFF. If you script this procedure in SSMS, SSMS with faithfully include SET ANSI_NULLS OFF in the script why the setting is retained.
You know that you are victim to this issue if you see this error message:
Msg 7405, Level 16, State 1, Line 17
Heterogeneous queries require the ANSI_NULLS and ANSI_WARNINGS options to be set for the connection. This ensures consistent query semantics. Enable these options and then reissue your query.
The remedy is to change the ANSI_NULLS setting for the procedure (by altering it when SET ANSI_NULLS ON is in effect). but you will need to check that the procedure does not include logic which depends on ANSI_NULLS being OFF. (When ANSI_NULLS is OFF a comparison with NULL will yield TRUE or FALSE, not UNKNOWN.)
The Query Parameter
The second parameter to OPENQUERY is the query to run on the remote server, and you may expect to be able to use a variable here, but you cannot. The query string must be a constant, since SQL Server needs to be able to determine the shape of the result set at compile time. This means that you as soon your query has a parameter value, you need to use dynamic SQL. Here is how to implement BigSalesByStore with OPENQUERY:CREATE FUNCTION quotestring(@str nvarchar(MAX)) RETURNS nvarchar(MAX) AS BEGIN DECLARE @ret nvarchar(MAX), @sq char(1) SELECT @sq = '''' SELECT @ret = replace(@str, @sq, @sq + @sq) RETURN(@sq + @ret + @sq) END go CREATE PROCEDURE SalesByStore @storeid varchar(30) AS SELECT t.title, s.qty FROM sales s JOIN titles t ON t.title_id = s.title_id WHERE s.stor_id = @storeid go CREATE PROCEDURE BigSalesByStore @storeid varchar(30), @qty smallint, @debug bit = 0 AS DECLARE @remotesql nvarchar(MAX), @localsql nvarchar(MAX) SELECT @remotesql = 'EXEC ' + quotename(db_name()) + '.dbo.SalesByStore ' + dbo.quotestring(@storeid) SELECT @localsql = 'SELECT * FROM OPENQUERY(LOCALSERVER, ' + dbo.quotestring(@remotesql) + ') WHERE qty >= @qty' IF @debug = 1 PRINT @localsql EXEC sp_executesql @localsql, N'@qty smallint', @qty go EXEC SalesByStore '7131' EXEC BigSalesByStore '7131', 25, 1 go DROP PROCEDURE BigSalesByStore, SalesByStore DROP FUNCTION quotestring
What initially seemed simple to use, is no longer so simple. What I did not say above is there are two reasons why we need dynamic SQL here. Beside the parameter @storeid, there is also the database name. Since OPENQUERY opens a loopback connection, the EXEC statement must include the database name. Yes, you could hardcode the name, but sooner or later that will bite you, if nothing else the day you want to restore a copy of your database on the same server for test purposes. From this follows that in practice, there are not many situations in application code where you can use OPENQUERY without using dynamic SQL.
The code certainly requires some explanation. The function quotestring is a helper, taken from my article on dynamic SQL. It encloses a string in single quotes, and doubles any quotes within it to conform to the T‑SQL syntax. The problem with writing dynamic SQL which involves OPENQUERY is that you get at least three levels of nested strings, and if you try to do all at once, you will find yourself writing code which has up to eight consecutive single quotes that you or no one else cannot read. Therefore, it is essential to approach the problem in a structured way like I do above. I first form the query on the remote server, and I use quotestring to embed the store id. Then I form the SQL string to execute locally, and again I use quotestring to embed the remote query. I could also have embedded @qty in the string, but I prefer to adhere to best practices and pass it as a parameter to the dynamic SQL string. As always when I use dynamic SQL, I include a @debug parameter, so that I can inspect the statement I've generated.
The Metadata Blues
To be able to compile the query, SQL Server needs to know the shape of the result set returned from the procedure. From SQL 2012 and on, SQL Server does this with the help of the procedure sp_describe_first_result_set which trawls code for the procedure as well all sub-procedures to figure out the shape of the first result set. Beware that this procedure will not always succeed, and the classic example is when temp tables are involved. (This issue does not apply to table variables, only temp tables.) Here is one example. This query:SELECT * FROM OPENQUERY(LOCALSERVER, 'EXEC msdb..sp_helpindex sysjobs')
results in this error message:
Msg 11526, Level 16, State 1, Procedure sp_describe_first_result_set, Line 1 [Batch Start Line 17]
The metadata could not be determined because statement 'insert into #spindtab values (@indname, @indid, @ignore_dup_key, @is_unique, @is_hypothetical,
@' in procedure 'sp_helpindex' uses a temp table.
There is a remedy for this, though. Starting with SQL 2012, the EXEC statement provides the clause WITH RESULT SETS that permits you to specify the shape of the result set yourself. So this returns the data as desired:SELECT * FROM OPENQUERY(LOCALSERVER, 'EXEC msdb..sp_helpindex sysjobs WITH RESULT SETS ((index_name sysname, index_description nvarchar(500), index_keys nvarchar(500)))')
Note that SQL Server will validate that the result set returned from the procedure actually aligns with what you say in WITH RESULT SETS and raise an error if not. Also observe the syntax – there are two pairs of parentheses.
Obviously, this requires that you know what the result set looks like. If you really wanted to do something likeSELECT a FROM (EXEC mysp) WHERE b = 123
And even if you don't care about the umpteen other columns returned by mysp, you still have to list them all. And if someone adds one more column to the result set, your query will break – exactly the same problem as we saw with INSERT-EXEC.
Another situation where sp_describe_first_result_set bites the dust is when there is dynamic SQL involved. Again, you can use WITH RESULT SETS to save the show. If you have the powers to change the procedure you are calling, it is better to attach the clause directly to where you execute the dynamic SQL:EXEC sp_executesql @sql, @params, @param1, ... WITH RESULT SETS ((...))
Thanks to the extra clause, sp_describe_first_result_set can return data successfully.
The Battle with FMTONLY ON in SQL 2008 and Earlier
Up to SQL 2008, SQL Server used a different mechanism to get the metadata. During compilation SQL Server would run the SQL text fed to OPENQUERY preceded by the command SET FMTONLY ON. When FMTONLY is ON, SQL Server does not execute any data-retrieving statements, but only sifts through the statements to return metadata about the result sets. This is not a very robust mechanism, and FMTONLY can be a source for confusion in more than one way, and you will see some hilarious code later on.
As with sp_describe_first_result_set, temp tables is likely to cause you grief. When FMTONLY is ON, the temp table is not created. If you run this on SQL 2008:SELECT * FROM OPENQUERY(LOCALSERVER, 'EXEC msdb..sp_helpindex sysjobs')
this results in:
Msg 208, Level 16, State 1, Procedure sp_helpindex, Line 104
Invalid object name '#spindtab'.
Here you cannot add the WITH RESULT SETS clause, as this clause was introduced in SQL 2012. However, there is a trick you can do: you can turn off FMTONLY in your batch and lure SQL Server:SELECT * FROM OPENQUERY(LOCALSERVER, 'SET FMTONLY OFF EXEC msdb..sp_helpindex sysjobs')
At first glance, this may seem like a nicer solution than having to specify the exact shape of the result set using WITH RESULT SETS, but that is absolutely not the case, for two reasons:
- The statement SET FMTONLY OFF will have absolutely no effect they day you move to SQL 2012 or later, so it is not exactly future-proof code.
- This means that the stored procedure executes twice! For a quick short thing that only reads data that may be OK. Less so, if it takes long time to run or if it perform updates.
For these reasons, I strongly recommend you to stay away from this trick! The best solution if you want to use the data from your stored procedure is to use any other of the methods described in this article, include INSERT-EXEC. But if you absolutely want to run your stored procedure with a temp table through OPENQUERY on SQL 2008 or earlier, here is a trick that at least addresses the second point above. Beware, you are about to look at one of the most obscure pieces of T‑SQL code I've ever come up with:CREATE PROCEDURE temp_temp_trick AS DECLARE @fmtonlyon int SELECT @fmtonlyon = 0 IF 1 = 0 SELECT @fmtonlyon = 1 SET FMTONLY OFF CREATE TABLE #temp(...) IF @fmtonlyon = 1 SET FMTONLY ON -- Rest of the code goes here.
What happens here is that when FMTONLY is ON, variable assignments are still carried out. When it comes to IF statements, the conditions are not evaluated, but both the IF and ELSE branches are "executed" (that is, sifted through). Thus, only when FMTONLY is ON, the flag @fmtonlyon will be set. We turn off FMTONLY before the creation of the temp table, to prevent compilation from failing, but then we restore the setting immediately after thanks to the @fmtonlyon flag.
Again, keep in mind that this is not future-proof. The day you upgrade to a newer version of SQL Server, your loopback query will fail, and you will need to change it to add WITH RESULT SETS. And, of course, since this trick only possible when you are in the position that you can change the procedure, you are better off anyway by sharing a temp table or use a process-keyed table.
If the stored procedure you call returns its result set through dynamic SQL, this may work if the SQL string is formed without reading data from any table, but if not, the query will fail with a message like this one:
Msg 7357, Level 16, State 2, Line 21
Cannot process the object "EXEC sp_who2". The OLE DB provider "SQLNCLI10" for linked server "LOCALSERVER" indicates that either the object has no columns or the current user does not have permissions on that object.
Again, the workaround with SET FMTONLY OFF, but it is still not future-proof.
Finally, an amusement with FMTONLY you may run into is that since variable assignments are carried but conditions for IF are ignored, you may get seemingly inexplicable errors. For instance, if you have a procedure that can call itself, the attempt to call it through OPENQUERY is likely to end with exceeding the maximum nest level of 32 even if the procedure has perfectly correct recursion handling.
The Effect of DML Statements
Yet a problem with OPENQUERY is demonstrated by this script:USE tempdb go CREATE TABLE nisse (a int NOT NULL) go CREATE PROCEDURE silly_sp @x int AS --SET NOCOUNT ON INSERT nisse VALUES (@x) SELECT @x * @@trancount SELECT @x * 3 go SELECT * FROM OPENQUERY(LOCALSERVER, 'EXEC tempdb.dbo.silly_sp 7') go SELECT * FROM nisse
When you run this, you get this error message:
Msg 7357, Level 16, State 1, Line 11
Cannot process the object "EXEC tempdb.dbo.silly_sp 7". The OLE DB provider "SQLNCLI11" for linked server "LOCALSERVER" indicates that either the object has no columns or the current user does not have permissions on that object.
The reason for this message is that the first "result set" is the rows affected message generated by the INSERT statement, and this message lures OPENQUERY to think that there were no columns in the result set. Adding SET NOCOUNT ON to the procedure resolves this issue. You could also add SET NOCOUNT ON the command string you pass to OPENQUERY. (And in difference to tricking with SET FMTONLY ON, this is a perfectly valid thing to do with no menacing side effects.)
When SQL Server executes the query for real, the OLE DB provider first issues SET IMPLICIT_TRANSACTIONS ON. With this setting SQL Server starts a transaction when an INSERT, UPDATE or DELETE statement is executed. (This also applies to a few more statements, see Books Online for details.) This can give some surprises. For instance, take the script above. Once SET NOCOUNT ON is in force, this is the output:
(1 row(s) affected)
(0 row(s) affected)
We get back 7 from the call to silly_sp, which indicates that @@trancount is 1, and there is thus an open transaction, despite there is no BEGIN TRANSACTION in the procedure. (We don't get the '21' that we get when we execute silly_sp directly, because with OPENQUERY, we only get one result set.) You also see that when we SELECT directly from nisse after the call to OPENQUERY, that the table is empty. This is because the implicit transaction was rolled back since it was never committed.
As you have seen, at first OPENQUERY seemed very simple to use, but the stakes rise steeply. If you are still considering to use OPENQUERY after having read this section, I can only wish you good luck and I hope that you really understand what you are doing. OPENQUERY was not intended for accessing the local server, and you should think twice before you use it that way.
XML is a solution that aims at the same spot as sharing a temp table and process-keyed tables. That is, the realm of general solutions without restrictions, to the price of a little more work. While SQL 2000 has support for XML, if you want to use XML to pass data between stored procedures, you need to have at least SQL 2005.
Constructing the XML
We will look at a version of SalesByStore and BigSalesByStore which uses XML, but since this is a little too much to digest in one go, we first only look at SalesByStore_core to see how we construct the XML:CREATE PROCEDURE SalesByStore_core @storeid varchar(30), @xmldata xml OUTPUT AS SET @xmldata = ( SELECT t.title, s.qty FROM sales s JOIN titles t ON t.title_id = s.title_id WHERE s.stor_id = @storeid FOR XML RAW('SalesByStore'), TYPE) go
In the previous version of SalesByStore_core, we stored the data from the result in a temp table. Here we use FOR XML RAW to generate an XML document that we save to the output parameter @xmldata.
This is how the resulting XML document may look like (with one title abbreviated for space):<SalesByStore title="Is Anger the Enemy?" qty="20" /> <SalesByStore title="The Gourmet Microwave" qty="25" /> <SalesByStore title="Computer Phobic AND Non-Phobic Individuals" qty="20" /> <SalesByStore title="Life Without Fear" qty="25" /> <SalesByStore title="Prolonged Data Deprivation: Four Case Studies" qty="15" /> <SalesByStore title="Emotional Security: A New Algorithm" qty="25" />
FOR XML has three more options beside RAW: AUTO, ELEMENTS and PATH, but for our purposes here, RAW is the simplest to use. You don't have to specify a name for the elements; the default in this case will be row, but I would suggest that using a name is good for clarity.
The keyword TYPE ensures that the return type of the SELECT query is the xml data type; without TYPE the type would be nvarchar(MAX). TYPE is not needed here, since there will be an implicit conversion to xml anyway, but it can be considered good practice to include it. Except... there is a bug which makes XML documents created with TYPE to be less efficient. The Connect item is still open, but I have not re-evaluated the repro in the Connect item to verify that the issue still exists.
Converting the XML Data Back to Tabular Format
Since SalesByStore should work like it did originally, it has to convert the data back to tabular format, a process known as shredding. Here is how the XML version looks like:CREATE PROCEDURE SalesByStore @storeid varchar(30) AS DECLARE @xmldata xml EXEC SalesByStore_core @storeid, @xmldata OUTPUT SELECT T.c.value('@title', 'varchar(80)') AS title, T.c.value('@qty', 'smallint') AS qty FROM @xmldata.nodes('SalesByStore') AS T(c) go
To shred the document, we use two of the xml type methods. The first is nodes which shreds the documents into fragments of a single element. That is, this part:FROM @xmldata.nodes('SalesByStore') AS T(c)
The part T(c) defines an alias for the one-column table as well as an alias for the column. To get the values out of the fragments, we use another xml type method, value. The value method takes two arguments whereof the first addresses the value we want to extract, and the second specifies the data type. The first parameter is a fairly complex story, but as long as you follow the example above, you don't really need to know any more. Just keep in mind that you must put an @ before the attribute names, else you would be addressing an element. In the XML section of my article Arrays and Lists in SQL Server (The Long Version), I have some more information about nodes and value.
To make the example complete, here is the XML version of BigSalesByStore. To avoid having to repeat the call to value in the WHERE clause, I use a CTE (Common Table Expression).CREATE PROCEDURE BigSalesByStore @storeid varchar(30), @qty smallint AS DECLARE @xmldata xml EXEC SalesByStore_core @storeid, @xmldata OUTPUT ; WITH SalesByStore AS ( SELECT T.c.value('@title', 'varchar(80)') AS title, T.c.value('@qty', 'smallint') AS qty FROM @xmldata.nodes('SalesByStore') AS T(c) ) SELECT title, qty FROM SalesByStore WHERE qty >= @qty go
Input and Output
In this example the XML document is output-only, but it's easy to see that the same method can be used for input-only scenarios. The caller builds the XML document and the callee shreds it back to a table.
What about input-output scenarios like the procedure ComputeTotalStoreQty? One possibility is that the callee shreds the data into a temp table, performs its operation, and then converts the data back to XML. A second alternative is that the callee modifies the XML directly using the xml type method modify. I will spare you from an example of this, however, as it unlikely that you would try it, unless you already are proficient in XQuery. A better alternative may be to mix methods: use a table-valued parameter for input and only use XML for output.
The result set in the example is from a single table, but what if we have some form of parent/child-relationship? Say that we want to return the name of all authors, as well as all the titles they have written. With temp tables or process-keyed tables, the natural solution would be to use two tables (or actually three, since in pubs there is a many-to-many relationship between titles and authors, which I overlook here.) But since XML is hierarchical, it would be more natural to put everything in a single XML document, and here is a query to do this:SELECT a.au_id , a.au_lname, a.au_fname , (SELECT t.title FROM pubs..titleauthor ta JOIN pubs..titles t ON t.title_id = ta.title_id WHERE a.au_id = ta.au_id FOR XML RAW('titles'), TYPE) FROM pubs..authors a FOR XML RAW('authors'), TYPE
Rather than a regular join query, I use a subquery for the titles, because I only want one node per author with all titles. With a join, I get one author node for each title, so that authors with many books appear in multiple nodes. The subquery uses FOR XML to create a nested XML document, and this time the TYPE option is mandatory, since without it the nested XML data would be included as a plain string.
To retrieve the titles from the XML document, you could use this query:SELECT au_id = A.item.value('@au_id', 'varchar(11)'), title = T.item.value('@title', 'varchar(80)') FROM @x.nodes('/authors') AS A(item) CROSS APPLY A.item.nodes('titles') AS T(item)
The first call to nodes gives you a fragment per authors node, and then you use CROSS APPLY to dig down to the titles node. For a little longer discussion on this way of shredding a hierarchical XML document, see the XML section of my article Arrays and Lists in SQL Server (The Long Version).
Assessing the Method
So far the technique to use this method. Let's now assess it. If you have never worked with XML in SQL Server, you are probably saying to yourself I will never use that!. And one can hardly blame you. This method is like pushing the table camel through the needles eye of the parameter list of a stored procedure. Personally, I think the method spells k-l-u-d-g-e. But it's certainly a matter of opinion. I got a mail from David Walker, and he went as far as saying this is the only method that really works.
And, that cannot be denied, there are certainly advantages with XML over about all the other methods I have presented here. It is less contrived than using the CLR, and it is definitely a better option than OPENQUERY. You are not caught up with the limitations of table-valued functions. Nor do you have any of the issues with INSERT-EXEC. Compared to temp tables and process-keyed tables, you don't have to be worried about recompilation or that programmers fail to clean up a process-keyed table after use.
When it comes to performance, you get some cost for building the XML document and shredding it shortly thereafter. Then again, as long as the amount of data is small, say less than 200 KB, the data will stay in memory and there is no logging involved like when you use a table of any sort. Larger XML documents will spill to disk, though. A general caveat is that inappropriate addressing in a large XML document can be a real performance killer, so if you expect large amounts of data, you have to be careful. (And these issues can appear with sizes below 200 KB.)
Besides the daunting complexity, there are downsides with XML from a robustness perspective. XML is more sensitive to errors. If you make a spelling mistake in the first argument to value, you will silently get NULL back, and no error message. Likewise, if you get the argument to nodes