I recently worked with a team on a project to improve performance of a legacy system and one of the issues we encountered was low performance when inserting data across linked servers.
During the investigation phase, we found a number of instances where medium-sized datasets (50k-150k records) were taking considerable time (30-70s) to insert from one server to another using linked servers. We found that a stored procedure similar to the following was used when inserting data from SQL01 to a table on SQL02:
USE DB01; -- Database on SQL01
GO
CREATE PROCEDURE dbo.DataInsertToSQL02
AS
BEGIN
INSERT INTO SQL02.DB02.dbo.Table02 (Id INT, ...)
SELECT Id, ... FROM dbo.Table01;
END
GO
In this scenario, SQL Server does not insert the data as a set. Instead it performs a row-by-row insert, so for 150k records the query is executed 150k times.
Our solution was to move the procedure from SQL01 to SQL02 & execute it from SQL01, e.g.:
USE DB02; -- Database on SQL02
GO
CREATE PROCEDURE dbo.DataInsertFromSQL01
AS
BEGIN
INSERT INTO dbo.Table02 (Id INT, ...)
SELECT Id, ... FROM SQL01.DB01.dbo.Table01;
END
GO
USE DB01;
GO
EXECUTE SQL02.DB02.dbo.DataInsertFromSQL01;
In this scenario, SQL Server selects & inserts the data as a set. By applying this change to an as-live environment, we observed that most query times dropped to 3s or less &, on average, query times were 15 times faster for datasets up-to 150k records.
Conclusion:
When inserting data across linked servers, it is important to “pull” rather than “push” the data to maximise performance.
One Comment
That is certainly something to keep in mind although I have not come across that issue so far but thanks.