Merge/Intersection between table and dataset - How to achieve?
Clash Royale CLAN TAG#URR8PPP
Consider the following table:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
2 0xD64A593F3E9ACC972158D522A4289EA0
(Id is an identity column)
In to that table I want to merge the following dataset:
Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06
(Id is NOT present in the dataset)
The rules for the merge are:
- If the hash does not exist in the table, insert it to the table;
- If the hash does not exist in the dataset, delete it from the table;
- If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.
The result of the merge should leave the table looking like this:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
3 0x31F777F0804D301936411E3ECD760859
4 0x0C5A65264F92A543E7AAA06375349C06
What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.
sql-server insert merge
add a comment |
Consider the following table:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
2 0xD64A593F3E9ACC972158D522A4289EA0
(Id is an identity column)
In to that table I want to merge the following dataset:
Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06
(Id is NOT present in the dataset)
The rules for the merge are:
- If the hash does not exist in the table, insert it to the table;
- If the hash does not exist in the dataset, delete it from the table;
- If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.
The result of the merge should leave the table looking like this:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
3 0x31F777F0804D301936411E3ECD760859
4 0x0C5A65264F92A543E7AAA06375349C06
What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.
sql-server insert merge
add a comment |
Consider the following table:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
2 0xD64A593F3E9ACC972158D522A4289EA0
(Id is an identity column)
In to that table I want to merge the following dataset:
Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06
(Id is NOT present in the dataset)
The rules for the merge are:
- If the hash does not exist in the table, insert it to the table;
- If the hash does not exist in the dataset, delete it from the table;
- If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.
The result of the merge should leave the table looking like this:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
3 0x31F777F0804D301936411E3ECD760859
4 0x0C5A65264F92A543E7AAA06375349C06
What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.
sql-server insert merge
Consider the following table:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
2 0xD64A593F3E9ACC972158D522A4289EA0
(Id is an identity column)
In to that table I want to merge the following dataset:
Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06
(Id is NOT present in the dataset)
The rules for the merge are:
- If the hash does not exist in the table, insert it to the table;
- If the hash does not exist in the dataset, delete it from the table;
- If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.
The result of the merge should leave the table looking like this:
Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
3 0x31F777F0804D301936411E3ECD760859
4 0x0C5A65264F92A543E7AAA06375349C06
What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.
sql-server insert merge
sql-server insert merge
edited Jan 31 at 9:29
James
asked Jan 31 at 9:14
JamesJames
1186
1186
add a comment |
add a comment |
1 Answer
1
active
oldest
votes
Using the sample data:
DECLARE @T table
(
Id integer IDENTITY NOT NULL PRIMARY KEY,
[Hash] binary(16) NOT NULL INDEX h
);
INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);
DECLARE @S table
(
[Hash] binary(16) NOT NULL
);
INSERT @S
([Hash])
VALUES
(0x31F777F0804D301936411E3ECD760859),
(0x31F777F0804D301936411E3ECD760859),
(0x0C5A65264F92A543E7AAA06375349C06);
You could write it as a MERGE
:
WITH
T AS
(
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
),
S AS
(
SELECT DISTINCT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
)
MERGE T
USING S
ON S.[Hash] = T.[Hash]
AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;
db<>fiddle
But for performance reasons (as well as some bugs), I would normally write it as two separate statements:
WITH ToDelete AS
(
SELECT
T.*
FROM @T AS T
WHERE
NOT EXISTS
(
SELECT
S.*
FROM @S AS S
WHERE
S.[Hash] = T.[Hash]
)
)
DELETE ToDelete;
WITH ToInsert AS
(
SELECT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
EXCEPT
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
)
INSERT @T
([Hash])
SELECT
ToInsert.[Hash]
FROM ToInsert;
db<>fiddle
You should have a unique index on the target on ([Hash], [Id])
. Chances are you have this already, or equivalently an index on [Hash]
and a unique (possibly clustered) index on [Id]
.
There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "182"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f228529%2fmerge-intersection-between-table-and-dataset-how-to-achieve%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
Using the sample data:
DECLARE @T table
(
Id integer IDENTITY NOT NULL PRIMARY KEY,
[Hash] binary(16) NOT NULL INDEX h
);
INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);
DECLARE @S table
(
[Hash] binary(16) NOT NULL
);
INSERT @S
([Hash])
VALUES
(0x31F777F0804D301936411E3ECD760859),
(0x31F777F0804D301936411E3ECD760859),
(0x0C5A65264F92A543E7AAA06375349C06);
You could write it as a MERGE
:
WITH
T AS
(
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
),
S AS
(
SELECT DISTINCT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
)
MERGE T
USING S
ON S.[Hash] = T.[Hash]
AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;
db<>fiddle
But for performance reasons (as well as some bugs), I would normally write it as two separate statements:
WITH ToDelete AS
(
SELECT
T.*
FROM @T AS T
WHERE
NOT EXISTS
(
SELECT
S.*
FROM @S AS S
WHERE
S.[Hash] = T.[Hash]
)
)
DELETE ToDelete;
WITH ToInsert AS
(
SELECT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
EXCEPT
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
)
INSERT @T
([Hash])
SELECT
ToInsert.[Hash]
FROM ToInsert;
db<>fiddle
You should have a unique index on the target on ([Hash], [Id])
. Chances are you have this already, or equivalently an index on [Hash]
and a unique (possibly clustered) index on [Id]
.
There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.
add a comment |
Using the sample data:
DECLARE @T table
(
Id integer IDENTITY NOT NULL PRIMARY KEY,
[Hash] binary(16) NOT NULL INDEX h
);
INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);
DECLARE @S table
(
[Hash] binary(16) NOT NULL
);
INSERT @S
([Hash])
VALUES
(0x31F777F0804D301936411E3ECD760859),
(0x31F777F0804D301936411E3ECD760859),
(0x0C5A65264F92A543E7AAA06375349C06);
You could write it as a MERGE
:
WITH
T AS
(
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
),
S AS
(
SELECT DISTINCT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
)
MERGE T
USING S
ON S.[Hash] = T.[Hash]
AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;
db<>fiddle
But for performance reasons (as well as some bugs), I would normally write it as two separate statements:
WITH ToDelete AS
(
SELECT
T.*
FROM @T AS T
WHERE
NOT EXISTS
(
SELECT
S.*
FROM @S AS S
WHERE
S.[Hash] = T.[Hash]
)
)
DELETE ToDelete;
WITH ToInsert AS
(
SELECT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
EXCEPT
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
)
INSERT @T
([Hash])
SELECT
ToInsert.[Hash]
FROM ToInsert;
db<>fiddle
You should have a unique index on the target on ([Hash], [Id])
. Chances are you have this already, or equivalently an index on [Hash]
and a unique (possibly clustered) index on [Id]
.
There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.
add a comment |
Using the sample data:
DECLARE @T table
(
Id integer IDENTITY NOT NULL PRIMARY KEY,
[Hash] binary(16) NOT NULL INDEX h
);
INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);
DECLARE @S table
(
[Hash] binary(16) NOT NULL
);
INSERT @S
([Hash])
VALUES
(0x31F777F0804D301936411E3ECD760859),
(0x31F777F0804D301936411E3ECD760859),
(0x0C5A65264F92A543E7AAA06375349C06);
You could write it as a MERGE
:
WITH
T AS
(
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
),
S AS
(
SELECT DISTINCT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
)
MERGE T
USING S
ON S.[Hash] = T.[Hash]
AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;
db<>fiddle
But for performance reasons (as well as some bugs), I would normally write it as two separate statements:
WITH ToDelete AS
(
SELECT
T.*
FROM @T AS T
WHERE
NOT EXISTS
(
SELECT
S.*
FROM @S AS S
WHERE
S.[Hash] = T.[Hash]
)
)
DELETE ToDelete;
WITH ToInsert AS
(
SELECT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
EXCEPT
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
)
INSERT @T
([Hash])
SELECT
ToInsert.[Hash]
FROM ToInsert;
db<>fiddle
You should have a unique index on the target on ([Hash], [Id])
. Chances are you have this already, or equivalently an index on [Hash]
and a unique (possibly clustered) index on [Id]
.
There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.
Using the sample data:
DECLARE @T table
(
Id integer IDENTITY NOT NULL PRIMARY KEY,
[Hash] binary(16) NOT NULL INDEX h
);
INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);
DECLARE @S table
(
[Hash] binary(16) NOT NULL
);
INSERT @S
([Hash])
VALUES
(0x31F777F0804D301936411E3ECD760859),
(0x31F777F0804D301936411E3ECD760859),
(0x0C5A65264F92A543E7AAA06375349C06);
You could write it as a MERGE
:
WITH
T AS
(
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
),
S AS
(
SELECT DISTINCT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
)
MERGE T
USING S
ON S.[Hash] = T.[Hash]
AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;
db<>fiddle
But for performance reasons (as well as some bugs), I would normally write it as two separate statements:
WITH ToDelete AS
(
SELECT
T.*
FROM @T AS T
WHERE
NOT EXISTS
(
SELECT
S.*
FROM @S AS S
WHERE
S.[Hash] = T.[Hash]
)
)
DELETE ToDelete;
WITH ToInsert AS
(
SELECT
S.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY S.[Hash]
ORDER BY S.[Hash])
FROM @S AS S
EXCEPT
SELECT
T.[Hash],
rn = ROW_NUMBER() OVER (
PARTITION BY T.[Hash]
ORDER BY T.[Hash], T.Id)
FROM @T AS T
)
INSERT @T
([Hash])
SELECT
ToInsert.[Hash]
FROM ToInsert;
db<>fiddle
You should have a unique index on the target on ([Hash], [Id])
. Chances are you have this already, or equivalently an index on [Hash]
and a unique (possibly clustered) index on [Id]
.
There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.
edited Jan 31 at 13:25
answered Jan 31 at 10:26
Paul White♦Paul White
52.2k14278451
52.2k14278451
add a comment |
add a comment |
Thanks for contributing an answer to Database Administrators Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f228529%2fmerge-intersection-between-table-and-dataset-how-to-achieve%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown