Merge/Intersection between table and dataset

Merge/Intersection between table and dataset - How to achieve?

Consider the following table:

Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
2 0xD64A593F3E9ACC972158D522A4289EA0

(Id is an identity column)

In to that table I want to merge the following dataset:

Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06

(Id is NOT present in the dataset)

The rules for the merge are:

If the hash does not exist in the table, insert it to the table;

If the hash does not exist in the dataset, delete it from the table;

If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.

The result of the merge should leave the table looking like this:

Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
3 0x31F777F0804D301936411E3ECD760859
4 0x0C5A65264F92A543E7AAA06375349C06

What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.

edited Jan 31 at 9:29

asked Jan 31 at 9:14

James

1186

add a comment |

Consider the following table:

Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
2 0xD64A593F3E9ACC972158D522A4289EA0

(Id is an identity column)

In to that table I want to merge the following dataset:

Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06

(Id is NOT present in the dataset)

The rules for the merge are:

If the hash does not exist in the table, insert it to the table;

If the hash does not exist in the dataset, delete it from the table;

If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.

The result of the merge should leave the table looking like this:

Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
3 0x31F777F0804D301936411E3ECD760859
4 0x0C5A65264F92A543E7AAA06375349C06

What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.

edited Jan 31 at 9:29

asked Jan 31 at 9:14

James

1186

add a comment |

Consider the following table:

Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
2 0xD64A593F3E9ACC972158D522A4289EA0

(Id is an identity column)

In to that table I want to merge the following dataset:

Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06

(Id is NOT present in the dataset)

The rules for the merge are:

If the hash does not exist in the table, insert it to the table;

If the hash does not exist in the dataset, delete it from the table;

If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.

The result of the merge should leave the table looking like this:

Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
3 0x31F777F0804D301936411E3ECD760859
4 0x0C5A65264F92A543E7AAA06375349C06

What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.

edited Jan 31 at 9:29

asked Jan 31 at 9:14

James

1186

Consider the following table:

Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
2 0xD64A593F3E9ACC972158D522A4289EA0

(Id is an identity column)

In to that table I want to merge the following dataset:

Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06

(Id is NOT present in the dataset)

The rules for the merge are:

If the hash does not exist in the table, insert it to the table;

If the hash does not exist in the dataset, delete it from the table;

If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.

The result of the merge should leave the table looking like this:

Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
3 0x31F777F0804D301936411E3ECD760859
4 0x0C5A65264F92A543E7AAA06375349C06

What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.

sql-server insert merge

edited Jan 31 at 9:29

asked Jan 31 at 9:14

James

1186

edited Jan 31 at 9:29

asked Jan 31 at 9:14

James

1186

edited Jan 31 at 9:29

asked Jan 31 at 9:14

James

1186

asked Jan 31 at 9:14

James

1186

asked Jan 31 at 9:14

James

1186

add a comment |

1 Answer
1

active

oldest

votes

Using the sample data:

DECLARE @T table
(
 Id integer IDENTITY NOT NULL PRIMARY KEY, 
 [Hash] binary(16) NOT NULL INDEX h
);

INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);

DECLARE @S table 
(
 [Hash] binary(16) NOT NULL
);

INSERT @S
 ([Hash])
VALUES
 (0x31F777F0804D301936411E3ECD760859),
 (0x31F777F0804D301936411E3ECD760859),
 (0x0C5A65264F92A543E7AAA06375349C06);

You could write it as a MERGE:

WITH
 T AS
 (
 SELECT
 T.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY T.[Hash] 
 ORDER BY T.[Hash], T.Id)
 FROM @T AS T
 ),
 S AS
 (
 SELECT DISTINCT
 S.[Hash],
 rn = ROW_NUMBER() OVER (
 PARTITION BY S.[Hash] 
 ORDER BY S.[Hash])
 FROM @S AS S
 )
MERGE T
USING S
 ON S.[Hash] = T.[Hash]
 AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;

db<>fiddle

But for performance reasons (as well as some bugs), I would normally write it as two separate statements:

WITH ToDelete AS
(
 SELECT
 T.*
 FROM @T AS T
 WHERE 
 NOT EXISTS 
 (
 SELECT
 S.* 
 FROM @S AS S 
 WHERE 
 S.[Hash] = T.[Hash]
 )
)
DELETE ToDelete;

WITH ToInsert AS
(
 SELECT
 S.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY S.[Hash] 
 ORDER BY S.[Hash])
 FROM @S AS S
 EXCEPT
 SELECT
 T.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY T.[Hash] 
 ORDER BY T.[Hash], T.Id)
 FROM @T AS T
)
INSERT @T
 ([Hash])
SELECT
 ToInsert.[Hash]
FROM ToInsert;

db<>fiddle

You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].

There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.

edited Jan 31 at 13:25

answered Jan 31 at 10:26

Paul White♦

52.2k14278451

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "182"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f228529%2fmerge-intersection-between-table-and-dataset-how-to-achieve%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

Using the sample data:

DECLARE @T table
(
 Id integer IDENTITY NOT NULL PRIMARY KEY, 
 [Hash] binary(16) NOT NULL INDEX h
);

INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);

DECLARE @S table 
(
 [Hash] binary(16) NOT NULL
);

INSERT @S
 ([Hash])
VALUES
 (0x31F777F0804D301936411E3ECD760859),
 (0x31F777F0804D301936411E3ECD760859),
 (0x0C5A65264F92A543E7AAA06375349C06);

You could write it as a MERGE:

WITH
 T AS
 (
 SELECT
 T.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY T.[Hash] 
 ORDER BY T.[Hash], T.Id)
 FROM @T AS T
 ),
 S AS
 (
 SELECT DISTINCT
 S.[Hash],
 rn = ROW_NUMBER() OVER (
 PARTITION BY S.[Hash] 
 ORDER BY S.[Hash])
 FROM @S AS S
 )
MERGE T
USING S
 ON S.[Hash] = T.[Hash]
 AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;

db<>fiddle

But for performance reasons (as well as some bugs), I would normally write it as two separate statements:

WITH ToDelete AS
(
 SELECT
 T.*
 FROM @T AS T
 WHERE 
 NOT EXISTS 
 (
 SELECT
 S.* 
 FROM @S AS S 
 WHERE 
 S.[Hash] = T.[Hash]
 )
)
DELETE ToDelete;

WITH ToInsert AS
(
 SELECT
 S.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY S.[Hash] 
 ORDER BY S.[Hash])
 FROM @S AS S
 EXCEPT
 SELECT
 T.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY T.[Hash] 
 ORDER BY T.[Hash], T.Id)
 FROM @T AS T
)
INSERT @T
 ([Hash])
SELECT
 ToInsert.[Hash]
FROM ToInsert;

db<>fiddle

You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].

There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.

edited Jan 31 at 13:25

answered Jan 31 at 10:26

Paul White♦

52.2k14278451

add a comment |

Using the sample data:

DECLARE @T table
(
 Id integer IDENTITY NOT NULL PRIMARY KEY, 
 [Hash] binary(16) NOT NULL INDEX h
);

INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);

DECLARE @S table 
(
 [Hash] binary(16) NOT NULL
);

INSERT @S
 ([Hash])
VALUES
 (0x31F777F0804D301936411E3ECD760859),
 (0x31F777F0804D301936411E3ECD760859),
 (0x0C5A65264F92A543E7AAA06375349C06);

You could write it as a MERGE:

WITH
 T AS
 (
 SELECT
 T.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY T.[Hash] 
 ORDER BY T.[Hash], T.Id)
 FROM @T AS T
 ),
 S AS
 (
 SELECT DISTINCT
 S.[Hash],
 rn = ROW_NUMBER() OVER (
 PARTITION BY S.[Hash] 
 ORDER BY S.[Hash])
 FROM @S AS S
 )
MERGE T
USING S
 ON S.[Hash] = T.[Hash]
 AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;

db<>fiddle

But for performance reasons (as well as some bugs), I would normally write it as two separate statements:

WITH ToDelete AS
(
 SELECT
 T.*
 FROM @T AS T
 WHERE 
 NOT EXISTS 
 (
 SELECT
 S.* 
 FROM @S AS S 
 WHERE 
 S.[Hash] = T.[Hash]
 )
)
DELETE ToDelete;

WITH ToInsert AS
(
 SELECT
 S.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY S.[Hash] 
 ORDER BY S.[Hash])
 FROM @S AS S
 EXCEPT
 SELECT
 T.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY T.[Hash] 
 ORDER BY T.[Hash], T.Id)
 FROM @T AS T
)
INSERT @T
 ([Hash])
SELECT
 ToInsert.[Hash]
FROM ToInsert;

db<>fiddle

You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].

There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.

edited Jan 31 at 13:25

answered Jan 31 at 10:26

Paul White♦

52.2k14278451

add a comment |

Using the sample data:

DECLARE @T table
(
 Id integer IDENTITY NOT NULL PRIMARY KEY, 
 [Hash] binary(16) NOT NULL INDEX h
);

INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);

DECLARE @S table 
(
 [Hash] binary(16) NOT NULL
);

INSERT @S
 ([Hash])
VALUES
 (0x31F777F0804D301936411E3ECD760859),
 (0x31F777F0804D301936411E3ECD760859),
 (0x0C5A65264F92A543E7AAA06375349C06);

You could write it as a MERGE:

WITH
 T AS
 (
 SELECT
 T.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY T.[Hash] 
 ORDER BY T.[Hash], T.Id)
 FROM @T AS T
 ),
 S AS
 (
 SELECT DISTINCT
 S.[Hash],
 rn = ROW_NUMBER() OVER (
 PARTITION BY S.[Hash] 
 ORDER BY S.[Hash])
 FROM @S AS S
 )
MERGE T
USING S
 ON S.[Hash] = T.[Hash]
 AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;

db<>fiddle

But for performance reasons (as well as some bugs), I would normally write it as two separate statements:

WITH ToDelete AS
(
 SELECT
 T.*
 FROM @T AS T
 WHERE 
 NOT EXISTS 
 (
 SELECT
 S.* 
 FROM @S AS S 
 WHERE 
 S.[Hash] = T.[Hash]
 )
)
DELETE ToDelete;

WITH ToInsert AS
(
 SELECT
 S.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY S.[Hash] 
 ORDER BY S.[Hash])
 FROM @S AS S
 EXCEPT
 SELECT
 T.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY T.[Hash] 
 ORDER BY T.[Hash], T.Id)
 FROM @T AS T
)
INSERT @T
 ([Hash])
SELECT
 ToInsert.[Hash]
FROM ToInsert;

db<>fiddle

You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].

There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.

edited Jan 31 at 13:25

answered Jan 31 at 10:26

Paul White♦

52.2k14278451

Using the sample data:

DECLARE @T table
(
 Id integer IDENTITY NOT NULL PRIMARY KEY, 
 [Hash] binary(16) NOT NULL INDEX h
);

INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);

DECLARE @S table 
(
 [Hash] binary(16) NOT NULL
);

INSERT @S
 ([Hash])
VALUES
 (0x31F777F0804D301936411E3ECD760859),
 (0x31F777F0804D301936411E3ECD760859),
 (0x0C5A65264F92A543E7AAA06375349C06);

You could write it as a MERGE:

WITH
 T AS
 (
 SELECT
 T.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY T.[Hash] 
 ORDER BY T.[Hash], T.Id)
 FROM @T AS T
 ),
 S AS
 (
 SELECT DISTINCT
 S.[Hash],
 rn = ROW_NUMBER() OVER (
 PARTITION BY S.[Hash] 
 ORDER BY S.[Hash])
 FROM @S AS S
 )
MERGE T
USING S
 ON S.[Hash] = T.[Hash]
 AND S.rn = T.rn
WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
WHEN NOT MATCHED BY SOURCE THEN DELETE;

db<>fiddle

But for performance reasons (as well as some bugs), I would normally write it as two separate statements:

WITH ToDelete AS
(
 SELECT
 T.*
 FROM @T AS T
 WHERE 
 NOT EXISTS 
 (
 SELECT
 S.* 
 FROM @S AS S 
 WHERE 
 S.[Hash] = T.[Hash]
 )
)
DELETE ToDelete;

WITH ToInsert AS
(
 SELECT
 S.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY S.[Hash] 
 ORDER BY S.[Hash])
 FROM @S AS S
 EXCEPT
 SELECT
 T.[Hash], 
 rn = ROW_NUMBER() OVER (
 PARTITION BY T.[Hash] 
 ORDER BY T.[Hash], T.Id)
 FROM @T AS T
)
INSERT @T
 ([Hash])
SELECT
 ToInsert.[Hash]
FROM ToInsert;

db<>fiddle

You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].

There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.

edited Jan 31 at 13:25

answered Jan 31 at 10:26

Paul White♦

52.2k14278451

edited Jan 31 at 13:25

answered Jan 31 at 10:26

Paul White♦

52.2k14278451

answered Jan 31 at 10:26

Paul White♦

52.2k14278451

answered Jan 31 at 10:26

Paul White♦

52.2k14278451

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Database Administrators Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu