Merge/Intersection between table and dataset - How to achieve?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












2















Consider the following table:



Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
2 0xD64A593F3E9ACC972158D522A4289EA0

(Id is an identity column)


In to that table I want to merge the following dataset:



Hash
----------------------------------
0x31F777F0804D301936411E3ECD760859
0x31F777F0804D301936411E3ECD760859
0x0C5A65264F92A543E7AAA06375349C06

(Id is NOT present in the dataset)


The rules for the merge are:



  • If the hash does not exist in the table, insert it to the table;

  • If the hash does not exist in the dataset, delete it from the table;

  • If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.

The result of the merge should leave the table looking like this:



Id Hash
----------- ----------------------------------
1 0x31F777F0804D301936411E3ECD760859
3 0x31F777F0804D301936411E3ECD760859
4 0x0C5A65264F92A543E7AAA06375349C06


What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.










share|improve this question




























    2















    Consider the following table:



    Id Hash
    ----------- ----------------------------------
    1 0x31F777F0804D301936411E3ECD760859
    2 0xD64A593F3E9ACC972158D522A4289EA0

    (Id is an identity column)


    In to that table I want to merge the following dataset:



    Hash
    ----------------------------------
    0x31F777F0804D301936411E3ECD760859
    0x31F777F0804D301936411E3ECD760859
    0x0C5A65264F92A543E7AAA06375349C06

    (Id is NOT present in the dataset)


    The rules for the merge are:



    • If the hash does not exist in the table, insert it to the table;

    • If the hash does not exist in the dataset, delete it from the table;

    • If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.

    The result of the merge should leave the table looking like this:



    Id Hash
    ----------- ----------------------------------
    1 0x31F777F0804D301936411E3ECD760859
    3 0x31F777F0804D301936411E3ECD760859
    4 0x0C5A65264F92A543E7AAA06375349C06


    What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.










    share|improve this question


























      2












      2








      2


      1






      Consider the following table:



      Id Hash
      ----------- ----------------------------------
      1 0x31F777F0804D301936411E3ECD760859
      2 0xD64A593F3E9ACC972158D522A4289EA0

      (Id is an identity column)


      In to that table I want to merge the following dataset:



      Hash
      ----------------------------------
      0x31F777F0804D301936411E3ECD760859
      0x31F777F0804D301936411E3ECD760859
      0x0C5A65264F92A543E7AAA06375349C06

      (Id is NOT present in the dataset)


      The rules for the merge are:



      • If the hash does not exist in the table, insert it to the table;

      • If the hash does not exist in the dataset, delete it from the table;

      • If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.

      The result of the merge should leave the table looking like this:



      Id Hash
      ----------- ----------------------------------
      1 0x31F777F0804D301936411E3ECD760859
      3 0x31F777F0804D301936411E3ECD760859
      4 0x0C5A65264F92A543E7AAA06375349C06


      What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.










      share|improve this question
















      Consider the following table:



      Id Hash
      ----------- ----------------------------------
      1 0x31F777F0804D301936411E3ECD760859
      2 0xD64A593F3E9ACC972158D522A4289EA0

      (Id is an identity column)


      In to that table I want to merge the following dataset:



      Hash
      ----------------------------------
      0x31F777F0804D301936411E3ECD760859
      0x31F777F0804D301936411E3ECD760859
      0x0C5A65264F92A543E7AAA06375349C06

      (Id is NOT present in the dataset)


      The rules for the merge are:



      • If the hash does not exist in the table, insert it to the table;

      • If the hash does not exist in the dataset, delete it from the table;

      • If the hash does exist on both sides, and there are X instances in the table and Y instances in the source, then (Y-X) instances should be inserted in to the table.

      The result of the merge should leave the table looking like this:



      Id Hash
      ----------- ----------------------------------
      1 0x31F777F0804D301936411E3ECD760859
      3 0x31F777F0804D301936411E3ECD760859
      4 0x0C5A65264F92A543E7AAA06375349C06


      What is the most efficient way to write a query to achieve this operation? FYI, other columns are omitted for brevity.







      sql-server insert merge






      share|improve this question















      share|improve this question













      share|improve this question




      share|improve this question








      edited Jan 31 at 9:29







      James

















      asked Jan 31 at 9:14









      JamesJames

      1186




      1186




















          1 Answer
          1






          active

          oldest

          votes


















          5














          Using the sample data:



          DECLARE @T table
          (
          Id integer IDENTITY NOT NULL PRIMARY KEY,
          [Hash] binary(16) NOT NULL INDEX h
          );

          INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
          INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);

          DECLARE @S table
          (
          [Hash] binary(16) NOT NULL
          );

          INSERT @S
          ([Hash])
          VALUES
          (0x31F777F0804D301936411E3ECD760859),
          (0x31F777F0804D301936411E3ECD760859),
          (0x0C5A65264F92A543E7AAA06375349C06);


          You could write it as a MERGE:



          WITH
          T AS
          (
          SELECT
          T.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY T.[Hash]
          ORDER BY T.[Hash], T.Id)
          FROM @T AS T
          ),
          S AS
          (
          SELECT DISTINCT
          S.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY S.[Hash]
          ORDER BY S.[Hash])
          FROM @S AS S
          )
          MERGE T
          USING S
          ON S.[Hash] = T.[Hash]
          AND S.rn = T.rn
          WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
          WHEN NOT MATCHED BY SOURCE THEN DELETE;


          db<>fiddle



          But for performance reasons (as well as some bugs), I would normally write it as two separate statements:



          WITH ToDelete AS
          (
          SELECT
          T.*
          FROM @T AS T
          WHERE
          NOT EXISTS
          (
          SELECT
          S.*
          FROM @S AS S
          WHERE
          S.[Hash] = T.[Hash]
          )
          )
          DELETE ToDelete;


          WITH ToInsert AS
          (
          SELECT
          S.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY S.[Hash]
          ORDER BY S.[Hash])
          FROM @S AS S
          EXCEPT
          SELECT
          T.[Hash],
          rn = ROW_NUMBER() OVER (
          PARTITION BY T.[Hash]
          ORDER BY T.[Hash], T.Id)
          FROM @T AS T
          )
          INSERT @T
          ([Hash])
          SELECT
          ToInsert.[Hash]
          FROM ToInsert;


          db<>fiddle



          You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].



          There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.






          share|improve this answer
























            Your Answer








            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "182"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            autoActivateHeartbeat: false,
            convertImagesToLinks: false,
            noModals: true,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: null,
            bindNavPrevention: true,
            postfix: "",
            imageUploader:
            brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
            contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
            allowUrls: true
            ,
            onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













            draft saved

            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f228529%2fmerge-intersection-between-table-and-dataset-how-to-achieve%23new-answer', 'question_page');

            );

            Post as a guest















            Required, but never shown

























            1 Answer
            1






            active

            oldest

            votes








            1 Answer
            1






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes









            5














            Using the sample data:



            DECLARE @T table
            (
            Id integer IDENTITY NOT NULL PRIMARY KEY,
            [Hash] binary(16) NOT NULL INDEX h
            );

            INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
            INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);

            DECLARE @S table
            (
            [Hash] binary(16) NOT NULL
            );

            INSERT @S
            ([Hash])
            VALUES
            (0x31F777F0804D301936411E3ECD760859),
            (0x31F777F0804D301936411E3ECD760859),
            (0x0C5A65264F92A543E7AAA06375349C06);


            You could write it as a MERGE:



            WITH
            T AS
            (
            SELECT
            T.[Hash],
            rn = ROW_NUMBER() OVER (
            PARTITION BY T.[Hash]
            ORDER BY T.[Hash], T.Id)
            FROM @T AS T
            ),
            S AS
            (
            SELECT DISTINCT
            S.[Hash],
            rn = ROW_NUMBER() OVER (
            PARTITION BY S.[Hash]
            ORDER BY S.[Hash])
            FROM @S AS S
            )
            MERGE T
            USING S
            ON S.[Hash] = T.[Hash]
            AND S.rn = T.rn
            WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
            WHEN NOT MATCHED BY SOURCE THEN DELETE;


            db<>fiddle



            But for performance reasons (as well as some bugs), I would normally write it as two separate statements:



            WITH ToDelete AS
            (
            SELECT
            T.*
            FROM @T AS T
            WHERE
            NOT EXISTS
            (
            SELECT
            S.*
            FROM @S AS S
            WHERE
            S.[Hash] = T.[Hash]
            )
            )
            DELETE ToDelete;


            WITH ToInsert AS
            (
            SELECT
            S.[Hash],
            rn = ROW_NUMBER() OVER (
            PARTITION BY S.[Hash]
            ORDER BY S.[Hash])
            FROM @S AS S
            EXCEPT
            SELECT
            T.[Hash],
            rn = ROW_NUMBER() OVER (
            PARTITION BY T.[Hash]
            ORDER BY T.[Hash], T.Id)
            FROM @T AS T
            )
            INSERT @T
            ([Hash])
            SELECT
            ToInsert.[Hash]
            FROM ToInsert;


            db<>fiddle



            You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].



            There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.






            share|improve this answer





























              5














              Using the sample data:



              DECLARE @T table
              (
              Id integer IDENTITY NOT NULL PRIMARY KEY,
              [Hash] binary(16) NOT NULL INDEX h
              );

              INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
              INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);

              DECLARE @S table
              (
              [Hash] binary(16) NOT NULL
              );

              INSERT @S
              ([Hash])
              VALUES
              (0x31F777F0804D301936411E3ECD760859),
              (0x31F777F0804D301936411E3ECD760859),
              (0x0C5A65264F92A543E7AAA06375349C06);


              You could write it as a MERGE:



              WITH
              T AS
              (
              SELECT
              T.[Hash],
              rn = ROW_NUMBER() OVER (
              PARTITION BY T.[Hash]
              ORDER BY T.[Hash], T.Id)
              FROM @T AS T
              ),
              S AS
              (
              SELECT DISTINCT
              S.[Hash],
              rn = ROW_NUMBER() OVER (
              PARTITION BY S.[Hash]
              ORDER BY S.[Hash])
              FROM @S AS S
              )
              MERGE T
              USING S
              ON S.[Hash] = T.[Hash]
              AND S.rn = T.rn
              WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
              WHEN NOT MATCHED BY SOURCE THEN DELETE;


              db<>fiddle



              But for performance reasons (as well as some bugs), I would normally write it as two separate statements:



              WITH ToDelete AS
              (
              SELECT
              T.*
              FROM @T AS T
              WHERE
              NOT EXISTS
              (
              SELECT
              S.*
              FROM @S AS S
              WHERE
              S.[Hash] = T.[Hash]
              )
              )
              DELETE ToDelete;


              WITH ToInsert AS
              (
              SELECT
              S.[Hash],
              rn = ROW_NUMBER() OVER (
              PARTITION BY S.[Hash]
              ORDER BY S.[Hash])
              FROM @S AS S
              EXCEPT
              SELECT
              T.[Hash],
              rn = ROW_NUMBER() OVER (
              PARTITION BY T.[Hash]
              ORDER BY T.[Hash], T.Id)
              FROM @T AS T
              )
              INSERT @T
              ([Hash])
              SELECT
              ToInsert.[Hash]
              FROM ToInsert;


              db<>fiddle



              You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].



              There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.






              share|improve this answer



























                5












                5








                5







                Using the sample data:



                DECLARE @T table
                (
                Id integer IDENTITY NOT NULL PRIMARY KEY,
                [Hash] binary(16) NOT NULL INDEX h
                );

                INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
                INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);

                DECLARE @S table
                (
                [Hash] binary(16) NOT NULL
                );

                INSERT @S
                ([Hash])
                VALUES
                (0x31F777F0804D301936411E3ECD760859),
                (0x31F777F0804D301936411E3ECD760859),
                (0x0C5A65264F92A543E7AAA06375349C06);


                You could write it as a MERGE:



                WITH
                T AS
                (
                SELECT
                T.[Hash],
                rn = ROW_NUMBER() OVER (
                PARTITION BY T.[Hash]
                ORDER BY T.[Hash], T.Id)
                FROM @T AS T
                ),
                S AS
                (
                SELECT DISTINCT
                S.[Hash],
                rn = ROW_NUMBER() OVER (
                PARTITION BY S.[Hash]
                ORDER BY S.[Hash])
                FROM @S AS S
                )
                MERGE T
                USING S
                ON S.[Hash] = T.[Hash]
                AND S.rn = T.rn
                WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
                WHEN NOT MATCHED BY SOURCE THEN DELETE;


                db<>fiddle



                But for performance reasons (as well as some bugs), I would normally write it as two separate statements:



                WITH ToDelete AS
                (
                SELECT
                T.*
                FROM @T AS T
                WHERE
                NOT EXISTS
                (
                SELECT
                S.*
                FROM @S AS S
                WHERE
                S.[Hash] = T.[Hash]
                )
                )
                DELETE ToDelete;


                WITH ToInsert AS
                (
                SELECT
                S.[Hash],
                rn = ROW_NUMBER() OVER (
                PARTITION BY S.[Hash]
                ORDER BY S.[Hash])
                FROM @S AS S
                EXCEPT
                SELECT
                T.[Hash],
                rn = ROW_NUMBER() OVER (
                PARTITION BY T.[Hash]
                ORDER BY T.[Hash], T.Id)
                FROM @T AS T
                )
                INSERT @T
                ([Hash])
                SELECT
                ToInsert.[Hash]
                FROM ToInsert;


                db<>fiddle



                You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].



                There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.






                share|improve this answer















                Using the sample data:



                DECLARE @T table
                (
                Id integer IDENTITY NOT NULL PRIMARY KEY,
                [Hash] binary(16) NOT NULL INDEX h
                );

                INSERT @T ([Hash]) VALUES (0x31F777F0804D301936411E3ECD760859);
                INSERT @T ([Hash]) VALUES (0xD64A593F3E9ACC972158D522A4289EA0);

                DECLARE @S table
                (
                [Hash] binary(16) NOT NULL
                );

                INSERT @S
                ([Hash])
                VALUES
                (0x31F777F0804D301936411E3ECD760859),
                (0x31F777F0804D301936411E3ECD760859),
                (0x0C5A65264F92A543E7AAA06375349C06);


                You could write it as a MERGE:



                WITH
                T AS
                (
                SELECT
                T.[Hash],
                rn = ROW_NUMBER() OVER (
                PARTITION BY T.[Hash]
                ORDER BY T.[Hash], T.Id)
                FROM @T AS T
                ),
                S AS
                (
                SELECT DISTINCT
                S.[Hash],
                rn = ROW_NUMBER() OVER (
                PARTITION BY S.[Hash]
                ORDER BY S.[Hash])
                FROM @S AS S
                )
                MERGE T
                USING S
                ON S.[Hash] = T.[Hash]
                AND S.rn = T.rn
                WHEN NOT MATCHED BY TARGET THEN INSERT ([Hash]) VALUES (S.[Hash])
                WHEN NOT MATCHED BY SOURCE THEN DELETE;


                db<>fiddle



                But for performance reasons (as well as some bugs), I would normally write it as two separate statements:



                WITH ToDelete AS
                (
                SELECT
                T.*
                FROM @T AS T
                WHERE
                NOT EXISTS
                (
                SELECT
                S.*
                FROM @S AS S
                WHERE
                S.[Hash] = T.[Hash]
                )
                )
                DELETE ToDelete;


                WITH ToInsert AS
                (
                SELECT
                S.[Hash],
                rn = ROW_NUMBER() OVER (
                PARTITION BY S.[Hash]
                ORDER BY S.[Hash])
                FROM @S AS S
                EXCEPT
                SELECT
                T.[Hash],
                rn = ROW_NUMBER() OVER (
                PARTITION BY T.[Hash]
                ORDER BY T.[Hash], T.Id)
                FROM @T AS T
                )
                INSERT @T
                ([Hash])
                SELECT
                ToInsert.[Hash]
                FROM ToInsert;


                db<>fiddle



                You should have a unique index on the target on ([Hash], [Id]). Chances are you have this already, or equivalently an index on [Hash] and a unique (possibly clustered) index on [Id].



                There may be other issues depending on what else was omitted from the question for brevity. In any case it should provide a couple of possible starting points for your own solution.







                share|improve this answer














                share|improve this answer



                share|improve this answer








                edited Jan 31 at 13:25

























                answered Jan 31 at 10:26









                Paul WhitePaul White

                52.2k14278451




                52.2k14278451



























                    draft saved

                    draft discarded
















































                    Thanks for contributing an answer to Database Administrators Stack Exchange!


                    • Please be sure to answer the question. Provide details and share your research!

                    But avoid


                    • Asking for help, clarification, or responding to other answers.

                    • Making statements based on opinion; back them up with references or personal experience.

                    To learn more, see our tips on writing great answers.




                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fdba.stackexchange.com%2fquestions%2f228529%2fmerge-intersection-between-table-and-dataset-how-to-achieve%23new-answer', 'question_page');

                    );

                    Post as a guest















                    Required, but never shown





















































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown

































                    Required, but never shown














                    Required, but never shown












                    Required, but never shown







                    Required, but never shown






                    Popular posts from this blog

                    How to check contact read email or not when send email to Individual?

                    Displaying single band from multi-band raster using QGIS

                    How many registers does an x86_64 CPU actually have?