Entropy and total variation distance

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
10
down vote

favorite
4












Let $X$, $Y$ be discrete random variables taking values within the same set of $N$ elements. Let the total variation distance $|P-Q|$ (which is half the $L_1$ distance between the distributions of $P$ and $Q$) be at most $epsilon$. What is a standard, easily provable bound on the difference between the entropies of $X$ and $Y$?



(It is easy to see that such a bound must depend not only on $epsilon$ but also on $N$.)










share|cite|improve this question



























    up vote
    10
    down vote

    favorite
    4












    Let $X$, $Y$ be discrete random variables taking values within the same set of $N$ elements. Let the total variation distance $|P-Q|$ (which is half the $L_1$ distance between the distributions of $P$ and $Q$) be at most $epsilon$. What is a standard, easily provable bound on the difference between the entropies of $X$ and $Y$?



    (It is easy to see that such a bound must depend not only on $epsilon$ but also on $N$.)










    share|cite|improve this question

























      up vote
      10
      down vote

      favorite
      4









      up vote
      10
      down vote

      favorite
      4






      4





      Let $X$, $Y$ be discrete random variables taking values within the same set of $N$ elements. Let the total variation distance $|P-Q|$ (which is half the $L_1$ distance between the distributions of $P$ and $Q$) be at most $epsilon$. What is a standard, easily provable bound on the difference between the entropies of $X$ and $Y$?



      (It is easy to see that such a bound must depend not only on $epsilon$ but also on $N$.)










      share|cite|improve this question















      Let $X$, $Y$ be discrete random variables taking values within the same set of $N$ elements. Let the total variation distance $|P-Q|$ (which is half the $L_1$ distance between the distributions of $P$ and $Q$) be at most $epsilon$. What is a standard, easily provable bound on the difference between the entropies of $X$ and $Y$?



      (It is easy to see that such a bound must depend not only on $epsilon$ but also on $N$.)







      pr.probability probability-distributions inequalities it.information-theory entropy






      share|cite|improve this question















      share|cite|improve this question













      share|cite|improve this question




      share|cite|improve this question








      edited Sep 16 at 14:39









      Iosif Pinelis

      15.2k12154




      15.2k12154










      asked Sep 16 at 12:41









      H A Helfgott

      4,2322479




      4,2322479




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          8
          down vote













          Claim. If $|P-Q|leqvarepsilonleqfrac12$, then $|H(P)-H(Q)| leq H(varepsilon) + varepsilonlog N$.



          Proof.
          Let $varepsilon':=|P-Q|$.
          Let $(X,Y)$ be an optimal coupling of $P$ and $Q$, so that
          beginalign
          mathbbP(Xneq Y) = |P-Q| ;.
          endalign
          Using a standard construction, we can assume that $X$ and $Y$ have the particular form
          beginalign
          X &:=
          begincases
          Z & textif $B=0$, \
          tildeX & textif $B=1$,
          endcases &
          Y &:=
          begincases
          Z & textif $B=0$, \
          tildeY & textif $B=1$,
          endcases
          endalign
          where $B$, $Z$ and $(tildeX,tildeY)$ are independent and $BsimtextBern(varepsilon')$.



          Note that
          beginalign
          H(X|B) leq H(X) leq H(B) + H(X|B) ;.
          endalign
          For $H(X|B)$ we can write
          beginalign
          H(X|B) &= varepsilon' H(X|B=1) + (1-varepsilon') H(X|B=0) \
          &= varepsilon' H(tildeX) + (1-varepsilon') H(Z) ;.
          endalign
          Thus,
          beginalign
          varepsilon' H(tildeX) + (1-varepsilon') H(Z) &leq H(X)
          leq H(B) + varepsilon' H(tildeX) + (1-varepsilon') H(Z) ;,
          tag$clubsuit$
          endalign
          and similarly,
          beginalign
          varepsilon' H(tildeY) + (1-varepsilon') H(Z) &leq H(Y)
          leq H(B) + varepsilon' H(tildeY) + (1-varepsilon') H(Z) ;.
          tag$spadesuit$
          endalign



          Combining ($clubsuit$) and ($spadesuit$) we get
          beginalign
          |H(X)-H(Y)| &leq
          H(B) + varepsilon' |H(tildeX) - H(tildeY)| \
          &leq H(varepsilon') + varepsilon' log N \
          &leq H(varepsilon) + varepsilon log N ;,
          endalign
          as claimed.
          QED




          Edit [2018--09--17] (following Iosif Pinelis's comment).

          Refining the above reasoning a little bit, we can get the better bound
          beginalign
          |H(P)-H(Q)|leq H(varepsilon) + varepsilonlog(N-1) ;.
          endalign



          Indeed, let $Sigma$ denote the $N$-element set that is the support of $P$ and $Q$. As before, let $varepsilon':=|P-Q|$, and let us discard the trivial cases $varepsilon'in0,1$, so that $0<varepsilon'<1$.



          Recalling from the construction of an optimal coupling, define for $ainSigma$,
          beginalign
          R_0(a) &:= P(a)land Q(a) &
          P_0(a) &:= P(a)-R_0(a) \
          & &
          Q_0(a) &:= Q(a)-R_0(a) ;.
          endalign
          Observe that $R_0$, $P_0$ and $Q_0$ are non-negative functions and
          beginalign
          sum_ainSigmaR_0(a)=1-varepsilon' qquadtextandqquad
          sum_ainSigmaP_0(a)=sum_ainSigmaQ_0(a)=varepsilon' ;.
          endalign
          Thus, $tildeR:=R_0/(1-varepsilon')$, $tildeP:=P_0/varepsilon'$ and $tildeQ:=Q_0/varepsilon'$ are probability distributions on $Sigma$ satisfying
          beginalign
          P(a) &= (1-varepsilon')tildeR(a) + varepsilon'tildeP \
          Q(a) &= (1-varepsilon')tildeR(a) + varepsilon'tildeQ ;.
          endalign
          If we now choose $ZsimtildeR$, $tildeXsimtildeP$, $tildeYsimtildeQ$ and $BsimtextBern(varepsilon')$ independently, we have a coupling as promised above.



          Back to the inequality, observe that since both $P$ and $Q$ are non-negative and normalized, there necessarily exist $a,binSigma$ such that $P_0(a)=0$ and $Q_0(b)=0$. This means that each of $tildeP$ and $tildeQ$ is in fact supported on a strict subset of $Sigma$. Hence,
          beginalign
          |H(tildeX)-H(tildeY)| &leq maxH(tildeP),H(tildeQ)
          leq log(N-1)
          endalign
          and the (updated) claim follows as before.



          Note.
          As the example H A Helfgott gave in the comments ($requireencloseenclosehorizontalstrikeN=0$, $requireencloseenclosehorizontalstrikeXsimtextBern(varepsilon)$ and $requireencloseenclosehorizontalstrikeYsimtextBern(0)$) shows, this refined bound is sharp at least when $requireencloseenclosehorizontalstrikeN=2$.
          Iosif Pinelis gave the following example in his comment below which shows that the refined bound is sharp for every $N$ and $varepsilon$:



          Let $Sigma:=1,2,ldots,N$ and
          beginalign
          P(a) &:=
          begincases
          1 & textif $a=1$,\
          0 & textotherwise,
          endcases &
          Q(a) &:=
          begincases
          1-varepsilon & textif $a=1$,\
          varepsilon/(N-1) & textotherwise.
          endcases
          endalign
          Then, $|P-Q|=varepsilon$ and $|H(P)-H(Q)|=H(Q)=H(varepsilon)+varepsilonlog(N-1)$.






          share|cite|improve this answer






















          • Thanks! This is in general shaper than the above solution. I wonder whether a completely elementary proof (such as the one above) is useful?
            – H A Helfgott
            Sep 16 at 22:37






          • 1




            (Well, for $Nleq 2$, $|H(X)-H(Y)|leq epsilon log(N/epsilon)$would be false , as the example of $Xsim textBern(epsilon)$, $Ysim textBern(0)$ shows. Stilldots)
            – H A Helfgott
            Sep 17 at 14:13






          • 1




            @Algernon : I think the exact bound is $H(varepsilon) + varepsilon ln(N-1)$. Can your reasoning be modified to get that?
            – Iosif Pinelis
            Sep 17 at 14:39






          • 1




            @Algernon : The bound $H(varepsilon) + varepsilonln(N-1)$ is actually optimal for all $N$ and $varepsilon$: Take $p_1=1$, $q_1=1-varepsilon$, $q_2=cdots=q_N=varepsilon/(N-1)$.
            – Iosif Pinelis
            Sep 17 at 17:34






          • 2




            Very nice answer, and I agree with Iosif that the example is tight for all $N$. I calculate $H(Y) = H(B) + H(Y|B)$, where $B$ is the indicator for $Y=1$. We have $H(B) = H(epsilon)$ and $H(Y|B) = (1-epsilon)(0) + epsilon H(U)$ where $U$ is uniform on $2,dots,N$. This gives $H(Y) = H(epsilon) + epsilon log(N-1)$.
            – usul
            Sep 17 at 19:22


















          up vote
          5
          down vote













          $newcommanddedelta
          newcommandepepsilon$



          Note that $pln p-p$ is decreasing in $pin[0,1]$, so that $pln p-ple qln q-q$ and hence $pln p-qln qle p-q=|p-q|$ if $0le qle ple1$.



          Next, take any real $cge1$. Note that $g(p):=pln p+cp$ (with $g(0):=0$) is convex in $pin[0,1]$. So, assuming $0le ple qle1$ and $qge e^-c$, we have
          $g(p)le g(0)vee g(q)=0vee g(q)=g(q)$ and hence
          $pln p-qln qle c(q-p)=c|p-q|$.
          Thus,
          beginequation
          pln p-qln qle c|p-q|
          endequation
          whenever $0le p,qle1$ and $qge e^-c$.



          Also, $-qln q$ is increasing in $qin[0,e^-1]$ and hence in $qin[0,e^-c]$, so that $-qln qle ce^-c$ for $qin[0,e^-c]$. Also, $pln ple0$ if $0le ple1$.



          Therefore, the difference between the entropies of $Q=(q_i)_i=1^N$ and $P=(p_i)_i=1^N$ is
          beginequation
          sum_1^N p_iln p_i-sum_1^N q_iln q_i=S_1+S_2+S_3,
          endequation
          where
          beginalign*
          S_1 &:=sum_icolon q_ige e^-c (p_iln p_i-q_iln q_i)le sum_icolon q_ige e^-cc|p_i-q_i|
          le cde quadtextif dege|P-Q|_1, \
          S_2 &:= sum_icolon q_i< e^-cp_iln p_ile0, \
          S_3 &:= sum_icolon q_i< e^-c(-q_iln q_i)lesum_icolon q_i< e^-cce^-cle Nce^-c.
          endalign*
          So, taking now $c=lnfrac Nde$ and assuming $Nge ede$, we see that
          beginequation
          sum_1^N p_iln p_i-sum_1^N q_iln q_ile
          cde+Nce^-c
          =2delnfrac Nde.
          endequation
          Taking here $de=2ep$ and noting that $Nge1$, we conclude that
          beginequation
          sum_1^N p_iln p_i-sum_1^N q_iln q_ile
          4eplnfracN2ep
          endequation
          if $eple1/(2e)$.






          share|cite|improve this answer





























            up vote
            4
            down vote













            $newcommandDeDelta
            newcommandepepsilon
            newcommandRmathbbR$



            Here is yet another answer providing the exact bound. This answer is perhaps a bit more elementary than the excellent answer given by user Algernon. Another advantage of this approach is that it produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.



            To preserve the history of the question, I have also retained my previous answer, which used different (if somewhat similar) ideas and provided a suboptimal bound.



            Take indeed any convex function $fcolon[0,1]toR$ and consider the difference
            beginequation
            De:=sum_1^N f(p_i)-sum_1^N f(q_i)
            endequation
            between the "generalized" entropies of $Q=(q_i)_i=1^N$ and $P=(p_i)_i=1^N$.
            We want to find the exact upper bound on $De$ subject to the given conditions on $(P,Q)$.
            In what follows, $(P,Q)$ is a point satisfying these conditions.
            Without loss of generality (wlog), for some $kin1,dots,N$ we have $p_ige q_i$ for $ile k$, $p_ile q_i$ for $ige k+1$, and $q_1gecdotsge q_k$, so that
            beginequation
            ep=sum_1^k(p_i-q_i)=sum_k+1^N(q_i-p_i)>0.
            endequation



            Let $p_i^*:=q_i$ for $i=2,dots,k$ and $p_1^*:=q_1+ep[=sum_1^k p_i-sum_2^k q_ile1]$. Then the vector $(p_1^*,dots,p_k^*)$ majorizes (in the Schur sense) the vector $(p_1,dots,p_k)$ and still satisfies the condition $p_i^*ge q_i$ for $ile k$. Also, since $f$ is convex, $sum_1^k f(p_i)$ is Schur convex in $(p_1,dots,p_k)$. So, wlog $(p_1,dots,p_k)=(p_1^*,dots,p_k^*)$.



            In particular, $p_1>q_1$ and $q_mge p_m$ for all $m=2,dots,N$.
            Moreover, wlog $p_m=0$ for any $m=2,dots,N$. Indeed, take any $m=2,dots,N$ with $p_m>0$ and replace $p_1,q_1,p_m,q_m$ respectively by $p_1+t,q_1+t,p_m-t,q_m-t$, where $t:=p_min(0,1-p_1]$; then all the conditions on $P,Q$ will still hold. After this replacement, $De$ will change by the sum of the nonnegative expressions $[f(p_1+t)-f(q_1+t)]-[f(p_1)-f(q_1)]$ and $[f(q_m)-f(p_m)]-[f(q_m-t)-f(p_m-t)]$; this nonnegativity follows by the convexity of $f$. Making such replacements for each $m=2,dots,N$ with $p_m>0$, we will change $De$ by a nonnegative amount, and will also get $p_m=0$ for all $m=2,dots,N$ indeed.



            Thus, wlog
            begingather
            p_1=1=q_1+ep,quad p_i=0 forall i=2,dots,N,quad
            sum_2^N q_i=ep.
            endgather
            Since $sum_2^N f(q_i)$ is Schur convex in the $q_i$'s, wlog
            beginalign
            De&=f(1)-f(q_1)+sum_2^N f(0)-sum_2^N f(q_i) \
            &le f(1)-f(1-ep)+(N-1)f(0)-(N-1)f(tfracepN-1)=:De_f;ep,N.
            endalign
            The bound $De_f;ep,N$ on $De$ is obviously exact, since it is attained when $p_1=1=q_1+ep$ and $q_2=cdots=q_N=tfracepN-1$.



            In the particular case when $f(p)=pln p$ (with $f(0)=0$), the exact bound $De_f;ep,N$ becomes $H(ep)+epln(N-1)$, where $H(ep):=eplnfrac1ep+(1-ep)lnfrac11-ep$.






            share|cite|improve this answer






















            • Nice approach! I only didn't understand when you say $(p^*_k+1,ldots,p^*_N)$ majorizes the vector $(p_k+1,ldots,p_N)$. Both $p$ and $p^*$ are probability vectors and you have already shown that $(p^*_1,ldots,p^*_k)$ majorizes $(p_1,ldots,p_k)$. However, I agree that the entropy of $p^*$ must be no larger than the entropy of $p$, which seems to be what you are using.
              – Algernon
              Sep 18 at 16:52










            • Also, it should be $p^∗_ileq q_i$ for $i>k$.
              – Algernon
              Sep 18 at 16:52






            • 1




              @Algernon : Thank you for your comments. I have now specified that the majorization is understood here in the Schur sense and given a corresponding reference. Also, I have now simplified the proof and no longer use $p_i^*$ for $i>k$.
              – Iosif Pinelis
              Sep 18 at 17:52






            • 1




              It is now shown that the same approach produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.
              – Iosif Pinelis
              Sep 20 at 0:47










            Your Answer




            StackExchange.ifUsing("editor", function ()
            return StackExchange.using("mathjaxEditing", function ()
            StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
            StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\\(","\\)"]]);
            );
            );
            , "mathjax-editing");

            StackExchange.ready(function()
            var channelOptions =
            tags: "".split(" "),
            id: "504"
            ;
            initTagRenderer("".split(" "), "".split(" "), channelOptions);

            StackExchange.using("externalEditor", function()
            // Have to fire editor after snippets, if snippets enabled
            if (StackExchange.settings.snippets.snippetsEnabled)
            StackExchange.using("snippets", function()
            createEditor();
            );

            else
            createEditor();

            );

            function createEditor()
            StackExchange.prepareEditor(
            heartbeatType: 'answer',
            convertImagesToLinks: true,
            noModals: false,
            showLowRepImageUploadWarning: true,
            reputationToPostImages: 10,
            bindNavPrevention: true,
            postfix: "",
            noCode: true, onDemand: true,
            discardSelector: ".discard-answer"
            ,immediatelyShowMarkdownHelp:true
            );



            );













             

            draft saved


            draft discarded


















            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathoverflow.net%2fquestions%2f310689%2fentropy-and-total-variation-distance%23new-answer', 'question_page');

            );

            Post as a guest






























            3 Answers
            3






            active

            oldest

            votes








            3 Answers
            3






            active

            oldest

            votes









            active

            oldest

            votes






            active

            oldest

            votes








            up vote
            8
            down vote













            Claim. If $|P-Q|leqvarepsilonleqfrac12$, then $|H(P)-H(Q)| leq H(varepsilon) + varepsilonlog N$.



            Proof.
            Let $varepsilon':=|P-Q|$.
            Let $(X,Y)$ be an optimal coupling of $P$ and $Q$, so that
            beginalign
            mathbbP(Xneq Y) = |P-Q| ;.
            endalign
            Using a standard construction, we can assume that $X$ and $Y$ have the particular form
            beginalign
            X &:=
            begincases
            Z & textif $B=0$, \
            tildeX & textif $B=1$,
            endcases &
            Y &:=
            begincases
            Z & textif $B=0$, \
            tildeY & textif $B=1$,
            endcases
            endalign
            where $B$, $Z$ and $(tildeX,tildeY)$ are independent and $BsimtextBern(varepsilon')$.



            Note that
            beginalign
            H(X|B) leq H(X) leq H(B) + H(X|B) ;.
            endalign
            For $H(X|B)$ we can write
            beginalign
            H(X|B) &= varepsilon' H(X|B=1) + (1-varepsilon') H(X|B=0) \
            &= varepsilon' H(tildeX) + (1-varepsilon') H(Z) ;.
            endalign
            Thus,
            beginalign
            varepsilon' H(tildeX) + (1-varepsilon') H(Z) &leq H(X)
            leq H(B) + varepsilon' H(tildeX) + (1-varepsilon') H(Z) ;,
            tag$clubsuit$
            endalign
            and similarly,
            beginalign
            varepsilon' H(tildeY) + (1-varepsilon') H(Z) &leq H(Y)
            leq H(B) + varepsilon' H(tildeY) + (1-varepsilon') H(Z) ;.
            tag$spadesuit$
            endalign



            Combining ($clubsuit$) and ($spadesuit$) we get
            beginalign
            |H(X)-H(Y)| &leq
            H(B) + varepsilon' |H(tildeX) - H(tildeY)| \
            &leq H(varepsilon') + varepsilon' log N \
            &leq H(varepsilon) + varepsilon log N ;,
            endalign
            as claimed.
            QED




            Edit [2018--09--17] (following Iosif Pinelis's comment).

            Refining the above reasoning a little bit, we can get the better bound
            beginalign
            |H(P)-H(Q)|leq H(varepsilon) + varepsilonlog(N-1) ;.
            endalign



            Indeed, let $Sigma$ denote the $N$-element set that is the support of $P$ and $Q$. As before, let $varepsilon':=|P-Q|$, and let us discard the trivial cases $varepsilon'in0,1$, so that $0<varepsilon'<1$.



            Recalling from the construction of an optimal coupling, define for $ainSigma$,
            beginalign
            R_0(a) &:= P(a)land Q(a) &
            P_0(a) &:= P(a)-R_0(a) \
            & &
            Q_0(a) &:= Q(a)-R_0(a) ;.
            endalign
            Observe that $R_0$, $P_0$ and $Q_0$ are non-negative functions and
            beginalign
            sum_ainSigmaR_0(a)=1-varepsilon' qquadtextandqquad
            sum_ainSigmaP_0(a)=sum_ainSigmaQ_0(a)=varepsilon' ;.
            endalign
            Thus, $tildeR:=R_0/(1-varepsilon')$, $tildeP:=P_0/varepsilon'$ and $tildeQ:=Q_0/varepsilon'$ are probability distributions on $Sigma$ satisfying
            beginalign
            P(a) &= (1-varepsilon')tildeR(a) + varepsilon'tildeP \
            Q(a) &= (1-varepsilon')tildeR(a) + varepsilon'tildeQ ;.
            endalign
            If we now choose $ZsimtildeR$, $tildeXsimtildeP$, $tildeYsimtildeQ$ and $BsimtextBern(varepsilon')$ independently, we have a coupling as promised above.



            Back to the inequality, observe that since both $P$ and $Q$ are non-negative and normalized, there necessarily exist $a,binSigma$ such that $P_0(a)=0$ and $Q_0(b)=0$. This means that each of $tildeP$ and $tildeQ$ is in fact supported on a strict subset of $Sigma$. Hence,
            beginalign
            |H(tildeX)-H(tildeY)| &leq maxH(tildeP),H(tildeQ)
            leq log(N-1)
            endalign
            and the (updated) claim follows as before.



            Note.
            As the example H A Helfgott gave in the comments ($requireencloseenclosehorizontalstrikeN=0$, $requireencloseenclosehorizontalstrikeXsimtextBern(varepsilon)$ and $requireencloseenclosehorizontalstrikeYsimtextBern(0)$) shows, this refined bound is sharp at least when $requireencloseenclosehorizontalstrikeN=2$.
            Iosif Pinelis gave the following example in his comment below which shows that the refined bound is sharp for every $N$ and $varepsilon$:



            Let $Sigma:=1,2,ldots,N$ and
            beginalign
            P(a) &:=
            begincases
            1 & textif $a=1$,\
            0 & textotherwise,
            endcases &
            Q(a) &:=
            begincases
            1-varepsilon & textif $a=1$,\
            varepsilon/(N-1) & textotherwise.
            endcases
            endalign
            Then, $|P-Q|=varepsilon$ and $|H(P)-H(Q)|=H(Q)=H(varepsilon)+varepsilonlog(N-1)$.






            share|cite|improve this answer






















            • Thanks! This is in general shaper than the above solution. I wonder whether a completely elementary proof (such as the one above) is useful?
              – H A Helfgott
              Sep 16 at 22:37






            • 1




              (Well, for $Nleq 2$, $|H(X)-H(Y)|leq epsilon log(N/epsilon)$would be false , as the example of $Xsim textBern(epsilon)$, $Ysim textBern(0)$ shows. Stilldots)
              – H A Helfgott
              Sep 17 at 14:13






            • 1




              @Algernon : I think the exact bound is $H(varepsilon) + varepsilon ln(N-1)$. Can your reasoning be modified to get that?
              – Iosif Pinelis
              Sep 17 at 14:39






            • 1




              @Algernon : The bound $H(varepsilon) + varepsilonln(N-1)$ is actually optimal for all $N$ and $varepsilon$: Take $p_1=1$, $q_1=1-varepsilon$, $q_2=cdots=q_N=varepsilon/(N-1)$.
              – Iosif Pinelis
              Sep 17 at 17:34






            • 2




              Very nice answer, and I agree with Iosif that the example is tight for all $N$. I calculate $H(Y) = H(B) + H(Y|B)$, where $B$ is the indicator for $Y=1$. We have $H(B) = H(epsilon)$ and $H(Y|B) = (1-epsilon)(0) + epsilon H(U)$ where $U$ is uniform on $2,dots,N$. This gives $H(Y) = H(epsilon) + epsilon log(N-1)$.
              – usul
              Sep 17 at 19:22















            up vote
            8
            down vote













            Claim. If $|P-Q|leqvarepsilonleqfrac12$, then $|H(P)-H(Q)| leq H(varepsilon) + varepsilonlog N$.



            Proof.
            Let $varepsilon':=|P-Q|$.
            Let $(X,Y)$ be an optimal coupling of $P$ and $Q$, so that
            beginalign
            mathbbP(Xneq Y) = |P-Q| ;.
            endalign
            Using a standard construction, we can assume that $X$ and $Y$ have the particular form
            beginalign
            X &:=
            begincases
            Z & textif $B=0$, \
            tildeX & textif $B=1$,
            endcases &
            Y &:=
            begincases
            Z & textif $B=0$, \
            tildeY & textif $B=1$,
            endcases
            endalign
            where $B$, $Z$ and $(tildeX,tildeY)$ are independent and $BsimtextBern(varepsilon')$.



            Note that
            beginalign
            H(X|B) leq H(X) leq H(B) + H(X|B) ;.
            endalign
            For $H(X|B)$ we can write
            beginalign
            H(X|B) &= varepsilon' H(X|B=1) + (1-varepsilon') H(X|B=0) \
            &= varepsilon' H(tildeX) + (1-varepsilon') H(Z) ;.
            endalign
            Thus,
            beginalign
            varepsilon' H(tildeX) + (1-varepsilon') H(Z) &leq H(X)
            leq H(B) + varepsilon' H(tildeX) + (1-varepsilon') H(Z) ;,
            tag$clubsuit$
            endalign
            and similarly,
            beginalign
            varepsilon' H(tildeY) + (1-varepsilon') H(Z) &leq H(Y)
            leq H(B) + varepsilon' H(tildeY) + (1-varepsilon') H(Z) ;.
            tag$spadesuit$
            endalign



            Combining ($clubsuit$) and ($spadesuit$) we get
            beginalign
            |H(X)-H(Y)| &leq
            H(B) + varepsilon' |H(tildeX) - H(tildeY)| \
            &leq H(varepsilon') + varepsilon' log N \
            &leq H(varepsilon) + varepsilon log N ;,
            endalign
            as claimed.
            QED




            Edit [2018--09--17] (following Iosif Pinelis's comment).

            Refining the above reasoning a little bit, we can get the better bound
            beginalign
            |H(P)-H(Q)|leq H(varepsilon) + varepsilonlog(N-1) ;.
            endalign



            Indeed, let $Sigma$ denote the $N$-element set that is the support of $P$ and $Q$. As before, let $varepsilon':=|P-Q|$, and let us discard the trivial cases $varepsilon'in0,1$, so that $0<varepsilon'<1$.



            Recalling from the construction of an optimal coupling, define for $ainSigma$,
            beginalign
            R_0(a) &:= P(a)land Q(a) &
            P_0(a) &:= P(a)-R_0(a) \
            & &
            Q_0(a) &:= Q(a)-R_0(a) ;.
            endalign
            Observe that $R_0$, $P_0$ and $Q_0$ are non-negative functions and
            beginalign
            sum_ainSigmaR_0(a)=1-varepsilon' qquadtextandqquad
            sum_ainSigmaP_0(a)=sum_ainSigmaQ_0(a)=varepsilon' ;.
            endalign
            Thus, $tildeR:=R_0/(1-varepsilon')$, $tildeP:=P_0/varepsilon'$ and $tildeQ:=Q_0/varepsilon'$ are probability distributions on $Sigma$ satisfying
            beginalign
            P(a) &= (1-varepsilon')tildeR(a) + varepsilon'tildeP \
            Q(a) &= (1-varepsilon')tildeR(a) + varepsilon'tildeQ ;.
            endalign
            If we now choose $ZsimtildeR$, $tildeXsimtildeP$, $tildeYsimtildeQ$ and $BsimtextBern(varepsilon')$ independently, we have a coupling as promised above.



            Back to the inequality, observe that since both $P$ and $Q$ are non-negative and normalized, there necessarily exist $a,binSigma$ such that $P_0(a)=0$ and $Q_0(b)=0$. This means that each of $tildeP$ and $tildeQ$ is in fact supported on a strict subset of $Sigma$. Hence,
            beginalign
            |H(tildeX)-H(tildeY)| &leq maxH(tildeP),H(tildeQ)
            leq log(N-1)
            endalign
            and the (updated) claim follows as before.



            Note.
            As the example H A Helfgott gave in the comments ($requireencloseenclosehorizontalstrikeN=0$, $requireencloseenclosehorizontalstrikeXsimtextBern(varepsilon)$ and $requireencloseenclosehorizontalstrikeYsimtextBern(0)$) shows, this refined bound is sharp at least when $requireencloseenclosehorizontalstrikeN=2$.
            Iosif Pinelis gave the following example in his comment below which shows that the refined bound is sharp for every $N$ and $varepsilon$:



            Let $Sigma:=1,2,ldots,N$ and
            beginalign
            P(a) &:=
            begincases
            1 & textif $a=1$,\
            0 & textotherwise,
            endcases &
            Q(a) &:=
            begincases
            1-varepsilon & textif $a=1$,\
            varepsilon/(N-1) & textotherwise.
            endcases
            endalign
            Then, $|P-Q|=varepsilon$ and $|H(P)-H(Q)|=H(Q)=H(varepsilon)+varepsilonlog(N-1)$.






            share|cite|improve this answer






















            • Thanks! This is in general shaper than the above solution. I wonder whether a completely elementary proof (such as the one above) is useful?
              – H A Helfgott
              Sep 16 at 22:37






            • 1




              (Well, for $Nleq 2$, $|H(X)-H(Y)|leq epsilon log(N/epsilon)$would be false , as the example of $Xsim textBern(epsilon)$, $Ysim textBern(0)$ shows. Stilldots)
              – H A Helfgott
              Sep 17 at 14:13






            • 1




              @Algernon : I think the exact bound is $H(varepsilon) + varepsilon ln(N-1)$. Can your reasoning be modified to get that?
              – Iosif Pinelis
              Sep 17 at 14:39






            • 1




              @Algernon : The bound $H(varepsilon) + varepsilonln(N-1)$ is actually optimal for all $N$ and $varepsilon$: Take $p_1=1$, $q_1=1-varepsilon$, $q_2=cdots=q_N=varepsilon/(N-1)$.
              – Iosif Pinelis
              Sep 17 at 17:34






            • 2




              Very nice answer, and I agree with Iosif that the example is tight for all $N$. I calculate $H(Y) = H(B) + H(Y|B)$, where $B$ is the indicator for $Y=1$. We have $H(B) = H(epsilon)$ and $H(Y|B) = (1-epsilon)(0) + epsilon H(U)$ where $U$ is uniform on $2,dots,N$. This gives $H(Y) = H(epsilon) + epsilon log(N-1)$.
              – usul
              Sep 17 at 19:22













            up vote
            8
            down vote










            up vote
            8
            down vote









            Claim. If $|P-Q|leqvarepsilonleqfrac12$, then $|H(P)-H(Q)| leq H(varepsilon) + varepsilonlog N$.



            Proof.
            Let $varepsilon':=|P-Q|$.
            Let $(X,Y)$ be an optimal coupling of $P$ and $Q$, so that
            beginalign
            mathbbP(Xneq Y) = |P-Q| ;.
            endalign
            Using a standard construction, we can assume that $X$ and $Y$ have the particular form
            beginalign
            X &:=
            begincases
            Z & textif $B=0$, \
            tildeX & textif $B=1$,
            endcases &
            Y &:=
            begincases
            Z & textif $B=0$, \
            tildeY & textif $B=1$,
            endcases
            endalign
            where $B$, $Z$ and $(tildeX,tildeY)$ are independent and $BsimtextBern(varepsilon')$.



            Note that
            beginalign
            H(X|B) leq H(X) leq H(B) + H(X|B) ;.
            endalign
            For $H(X|B)$ we can write
            beginalign
            H(X|B) &= varepsilon' H(X|B=1) + (1-varepsilon') H(X|B=0) \
            &= varepsilon' H(tildeX) + (1-varepsilon') H(Z) ;.
            endalign
            Thus,
            beginalign
            varepsilon' H(tildeX) + (1-varepsilon') H(Z) &leq H(X)
            leq H(B) + varepsilon' H(tildeX) + (1-varepsilon') H(Z) ;,
            tag$clubsuit$
            endalign
            and similarly,
            beginalign
            varepsilon' H(tildeY) + (1-varepsilon') H(Z) &leq H(Y)
            leq H(B) + varepsilon' H(tildeY) + (1-varepsilon') H(Z) ;.
            tag$spadesuit$
            endalign



            Combining ($clubsuit$) and ($spadesuit$) we get
            beginalign
            |H(X)-H(Y)| &leq
            H(B) + varepsilon' |H(tildeX) - H(tildeY)| \
            &leq H(varepsilon') + varepsilon' log N \
            &leq H(varepsilon) + varepsilon log N ;,
            endalign
            as claimed.
            QED




            Edit [2018--09--17] (following Iosif Pinelis's comment).

            Refining the above reasoning a little bit, we can get the better bound
            beginalign
            |H(P)-H(Q)|leq H(varepsilon) + varepsilonlog(N-1) ;.
            endalign



            Indeed, let $Sigma$ denote the $N$-element set that is the support of $P$ and $Q$. As before, let $varepsilon':=|P-Q|$, and let us discard the trivial cases $varepsilon'in0,1$, so that $0<varepsilon'<1$.



            Recalling from the construction of an optimal coupling, define for $ainSigma$,
            beginalign
            R_0(a) &:= P(a)land Q(a) &
            P_0(a) &:= P(a)-R_0(a) \
            & &
            Q_0(a) &:= Q(a)-R_0(a) ;.
            endalign
            Observe that $R_0$, $P_0$ and $Q_0$ are non-negative functions and
            beginalign
            sum_ainSigmaR_0(a)=1-varepsilon' qquadtextandqquad
            sum_ainSigmaP_0(a)=sum_ainSigmaQ_0(a)=varepsilon' ;.
            endalign
            Thus, $tildeR:=R_0/(1-varepsilon')$, $tildeP:=P_0/varepsilon'$ and $tildeQ:=Q_0/varepsilon'$ are probability distributions on $Sigma$ satisfying
            beginalign
            P(a) &= (1-varepsilon')tildeR(a) + varepsilon'tildeP \
            Q(a) &= (1-varepsilon')tildeR(a) + varepsilon'tildeQ ;.
            endalign
            If we now choose $ZsimtildeR$, $tildeXsimtildeP$, $tildeYsimtildeQ$ and $BsimtextBern(varepsilon')$ independently, we have a coupling as promised above.



            Back to the inequality, observe that since both $P$ and $Q$ are non-negative and normalized, there necessarily exist $a,binSigma$ such that $P_0(a)=0$ and $Q_0(b)=0$. This means that each of $tildeP$ and $tildeQ$ is in fact supported on a strict subset of $Sigma$. Hence,
            beginalign
            |H(tildeX)-H(tildeY)| &leq maxH(tildeP),H(tildeQ)
            leq log(N-1)
            endalign
            and the (updated) claim follows as before.



            Note.
            As the example H A Helfgott gave in the comments ($requireencloseenclosehorizontalstrikeN=0$, $requireencloseenclosehorizontalstrikeXsimtextBern(varepsilon)$ and $requireencloseenclosehorizontalstrikeYsimtextBern(0)$) shows, this refined bound is sharp at least when $requireencloseenclosehorizontalstrikeN=2$.
            Iosif Pinelis gave the following example in his comment below which shows that the refined bound is sharp for every $N$ and $varepsilon$:



            Let $Sigma:=1,2,ldots,N$ and
            beginalign
            P(a) &:=
            begincases
            1 & textif $a=1$,\
            0 & textotherwise,
            endcases &
            Q(a) &:=
            begincases
            1-varepsilon & textif $a=1$,\
            varepsilon/(N-1) & textotherwise.
            endcases
            endalign
            Then, $|P-Q|=varepsilon$ and $|H(P)-H(Q)|=H(Q)=H(varepsilon)+varepsilonlog(N-1)$.






            share|cite|improve this answer














            Claim. If $|P-Q|leqvarepsilonleqfrac12$, then $|H(P)-H(Q)| leq H(varepsilon) + varepsilonlog N$.



            Proof.
            Let $varepsilon':=|P-Q|$.
            Let $(X,Y)$ be an optimal coupling of $P$ and $Q$, so that
            beginalign
            mathbbP(Xneq Y) = |P-Q| ;.
            endalign
            Using a standard construction, we can assume that $X$ and $Y$ have the particular form
            beginalign
            X &:=
            begincases
            Z & textif $B=0$, \
            tildeX & textif $B=1$,
            endcases &
            Y &:=
            begincases
            Z & textif $B=0$, \
            tildeY & textif $B=1$,
            endcases
            endalign
            where $B$, $Z$ and $(tildeX,tildeY)$ are independent and $BsimtextBern(varepsilon')$.



            Note that
            beginalign
            H(X|B) leq H(X) leq H(B) + H(X|B) ;.
            endalign
            For $H(X|B)$ we can write
            beginalign
            H(X|B) &= varepsilon' H(X|B=1) + (1-varepsilon') H(X|B=0) \
            &= varepsilon' H(tildeX) + (1-varepsilon') H(Z) ;.
            endalign
            Thus,
            beginalign
            varepsilon' H(tildeX) + (1-varepsilon') H(Z) &leq H(X)
            leq H(B) + varepsilon' H(tildeX) + (1-varepsilon') H(Z) ;,
            tag$clubsuit$
            endalign
            and similarly,
            beginalign
            varepsilon' H(tildeY) + (1-varepsilon') H(Z) &leq H(Y)
            leq H(B) + varepsilon' H(tildeY) + (1-varepsilon') H(Z) ;.
            tag$spadesuit$
            endalign



            Combining ($clubsuit$) and ($spadesuit$) we get
            beginalign
            |H(X)-H(Y)| &leq
            H(B) + varepsilon' |H(tildeX) - H(tildeY)| \
            &leq H(varepsilon') + varepsilon' log N \
            &leq H(varepsilon) + varepsilon log N ;,
            endalign
            as claimed.
            QED




            Edit [2018--09--17] (following Iosif Pinelis's comment).

            Refining the above reasoning a little bit, we can get the better bound
            beginalign
            |H(P)-H(Q)|leq H(varepsilon) + varepsilonlog(N-1) ;.
            endalign



            Indeed, let $Sigma$ denote the $N$-element set that is the support of $P$ and $Q$. As before, let $varepsilon':=|P-Q|$, and let us discard the trivial cases $varepsilon'in0,1$, so that $0<varepsilon'<1$.



            Recalling from the construction of an optimal coupling, define for $ainSigma$,
            beginalign
            R_0(a) &:= P(a)land Q(a) &
            P_0(a) &:= P(a)-R_0(a) \
            & &
            Q_0(a) &:= Q(a)-R_0(a) ;.
            endalign
            Observe that $R_0$, $P_0$ and $Q_0$ are non-negative functions and
            beginalign
            sum_ainSigmaR_0(a)=1-varepsilon' qquadtextandqquad
            sum_ainSigmaP_0(a)=sum_ainSigmaQ_0(a)=varepsilon' ;.
            endalign
            Thus, $tildeR:=R_0/(1-varepsilon')$, $tildeP:=P_0/varepsilon'$ and $tildeQ:=Q_0/varepsilon'$ are probability distributions on $Sigma$ satisfying
            beginalign
            P(a) &= (1-varepsilon')tildeR(a) + varepsilon'tildeP \
            Q(a) &= (1-varepsilon')tildeR(a) + varepsilon'tildeQ ;.
            endalign
            If we now choose $ZsimtildeR$, $tildeXsimtildeP$, $tildeYsimtildeQ$ and $BsimtextBern(varepsilon')$ independently, we have a coupling as promised above.



            Back to the inequality, observe that since both $P$ and $Q$ are non-negative and normalized, there necessarily exist $a,binSigma$ such that $P_0(a)=0$ and $Q_0(b)=0$. This means that each of $tildeP$ and $tildeQ$ is in fact supported on a strict subset of $Sigma$. Hence,
            beginalign
            |H(tildeX)-H(tildeY)| &leq maxH(tildeP),H(tildeQ)
            leq log(N-1)
            endalign
            and the (updated) claim follows as before.



            Note.
            As the example H A Helfgott gave in the comments ($requireencloseenclosehorizontalstrikeN=0$, $requireencloseenclosehorizontalstrikeXsimtextBern(varepsilon)$ and $requireencloseenclosehorizontalstrikeYsimtextBern(0)$) shows, this refined bound is sharp at least when $requireencloseenclosehorizontalstrikeN=2$.
            Iosif Pinelis gave the following example in his comment below which shows that the refined bound is sharp for every $N$ and $varepsilon$:



            Let $Sigma:=1,2,ldots,N$ and
            beginalign
            P(a) &:=
            begincases
            1 & textif $a=1$,\
            0 & textotherwise,
            endcases &
            Q(a) &:=
            begincases
            1-varepsilon & textif $a=1$,\
            varepsilon/(N-1) & textotherwise.
            endcases
            endalign
            Then, $|P-Q|=varepsilon$ and $|H(P)-H(Q)|=H(Q)=H(varepsilon)+varepsilonlog(N-1)$.







            share|cite|improve this answer














            share|cite|improve this answer



            share|cite|improve this answer








            edited Sep 17 at 20:25

























            answered Sep 16 at 16:41









            Algernon

            9491712




            9491712











            • Thanks! This is in general shaper than the above solution. I wonder whether a completely elementary proof (such as the one above) is useful?
              – H A Helfgott
              Sep 16 at 22:37






            • 1




              (Well, for $Nleq 2$, $|H(X)-H(Y)|leq epsilon log(N/epsilon)$would be false , as the example of $Xsim textBern(epsilon)$, $Ysim textBern(0)$ shows. Stilldots)
              – H A Helfgott
              Sep 17 at 14:13






            • 1




              @Algernon : I think the exact bound is $H(varepsilon) + varepsilon ln(N-1)$. Can your reasoning be modified to get that?
              – Iosif Pinelis
              Sep 17 at 14:39






            • 1




              @Algernon : The bound $H(varepsilon) + varepsilonln(N-1)$ is actually optimal for all $N$ and $varepsilon$: Take $p_1=1$, $q_1=1-varepsilon$, $q_2=cdots=q_N=varepsilon/(N-1)$.
              – Iosif Pinelis
              Sep 17 at 17:34






            • 2




              Very nice answer, and I agree with Iosif that the example is tight for all $N$. I calculate $H(Y) = H(B) + H(Y|B)$, where $B$ is the indicator for $Y=1$. We have $H(B) = H(epsilon)$ and $H(Y|B) = (1-epsilon)(0) + epsilon H(U)$ where $U$ is uniform on $2,dots,N$. This gives $H(Y) = H(epsilon) + epsilon log(N-1)$.
              – usul
              Sep 17 at 19:22

















            • Thanks! This is in general shaper than the above solution. I wonder whether a completely elementary proof (such as the one above) is useful?
              – H A Helfgott
              Sep 16 at 22:37






            • 1




              (Well, for $Nleq 2$, $|H(X)-H(Y)|leq epsilon log(N/epsilon)$would be false , as the example of $Xsim textBern(epsilon)$, $Ysim textBern(0)$ shows. Stilldots)
              – H A Helfgott
              Sep 17 at 14:13






            • 1




              @Algernon : I think the exact bound is $H(varepsilon) + varepsilon ln(N-1)$. Can your reasoning be modified to get that?
              – Iosif Pinelis
              Sep 17 at 14:39






            • 1




              @Algernon : The bound $H(varepsilon) + varepsilonln(N-1)$ is actually optimal for all $N$ and $varepsilon$: Take $p_1=1$, $q_1=1-varepsilon$, $q_2=cdots=q_N=varepsilon/(N-1)$.
              – Iosif Pinelis
              Sep 17 at 17:34






            • 2




              Very nice answer, and I agree with Iosif that the example is tight for all $N$. I calculate $H(Y) = H(B) + H(Y|B)$, where $B$ is the indicator for $Y=1$. We have $H(B) = H(epsilon)$ and $H(Y|B) = (1-epsilon)(0) + epsilon H(U)$ where $U$ is uniform on $2,dots,N$. This gives $H(Y) = H(epsilon) + epsilon log(N-1)$.
              – usul
              Sep 17 at 19:22
















            Thanks! This is in general shaper than the above solution. I wonder whether a completely elementary proof (such as the one above) is useful?
            – H A Helfgott
            Sep 16 at 22:37




            Thanks! This is in general shaper than the above solution. I wonder whether a completely elementary proof (such as the one above) is useful?
            – H A Helfgott
            Sep 16 at 22:37




            1




            1




            (Well, for $Nleq 2$, $|H(X)-H(Y)|leq epsilon log(N/epsilon)$would be false , as the example of $Xsim textBern(epsilon)$, $Ysim textBern(0)$ shows. Stilldots)
            – H A Helfgott
            Sep 17 at 14:13




            (Well, for $Nleq 2$, $|H(X)-H(Y)|leq epsilon log(N/epsilon)$would be false , as the example of $Xsim textBern(epsilon)$, $Ysim textBern(0)$ shows. Stilldots)
            – H A Helfgott
            Sep 17 at 14:13




            1




            1




            @Algernon : I think the exact bound is $H(varepsilon) + varepsilon ln(N-1)$. Can your reasoning be modified to get that?
            – Iosif Pinelis
            Sep 17 at 14:39




            @Algernon : I think the exact bound is $H(varepsilon) + varepsilon ln(N-1)$. Can your reasoning be modified to get that?
            – Iosif Pinelis
            Sep 17 at 14:39




            1




            1




            @Algernon : The bound $H(varepsilon) + varepsilonln(N-1)$ is actually optimal for all $N$ and $varepsilon$: Take $p_1=1$, $q_1=1-varepsilon$, $q_2=cdots=q_N=varepsilon/(N-1)$.
            – Iosif Pinelis
            Sep 17 at 17:34




            @Algernon : The bound $H(varepsilon) + varepsilonln(N-1)$ is actually optimal for all $N$ and $varepsilon$: Take $p_1=1$, $q_1=1-varepsilon$, $q_2=cdots=q_N=varepsilon/(N-1)$.
            – Iosif Pinelis
            Sep 17 at 17:34




            2




            2




            Very nice answer, and I agree with Iosif that the example is tight for all $N$. I calculate $H(Y) = H(B) + H(Y|B)$, where $B$ is the indicator for $Y=1$. We have $H(B) = H(epsilon)$ and $H(Y|B) = (1-epsilon)(0) + epsilon H(U)$ where $U$ is uniform on $2,dots,N$. This gives $H(Y) = H(epsilon) + epsilon log(N-1)$.
            – usul
            Sep 17 at 19:22





            Very nice answer, and I agree with Iosif that the example is tight for all $N$. I calculate $H(Y) = H(B) + H(Y|B)$, where $B$ is the indicator for $Y=1$. We have $H(B) = H(epsilon)$ and $H(Y|B) = (1-epsilon)(0) + epsilon H(U)$ where $U$ is uniform on $2,dots,N$. This gives $H(Y) = H(epsilon) + epsilon log(N-1)$.
            – usul
            Sep 17 at 19:22











            up vote
            5
            down vote













            $newcommanddedelta
            newcommandepepsilon$



            Note that $pln p-p$ is decreasing in $pin[0,1]$, so that $pln p-ple qln q-q$ and hence $pln p-qln qle p-q=|p-q|$ if $0le qle ple1$.



            Next, take any real $cge1$. Note that $g(p):=pln p+cp$ (with $g(0):=0$) is convex in $pin[0,1]$. So, assuming $0le ple qle1$ and $qge e^-c$, we have
            $g(p)le g(0)vee g(q)=0vee g(q)=g(q)$ and hence
            $pln p-qln qle c(q-p)=c|p-q|$.
            Thus,
            beginequation
            pln p-qln qle c|p-q|
            endequation
            whenever $0le p,qle1$ and $qge e^-c$.



            Also, $-qln q$ is increasing in $qin[0,e^-1]$ and hence in $qin[0,e^-c]$, so that $-qln qle ce^-c$ for $qin[0,e^-c]$. Also, $pln ple0$ if $0le ple1$.



            Therefore, the difference between the entropies of $Q=(q_i)_i=1^N$ and $P=(p_i)_i=1^N$ is
            beginequation
            sum_1^N p_iln p_i-sum_1^N q_iln q_i=S_1+S_2+S_3,
            endequation
            where
            beginalign*
            S_1 &:=sum_icolon q_ige e^-c (p_iln p_i-q_iln q_i)le sum_icolon q_ige e^-cc|p_i-q_i|
            le cde quadtextif dege|P-Q|_1, \
            S_2 &:= sum_icolon q_i< e^-cp_iln p_ile0, \
            S_3 &:= sum_icolon q_i< e^-c(-q_iln q_i)lesum_icolon q_i< e^-cce^-cle Nce^-c.
            endalign*
            So, taking now $c=lnfrac Nde$ and assuming $Nge ede$, we see that
            beginequation
            sum_1^N p_iln p_i-sum_1^N q_iln q_ile
            cde+Nce^-c
            =2delnfrac Nde.
            endequation
            Taking here $de=2ep$ and noting that $Nge1$, we conclude that
            beginequation
            sum_1^N p_iln p_i-sum_1^N q_iln q_ile
            4eplnfracN2ep
            endequation
            if $eple1/(2e)$.






            share|cite|improve this answer


























              up vote
              5
              down vote













              $newcommanddedelta
              newcommandepepsilon$



              Note that $pln p-p$ is decreasing in $pin[0,1]$, so that $pln p-ple qln q-q$ and hence $pln p-qln qle p-q=|p-q|$ if $0le qle ple1$.



              Next, take any real $cge1$. Note that $g(p):=pln p+cp$ (with $g(0):=0$) is convex in $pin[0,1]$. So, assuming $0le ple qle1$ and $qge e^-c$, we have
              $g(p)le g(0)vee g(q)=0vee g(q)=g(q)$ and hence
              $pln p-qln qle c(q-p)=c|p-q|$.
              Thus,
              beginequation
              pln p-qln qle c|p-q|
              endequation
              whenever $0le p,qle1$ and $qge e^-c$.



              Also, $-qln q$ is increasing in $qin[0,e^-1]$ and hence in $qin[0,e^-c]$, so that $-qln qle ce^-c$ for $qin[0,e^-c]$. Also, $pln ple0$ if $0le ple1$.



              Therefore, the difference between the entropies of $Q=(q_i)_i=1^N$ and $P=(p_i)_i=1^N$ is
              beginequation
              sum_1^N p_iln p_i-sum_1^N q_iln q_i=S_1+S_2+S_3,
              endequation
              where
              beginalign*
              S_1 &:=sum_icolon q_ige e^-c (p_iln p_i-q_iln q_i)le sum_icolon q_ige e^-cc|p_i-q_i|
              le cde quadtextif dege|P-Q|_1, \
              S_2 &:= sum_icolon q_i< e^-cp_iln p_ile0, \
              S_3 &:= sum_icolon q_i< e^-c(-q_iln q_i)lesum_icolon q_i< e^-cce^-cle Nce^-c.
              endalign*
              So, taking now $c=lnfrac Nde$ and assuming $Nge ede$, we see that
              beginequation
              sum_1^N p_iln p_i-sum_1^N q_iln q_ile
              cde+Nce^-c
              =2delnfrac Nde.
              endequation
              Taking here $de=2ep$ and noting that $Nge1$, we conclude that
              beginequation
              sum_1^N p_iln p_i-sum_1^N q_iln q_ile
              4eplnfracN2ep
              endequation
              if $eple1/(2e)$.






              share|cite|improve this answer
























                up vote
                5
                down vote










                up vote
                5
                down vote









                $newcommanddedelta
                newcommandepepsilon$



                Note that $pln p-p$ is decreasing in $pin[0,1]$, so that $pln p-ple qln q-q$ and hence $pln p-qln qle p-q=|p-q|$ if $0le qle ple1$.



                Next, take any real $cge1$. Note that $g(p):=pln p+cp$ (with $g(0):=0$) is convex in $pin[0,1]$. So, assuming $0le ple qle1$ and $qge e^-c$, we have
                $g(p)le g(0)vee g(q)=0vee g(q)=g(q)$ and hence
                $pln p-qln qle c(q-p)=c|p-q|$.
                Thus,
                beginequation
                pln p-qln qle c|p-q|
                endequation
                whenever $0le p,qle1$ and $qge e^-c$.



                Also, $-qln q$ is increasing in $qin[0,e^-1]$ and hence in $qin[0,e^-c]$, so that $-qln qle ce^-c$ for $qin[0,e^-c]$. Also, $pln ple0$ if $0le ple1$.



                Therefore, the difference between the entropies of $Q=(q_i)_i=1^N$ and $P=(p_i)_i=1^N$ is
                beginequation
                sum_1^N p_iln p_i-sum_1^N q_iln q_i=S_1+S_2+S_3,
                endequation
                where
                beginalign*
                S_1 &:=sum_icolon q_ige e^-c (p_iln p_i-q_iln q_i)le sum_icolon q_ige e^-cc|p_i-q_i|
                le cde quadtextif dege|P-Q|_1, \
                S_2 &:= sum_icolon q_i< e^-cp_iln p_ile0, \
                S_3 &:= sum_icolon q_i< e^-c(-q_iln q_i)lesum_icolon q_i< e^-cce^-cle Nce^-c.
                endalign*
                So, taking now $c=lnfrac Nde$ and assuming $Nge ede$, we see that
                beginequation
                sum_1^N p_iln p_i-sum_1^N q_iln q_ile
                cde+Nce^-c
                =2delnfrac Nde.
                endequation
                Taking here $de=2ep$ and noting that $Nge1$, we conclude that
                beginequation
                sum_1^N p_iln p_i-sum_1^N q_iln q_ile
                4eplnfracN2ep
                endequation
                if $eple1/(2e)$.






                share|cite|improve this answer














                $newcommanddedelta
                newcommandepepsilon$



                Note that $pln p-p$ is decreasing in $pin[0,1]$, so that $pln p-ple qln q-q$ and hence $pln p-qln qle p-q=|p-q|$ if $0le qle ple1$.



                Next, take any real $cge1$. Note that $g(p):=pln p+cp$ (with $g(0):=0$) is convex in $pin[0,1]$. So, assuming $0le ple qle1$ and $qge e^-c$, we have
                $g(p)le g(0)vee g(q)=0vee g(q)=g(q)$ and hence
                $pln p-qln qle c(q-p)=c|p-q|$.
                Thus,
                beginequation
                pln p-qln qle c|p-q|
                endequation
                whenever $0le p,qle1$ and $qge e^-c$.



                Also, $-qln q$ is increasing in $qin[0,e^-1]$ and hence in $qin[0,e^-c]$, so that $-qln qle ce^-c$ for $qin[0,e^-c]$. Also, $pln ple0$ if $0le ple1$.



                Therefore, the difference between the entropies of $Q=(q_i)_i=1^N$ and $P=(p_i)_i=1^N$ is
                beginequation
                sum_1^N p_iln p_i-sum_1^N q_iln q_i=S_1+S_2+S_3,
                endequation
                where
                beginalign*
                S_1 &:=sum_icolon q_ige e^-c (p_iln p_i-q_iln q_i)le sum_icolon q_ige e^-cc|p_i-q_i|
                le cde quadtextif dege|P-Q|_1, \
                S_2 &:= sum_icolon q_i< e^-cp_iln p_ile0, \
                S_3 &:= sum_icolon q_i< e^-c(-q_iln q_i)lesum_icolon q_i< e^-cce^-cle Nce^-c.
                endalign*
                So, taking now $c=lnfrac Nde$ and assuming $Nge ede$, we see that
                beginequation
                sum_1^N p_iln p_i-sum_1^N q_iln q_ile
                cde+Nce^-c
                =2delnfrac Nde.
                endequation
                Taking here $de=2ep$ and noting that $Nge1$, we conclude that
                beginequation
                sum_1^N p_iln p_i-sum_1^N q_iln q_ile
                4eplnfracN2ep
                endequation
                if $eple1/(2e)$.







                share|cite|improve this answer














                share|cite|improve this answer



                share|cite|improve this answer








                edited Sep 16 at 18:09

























                answered Sep 16 at 14:37









                Iosif Pinelis

                15.2k12154




                15.2k12154




















                    up vote
                    4
                    down vote













                    $newcommandDeDelta
                    newcommandepepsilon
                    newcommandRmathbbR$



                    Here is yet another answer providing the exact bound. This answer is perhaps a bit more elementary than the excellent answer given by user Algernon. Another advantage of this approach is that it produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.



                    To preserve the history of the question, I have also retained my previous answer, which used different (if somewhat similar) ideas and provided a suboptimal bound.



                    Take indeed any convex function $fcolon[0,1]toR$ and consider the difference
                    beginequation
                    De:=sum_1^N f(p_i)-sum_1^N f(q_i)
                    endequation
                    between the "generalized" entropies of $Q=(q_i)_i=1^N$ and $P=(p_i)_i=1^N$.
                    We want to find the exact upper bound on $De$ subject to the given conditions on $(P,Q)$.
                    In what follows, $(P,Q)$ is a point satisfying these conditions.
                    Without loss of generality (wlog), for some $kin1,dots,N$ we have $p_ige q_i$ for $ile k$, $p_ile q_i$ for $ige k+1$, and $q_1gecdotsge q_k$, so that
                    beginequation
                    ep=sum_1^k(p_i-q_i)=sum_k+1^N(q_i-p_i)>0.
                    endequation



                    Let $p_i^*:=q_i$ for $i=2,dots,k$ and $p_1^*:=q_1+ep[=sum_1^k p_i-sum_2^k q_ile1]$. Then the vector $(p_1^*,dots,p_k^*)$ majorizes (in the Schur sense) the vector $(p_1,dots,p_k)$ and still satisfies the condition $p_i^*ge q_i$ for $ile k$. Also, since $f$ is convex, $sum_1^k f(p_i)$ is Schur convex in $(p_1,dots,p_k)$. So, wlog $(p_1,dots,p_k)=(p_1^*,dots,p_k^*)$.



                    In particular, $p_1>q_1$ and $q_mge p_m$ for all $m=2,dots,N$.
                    Moreover, wlog $p_m=0$ for any $m=2,dots,N$. Indeed, take any $m=2,dots,N$ with $p_m>0$ and replace $p_1,q_1,p_m,q_m$ respectively by $p_1+t,q_1+t,p_m-t,q_m-t$, where $t:=p_min(0,1-p_1]$; then all the conditions on $P,Q$ will still hold. After this replacement, $De$ will change by the sum of the nonnegative expressions $[f(p_1+t)-f(q_1+t)]-[f(p_1)-f(q_1)]$ and $[f(q_m)-f(p_m)]-[f(q_m-t)-f(p_m-t)]$; this nonnegativity follows by the convexity of $f$. Making such replacements for each $m=2,dots,N$ with $p_m>0$, we will change $De$ by a nonnegative amount, and will also get $p_m=0$ for all $m=2,dots,N$ indeed.



                    Thus, wlog
                    begingather
                    p_1=1=q_1+ep,quad p_i=0 forall i=2,dots,N,quad
                    sum_2^N q_i=ep.
                    endgather
                    Since $sum_2^N f(q_i)$ is Schur convex in the $q_i$'s, wlog
                    beginalign
                    De&=f(1)-f(q_1)+sum_2^N f(0)-sum_2^N f(q_i) \
                    &le f(1)-f(1-ep)+(N-1)f(0)-(N-1)f(tfracepN-1)=:De_f;ep,N.
                    endalign
                    The bound $De_f;ep,N$ on $De$ is obviously exact, since it is attained when $p_1=1=q_1+ep$ and $q_2=cdots=q_N=tfracepN-1$.



                    In the particular case when $f(p)=pln p$ (with $f(0)=0$), the exact bound $De_f;ep,N$ becomes $H(ep)+epln(N-1)$, where $H(ep):=eplnfrac1ep+(1-ep)lnfrac11-ep$.






                    share|cite|improve this answer






















                    • Nice approach! I only didn't understand when you say $(p^*_k+1,ldots,p^*_N)$ majorizes the vector $(p_k+1,ldots,p_N)$. Both $p$ and $p^*$ are probability vectors and you have already shown that $(p^*_1,ldots,p^*_k)$ majorizes $(p_1,ldots,p_k)$. However, I agree that the entropy of $p^*$ must be no larger than the entropy of $p$, which seems to be what you are using.
                      – Algernon
                      Sep 18 at 16:52










                    • Also, it should be $p^∗_ileq q_i$ for $i>k$.
                      – Algernon
                      Sep 18 at 16:52






                    • 1




                      @Algernon : Thank you for your comments. I have now specified that the majorization is understood here in the Schur sense and given a corresponding reference. Also, I have now simplified the proof and no longer use $p_i^*$ for $i>k$.
                      – Iosif Pinelis
                      Sep 18 at 17:52






                    • 1




                      It is now shown that the same approach produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.
                      – Iosif Pinelis
                      Sep 20 at 0:47














                    up vote
                    4
                    down vote













                    $newcommandDeDelta
                    newcommandepepsilon
                    newcommandRmathbbR$



                    Here is yet another answer providing the exact bound. This answer is perhaps a bit more elementary than the excellent answer given by user Algernon. Another advantage of this approach is that it produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.



                    To preserve the history of the question, I have also retained my previous answer, which used different (if somewhat similar) ideas and provided a suboptimal bound.



                    Take indeed any convex function $fcolon[0,1]toR$ and consider the difference
                    beginequation
                    De:=sum_1^N f(p_i)-sum_1^N f(q_i)
                    endequation
                    between the "generalized" entropies of $Q=(q_i)_i=1^N$ and $P=(p_i)_i=1^N$.
                    We want to find the exact upper bound on $De$ subject to the given conditions on $(P,Q)$.
                    In what follows, $(P,Q)$ is a point satisfying these conditions.
                    Without loss of generality (wlog), for some $kin1,dots,N$ we have $p_ige q_i$ for $ile k$, $p_ile q_i$ for $ige k+1$, and $q_1gecdotsge q_k$, so that
                    beginequation
                    ep=sum_1^k(p_i-q_i)=sum_k+1^N(q_i-p_i)>0.
                    endequation



                    Let $p_i^*:=q_i$ for $i=2,dots,k$ and $p_1^*:=q_1+ep[=sum_1^k p_i-sum_2^k q_ile1]$. Then the vector $(p_1^*,dots,p_k^*)$ majorizes (in the Schur sense) the vector $(p_1,dots,p_k)$ and still satisfies the condition $p_i^*ge q_i$ for $ile k$. Also, since $f$ is convex, $sum_1^k f(p_i)$ is Schur convex in $(p_1,dots,p_k)$. So, wlog $(p_1,dots,p_k)=(p_1^*,dots,p_k^*)$.



                    In particular, $p_1>q_1$ and $q_mge p_m$ for all $m=2,dots,N$.
                    Moreover, wlog $p_m=0$ for any $m=2,dots,N$. Indeed, take any $m=2,dots,N$ with $p_m>0$ and replace $p_1,q_1,p_m,q_m$ respectively by $p_1+t,q_1+t,p_m-t,q_m-t$, where $t:=p_min(0,1-p_1]$; then all the conditions on $P,Q$ will still hold. After this replacement, $De$ will change by the sum of the nonnegative expressions $[f(p_1+t)-f(q_1+t)]-[f(p_1)-f(q_1)]$ and $[f(q_m)-f(p_m)]-[f(q_m-t)-f(p_m-t)]$; this nonnegativity follows by the convexity of $f$. Making such replacements for each $m=2,dots,N$ with $p_m>0$, we will change $De$ by a nonnegative amount, and will also get $p_m=0$ for all $m=2,dots,N$ indeed.



                    Thus, wlog
                    begingather
                    p_1=1=q_1+ep,quad p_i=0 forall i=2,dots,N,quad
                    sum_2^N q_i=ep.
                    endgather
                    Since $sum_2^N f(q_i)$ is Schur convex in the $q_i$'s, wlog
                    beginalign
                    De&=f(1)-f(q_1)+sum_2^N f(0)-sum_2^N f(q_i) \
                    &le f(1)-f(1-ep)+(N-1)f(0)-(N-1)f(tfracepN-1)=:De_f;ep,N.
                    endalign
                    The bound $De_f;ep,N$ on $De$ is obviously exact, since it is attained when $p_1=1=q_1+ep$ and $q_2=cdots=q_N=tfracepN-1$.



                    In the particular case when $f(p)=pln p$ (with $f(0)=0$), the exact bound $De_f;ep,N$ becomes $H(ep)+epln(N-1)$, where $H(ep):=eplnfrac1ep+(1-ep)lnfrac11-ep$.






                    share|cite|improve this answer






















                    • Nice approach! I only didn't understand when you say $(p^*_k+1,ldots,p^*_N)$ majorizes the vector $(p_k+1,ldots,p_N)$. Both $p$ and $p^*$ are probability vectors and you have already shown that $(p^*_1,ldots,p^*_k)$ majorizes $(p_1,ldots,p_k)$. However, I agree that the entropy of $p^*$ must be no larger than the entropy of $p$, which seems to be what you are using.
                      – Algernon
                      Sep 18 at 16:52










                    • Also, it should be $p^∗_ileq q_i$ for $i>k$.
                      – Algernon
                      Sep 18 at 16:52






                    • 1




                      @Algernon : Thank you for your comments. I have now specified that the majorization is understood here in the Schur sense and given a corresponding reference. Also, I have now simplified the proof and no longer use $p_i^*$ for $i>k$.
                      – Iosif Pinelis
                      Sep 18 at 17:52






                    • 1




                      It is now shown that the same approach produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.
                      – Iosif Pinelis
                      Sep 20 at 0:47












                    up vote
                    4
                    down vote










                    up vote
                    4
                    down vote









                    $newcommandDeDelta
                    newcommandepepsilon
                    newcommandRmathbbR$



                    Here is yet another answer providing the exact bound. This answer is perhaps a bit more elementary than the excellent answer given by user Algernon. Another advantage of this approach is that it produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.



                    To preserve the history of the question, I have also retained my previous answer, which used different (if somewhat similar) ideas and provided a suboptimal bound.



                    Take indeed any convex function $fcolon[0,1]toR$ and consider the difference
                    beginequation
                    De:=sum_1^N f(p_i)-sum_1^N f(q_i)
                    endequation
                    between the "generalized" entropies of $Q=(q_i)_i=1^N$ and $P=(p_i)_i=1^N$.
                    We want to find the exact upper bound on $De$ subject to the given conditions on $(P,Q)$.
                    In what follows, $(P,Q)$ is a point satisfying these conditions.
                    Without loss of generality (wlog), for some $kin1,dots,N$ we have $p_ige q_i$ for $ile k$, $p_ile q_i$ for $ige k+1$, and $q_1gecdotsge q_k$, so that
                    beginequation
                    ep=sum_1^k(p_i-q_i)=sum_k+1^N(q_i-p_i)>0.
                    endequation



                    Let $p_i^*:=q_i$ for $i=2,dots,k$ and $p_1^*:=q_1+ep[=sum_1^k p_i-sum_2^k q_ile1]$. Then the vector $(p_1^*,dots,p_k^*)$ majorizes (in the Schur sense) the vector $(p_1,dots,p_k)$ and still satisfies the condition $p_i^*ge q_i$ for $ile k$. Also, since $f$ is convex, $sum_1^k f(p_i)$ is Schur convex in $(p_1,dots,p_k)$. So, wlog $(p_1,dots,p_k)=(p_1^*,dots,p_k^*)$.



                    In particular, $p_1>q_1$ and $q_mge p_m$ for all $m=2,dots,N$.
                    Moreover, wlog $p_m=0$ for any $m=2,dots,N$. Indeed, take any $m=2,dots,N$ with $p_m>0$ and replace $p_1,q_1,p_m,q_m$ respectively by $p_1+t,q_1+t,p_m-t,q_m-t$, where $t:=p_min(0,1-p_1]$; then all the conditions on $P,Q$ will still hold. After this replacement, $De$ will change by the sum of the nonnegative expressions $[f(p_1+t)-f(q_1+t)]-[f(p_1)-f(q_1)]$ and $[f(q_m)-f(p_m)]-[f(q_m-t)-f(p_m-t)]$; this nonnegativity follows by the convexity of $f$. Making such replacements for each $m=2,dots,N$ with $p_m>0$, we will change $De$ by a nonnegative amount, and will also get $p_m=0$ for all $m=2,dots,N$ indeed.



                    Thus, wlog
                    begingather
                    p_1=1=q_1+ep,quad p_i=0 forall i=2,dots,N,quad
                    sum_2^N q_i=ep.
                    endgather
                    Since $sum_2^N f(q_i)$ is Schur convex in the $q_i$'s, wlog
                    beginalign
                    De&=f(1)-f(q_1)+sum_2^N f(0)-sum_2^N f(q_i) \
                    &le f(1)-f(1-ep)+(N-1)f(0)-(N-1)f(tfracepN-1)=:De_f;ep,N.
                    endalign
                    The bound $De_f;ep,N$ on $De$ is obviously exact, since it is attained when $p_1=1=q_1+ep$ and $q_2=cdots=q_N=tfracepN-1$.



                    In the particular case when $f(p)=pln p$ (with $f(0)=0$), the exact bound $De_f;ep,N$ becomes $H(ep)+epln(N-1)$, where $H(ep):=eplnfrac1ep+(1-ep)lnfrac11-ep$.






                    share|cite|improve this answer














                    $newcommandDeDelta
                    newcommandepepsilon
                    newcommandRmathbbR$



                    Here is yet another answer providing the exact bound. This answer is perhaps a bit more elementary than the excellent answer given by user Algernon. Another advantage of this approach is that it produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.



                    To preserve the history of the question, I have also retained my previous answer, which used different (if somewhat similar) ideas and provided a suboptimal bound.



                    Take indeed any convex function $fcolon[0,1]toR$ and consider the difference
                    beginequation
                    De:=sum_1^N f(p_i)-sum_1^N f(q_i)
                    endequation
                    between the "generalized" entropies of $Q=(q_i)_i=1^N$ and $P=(p_i)_i=1^N$.
                    We want to find the exact upper bound on $De$ subject to the given conditions on $(P,Q)$.
                    In what follows, $(P,Q)$ is a point satisfying these conditions.
                    Without loss of generality (wlog), for some $kin1,dots,N$ we have $p_ige q_i$ for $ile k$, $p_ile q_i$ for $ige k+1$, and $q_1gecdotsge q_k$, so that
                    beginequation
                    ep=sum_1^k(p_i-q_i)=sum_k+1^N(q_i-p_i)>0.
                    endequation



                    Let $p_i^*:=q_i$ for $i=2,dots,k$ and $p_1^*:=q_1+ep[=sum_1^k p_i-sum_2^k q_ile1]$. Then the vector $(p_1^*,dots,p_k^*)$ majorizes (in the Schur sense) the vector $(p_1,dots,p_k)$ and still satisfies the condition $p_i^*ge q_i$ for $ile k$. Also, since $f$ is convex, $sum_1^k f(p_i)$ is Schur convex in $(p_1,dots,p_k)$. So, wlog $(p_1,dots,p_k)=(p_1^*,dots,p_k^*)$.



                    In particular, $p_1>q_1$ and $q_mge p_m$ for all $m=2,dots,N$.
                    Moreover, wlog $p_m=0$ for any $m=2,dots,N$. Indeed, take any $m=2,dots,N$ with $p_m>0$ and replace $p_1,q_1,p_m,q_m$ respectively by $p_1+t,q_1+t,p_m-t,q_m-t$, where $t:=p_min(0,1-p_1]$; then all the conditions on $P,Q$ will still hold. After this replacement, $De$ will change by the sum of the nonnegative expressions $[f(p_1+t)-f(q_1+t)]-[f(p_1)-f(q_1)]$ and $[f(q_m)-f(p_m)]-[f(q_m-t)-f(p_m-t)]$; this nonnegativity follows by the convexity of $f$. Making such replacements for each $m=2,dots,N$ with $p_m>0$, we will change $De$ by a nonnegative amount, and will also get $p_m=0$ for all $m=2,dots,N$ indeed.



                    Thus, wlog
                    begingather
                    p_1=1=q_1+ep,quad p_i=0 forall i=2,dots,N,quad
                    sum_2^N q_i=ep.
                    endgather
                    Since $sum_2^N f(q_i)$ is Schur convex in the $q_i$'s, wlog
                    beginalign
                    De&=f(1)-f(q_1)+sum_2^N f(0)-sum_2^N f(q_i) \
                    &le f(1)-f(1-ep)+(N-1)f(0)-(N-1)f(tfracepN-1)=:De_f;ep,N.
                    endalign
                    The bound $De_f;ep,N$ on $De$ is obviously exact, since it is attained when $p_1=1=q_1+ep$ and $q_2=cdots=q_N=tfracepN-1$.



                    In the particular case when $f(p)=pln p$ (with $f(0)=0$), the exact bound $De_f;ep,N$ becomes $H(ep)+epln(N-1)$, where $H(ep):=eplnfrac1ep+(1-ep)lnfrac11-ep$.







                    share|cite|improve this answer














                    share|cite|improve this answer



                    share|cite|improve this answer








                    edited Sep 20 at 4:58

























                    answered Sep 18 at 6:00









                    Iosif Pinelis

                    15.2k12154




                    15.2k12154











                    • Nice approach! I only didn't understand when you say $(p^*_k+1,ldots,p^*_N)$ majorizes the vector $(p_k+1,ldots,p_N)$. Both $p$ and $p^*$ are probability vectors and you have already shown that $(p^*_1,ldots,p^*_k)$ majorizes $(p_1,ldots,p_k)$. However, I agree that the entropy of $p^*$ must be no larger than the entropy of $p$, which seems to be what you are using.
                      – Algernon
                      Sep 18 at 16:52










                    • Also, it should be $p^∗_ileq q_i$ for $i>k$.
                      – Algernon
                      Sep 18 at 16:52






                    • 1




                      @Algernon : Thank you for your comments. I have now specified that the majorization is understood here in the Schur sense and given a corresponding reference. Also, I have now simplified the proof and no longer use $p_i^*$ for $i>k$.
                      – Iosif Pinelis
                      Sep 18 at 17:52






                    • 1




                      It is now shown that the same approach produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.
                      – Iosif Pinelis
                      Sep 20 at 0:47
















                    • Nice approach! I only didn't understand when you say $(p^*_k+1,ldots,p^*_N)$ majorizes the vector $(p_k+1,ldots,p_N)$. Both $p$ and $p^*$ are probability vectors and you have already shown that $(p^*_1,ldots,p^*_k)$ majorizes $(p_1,ldots,p_k)$. However, I agree that the entropy of $p^*$ must be no larger than the entropy of $p$, which seems to be what you are using.
                      – Algernon
                      Sep 18 at 16:52










                    • Also, it should be $p^∗_ileq q_i$ for $i>k$.
                      – Algernon
                      Sep 18 at 16:52






                    • 1




                      @Algernon : Thank you for your comments. I have now specified that the majorization is understood here in the Schur sense and given a corresponding reference. Also, I have now simplified the proof and no longer use $p_i^*$ for $i>k$.
                      – Iosif Pinelis
                      Sep 18 at 17:52






                    • 1




                      It is now shown that the same approach produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.
                      – Iosif Pinelis
                      Sep 20 at 0:47















                    Nice approach! I only didn't understand when you say $(p^*_k+1,ldots,p^*_N)$ majorizes the vector $(p_k+1,ldots,p_N)$. Both $p$ and $p^*$ are probability vectors and you have already shown that $(p^*_1,ldots,p^*_k)$ majorizes $(p_1,ldots,p_k)$. However, I agree that the entropy of $p^*$ must be no larger than the entropy of $p$, which seems to be what you are using.
                    – Algernon
                    Sep 18 at 16:52




                    Nice approach! I only didn't understand when you say $(p^*_k+1,ldots,p^*_N)$ majorizes the vector $(p_k+1,ldots,p_N)$. Both $p$ and $p^*$ are probability vectors and you have already shown that $(p^*_1,ldots,p^*_k)$ majorizes $(p_1,ldots,p_k)$. However, I agree that the entropy of $p^*$ must be no larger than the entropy of $p$, which seems to be what you are using.
                    – Algernon
                    Sep 18 at 16:52












                    Also, it should be $p^∗_ileq q_i$ for $i>k$.
                    – Algernon
                    Sep 18 at 16:52




                    Also, it should be $p^∗_ileq q_i$ for $i>k$.
                    – Algernon
                    Sep 18 at 16:52




                    1




                    1




                    @Algernon : Thank you for your comments. I have now specified that the majorization is understood here in the Schur sense and given a corresponding reference. Also, I have now simplified the proof and no longer use $p_i^*$ for $i>k$.
                    – Iosif Pinelis
                    Sep 18 at 17:52




                    @Algernon : Thank you for your comments. I have now specified that the majorization is understood here in the Schur sense and given a corresponding reference. Also, I have now simplified the proof and no longer use $p_i^*$ for $i>k$.
                    – Iosif Pinelis
                    Sep 18 at 17:52




                    1




                    1




                    It is now shown that the same approach produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.
                    – Iosif Pinelis
                    Sep 20 at 0:47




                    It is now shown that the same approach produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.
                    – Iosif Pinelis
                    Sep 20 at 0:47

















                     

                    draft saved


                    draft discarded















































                     


                    draft saved


                    draft discarded














                    StackExchange.ready(
                    function ()
                    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathoverflow.net%2fquestions%2f310689%2fentropy-and-total-variation-distance%23new-answer', 'question_page');

                    );

                    Post as a guest













































































                    Popular posts from this blog

                    How to check contact read email or not when send email to Individual?

                    Bahrain

                    Postfix configuration issue with fips on centos 7; mailgun relay