Entropy and total variation distance

up vote
10
down vote

favorite

Let $X$, $Y$ be discrete random variables taking values within the same set of $N$ elements. Let the total variation distance $|P-Q|$ (which is half the $L_1$ distance between the distributions of $P$ and $Q$) be at most $epsilon$. What is a standard, easily provable bound on the difference between the entropies of $X$ and $Y$?

(It is easy to see that such a bound must depend not only on $epsilon$ but also on $N$.)

edited Sep 16 at 14:39

Iosif Pinelis

15.2k12154

asked Sep 16 at 12:41

H A Helfgott

4,2322479

add a commentÂ |Â

up vote
10
down vote

favorite

(It is easy to see that such a bound must depend not only on $epsilon$ but also on $N$.)

edited Sep 16 at 14:39

Iosif Pinelis

15.2k12154

asked Sep 16 at 12:41

H A Helfgott

4,2322479

add a commentÂ |Â

up vote
10
down vote

favorite

(It is easy to see that such a bound must depend not only on $epsilon$ but also on $N$.)

edited Sep 16 at 14:39

Iosif Pinelis

15.2k12154

asked Sep 16 at 12:41

H A Helfgott

4,2322479

(It is easy to see that such a bound must depend not only on $epsilon$ but also on $N$.)

pr.probability probability-distributions inequalities it.information-theory entropy

edited Sep 16 at 14:39

Iosif Pinelis

15.2k12154

asked Sep 16 at 12:41

H A Helfgott

4,2322479

edited Sep 16 at 14:39

Iosif Pinelis

15.2k12154

asked Sep 16 at 12:41

H A Helfgott

4,2322479

edited Sep 16 at 14:39

Iosif Pinelis

15.2k12154

edited Sep 16 at 14:39

Iosif Pinelis

15.2k12154

edited Sep 16 at 14:39

Iosif Pinelis

15.2k12154

asked Sep 16 at 12:41

H A Helfgott

4,2322479

asked Sep 16 at 12:41

H A Helfgott

4,2322479

asked Sep 16 at 12:41

H A Helfgott

4,2322479

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
8
down vote

Claim. If $|P-Q|leqvarepsilonleqfrac12$, then $|H(P)-H(Q)| leq H(varepsilon) + varepsilonlog N$.

Proof.
Let $varepsilon':=|P-Q|$.
Let $(X,Y)$ be an optimal coupling of $P$ and $Q$, so that
beginalign
mathbbP(Xneq Y) = |P-Q| ;.
endalign
Using a standard construction, we can assume that $X$ and $Y$ have the particular form
beginalign
X &:=
begincases
Z & textif $B=0$, \
tildeX & textif $B=1$,
endcases &
Y &:=
begincases
Z & textif $B=0$, \
tildeY & textif $B=1$,
endcases
endalign
where $B$, $Z$ and $(tildeX,tildeY)$ are independent and $BsimtextBern(varepsilon')$.

Note that
beginalign
H(X|B) leq H(X) leq H(B) + H(X|B) ;.
endalign
For $H(X|B)$ we can write
beginalign
H(X|B) &= varepsilon' H(X|B=1) + (1-varepsilon') H(X|B=0) \
&= varepsilon' H(tildeX) + (1-varepsilon') H(Z) ;.
endalign
Thus,
beginalign
varepsilon' H(tildeX) + (1-varepsilon') H(Z) &leq H(X)
leq H(B) + varepsilon' H(tildeX) + (1-varepsilon') H(Z) ;,
tag$clubsuit$
endalign
and similarly,
beginalign
varepsilon' H(tildeY) + (1-varepsilon') H(Z) &leq H(Y)
leq H(B) + varepsilon' H(tildeY) + (1-varepsilon') H(Z) ;.
tag$spadesuit$
endalign

Combining ($clubsuit$) and ($spadesuit$) we get
beginalign
|H(X)-H(Y)| &leq
H(B) + varepsilon' |H(tildeX) - H(tildeY)| \
&leq H(varepsilon') + varepsilon' log N \
&leq H(varepsilon) + varepsilon log N ;,
endalign
as claimed.
QED

Edit [2018--09--17] (following Iosif Pinelis's comment).

Refining the above reasoning a little bit, we can get the better bound
beginalign
|H(P)-H(Q)|leq H(varepsilon) + varepsilonlog(N-1) ;.
endalign

Indeed, let $Sigma$ denote the $N$-element set that is the support of $P$ and $Q$. As before, let $varepsilon':=|P-Q|$, and let us discard the trivial cases $varepsilon'in0,1$, so that $0<varepsilon'<1$.

Recalling from the construction of an optimal coupling, define for $ainSigma$,
beginalign
R_0(a) &:= P(a)land Q(a) &
P_0(a) &:= P(a)-R_0(a) \
& &
Q_0(a) &:= Q(a)-R_0(a) ;.
endalign
Observe that $R_0$, $P_0$ and $Q_0$ are non-negative functions and
beginalign
sum_ainSigmaR_0(a)=1-varepsilon' qquadtextandqquad
sum_ainSigmaP_0(a)=sum_ainSigmaQ_0(a)=varepsilon' ;.
endalign
Thus, $tildeR:=R_0/(1-varepsilon')$, $tildeP:=P_0/varepsilon'$ and $tildeQ:=Q_0/varepsilon'$ are probability distributions on $Sigma$ satisfying
beginalign
P(a) &= (1-varepsilon')tildeR(a) + varepsilon'tildeP \
Q(a) &= (1-varepsilon')tildeR(a) + varepsilon'tildeQ ;.
endalign
If we now choose $ZsimtildeR$, $tildeXsimtildeP$, $tildeYsimtildeQ$ and $BsimtextBern(varepsilon')$ independently, we have a coupling as promised above.

Back to the inequality, observe that since both $P$ and $Q$ are non-negative and normalized, there necessarily exist $a,binSigma$ such that $P_0(a)=0$ and $Q_0(b)=0$. This means that each of $tildeP$ and $tildeQ$ is in fact supported on a strict subset of $Sigma$. Hence,
beginalign
|H(tildeX)-H(tildeY)| &leq maxH(tildeP),H(tildeQ)
leq log(N-1)
endalign
and the (updated) claim follows as before.

Note.
As the example H A Helfgott gave in the comments ($requireencloseenclosehorizontalstrikeN=0$, $requireencloseenclosehorizontalstrikeXsimtextBern(varepsilon)$ and $requireencloseenclosehorizontalstrikeYsimtextBern(0)$) shows, this refined bound is sharp at least when $requireencloseenclosehorizontalstrikeN=2$.
Iosif Pinelis gave the following example in his comment below which shows that the refined bound is sharp for every $N$ and $varepsilon$:

Let $Sigma:=1,2,ldots,N$ and
beginalign
P(a) &:=
begincases
1 & textif $a=1$,\
0 & textotherwise,
endcases &
Q(a) &:=
begincases
1-varepsilon & textif $a=1$,\
varepsilon/(N-1) & textotherwise.
endcases
endalign
Then, $|P-Q|=varepsilon$ and $|H(P)-H(Q)|=H(Q)=H(varepsilon)+varepsilonlog(N-1)$.

edited Sep 17 at 20:25

answered Sep 16 at 16:41

Algernon

9491712

Thanks! This is in general shaper than the above solution. I wonder whether a completely elementary proof (such as the one above) is useful?
â€“Â H A Helfgott
Sep 16 at 22:37

1

(Well, for $Nleq 2$, $|H(X)-H(Y)|leq epsilon log(N/epsilon)$would be false , as the example of $Xsim textBern(epsilon)$, $Ysim textBern(0)$ shows. Stilldots)
â€“Â H A Helfgott
Sep 17 at 14:13

1

@Algernon : I think the exact bound is $H(varepsilon) + varepsilon ln(N-1)$. Can your reasoning be modified to get that?
â€“Â Iosif Pinelis
Sep 17 at 14:39

1

@Algernon : The bound $H(varepsilon) + varepsilonln(N-1)$ is actually optimal for all $N$ and $varepsilon$: Take $p_1=1$, $q_1=1-varepsilon$, $q_2=cdots=q_N=varepsilon/(N-1)$.
â€“Â Iosif Pinelis
Sep 17 at 17:34

2

Very nice answer, and I agree with Iosif that the example is tight for all $N$. I calculate $H(Y) = H(B) + H(Y|B)$, where $B$ is the indicator for $Y=1$. We have $H(B) = H(epsilon)$ and $H(Y|B) = (1-epsilon)(0) + epsilon H(U)$ where $U$ is uniform on $2,dots,N$. This gives $H(Y) = H(epsilon) + epsilon log(N-1)$.
â€“Â usul
Sep 17 at 19:22

Â |Â
show 8 more comments

up vote
5
down vote

$newcommanddedelta
newcommandepepsilon$

Note that $pln p-p$ is decreasing in $pin[0,1]$, so that $pln p-ple qln q-q$ and hence $pln p-qln qle p-q=|p-q|$ if $0le qle ple1$.

Next, take any real $cge1$. Note that $g(p):=pln p+cp$ (with $g(0):=0$) is convex in $pin[0,1]$. So, assuming $0le ple qle1$ and $qge e^-c$, we have
$g(p)le g(0)vee g(q)=0vee g(q)=g(q)$ and hence
$pln p-qln qle c(q-p)=c|p-q|$.
Thus,
beginequation
pln p-qln qle c|p-q|
endequation
whenever $0le p,qle1$ and $qge e^-c$.

Also, $-qln q$ is increasing in $qin[0,e^-1]$ and hence in $qin[0,e^-c]$, so that $-qln qle ce^-c$ for $qin[0,e^-c]$. Also, $pln ple0$ if $0le ple1$.

Therefore, the difference between the entropies of $Q=(q_i)_i=1^N$ and $P=(p_i)_i=1^N$ is
beginequation
sum_1^N p_iln p_i-sum_1^N q_iln q_i=S_1+S_2+S_3,
endequation
where
beginalign*
S_1 &:=sum_icolon q_ige e^-c (p_iln p_i-q_iln q_i)le sum_icolon q_ige e^-cc|p_i-q_i|
le cde quadtextif dege|P-Q|_1, \
S_2 &:= sum_icolon q_i< e^-cp_iln p_ile0, \
S_3 &:= sum_icolon q_i< e^-c(-q_iln q_i)lesum_icolon q_i< e^-cce^-cle Nce^-c.
endalign*
So, taking now $c=lnfrac Nde$ and assuming $Nge ede$, we see that
beginequation
sum_1^N p_iln p_i-sum_1^N q_iln q_ile
cde+Nce^-c
=2delnfrac Nde.
endequation
Taking here $de=2ep$ and noting that $Nge1$, we conclude that
beginequation
sum_1^N p_iln p_i-sum_1^N q_iln q_ile
4eplnfracN2ep
endequation
if $eple1/(2e)$.

edited Sep 16 at 18:09

answered Sep 16 at 14:37

Iosif Pinelis

15.2k12154

add a commentÂ |Â

up vote
4
down vote

$newcommandDeDelta
newcommandepepsilon
newcommandRmathbbR$

Here is yet another answer providing the exact bound. This answer is perhaps a bit more elementary than the excellent answer given by user Algernon. Another advantage of this approach is that it produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.

To preserve the history of the question, I have also retained my previous answer, which used different (if somewhat similar) ideas and provided a suboptimal bound.

Take indeed any convex function $fcolon[0,1]toR$ and consider the difference
beginequation
De:=sum_1^N f(p_i)-sum_1^N f(q_i)
endequation
between the "generalized" entropies of $Q=(q_i)_i=1^N$ and $P=(p_i)_i=1^N$.
We want to find the exact upper bound on $De$ subject to the given conditions on $(P,Q)$.
In what follows, $(P,Q)$ is a point satisfying these conditions.
Without loss of generality (wlog), for some $kin1,dots,N$ we have $p_ige q_i$ for $ile k$, $p_ile q_i$ for $ige k+1$, and $q_1gecdotsge q_k$, so that
beginequation
ep=sum_1^k(p_i-q_i)=sum_k+1^N(q_i-p_i)>0.
endequation

Let $p_i^*:=q_i$ for $i=2,dots,k$ and $p_1^*:=q_1+ep[=sum_1^k p_i-sum_2^k q_ile1]$. Then the vector $(p_1^*,dots,p_k^*)$ majorizes (in the Schur sense) the vector $(p_1,dots,p_k)$ and still satisfies the condition $p_i^*ge q_i$ for $ile k$. Also, since $f$ is convex, $sum_1^k f(p_i)$ is Schur convex in $(p_1,dots,p_k)$. So, wlog $(p_1,dots,p_k)=(p_1^*,dots,p_k^*)$.

In particular, $p_1>q_1$ and $q_mge p_m$ for all $m=2,dots,N$.
Moreover, wlog $p_m=0$ for any $m=2,dots,N$. Indeed, take any $m=2,dots,N$ with $p_m>0$ and replace $p_1,q_1,p_m,q_m$ respectively by $p_1+t,q_1+t,p_m-t,q_m-t$, where $t:=p_min(0,1-p_1]$; then all the conditions on $P,Q$ will still hold. After this replacement, $De$ will change by the sum of the nonnegative expressions $[f(p_1+t)-f(q_1+t)]-[f(p_1)-f(q_1)]$ and $[f(q_m)-f(p_m)]-[f(q_m-t)-f(p_m-t)]$; this nonnegativity follows by the convexity of $f$. Making such replacements for each $m=2,dots,N$ with $p_m>0$, we will change $De$ by a nonnegative amount, and will also get $p_m=0$ for all $m=2,dots,N$ indeed.

Thus, wlog
begingather
p_1=1=q_1+ep,quad p_i=0 forall i=2,dots,N,quad
sum_2^N q_i=ep.
endgather
Since $sum_2^N f(q_i)$ is Schur convex in the $q_i$'s, wlog
beginalign
De&=f(1)-f(q_1)+sum_2^N f(0)-sum_2^N f(q_i) \
&le f(1)-f(1-ep)+(N-1)f(0)-(N-1)f(tfracepN-1)=:De_f;ep,N.
endalign
The bound $De_f;ep,N$ on $De$ is obviously exact, since it is attained when $p_1=1=q_1+ep$ and $q_2=cdots=q_N=tfracepN-1$.

In the particular case when $f(p)=pln p$ (with $f(0)=0$), the exact bound $De_f;ep,N$ becomes $H(ep)+epln(N-1)$, where $H(ep):=eplnfrac1ep+(1-ep)lnfrac11-ep$.

edited Sep 20 at 4:58

answered Sep 18 at 6:00

Iosif Pinelis

15.2k12154

Nice approach! I only didn't understand when you say $(p^*_k+1,ldots,p^*_N)$ majorizes the vector $(p_k+1,ldots,p_N)$. Both $p$ and $p^*$ are probability vectors and you have already shown that $(p^*_1,ldots,p^*_k)$ majorizes $(p_1,ldots,p_k)$. However, I agree that the entropy of $p^*$ must be no larger than the entropy of $p$, which seems to be what you are using.
â€“Â Algernon
Sep 18 at 16:52

Also, it should be $p^Ã¢ÂˆÂ—_ileq q_i$ for $i>k$.
â€“Â Algernon
Sep 18 at 16:52

1

@Algernon : Thank you for your comments. I have now specified that the majorization is understood here in the Schur sense and given a corresponding reference. Also, I have now simplified the proof and no longer use $p_i^*$ for $i>k$.
â€“Â Iosif Pinelis
Sep 18 at 17:52

1

It is now shown that the same approach produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.
â€“Â Iosif Pinelis
Sep 20 at 0:47

add a commentÂ |Â

Your Answer

StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["$", "$"], ["\$","\$"]]);
);
);
, "mathjax-editing");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "504"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: true,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: 10,
bindNavPrevention: true,
postfix: "",
noCode: true, onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fmathoverflow.net%2fquestions%2f310689%2fentropy-and-total-variation-distance%23new-answer', 'question_page');

);

Post as a guest

Name

3 Answers
3

active

oldest

votes

3 Answers
3

active

oldest

votes

up vote
8
down vote

Claim. If $|P-Q|leqvarepsilonleqfrac12$, then $|H(P)-H(Q)| leq H(varepsilon) + varepsilonlog N$.

edited Sep 17 at 20:25

answered Sep 16 at 16:41

Algernon

9491712

Thanks! This is in general shaper than the above solution. I wonder whether a completely elementary proof (such as the one above) is useful?
â€“Â H A Helfgott
Sep 16 at 22:37

1

(Well, for $Nleq 2$, $|H(X)-H(Y)|leq epsilon log(N/epsilon)$would be false , as the example of $Xsim textBern(epsilon)$, $Ysim textBern(0)$ shows. Stilldots)
â€“Â H A Helfgott
Sep 17 at 14:13

1

@Algernon : I think the exact bound is $H(varepsilon) + varepsilon ln(N-1)$. Can your reasoning be modified to get that?
â€“Â Iosif Pinelis
Sep 17 at 14:39

1

@Algernon : The bound $H(varepsilon) + varepsilonln(N-1)$ is actually optimal for all $N$ and $varepsilon$: Take $p_1=1$, $q_1=1-varepsilon$, $q_2=cdots=q_N=varepsilon/(N-1)$.
â€“Â Iosif Pinelis
Sep 17 at 17:34

2

Very nice answer, and I agree with Iosif that the example is tight for all $N$. I calculate $H(Y) = H(B) + H(Y|B)$, where $B$ is the indicator for $Y=1$. We have $H(B) = H(epsilon)$ and $H(Y|B) = (1-epsilon)(0) + epsilon H(U)$ where $U$ is uniform on $2,dots,N$. This gives $H(Y) = H(epsilon) + epsilon log(N-1)$.
â€“Â usul
Sep 17 at 19:22

Â |Â
show 8 more comments

up vote
8
down vote

Claim. If $|P-Q|leqvarepsilonleqfrac12$, then $|H(P)-H(Q)| leq H(varepsilon) + varepsilonlog N$.

edited Sep 17 at 20:25

answered Sep 16 at 16:41

Algernon

9491712

Thanks! This is in general shaper than the above solution. I wonder whether a completely elementary proof (such as the one above) is useful?
â€“Â H A Helfgott
Sep 16 at 22:37

1

(Well, for $Nleq 2$, $|H(X)-H(Y)|leq epsilon log(N/epsilon)$would be false , as the example of $Xsim textBern(epsilon)$, $Ysim textBern(0)$ shows. Stilldots)
â€“Â H A Helfgott
Sep 17 at 14:13

1

@Algernon : I think the exact bound is $H(varepsilon) + varepsilon ln(N-1)$. Can your reasoning be modified to get that?
â€“Â Iosif Pinelis
Sep 17 at 14:39

1

@Algernon : The bound $H(varepsilon) + varepsilonln(N-1)$ is actually optimal for all $N$ and $varepsilon$: Take $p_1=1$, $q_1=1-varepsilon$, $q_2=cdots=q_N=varepsilon/(N-1)$.
â€“Â Iosif Pinelis
Sep 17 at 17:34

2

Very nice answer, and I agree with Iosif that the example is tight for all $N$. I calculate $H(Y) = H(B) + H(Y|B)$, where $B$ is the indicator for $Y=1$. We have $H(B) = H(epsilon)$ and $H(Y|B) = (1-epsilon)(0) + epsilon H(U)$ where $U$ is uniform on $2,dots,N$. This gives $H(Y) = H(epsilon) + epsilon log(N-1)$.
â€“Â usul
Sep 17 at 19:22

Â |Â
show 8 more comments

up vote
8
down vote

Claim. If $|P-Q|leqvarepsilonleqfrac12$, then $|H(P)-H(Q)| leq H(varepsilon) + varepsilonlog N$.

edited Sep 17 at 20:25

answered Sep 16 at 16:41

Algernon

9491712

Claim. If $|P-Q|leqvarepsilonleqfrac12$, then $|H(P)-H(Q)| leq H(varepsilon) + varepsilonlog N$.

edited Sep 17 at 20:25

answered Sep 16 at 16:41

Algernon

9491712

edited Sep 17 at 20:25

answered Sep 16 at 16:41

Algernon

9491712

answered Sep 16 at 16:41

Algernon

9491712

answered Sep 16 at 16:41

Algernon

9491712

Thanks! This is in general shaper than the above solution. I wonder whether a completely elementary proof (such as the one above) is useful?
â€“Â H A Helfgott
Sep 16 at 22:37

1

(Well, for $Nleq 2$, $|H(X)-H(Y)|leq epsilon log(N/epsilon)$would be false , as the example of $Xsim textBern(epsilon)$, $Ysim textBern(0)$ shows. Stilldots)
â€“Â H A Helfgott
Sep 17 at 14:13

1

@Algernon : I think the exact bound is $H(varepsilon) + varepsilon ln(N-1)$. Can your reasoning be modified to get that?
â€“Â Iosif Pinelis
Sep 17 at 14:39

1

@Algernon : The bound $H(varepsilon) + varepsilonln(N-1)$ is actually optimal for all $N$ and $varepsilon$: Take $p_1=1$, $q_1=1-varepsilon$, $q_2=cdots=q_N=varepsilon/(N-1)$.
â€“Â Iosif Pinelis
Sep 17 at 17:34

2

Very nice answer, and I agree with Iosif that the example is tight for all $N$. I calculate $H(Y) = H(B) + H(Y|B)$, where $B$ is the indicator for $Y=1$. We have $H(B) = H(epsilon)$ and $H(Y|B) = (1-epsilon)(0) + epsilon H(U)$ where $U$ is uniform on $2,dots,N$. This gives $H(Y) = H(epsilon) + epsilon log(N-1)$.
â€“Â usul
Sep 17 at 19:22

Â |Â
show 8 more comments

Thanks! This is in general shaper than the above solution. I wonder whether a completely elementary proof (such as the one above) is useful?
â€“Â H A Helfgott
Sep 16 at 22:37

1

(Well, for $Nleq 2$, $|H(X)-H(Y)|leq epsilon log(N/epsilon)$would be false , as the example of $Xsim textBern(epsilon)$, $Ysim textBern(0)$ shows. Stilldots)
â€“Â H A Helfgott
Sep 17 at 14:13

1

@Algernon : I think the exact bound is $H(varepsilon) + varepsilon ln(N-1)$. Can your reasoning be modified to get that?
â€“Â Iosif Pinelis
Sep 17 at 14:39

1

@Algernon : The bound $H(varepsilon) + varepsilonln(N-1)$ is actually optimal for all $N$ and $varepsilon$: Take $p_1=1$, $q_1=1-varepsilon$, $q_2=cdots=q_N=varepsilon/(N-1)$.
â€“Â Iosif Pinelis
Sep 17 at 17:34

2

Very nice answer, and I agree with Iosif that the example is tight for all $N$. I calculate $H(Y) = H(B) + H(Y|B)$, where $B$ is the indicator for $Y=1$. We have $H(B) = H(epsilon)$ and $H(Y|B) = (1-epsilon)(0) + epsilon H(U)$ where $U$ is uniform on $2,dots,N$. This gives $H(Y) = H(epsilon) + epsilon log(N-1)$.
â€“Â usul
Sep 17 at 19:22

Thanks! This is in general shaper than the above solution. I wonder whether a completely elementary proof (such as the one above) is useful?
â€“Â H A Helfgott
Sep 16 at 22:37

(Well, for $Nleq 2$, $|H(X)-H(Y)|leq epsilon log(N/epsilon)$would be false , as the example of $Xsim textBern(epsilon)$, $Ysim textBern(0)$ shows. Stilldots)
â€“Â H A Helfgott
Sep 17 at 14:13

@Algernon : I think the exact bound is $H(varepsilon) + varepsilon ln(N-1)$. Can your reasoning be modified to get that?
â€“Â Iosif Pinelis
Sep 17 at 14:39

@Algernon : The bound $H(varepsilon) + varepsilonln(N-1)$ is actually optimal for all $N$ and $varepsilon$: Take $p_1=1$, $q_1=1-varepsilon$, $q_2=cdots=q_N=varepsilon/(N-1)$.
â€“Â Iosif Pinelis
Sep 17 at 17:34

Very nice answer, and I agree with Iosif that the example is tight for all $N$. I calculate $H(Y) = H(B) + H(Y|B)$, where $B$ is the indicator for $Y=1$. We have $H(B) = H(epsilon)$ and $H(Y|B) = (1-epsilon)(0) + epsilon H(U)$ where $U$ is uniform on $2,dots,N$. This gives $H(Y) = H(epsilon) + epsilon log(N-1)$.
â€“Â usul
Sep 17 at 19:22

Â |Â
show 8 more comments

up vote
5
down vote

$newcommanddedelta
newcommandepepsilon$

Note that $pln p-p$ is decreasing in $pin[0,1]$, so that $pln p-ple qln q-q$ and hence $pln p-qln qle p-q=|p-q|$ if $0le qle ple1$.

Also, $-qln q$ is increasing in $qin[0,e^-1]$ and hence in $qin[0,e^-c]$, so that $-qln qle ce^-c$ for $qin[0,e^-c]$. Also, $pln ple0$ if $0le ple1$.

edited Sep 16 at 18:09

answered Sep 16 at 14:37

Iosif Pinelis

15.2k12154

add a commentÂ |Â

up vote
5
down vote

$newcommanddedelta
newcommandepepsilon$

Note that $pln p-p$ is decreasing in $pin[0,1]$, so that $pln p-ple qln q-q$ and hence $pln p-qln qle p-q=|p-q|$ if $0le qle ple1$.

Also, $-qln q$ is increasing in $qin[0,e^-1]$ and hence in $qin[0,e^-c]$, so that $-qln qle ce^-c$ for $qin[0,e^-c]$. Also, $pln ple0$ if $0le ple1$.

edited Sep 16 at 18:09

answered Sep 16 at 14:37

Iosif Pinelis

15.2k12154

add a commentÂ |Â

up vote
5
down vote

$newcommanddedelta
newcommandepepsilon$

Note that $pln p-p$ is decreasing in $pin[0,1]$, so that $pln p-ple qln q-q$ and hence $pln p-qln qle p-q=|p-q|$ if $0le qle ple1$.

Also, $-qln q$ is increasing in $qin[0,e^-1]$ and hence in $qin[0,e^-c]$, so that $-qln qle ce^-c$ for $qin[0,e^-c]$. Also, $pln ple0$ if $0le ple1$.

edited Sep 16 at 18:09

answered Sep 16 at 14:37

Iosif Pinelis

15.2k12154

$newcommanddedelta
newcommandepepsilon$

Note that $pln p-p$ is decreasing in $pin[0,1]$, so that $pln p-ple qln q-q$ and hence $pln p-qln qle p-q=|p-q|$ if $0le qle ple1$.

Also, $-qln q$ is increasing in $qin[0,e^-1]$ and hence in $qin[0,e^-c]$, so that $-qln qle ce^-c$ for $qin[0,e^-c]$. Also, $pln ple0$ if $0le ple1$.

edited Sep 16 at 18:09

answered Sep 16 at 14:37

Iosif Pinelis

15.2k12154

edited Sep 16 at 18:09

answered Sep 16 at 14:37

Iosif Pinelis

15.2k12154

answered Sep 16 at 14:37

Iosif Pinelis

15.2k12154

answered Sep 16 at 14:37

Iosif Pinelis

15.2k12154

add a commentÂ |Â

up vote
4
down vote

$newcommandDeDelta
newcommandepepsilon
newcommandRmathbbR$

To preserve the history of the question, I have also retained my previous answer, which used different (if somewhat similar) ideas and provided a suboptimal bound.

In the particular case when $f(p)=pln p$ (with $f(0)=0$), the exact bound $De_f;ep,N$ becomes $H(ep)+epln(N-1)$, where $H(ep):=eplnfrac1ep+(1-ep)lnfrac11-ep$.

edited Sep 20 at 4:58

answered Sep 18 at 6:00

Iosif Pinelis

15.2k12154

Nice approach! I only didn't understand when you say $(p^*_k+1,ldots,p^*_N)$ majorizes the vector $(p_k+1,ldots,p_N)$. Both $p$ and $p^*$ are probability vectors and you have already shown that $(p^*_1,ldots,p^*_k)$ majorizes $(p_1,ldots,p_k)$. However, I agree that the entropy of $p^*$ must be no larger than the entropy of $p$, which seems to be what you are using.
â€“Â Algernon
Sep 18 at 16:52

Also, it should be $p^Ã¢ÂˆÂ—_ileq q_i$ for $i>k$.
â€“Â Algernon
Sep 18 at 16:52

1

@Algernon : Thank you for your comments. I have now specified that the majorization is understood here in the Schur sense and given a corresponding reference. Also, I have now simplified the proof and no longer use $p_i^*$ for $i>k$.
â€“Â Iosif Pinelis
Sep 18 at 17:52

1

It is now shown that the same approach produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.
â€“Â Iosif Pinelis
Sep 20 at 0:47

add a commentÂ |Â

up vote
4
down vote

$newcommandDeDelta
newcommandepepsilon
newcommandRmathbbR$

To preserve the history of the question, I have also retained my previous answer, which used different (if somewhat similar) ideas and provided a suboptimal bound.

In the particular case when $f(p)=pln p$ (with $f(0)=0$), the exact bound $De_f;ep,N$ becomes $H(ep)+epln(N-1)$, where $H(ep):=eplnfrac1ep+(1-ep)lnfrac11-ep$.

edited Sep 20 at 4:58

answered Sep 18 at 6:00

Iosif Pinelis

15.2k12154

Nice approach! I only didn't understand when you say $(p^*_k+1,ldots,p^*_N)$ majorizes the vector $(p_k+1,ldots,p_N)$. Both $p$ and $p^*$ are probability vectors and you have already shown that $(p^*_1,ldots,p^*_k)$ majorizes $(p_1,ldots,p_k)$. However, I agree that the entropy of $p^*$ must be no larger than the entropy of $p$, which seems to be what you are using.
â€“Â Algernon
Sep 18 at 16:52

Also, it should be $p^Ã¢ÂˆÂ—_ileq q_i$ for $i>k$.
â€“Â Algernon
Sep 18 at 16:52

1

@Algernon : Thank you for your comments. I have now specified that the majorization is understood here in the Schur sense and given a corresponding reference. Also, I have now simplified the proof and no longer use $p_i^*$ for $i>k$.
â€“Â Iosif Pinelis
Sep 18 at 17:52

1

It is now shown that the same approach produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.
â€“Â Iosif Pinelis
Sep 20 at 0:47

add a commentÂ |Â

up vote
4
down vote

$newcommandDeDelta
newcommandepepsilon
newcommandRmathbbR$

To preserve the history of the question, I have also retained my previous answer, which used different (if somewhat similar) ideas and provided a suboptimal bound.

In the particular case when $f(p)=pln p$ (with $f(0)=0$), the exact bound $De_f;ep,N$ becomes $H(ep)+epln(N-1)$, where $H(ep):=eplnfrac1ep+(1-ep)lnfrac11-ep$.

edited Sep 20 at 4:58

answered Sep 18 at 6:00

Iosif Pinelis

15.2k12154

$newcommandDeDelta
newcommandepepsilon
newcommandRmathbbR$

To preserve the history of the question, I have also retained my previous answer, which used different (if somewhat similar) ideas and provided a suboptimal bound.

In the particular case when $f(p)=pln p$ (with $f(0)=0$), the exact bound $De_f;ep,N$ becomes $H(ep)+epln(N-1)$, where $H(ep):=eplnfrac1ep+(1-ep)lnfrac11-ep$.

edited Sep 20 at 4:58

answered Sep 18 at 6:00

Iosif Pinelis

15.2k12154

edited Sep 20 at 4:58

answered Sep 18 at 6:00

Iosif Pinelis

15.2k12154

answered Sep 18 at 6:00

Iosif Pinelis

15.2k12154

answered Sep 18 at 6:00

Iosif Pinelis

15.2k12154

Nice approach! I only didn't understand when you say $(p^*_k+1,ldots,p^*_N)$ majorizes the vector $(p_k+1,ldots,p_N)$. Both $p$ and $p^*$ are probability vectors and you have already shown that $(p^*_1,ldots,p^*_k)$ majorizes $(p_1,ldots,p_k)$. However, I agree that the entropy of $p^*$ must be no larger than the entropy of $p$, which seems to be what you are using.
â€“Â Algernon
Sep 18 at 16:52

Also, it should be $p^Ã¢ÂˆÂ—_ileq q_i$ for $i>k$.
â€“Â Algernon
Sep 18 at 16:52

1

@Algernon : Thank you for your comments. I have now specified that the majorization is understood here in the Schur sense and given a corresponding reference. Also, I have now simplified the proof and no longer use $p_i^*$ for $i>k$.
â€“Â Iosif Pinelis
Sep 18 at 17:52

1

It is now shown that the same approach produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.
â€“Â Iosif Pinelis
Sep 20 at 0:47

add a commentÂ |Â

Nice approach! I only didn't understand when you say $(p^*_k+1,ldots,p^*_N)$ majorizes the vector $(p_k+1,ldots,p_N)$. Both $p$ and $p^*$ are probability vectors and you have already shown that $(p^*_1,ldots,p^*_k)$ majorizes $(p_1,ldots,p_k)$. However, I agree that the entropy of $p^*$ must be no larger than the entropy of $p$, which seems to be what you are using.
â€“Â Algernon
Sep 18 at 16:52

Also, it should be $p^Ã¢ÂˆÂ—_ileq q_i$ for $i>k$.
â€“Â Algernon
Sep 18 at 16:52

1

@Algernon : Thank you for your comments. I have now specified that the majorization is understood here in the Schur sense and given a corresponding reference. Also, I have now simplified the proof and no longer use $p_i^*$ for $i>k$.
â€“Â Iosif Pinelis
Sep 18 at 17:52

1

It is now shown that the same approach produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.
â€“Â Iosif Pinelis
Sep 20 at 0:47

Nice approach! I only didn't understand when you say $(p^*_k+1,ldots,p^*_N)$ majorizes the vector $(p_k+1,ldots,p_N)$. Both $p$ and $p^*$ are probability vectors and you have already shown that $(p^*_1,ldots,p^*_k)$ majorizes $(p_1,ldots,p_k)$. However, I agree that the entropy of $p^*$ must be no larger than the entropy of $p$, which seems to be what you are using.
â€“Â Algernon
Sep 18 at 16:52

Also, it should be $p^Ã¢ÂˆÂ—_ileq q_i$ for $i>k$.
â€“Â Algernon
Sep 18 at 16:52

@Algernon : Thank you for your comments. I have now specified that the majorization is understood here in the Schur sense and given a corresponding reference. Also, I have now simplified the proof and no longer use $p_i^*$ for $i>k$.
â€“Â Iosif Pinelis
Sep 18 at 17:52

It is now shown that the same approach produces the exact bound for any convex function $f$ in place of the function $pmapsto pln p$.
â€“Â Iosif Pinelis
Sep 20 at 0:47

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu