How does awk '!a[$0]++' work?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












33















This one-liner removes duplicate lines from text input without pre-sorting.



For example:



$ cat >f
q
w
e
w
r
$ awk '!a[$0]++' <f
q
w
e
r
$


The original code I have found on the internets read:



awk '!_[$0]++'



This was even more perplexing to me as I took _ to have a special meaning in awk, like in Perl, but it turned out to be just a name of an array.



Now, I understand the logic behind the one-liner:
each input line is used as a key in a hash array, thus, upon completion, the hash contains unique lines in the order of arrival.



What I would like to learn is how exactly this notation is interpreted by awk. E.g. what the bang sign (!) means and the other elements of this code snippet.



How does it work?










share|improve this question






















  • title is misleading, it should be $0 (Zero), not $o (o).

    – Archemar
    Oct 6 '14 at 21:06






  • 1





    As it's a hash, it's unordered, so "in the order of arrival" isn't actually correct.

    – Kevin
    Oct 7 '14 at 6:10















33















This one-liner removes duplicate lines from text input without pre-sorting.



For example:



$ cat >f
q
w
e
w
r
$ awk '!a[$0]++' <f
q
w
e
r
$


The original code I have found on the internets read:



awk '!_[$0]++'



This was even more perplexing to me as I took _ to have a special meaning in awk, like in Perl, but it turned out to be just a name of an array.



Now, I understand the logic behind the one-liner:
each input line is used as a key in a hash array, thus, upon completion, the hash contains unique lines in the order of arrival.



What I would like to learn is how exactly this notation is interpreted by awk. E.g. what the bang sign (!) means and the other elements of this code snippet.



How does it work?










share|improve this question






















  • title is misleading, it should be $0 (Zero), not $o (o).

    – Archemar
    Oct 6 '14 at 21:06






  • 1





    As it's a hash, it's unordered, so "in the order of arrival" isn't actually correct.

    – Kevin
    Oct 7 '14 at 6:10













33












33








33


10






This one-liner removes duplicate lines from text input without pre-sorting.



For example:



$ cat >f
q
w
e
w
r
$ awk '!a[$0]++' <f
q
w
e
r
$


The original code I have found on the internets read:



awk '!_[$0]++'



This was even more perplexing to me as I took _ to have a special meaning in awk, like in Perl, but it turned out to be just a name of an array.



Now, I understand the logic behind the one-liner:
each input line is used as a key in a hash array, thus, upon completion, the hash contains unique lines in the order of arrival.



What I would like to learn is how exactly this notation is interpreted by awk. E.g. what the bang sign (!) means and the other elements of this code snippet.



How does it work?










share|improve this question














This one-liner removes duplicate lines from text input without pre-sorting.



For example:



$ cat >f
q
w
e
w
r
$ awk '!a[$0]++' <f
q
w
e
r
$


The original code I have found on the internets read:



awk '!_[$0]++'



This was even more perplexing to me as I took _ to have a special meaning in awk, like in Perl, but it turned out to be just a name of an array.



Now, I understand the logic behind the one-liner:
each input line is used as a key in a hash array, thus, upon completion, the hash contains unique lines in the order of arrival.



What I would like to learn is how exactly this notation is interpreted by awk. E.g. what the bang sign (!) means and the other elements of this code snippet.



How does it work?







shell-script awk scripting sort uniq






share|improve this question













share|improve this question











share|improve this question




share|improve this question










asked Oct 6 '14 at 20:56









Alexander ShcheblikinAlexander Shcheblikin

9251712




9251712












  • title is misleading, it should be $0 (Zero), not $o (o).

    – Archemar
    Oct 6 '14 at 21:06






  • 1





    As it's a hash, it's unordered, so "in the order of arrival" isn't actually correct.

    – Kevin
    Oct 7 '14 at 6:10

















  • title is misleading, it should be $0 (Zero), not $o (o).

    – Archemar
    Oct 6 '14 at 21:06






  • 1





    As it's a hash, it's unordered, so "in the order of arrival" isn't actually correct.

    – Kevin
    Oct 7 '14 at 6:10
















title is misleading, it should be $0 (Zero), not $o (o).

– Archemar
Oct 6 '14 at 21:06





title is misleading, it should be $0 (Zero), not $o (o).

– Archemar
Oct 6 '14 at 21:06




1




1





As it's a hash, it's unordered, so "in the order of arrival" isn't actually correct.

– Kevin
Oct 7 '14 at 6:10





As it's a hash, it's unordered, so "in the order of arrival" isn't actually correct.

– Kevin
Oct 7 '14 at 6:10










2 Answers
2






active

oldest

votes


















31














Let's see,



 !a[$0]++


first



 a[$0]


we look at the value of a[$0] (array a with whole input line ($0) as key).



If it does not exist ( ! is negation in test will eval to true)



 !a[$0]


we print the input line $0 (default action).



Also, we add one ( ++ ) to a[$0], so next time !a[$0] will evaluate to false.



Nice, find!! You should have a look at code golf!






share|improve this answer




















  • 1





    So the essence is this: the expression in the single quotes is used by awk as a test for each input line; every time the test succeeds awk executes the action in curly braces, which, when omitted is print. Thanks!

    – Alexander Shcheblikin
    Oct 7 '14 at 1:21






  • 3





    @Archemar: This answer is wrong, see mine.

    – cuonglm
    Oct 7 '14 at 3:29











  • @AlexanderShcheblikin in awk, the default action is print $0. This means that anything evaluated as true will execute this as default. So for example awk '1' file prints all the lines, awk '$1' file prints all those lines whose first field is not empty or 0, etc.

    – fedorqui
    Oct 7 '14 at 10:31






  • 6





    @Gnouc I don't see any serious error in this answer. If that is what you're referring to, the incrementation is indeed applied after the value of the expression is calculated. It's true that the incrementation happens before the printing, but that's a minor imprecision which doesn't affect the basic explanation.

    – Gilles
    Oct 7 '14 at 17:23












  • @AlexanderShcheblikin: See my updated answer.

    – cuonglm
    Oct 8 '14 at 2:24


















27














Here is the processing:



  • a[$0]: look at the value of key $0, in associative array a. If it does not exist, create it.


  • a[$0]++: increment the value of a[$0], return the old value as value of expression. If a[$0] does not exist, return 0 and increment a[$0] to 1 (++ operator returns numeric value).


  • !a[$0]++: negate the value of expression. If a[$0]++ return 0, the whole expression is evaluated to true, make awk performed default action print $0. Otherwise, the whole expression is evaluated to false, causes awk do nothing.


References:



  • Expression in awk

  • gawk - Increment and Decrement Operators

With gawk, we can use dgawk (or awk --debug with newer version) to debug a gawk script. First, create a gawk script, named test.awk:



BEGIN 
a = 0;
!a++;



Then run:



dgawk -f test.awk


or:



gawk --debug -f test.awk


In debugger console:



$ dgawk -f test.awk
dgawk> trace on
dgawk> watch a
Watchpoint 1: a
dgawk> run
Starting program:
[ 1:0x7fe59154cfe0] Op_rule : [in_rule = BEGIN] [source_file = test.awk]
[ 2:0x7fe59154bf80] Op_push_i : 0 [PERM|NUMCUR|NUMBER]
[ 2:0x7fe59154bf20] Op_store_var : a [do_reference = FALSE]
[ 3:0x7fe59154bf60] Op_push_lhs : a [do_reference = TRUE]
Stopping in BEGIN ...
Watchpoint 1: a
Old value: untyped variable
New value: 0
main() at `test.awk':3
3 !a++;
dgawk> step
[ 3:0x7fe59154bfc0] Op_postincrement :
[ 3:0x7fe59154bf40] Op_not :
Watchpoint 1: a
Old value: 0
New value: 1
main() at `test.awk':3
3 !a++;
dgawk>


You can see, Op_postincrement was executed before Op_not.



You can also use si or stepi instead of s or step to see more clearly:



dgawk> si
[ 3:0x7ff061ac1fc0] Op_postincrement :
3 !a++;
dgawk> si
[ 3:0x7ff061ac1f40] Op_not :
Watchpoint 1: a
Old value: 0
New value: 1
main() at `test.awk':3
3 !a++;





share|improve this answer




















  • 3





    @Archemar: Your answer indicate that ! is applied before ++.

    – cuonglm
    Oct 7 '14 at 5:37






  • 6





    This answer is wrong. The incrementation happens after the result of the ! operator is calculated. You are confusing operator precedence (!a[$0]++ is parsed like !(a[$0]++)) with order of evaluation (the assignment of the new value of a[$0] happens after the value of the expression has been calculated).

    – Gilles
    Oct 7 '14 at 17:21






  • 5





    @Gnouc It says right in the passage you quoted, and if it worked the way you described, this code wouldn't have the desired effect. First the value !x is calculated, where x is the old value of a[$0]. Then a[$0] is set to 1+x.

    – Gilles
    Oct 7 '14 at 17:26






  • 7





    I believe that your analysis of what awk does is correct. Sorry if I implied otherwise yesterday. However, your critique of Archemar's answer is wrong. Archemar does not misunderstand precedence, you do, you're confusing precedence with order of evaluation (see my previous comment). If you remove any mention of Archemar's answer in yours, your answer should be correct. As it is, it is focused on proving Archemar wrong, and this is not the case.

    – Gilles
    Oct 8 '14 at 7:59






  • 4





    well, at least now I know about awk's debugger ...

    – Archemar
    Oct 8 '14 at 8:51










Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f159695%2fhow-does-awk-a0-work%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























2 Answers
2






active

oldest

votes








2 Answers
2






active

oldest

votes









active

oldest

votes






active

oldest

votes









31














Let's see,



 !a[$0]++


first



 a[$0]


we look at the value of a[$0] (array a with whole input line ($0) as key).



If it does not exist ( ! is negation in test will eval to true)



 !a[$0]


we print the input line $0 (default action).



Also, we add one ( ++ ) to a[$0], so next time !a[$0] will evaluate to false.



Nice, find!! You should have a look at code golf!






share|improve this answer




















  • 1





    So the essence is this: the expression in the single quotes is used by awk as a test for each input line; every time the test succeeds awk executes the action in curly braces, which, when omitted is print. Thanks!

    – Alexander Shcheblikin
    Oct 7 '14 at 1:21






  • 3





    @Archemar: This answer is wrong, see mine.

    – cuonglm
    Oct 7 '14 at 3:29











  • @AlexanderShcheblikin in awk, the default action is print $0. This means that anything evaluated as true will execute this as default. So for example awk '1' file prints all the lines, awk '$1' file prints all those lines whose first field is not empty or 0, etc.

    – fedorqui
    Oct 7 '14 at 10:31






  • 6





    @Gnouc I don't see any serious error in this answer. If that is what you're referring to, the incrementation is indeed applied after the value of the expression is calculated. It's true that the incrementation happens before the printing, but that's a minor imprecision which doesn't affect the basic explanation.

    – Gilles
    Oct 7 '14 at 17:23












  • @AlexanderShcheblikin: See my updated answer.

    – cuonglm
    Oct 8 '14 at 2:24















31














Let's see,



 !a[$0]++


first



 a[$0]


we look at the value of a[$0] (array a with whole input line ($0) as key).



If it does not exist ( ! is negation in test will eval to true)



 !a[$0]


we print the input line $0 (default action).



Also, we add one ( ++ ) to a[$0], so next time !a[$0] will evaluate to false.



Nice, find!! You should have a look at code golf!






share|improve this answer




















  • 1





    So the essence is this: the expression in the single quotes is used by awk as a test for each input line; every time the test succeeds awk executes the action in curly braces, which, when omitted is print. Thanks!

    – Alexander Shcheblikin
    Oct 7 '14 at 1:21






  • 3





    @Archemar: This answer is wrong, see mine.

    – cuonglm
    Oct 7 '14 at 3:29











  • @AlexanderShcheblikin in awk, the default action is print $0. This means that anything evaluated as true will execute this as default. So for example awk '1' file prints all the lines, awk '$1' file prints all those lines whose first field is not empty or 0, etc.

    – fedorqui
    Oct 7 '14 at 10:31






  • 6





    @Gnouc I don't see any serious error in this answer. If that is what you're referring to, the incrementation is indeed applied after the value of the expression is calculated. It's true that the incrementation happens before the printing, but that's a minor imprecision which doesn't affect the basic explanation.

    – Gilles
    Oct 7 '14 at 17:23












  • @AlexanderShcheblikin: See my updated answer.

    – cuonglm
    Oct 8 '14 at 2:24













31












31








31







Let's see,



 !a[$0]++


first



 a[$0]


we look at the value of a[$0] (array a with whole input line ($0) as key).



If it does not exist ( ! is negation in test will eval to true)



 !a[$0]


we print the input line $0 (default action).



Also, we add one ( ++ ) to a[$0], so next time !a[$0] will evaluate to false.



Nice, find!! You should have a look at code golf!






share|improve this answer















Let's see,



 !a[$0]++


first



 a[$0]


we look at the value of a[$0] (array a with whole input line ($0) as key).



If it does not exist ( ! is negation in test will eval to true)



 !a[$0]


we print the input line $0 (default action).



Also, we add one ( ++ ) to a[$0], so next time !a[$0] will evaluate to false.



Nice, find!! You should have a look at code golf!







share|improve this answer














share|improve this answer



share|improve this answer








edited Oct 8 '14 at 8:01









Gilles

543k12811001617




543k12811001617










answered Oct 6 '14 at 21:03









ArchemarArchemar

20.2k93973




20.2k93973







  • 1





    So the essence is this: the expression in the single quotes is used by awk as a test for each input line; every time the test succeeds awk executes the action in curly braces, which, when omitted is print. Thanks!

    – Alexander Shcheblikin
    Oct 7 '14 at 1:21






  • 3





    @Archemar: This answer is wrong, see mine.

    – cuonglm
    Oct 7 '14 at 3:29











  • @AlexanderShcheblikin in awk, the default action is print $0. This means that anything evaluated as true will execute this as default. So for example awk '1' file prints all the lines, awk '$1' file prints all those lines whose first field is not empty or 0, etc.

    – fedorqui
    Oct 7 '14 at 10:31






  • 6





    @Gnouc I don't see any serious error in this answer. If that is what you're referring to, the incrementation is indeed applied after the value of the expression is calculated. It's true that the incrementation happens before the printing, but that's a minor imprecision which doesn't affect the basic explanation.

    – Gilles
    Oct 7 '14 at 17:23












  • @AlexanderShcheblikin: See my updated answer.

    – cuonglm
    Oct 8 '14 at 2:24












  • 1





    So the essence is this: the expression in the single quotes is used by awk as a test for each input line; every time the test succeeds awk executes the action in curly braces, which, when omitted is print. Thanks!

    – Alexander Shcheblikin
    Oct 7 '14 at 1:21






  • 3





    @Archemar: This answer is wrong, see mine.

    – cuonglm
    Oct 7 '14 at 3:29











  • @AlexanderShcheblikin in awk, the default action is print $0. This means that anything evaluated as true will execute this as default. So for example awk '1' file prints all the lines, awk '$1' file prints all those lines whose first field is not empty or 0, etc.

    – fedorqui
    Oct 7 '14 at 10:31






  • 6





    @Gnouc I don't see any serious error in this answer. If that is what you're referring to, the incrementation is indeed applied after the value of the expression is calculated. It's true that the incrementation happens before the printing, but that's a minor imprecision which doesn't affect the basic explanation.

    – Gilles
    Oct 7 '14 at 17:23












  • @AlexanderShcheblikin: See my updated answer.

    – cuonglm
    Oct 8 '14 at 2:24







1




1





So the essence is this: the expression in the single quotes is used by awk as a test for each input line; every time the test succeeds awk executes the action in curly braces, which, when omitted is print. Thanks!

– Alexander Shcheblikin
Oct 7 '14 at 1:21





So the essence is this: the expression in the single quotes is used by awk as a test for each input line; every time the test succeeds awk executes the action in curly braces, which, when omitted is print. Thanks!

– Alexander Shcheblikin
Oct 7 '14 at 1:21




3




3





@Archemar: This answer is wrong, see mine.

– cuonglm
Oct 7 '14 at 3:29





@Archemar: This answer is wrong, see mine.

– cuonglm
Oct 7 '14 at 3:29













@AlexanderShcheblikin in awk, the default action is print $0. This means that anything evaluated as true will execute this as default. So for example awk '1' file prints all the lines, awk '$1' file prints all those lines whose first field is not empty or 0, etc.

– fedorqui
Oct 7 '14 at 10:31





@AlexanderShcheblikin in awk, the default action is print $0. This means that anything evaluated as true will execute this as default. So for example awk '1' file prints all the lines, awk '$1' file prints all those lines whose first field is not empty or 0, etc.

– fedorqui
Oct 7 '14 at 10:31




6




6





@Gnouc I don't see any serious error in this answer. If that is what you're referring to, the incrementation is indeed applied after the value of the expression is calculated. It's true that the incrementation happens before the printing, but that's a minor imprecision which doesn't affect the basic explanation.

– Gilles
Oct 7 '14 at 17:23






@Gnouc I don't see any serious error in this answer. If that is what you're referring to, the incrementation is indeed applied after the value of the expression is calculated. It's true that the incrementation happens before the printing, but that's a minor imprecision which doesn't affect the basic explanation.

– Gilles
Oct 7 '14 at 17:23














@AlexanderShcheblikin: See my updated answer.

– cuonglm
Oct 8 '14 at 2:24





@AlexanderShcheblikin: See my updated answer.

– cuonglm
Oct 8 '14 at 2:24













27














Here is the processing:



  • a[$0]: look at the value of key $0, in associative array a. If it does not exist, create it.


  • a[$0]++: increment the value of a[$0], return the old value as value of expression. If a[$0] does not exist, return 0 and increment a[$0] to 1 (++ operator returns numeric value).


  • !a[$0]++: negate the value of expression. If a[$0]++ return 0, the whole expression is evaluated to true, make awk performed default action print $0. Otherwise, the whole expression is evaluated to false, causes awk do nothing.


References:



  • Expression in awk

  • gawk - Increment and Decrement Operators

With gawk, we can use dgawk (or awk --debug with newer version) to debug a gawk script. First, create a gawk script, named test.awk:



BEGIN 
a = 0;
!a++;



Then run:



dgawk -f test.awk


or:



gawk --debug -f test.awk


In debugger console:



$ dgawk -f test.awk
dgawk> trace on
dgawk> watch a
Watchpoint 1: a
dgawk> run
Starting program:
[ 1:0x7fe59154cfe0] Op_rule : [in_rule = BEGIN] [source_file = test.awk]
[ 2:0x7fe59154bf80] Op_push_i : 0 [PERM|NUMCUR|NUMBER]
[ 2:0x7fe59154bf20] Op_store_var : a [do_reference = FALSE]
[ 3:0x7fe59154bf60] Op_push_lhs : a [do_reference = TRUE]
Stopping in BEGIN ...
Watchpoint 1: a
Old value: untyped variable
New value: 0
main() at `test.awk':3
3 !a++;
dgawk> step
[ 3:0x7fe59154bfc0] Op_postincrement :
[ 3:0x7fe59154bf40] Op_not :
Watchpoint 1: a
Old value: 0
New value: 1
main() at `test.awk':3
3 !a++;
dgawk>


You can see, Op_postincrement was executed before Op_not.



You can also use si or stepi instead of s or step to see more clearly:



dgawk> si
[ 3:0x7ff061ac1fc0] Op_postincrement :
3 !a++;
dgawk> si
[ 3:0x7ff061ac1f40] Op_not :
Watchpoint 1: a
Old value: 0
New value: 1
main() at `test.awk':3
3 !a++;





share|improve this answer




















  • 3





    @Archemar: Your answer indicate that ! is applied before ++.

    – cuonglm
    Oct 7 '14 at 5:37






  • 6





    This answer is wrong. The incrementation happens after the result of the ! operator is calculated. You are confusing operator precedence (!a[$0]++ is parsed like !(a[$0]++)) with order of evaluation (the assignment of the new value of a[$0] happens after the value of the expression has been calculated).

    – Gilles
    Oct 7 '14 at 17:21






  • 5





    @Gnouc It says right in the passage you quoted, and if it worked the way you described, this code wouldn't have the desired effect. First the value !x is calculated, where x is the old value of a[$0]. Then a[$0] is set to 1+x.

    – Gilles
    Oct 7 '14 at 17:26






  • 7





    I believe that your analysis of what awk does is correct. Sorry if I implied otherwise yesterday. However, your critique of Archemar's answer is wrong. Archemar does not misunderstand precedence, you do, you're confusing precedence with order of evaluation (see my previous comment). If you remove any mention of Archemar's answer in yours, your answer should be correct. As it is, it is focused on proving Archemar wrong, and this is not the case.

    – Gilles
    Oct 8 '14 at 7:59






  • 4





    well, at least now I know about awk's debugger ...

    – Archemar
    Oct 8 '14 at 8:51















27














Here is the processing:



  • a[$0]: look at the value of key $0, in associative array a. If it does not exist, create it.


  • a[$0]++: increment the value of a[$0], return the old value as value of expression. If a[$0] does not exist, return 0 and increment a[$0] to 1 (++ operator returns numeric value).


  • !a[$0]++: negate the value of expression. If a[$0]++ return 0, the whole expression is evaluated to true, make awk performed default action print $0. Otherwise, the whole expression is evaluated to false, causes awk do nothing.


References:



  • Expression in awk

  • gawk - Increment and Decrement Operators

With gawk, we can use dgawk (or awk --debug with newer version) to debug a gawk script. First, create a gawk script, named test.awk:



BEGIN 
a = 0;
!a++;



Then run:



dgawk -f test.awk


or:



gawk --debug -f test.awk


In debugger console:



$ dgawk -f test.awk
dgawk> trace on
dgawk> watch a
Watchpoint 1: a
dgawk> run
Starting program:
[ 1:0x7fe59154cfe0] Op_rule : [in_rule = BEGIN] [source_file = test.awk]
[ 2:0x7fe59154bf80] Op_push_i : 0 [PERM|NUMCUR|NUMBER]
[ 2:0x7fe59154bf20] Op_store_var : a [do_reference = FALSE]
[ 3:0x7fe59154bf60] Op_push_lhs : a [do_reference = TRUE]
Stopping in BEGIN ...
Watchpoint 1: a
Old value: untyped variable
New value: 0
main() at `test.awk':3
3 !a++;
dgawk> step
[ 3:0x7fe59154bfc0] Op_postincrement :
[ 3:0x7fe59154bf40] Op_not :
Watchpoint 1: a
Old value: 0
New value: 1
main() at `test.awk':3
3 !a++;
dgawk>


You can see, Op_postincrement was executed before Op_not.



You can also use si or stepi instead of s or step to see more clearly:



dgawk> si
[ 3:0x7ff061ac1fc0] Op_postincrement :
3 !a++;
dgawk> si
[ 3:0x7ff061ac1f40] Op_not :
Watchpoint 1: a
Old value: 0
New value: 1
main() at `test.awk':3
3 !a++;





share|improve this answer




















  • 3





    @Archemar: Your answer indicate that ! is applied before ++.

    – cuonglm
    Oct 7 '14 at 5:37






  • 6





    This answer is wrong. The incrementation happens after the result of the ! operator is calculated. You are confusing operator precedence (!a[$0]++ is parsed like !(a[$0]++)) with order of evaluation (the assignment of the new value of a[$0] happens after the value of the expression has been calculated).

    – Gilles
    Oct 7 '14 at 17:21






  • 5





    @Gnouc It says right in the passage you quoted, and if it worked the way you described, this code wouldn't have the desired effect. First the value !x is calculated, where x is the old value of a[$0]. Then a[$0] is set to 1+x.

    – Gilles
    Oct 7 '14 at 17:26






  • 7





    I believe that your analysis of what awk does is correct. Sorry if I implied otherwise yesterday. However, your critique of Archemar's answer is wrong. Archemar does not misunderstand precedence, you do, you're confusing precedence with order of evaluation (see my previous comment). If you remove any mention of Archemar's answer in yours, your answer should be correct. As it is, it is focused on proving Archemar wrong, and this is not the case.

    – Gilles
    Oct 8 '14 at 7:59






  • 4





    well, at least now I know about awk's debugger ...

    – Archemar
    Oct 8 '14 at 8:51













27












27








27







Here is the processing:



  • a[$0]: look at the value of key $0, in associative array a. If it does not exist, create it.


  • a[$0]++: increment the value of a[$0], return the old value as value of expression. If a[$0] does not exist, return 0 and increment a[$0] to 1 (++ operator returns numeric value).


  • !a[$0]++: negate the value of expression. If a[$0]++ return 0, the whole expression is evaluated to true, make awk performed default action print $0. Otherwise, the whole expression is evaluated to false, causes awk do nothing.


References:



  • Expression in awk

  • gawk - Increment and Decrement Operators

With gawk, we can use dgawk (or awk --debug with newer version) to debug a gawk script. First, create a gawk script, named test.awk:



BEGIN 
a = 0;
!a++;



Then run:



dgawk -f test.awk


or:



gawk --debug -f test.awk


In debugger console:



$ dgawk -f test.awk
dgawk> trace on
dgawk> watch a
Watchpoint 1: a
dgawk> run
Starting program:
[ 1:0x7fe59154cfe0] Op_rule : [in_rule = BEGIN] [source_file = test.awk]
[ 2:0x7fe59154bf80] Op_push_i : 0 [PERM|NUMCUR|NUMBER]
[ 2:0x7fe59154bf20] Op_store_var : a [do_reference = FALSE]
[ 3:0x7fe59154bf60] Op_push_lhs : a [do_reference = TRUE]
Stopping in BEGIN ...
Watchpoint 1: a
Old value: untyped variable
New value: 0
main() at `test.awk':3
3 !a++;
dgawk> step
[ 3:0x7fe59154bfc0] Op_postincrement :
[ 3:0x7fe59154bf40] Op_not :
Watchpoint 1: a
Old value: 0
New value: 1
main() at `test.awk':3
3 !a++;
dgawk>


You can see, Op_postincrement was executed before Op_not.



You can also use si or stepi instead of s or step to see more clearly:



dgawk> si
[ 3:0x7ff061ac1fc0] Op_postincrement :
3 !a++;
dgawk> si
[ 3:0x7ff061ac1f40] Op_not :
Watchpoint 1: a
Old value: 0
New value: 1
main() at `test.awk':3
3 !a++;





share|improve this answer















Here is the processing:



  • a[$0]: look at the value of key $0, in associative array a. If it does not exist, create it.


  • a[$0]++: increment the value of a[$0], return the old value as value of expression. If a[$0] does not exist, return 0 and increment a[$0] to 1 (++ operator returns numeric value).


  • !a[$0]++: negate the value of expression. If a[$0]++ return 0, the whole expression is evaluated to true, make awk performed default action print $0. Otherwise, the whole expression is evaluated to false, causes awk do nothing.


References:



  • Expression in awk

  • gawk - Increment and Decrement Operators

With gawk, we can use dgawk (or awk --debug with newer version) to debug a gawk script. First, create a gawk script, named test.awk:



BEGIN 
a = 0;
!a++;



Then run:



dgawk -f test.awk


or:



gawk --debug -f test.awk


In debugger console:



$ dgawk -f test.awk
dgawk> trace on
dgawk> watch a
Watchpoint 1: a
dgawk> run
Starting program:
[ 1:0x7fe59154cfe0] Op_rule : [in_rule = BEGIN] [source_file = test.awk]
[ 2:0x7fe59154bf80] Op_push_i : 0 [PERM|NUMCUR|NUMBER]
[ 2:0x7fe59154bf20] Op_store_var : a [do_reference = FALSE]
[ 3:0x7fe59154bf60] Op_push_lhs : a [do_reference = TRUE]
Stopping in BEGIN ...
Watchpoint 1: a
Old value: untyped variable
New value: 0
main() at `test.awk':3
3 !a++;
dgawk> step
[ 3:0x7fe59154bfc0] Op_postincrement :
[ 3:0x7fe59154bf40] Op_not :
Watchpoint 1: a
Old value: 0
New value: 1
main() at `test.awk':3
3 !a++;
dgawk>


You can see, Op_postincrement was executed before Op_not.



You can also use si or stepi instead of s or step to see more clearly:



dgawk> si
[ 3:0x7ff061ac1fc0] Op_postincrement :
3 !a++;
dgawk> si
[ 3:0x7ff061ac1f40] Op_not :
Watchpoint 1: a
Old value: 0
New value: 1
main() at `test.awk':3
3 !a++;






share|improve this answer














share|improve this answer



share|improve this answer








edited Jun 8 '15 at 7:47

























answered Oct 7 '14 at 2:02









cuonglmcuonglm

105k25209307




105k25209307







  • 3





    @Archemar: Your answer indicate that ! is applied before ++.

    – cuonglm
    Oct 7 '14 at 5:37






  • 6





    This answer is wrong. The incrementation happens after the result of the ! operator is calculated. You are confusing operator precedence (!a[$0]++ is parsed like !(a[$0]++)) with order of evaluation (the assignment of the new value of a[$0] happens after the value of the expression has been calculated).

    – Gilles
    Oct 7 '14 at 17:21






  • 5





    @Gnouc It says right in the passage you quoted, and if it worked the way you described, this code wouldn't have the desired effect. First the value !x is calculated, where x is the old value of a[$0]. Then a[$0] is set to 1+x.

    – Gilles
    Oct 7 '14 at 17:26






  • 7





    I believe that your analysis of what awk does is correct. Sorry if I implied otherwise yesterday. However, your critique of Archemar's answer is wrong. Archemar does not misunderstand precedence, you do, you're confusing precedence with order of evaluation (see my previous comment). If you remove any mention of Archemar's answer in yours, your answer should be correct. As it is, it is focused on proving Archemar wrong, and this is not the case.

    – Gilles
    Oct 8 '14 at 7:59






  • 4





    well, at least now I know about awk's debugger ...

    – Archemar
    Oct 8 '14 at 8:51












  • 3





    @Archemar: Your answer indicate that ! is applied before ++.

    – cuonglm
    Oct 7 '14 at 5:37






  • 6





    This answer is wrong. The incrementation happens after the result of the ! operator is calculated. You are confusing operator precedence (!a[$0]++ is parsed like !(a[$0]++)) with order of evaluation (the assignment of the new value of a[$0] happens after the value of the expression has been calculated).

    – Gilles
    Oct 7 '14 at 17:21






  • 5





    @Gnouc It says right in the passage you quoted, and if it worked the way you described, this code wouldn't have the desired effect. First the value !x is calculated, where x is the old value of a[$0]. Then a[$0] is set to 1+x.

    – Gilles
    Oct 7 '14 at 17:26






  • 7





    I believe that your analysis of what awk does is correct. Sorry if I implied otherwise yesterday. However, your critique of Archemar's answer is wrong. Archemar does not misunderstand precedence, you do, you're confusing precedence with order of evaluation (see my previous comment). If you remove any mention of Archemar's answer in yours, your answer should be correct. As it is, it is focused on proving Archemar wrong, and this is not the case.

    – Gilles
    Oct 8 '14 at 7:59






  • 4





    well, at least now I know about awk's debugger ...

    – Archemar
    Oct 8 '14 at 8:51







3




3





@Archemar: Your answer indicate that ! is applied before ++.

– cuonglm
Oct 7 '14 at 5:37





@Archemar: Your answer indicate that ! is applied before ++.

– cuonglm
Oct 7 '14 at 5:37




6




6





This answer is wrong. The incrementation happens after the result of the ! operator is calculated. You are confusing operator precedence (!a[$0]++ is parsed like !(a[$0]++)) with order of evaluation (the assignment of the new value of a[$0] happens after the value of the expression has been calculated).

– Gilles
Oct 7 '14 at 17:21





This answer is wrong. The incrementation happens after the result of the ! operator is calculated. You are confusing operator precedence (!a[$0]++ is parsed like !(a[$0]++)) with order of evaluation (the assignment of the new value of a[$0] happens after the value of the expression has been calculated).

– Gilles
Oct 7 '14 at 17:21




5




5





@Gnouc It says right in the passage you quoted, and if it worked the way you described, this code wouldn't have the desired effect. First the value !x is calculated, where x is the old value of a[$0]. Then a[$0] is set to 1+x.

– Gilles
Oct 7 '14 at 17:26





@Gnouc It says right in the passage you quoted, and if it worked the way you described, this code wouldn't have the desired effect. First the value !x is calculated, where x is the old value of a[$0]. Then a[$0] is set to 1+x.

– Gilles
Oct 7 '14 at 17:26




7




7





I believe that your analysis of what awk does is correct. Sorry if I implied otherwise yesterday. However, your critique of Archemar's answer is wrong. Archemar does not misunderstand precedence, you do, you're confusing precedence with order of evaluation (see my previous comment). If you remove any mention of Archemar's answer in yours, your answer should be correct. As it is, it is focused on proving Archemar wrong, and this is not the case.

– Gilles
Oct 8 '14 at 7:59





I believe that your analysis of what awk does is correct. Sorry if I implied otherwise yesterday. However, your critique of Archemar's answer is wrong. Archemar does not misunderstand precedence, you do, you're confusing precedence with order of evaluation (see my previous comment). If you remove any mention of Archemar's answer in yours, your answer should be correct. As it is, it is focused on proving Archemar wrong, and this is not the case.

– Gilles
Oct 8 '14 at 7:59




4




4





well, at least now I know about awk's debugger ...

– Archemar
Oct 8 '14 at 8:51





well, at least now I know about awk's debugger ...

– Archemar
Oct 8 '14 at 8:51

















draft saved

draft discarded
















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f159695%2fhow-does-awk-a0-work%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown