awk paragraph does not work
Clash Royale CLAN TAG#URR8PPP
up vote
2
down vote
favorite
I have downloaded the KingBase Lite 2018 Update 3 file from here. I now want to extract data from a single event such as the "FIDE Candidates 2018": I want to get all the paragraphs containing this text and the paragraph below it, so I have the whole pgn for each game.
To first just get the paragraph that contains the text, I followed these recommendations.
However, when I try awk -v RS='' -v ORS='nn' '/FIDE Candidates 2018/' KingBaseLite2018-03.pgn
, it just prints the whole file. When I search for a word that does not exist, it does not print anything. So I assume it does the search correctly, but it somehow does not properly cut at new lines. There might be something awkward about the new line characters in that file. When I try other suggestions from the above link like using perl, I get the same result.
What can I do to get the paragraph now? And how can I include one paragraph below as well?
awk sed grep perl
add a comment |Â
up vote
2
down vote
favorite
I have downloaded the KingBase Lite 2018 Update 3 file from here. I now want to extract data from a single event such as the "FIDE Candidates 2018": I want to get all the paragraphs containing this text and the paragraph below it, so I have the whole pgn for each game.
To first just get the paragraph that contains the text, I followed these recommendations.
However, when I try awk -v RS='' -v ORS='nn' '/FIDE Candidates 2018/' KingBaseLite2018-03.pgn
, it just prints the whole file. When I search for a word that does not exist, it does not print anything. So I assume it does the search correctly, but it somehow does not properly cut at new lines. There might be something awkward about the new line characters in that file. When I try other suggestions from the above link like using perl, I get the same result.
What can I do to get the paragraph now? And how can I include one paragraph below as well?
awk sed grep perl
2
Might be due to your use ofRS
? Anyhow, you should post a sample input and desired output to help those who want to help you reproduce your issue and test the solutions.
â simlev
Jun 6 at 10:29
3
The problem seems to be that the file has Windows-style CRLF line-endings
â steeldriver
Jun 6 at 11:35
2
... although it's not entirely equivalent, probably settingRS='rnrn'
is sufficient in this case
â steeldriver
Jun 6 at 11:54
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I have downloaded the KingBase Lite 2018 Update 3 file from here. I now want to extract data from a single event such as the "FIDE Candidates 2018": I want to get all the paragraphs containing this text and the paragraph below it, so I have the whole pgn for each game.
To first just get the paragraph that contains the text, I followed these recommendations.
However, when I try awk -v RS='' -v ORS='nn' '/FIDE Candidates 2018/' KingBaseLite2018-03.pgn
, it just prints the whole file. When I search for a word that does not exist, it does not print anything. So I assume it does the search correctly, but it somehow does not properly cut at new lines. There might be something awkward about the new line characters in that file. When I try other suggestions from the above link like using perl, I get the same result.
What can I do to get the paragraph now? And how can I include one paragraph below as well?
awk sed grep perl
I have downloaded the KingBase Lite 2018 Update 3 file from here. I now want to extract data from a single event such as the "FIDE Candidates 2018": I want to get all the paragraphs containing this text and the paragraph below it, so I have the whole pgn for each game.
To first just get the paragraph that contains the text, I followed these recommendations.
However, when I try awk -v RS='' -v ORS='nn' '/FIDE Candidates 2018/' KingBaseLite2018-03.pgn
, it just prints the whole file. When I search for a word that does not exist, it does not print anything. So I assume it does the search correctly, but it somehow does not properly cut at new lines. There might be something awkward about the new line characters in that file. When I try other suggestions from the above link like using perl, I get the same result.
What can I do to get the paragraph now? And how can I include one paragraph below as well?
awk sed grep perl
edited Jun 6 at 12:34
asked Jun 6 at 10:07
maddingl
386
386
2
Might be due to your use ofRS
? Anyhow, you should post a sample input and desired output to help those who want to help you reproduce your issue and test the solutions.
â simlev
Jun 6 at 10:29
3
The problem seems to be that the file has Windows-style CRLF line-endings
â steeldriver
Jun 6 at 11:35
2
... although it's not entirely equivalent, probably settingRS='rnrn'
is sufficient in this case
â steeldriver
Jun 6 at 11:54
add a comment |Â
2
Might be due to your use ofRS
? Anyhow, you should post a sample input and desired output to help those who want to help you reproduce your issue and test the solutions.
â simlev
Jun 6 at 10:29
3
The problem seems to be that the file has Windows-style CRLF line-endings
â steeldriver
Jun 6 at 11:35
2
... although it's not entirely equivalent, probably settingRS='rnrn'
is sufficient in this case
â steeldriver
Jun 6 at 11:54
2
2
Might be due to your use of
RS
? Anyhow, you should post a sample input and desired output to help those who want to help you reproduce your issue and test the solutions.â simlev
Jun 6 at 10:29
Might be due to your use of
RS
? Anyhow, you should post a sample input and desired output to help those who want to help you reproduce your issue and test the solutions.â simlev
Jun 6 at 10:29
3
3
The problem seems to be that the file has Windows-style CRLF line-endings
â steeldriver
Jun 6 at 11:35
The problem seems to be that the file has Windows-style CRLF line-endings
â steeldriver
Jun 6 at 11:35
2
2
... although it's not entirely equivalent, probably setting
RS='rnrn'
is sufficient in this caseâ steeldriver
Jun 6 at 11:54
... although it's not entirely equivalent, probably setting
RS='rnrn'
is sufficient in this caseâ steeldriver
Jun 6 at 11:54
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
3
down vote
accepted
I downloaded and unzipped the file, and the line endings are CRLF, so you need to account for that, either by using a tool like fromdos
, or if you don't want to modify the file, you can to tell Perl that you want it to do the translation with its :crlf
PerlIO layer, which is what I'm doing below with the PERLIO
environment variable. (There are other ways to change the layers, but this one was easiest for a one-liner.)
I'm using the flip-flop operator ...
to extract only the paragraph that matches the regex plus the following one that matches /^1./
(since all the paragraphs in the file start with either [
or 1.
).
wget http://kingbase-chess.net/download/650 -O KingBaseLite2018-03.zip
unzip KingBaseLite2018-03.zip
PERLIO=:crlf perl -00ne 'print if /"FIDE Candidates 2018"/.../^1./' KingBaseLite2018-03.pgn
1
This works perfectly! Thank you very much :)
â maddingl
Jun 6 at 12:33
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
3
down vote
accepted
I downloaded and unzipped the file, and the line endings are CRLF, so you need to account for that, either by using a tool like fromdos
, or if you don't want to modify the file, you can to tell Perl that you want it to do the translation with its :crlf
PerlIO layer, which is what I'm doing below with the PERLIO
environment variable. (There are other ways to change the layers, but this one was easiest for a one-liner.)
I'm using the flip-flop operator ...
to extract only the paragraph that matches the regex plus the following one that matches /^1./
(since all the paragraphs in the file start with either [
or 1.
).
wget http://kingbase-chess.net/download/650 -O KingBaseLite2018-03.zip
unzip KingBaseLite2018-03.zip
PERLIO=:crlf perl -00ne 'print if /"FIDE Candidates 2018"/.../^1./' KingBaseLite2018-03.pgn
1
This works perfectly! Thank you very much :)
â maddingl
Jun 6 at 12:33
add a comment |Â
up vote
3
down vote
accepted
I downloaded and unzipped the file, and the line endings are CRLF, so you need to account for that, either by using a tool like fromdos
, or if you don't want to modify the file, you can to tell Perl that you want it to do the translation with its :crlf
PerlIO layer, which is what I'm doing below with the PERLIO
environment variable. (There are other ways to change the layers, but this one was easiest for a one-liner.)
I'm using the flip-flop operator ...
to extract only the paragraph that matches the regex plus the following one that matches /^1./
(since all the paragraphs in the file start with either [
or 1.
).
wget http://kingbase-chess.net/download/650 -O KingBaseLite2018-03.zip
unzip KingBaseLite2018-03.zip
PERLIO=:crlf perl -00ne 'print if /"FIDE Candidates 2018"/.../^1./' KingBaseLite2018-03.pgn
1
This works perfectly! Thank you very much :)
â maddingl
Jun 6 at 12:33
add a comment |Â
up vote
3
down vote
accepted
up vote
3
down vote
accepted
I downloaded and unzipped the file, and the line endings are CRLF, so you need to account for that, either by using a tool like fromdos
, or if you don't want to modify the file, you can to tell Perl that you want it to do the translation with its :crlf
PerlIO layer, which is what I'm doing below with the PERLIO
environment variable. (There are other ways to change the layers, but this one was easiest for a one-liner.)
I'm using the flip-flop operator ...
to extract only the paragraph that matches the regex plus the following one that matches /^1./
(since all the paragraphs in the file start with either [
or 1.
).
wget http://kingbase-chess.net/download/650 -O KingBaseLite2018-03.zip
unzip KingBaseLite2018-03.zip
PERLIO=:crlf perl -00ne 'print if /"FIDE Candidates 2018"/.../^1./' KingBaseLite2018-03.pgn
I downloaded and unzipped the file, and the line endings are CRLF, so you need to account for that, either by using a tool like fromdos
, or if you don't want to modify the file, you can to tell Perl that you want it to do the translation with its :crlf
PerlIO layer, which is what I'm doing below with the PERLIO
environment variable. (There are other ways to change the layers, but this one was easiest for a one-liner.)
I'm using the flip-flop operator ...
to extract only the paragraph that matches the regex plus the following one that matches /^1./
(since all the paragraphs in the file start with either [
or 1.
).
wget http://kingbase-chess.net/download/650 -O KingBaseLite2018-03.zip
unzip KingBaseLite2018-03.zip
PERLIO=:crlf perl -00ne 'print if /"FIDE Candidates 2018"/.../^1./' KingBaseLite2018-03.pgn
edited Jun 6 at 11:45
answered Jun 6 at 11:38
haukex
2839
2839
1
This works perfectly! Thank you very much :)
â maddingl
Jun 6 at 12:33
add a comment |Â
1
This works perfectly! Thank you very much :)
â maddingl
Jun 6 at 12:33
1
1
This works perfectly! Thank you very much :)
â maddingl
Jun 6 at 12:33
This works perfectly! Thank you very much :)
â maddingl
Jun 6 at 12:33
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f448149%2fawk-paragraph-does-not-work%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
2
Might be due to your use of
RS
? Anyhow, you should post a sample input and desired output to help those who want to help you reproduce your issue and test the solutions.â simlev
Jun 6 at 10:29
3
The problem seems to be that the file has Windows-style CRLF line-endings
â steeldriver
Jun 6 at 11:35
2
... although it's not entirely equivalent, probably setting
RS='rnrn'
is sufficient in this caseâ steeldriver
Jun 6 at 11:54