awk paragraph does not work

up vote
2
down vote

favorite

I have downloaded the KingBase Lite 2018 Update 3 file from here. I now want to extract data from a single event such as the "FIDE Candidates 2018": I want to get all the paragraphs containing this text and the paragraph below it, so I have the whole pgn for each game.

To first just get the paragraph that contains the text, I followed these recommendations.

However, when I try awk -v RS='' -v ORS='nn' '/FIDE Candidates 2018/' KingBaseLite2018-03.pgn, it just prints the whole file. When I search for a word that does not exist, it does not print anything. So I assume it does the search correctly, but it somehow does not properly cut at new lines. There might be something awkward about the new line characters in that file. When I try other suggestions from the above link like using perl, I get the same result.

What can I do to get the paragraph now? And how can I include one paragraph below as well?

edited Jun 6 at 12:34

asked Jun 6 at 10:07

maddingl

386

2

Might be due to your use of RS? Anyhow, you should post a sample input and desired output to help those who want to help you reproduce your issue and test the solutions.
â€“Â simlev
Jun 6 at 10:29

3

The problem seems to be that the file has Windows-style CRLF line-endings
â€“Â steeldriver
Jun 6 at 11:35

2

... although it's not entirely equivalent, probably setting RS='rnrn' is sufficient in this case
â€“Â steeldriver
Jun 6 at 11:54

add a commentÂ |Â

up vote
2
down vote

favorite

To first just get the paragraph that contains the text, I followed these recommendations.

What can I do to get the paragraph now? And how can I include one paragraph below as well?

edited Jun 6 at 12:34

asked Jun 6 at 10:07

maddingl

386

2

Might be due to your use of RS? Anyhow, you should post a sample input and desired output to help those who want to help you reproduce your issue and test the solutions.
â€“Â simlev
Jun 6 at 10:29

3

The problem seems to be that the file has Windows-style CRLF line-endings
â€“Â steeldriver
Jun 6 at 11:35

2

... although it's not entirely equivalent, probably setting RS='rnrn' is sufficient in this case
â€“Â steeldriver
Jun 6 at 11:54

add a commentÂ |Â

up vote
2
down vote

favorite

To first just get the paragraph that contains the text, I followed these recommendations.

What can I do to get the paragraph now? And how can I include one paragraph below as well?

edited Jun 6 at 12:34

asked Jun 6 at 10:07

maddingl

386

To first just get the paragraph that contains the text, I followed these recommendations.

What can I do to get the paragraph now? And how can I include one paragraph below as well?

edited Jun 6 at 12:34

asked Jun 6 at 10:07

maddingl

386

edited Jun 6 at 12:34

asked Jun 6 at 10:07

maddingl

386

asked Jun 6 at 10:07

maddingl

386

asked Jun 6 at 10:07

maddingl

386

2

Might be due to your use of RS? Anyhow, you should post a sample input and desired output to help those who want to help you reproduce your issue and test the solutions.
â€“Â simlev
Jun 6 at 10:29

3

The problem seems to be that the file has Windows-style CRLF line-endings
â€“Â steeldriver
Jun 6 at 11:35

2

... although it's not entirely equivalent, probably setting RS='rnrn' is sufficient in this case
â€“Â steeldriver
Jun 6 at 11:54

add a commentÂ |Â

2

Might be due to your use of RS? Anyhow, you should post a sample input and desired output to help those who want to help you reproduce your issue and test the solutions.
â€“Â simlev
Jun 6 at 10:29

3

The problem seems to be that the file has Windows-style CRLF line-endings
â€“Â steeldriver
Jun 6 at 11:35

2

... although it's not entirely equivalent, probably setting RS='rnrn' is sufficient in this case
â€“Â steeldriver
Jun 6 at 11:54

Might be due to your use of RS? Anyhow, you should post a sample input and desired output to help those who want to help you reproduce your issue and test the solutions.
â€“Â simlev
Jun 6 at 10:29

The problem seems to be that the file has Windows-style CRLF line-endings
â€“Â steeldriver
Jun 6 at 11:35

... although it's not entirely equivalent, probably setting RS='rnrn' is sufficient in this case
â€“Â steeldriver
Jun 6 at 11:54

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

I downloaded and unzipped the file, and the line endings are CRLF, so you need to account for that, either by using a tool like fromdos, or if you don't want to modify the file, you can to tell Perl that you want it to do the translation with its :crlf PerlIO layer, which is what I'm doing below with the PERLIO environment variable. (There are other ways to change the layers, but this one was easiest for a one-liner.)

I'm using the flip-flop operator ... to extract only the paragraph that matches the regex plus the following one that matches /^1./ (since all the paragraphs in the file start with either [ or 1.).

wget http://kingbase-chess.net/download/650 -O KingBaseLite2018-03.zip
unzip KingBaseLite2018-03.zip
PERLIO=:crlf perl -00ne 'print if /"FIDE Candidates 2018"/.../^1./' KingBaseLite2018-03.pgn

edited Jun 6 at 11:45

answered Jun 6 at 11:38

haukex

2839

1

This works perfectly! Thank you very much :)
â€“Â maddingl
Jun 6 at 12:33

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f448149%2fawk-paragraph-does-not-work%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
3
down vote

accepted

wget http://kingbase-chess.net/download/650 -O KingBaseLite2018-03.zip
unzip KingBaseLite2018-03.zip
PERLIO=:crlf perl -00ne 'print if /"FIDE Candidates 2018"/.../^1./' KingBaseLite2018-03.pgn

edited Jun 6 at 11:45

answered Jun 6 at 11:38

haukex

2839

1

This works perfectly! Thank you very much :)
â€“Â maddingl
Jun 6 at 12:33

add a commentÂ |Â

up vote
3
down vote

accepted

wget http://kingbase-chess.net/download/650 -O KingBaseLite2018-03.zip
unzip KingBaseLite2018-03.zip
PERLIO=:crlf perl -00ne 'print if /"FIDE Candidates 2018"/.../^1./' KingBaseLite2018-03.pgn

edited Jun 6 at 11:45

answered Jun 6 at 11:38

haukex

2839

1

This works perfectly! Thank you very much :)
â€“Â maddingl
Jun 6 at 12:33

add a commentÂ |Â

up vote
3
down vote

accepted

wget http://kingbase-chess.net/download/650 -O KingBaseLite2018-03.zip
unzip KingBaseLite2018-03.zip
PERLIO=:crlf perl -00ne 'print if /"FIDE Candidates 2018"/.../^1./' KingBaseLite2018-03.pgn

edited Jun 6 at 11:45

answered Jun 6 at 11:38

haukex

2839

wget http://kingbase-chess.net/download/650 -O KingBaseLite2018-03.zip
unzip KingBaseLite2018-03.zip
PERLIO=:crlf perl -00ne 'print if /"FIDE Candidates 2018"/.../^1./' KingBaseLite2018-03.pgn

edited Jun 6 at 11:45

answered Jun 6 at 11:38

haukex

2839

edited Jun 6 at 11:45

answered Jun 6 at 11:38

haukex

2839

answered Jun 6 at 11:38

haukex

2839

answered Jun 6 at 11:38

haukex

2839

1

This works perfectly! Thank you very much :)
â€“Â maddingl
Jun 6 at 12:33

add a commentÂ |Â

1

This works perfectly! Thank you very much :)
â€“Â maddingl
Jun 6 at 12:33

This works perfectly! Thank you very much :)
â€“Â maddingl
Jun 6 at 12:33

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu