Finding incorrect YAML headers
Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty margin-bottom:0;
up vote
2
down vote
favorite
I am trying to identify which files in my project have incorrect headers. The files all starts like this
---
header:
.
.
.
title:
some header:
.
.
.
more headers:
level:
.
.
.
---
Where . . . only represents more headers. The headers contains no indentation. Using the following expression I have been able to extract the YAML header from every file.
grep -Przo --include=*.md "^---(.|n)*?---" .
Now I want to list the incorrect YAML headers.
- Every YAML header must have a
title: some text
- Every YAML header must have
language: [a-z]2
- It must either contain a
external: .*
orauthor: .*
. - The placement of
title:
,level:
,external:
andlanguage:
varies.
I tried to do something like
grep -L --include=*.md -e "external: .*" -e "author: .* ."
However the problem with this is that it searches the entire file, not just the YAML header. So I guess solving the issues above boils down to how I can feed the YAML header result from my previous search into grep again. I tried
grep -Przo --include=*.md "^---(.|n)*?---" . | xargs -0 grep "title:";
However this gave me an error "No such file or directory", so I am a bit uncertain how to proceed.
Examples:
---
title: Rull-en-ball
level: 1
author: Transkribert og oversatt fra [Unity3D](http://unity3d.com)
translator: Bjørn Fjukstad
license: Oversatt fra [unity3d.com](https://unity3d.com/learn/tutorials/projects/roll-ball-tutorial)
language: nb
---
Correct YAML, has an author, language and title.
---
title: Mini Golf
level: 2
language: en
external: http://appinventor.mit.edu/explore/ai2/minigolf.html
---
Correct YAML, has a title, language, and external instead of author.
---
title: 'Stjerner og galakser'
level: 2
logo: ../../assets/img/ccuk_logo.png
license: '[Code Club World Limited Terms of Service](https://github.com/CodeClub/scratch-curriculum/blob/master/LICENSE.md)'
translator: 'Ole Andreas Ramsdal'
language: nb
---
Incorrect YAML header, missing author.
grep yaml
add a comment |Â
up vote
2
down vote
favorite
I am trying to identify which files in my project have incorrect headers. The files all starts like this
---
header:
.
.
.
title:
some header:
.
.
.
more headers:
level:
.
.
.
---
Where . . . only represents more headers. The headers contains no indentation. Using the following expression I have been able to extract the YAML header from every file.
grep -Przo --include=*.md "^---(.|n)*?---" .
Now I want to list the incorrect YAML headers.
- Every YAML header must have a
title: some text
- Every YAML header must have
language: [a-z]2
- It must either contain a
external: .*
orauthor: .*
. - The placement of
title:
,level:
,external:
andlanguage:
varies.
I tried to do something like
grep -L --include=*.md -e "external: .*" -e "author: .* ."
However the problem with this is that it searches the entire file, not just the YAML header. So I guess solving the issues above boils down to how I can feed the YAML header result from my previous search into grep again. I tried
grep -Przo --include=*.md "^---(.|n)*?---" . | xargs -0 grep "title:";
However this gave me an error "No such file or directory", so I am a bit uncertain how to proceed.
Examples:
---
title: Rull-en-ball
level: 1
author: Transkribert og oversatt fra [Unity3D](http://unity3d.com)
translator: Bjørn Fjukstad
license: Oversatt fra [unity3d.com](https://unity3d.com/learn/tutorials/projects/roll-ball-tutorial)
language: nb
---
Correct YAML, has an author, language and title.
---
title: Mini Golf
level: 2
language: en
external: http://appinventor.mit.edu/explore/ai2/minigolf.html
---
Correct YAML, has a title, language, and external instead of author.
---
title: 'Stjerner og galakser'
level: 2
logo: ../../assets/img/ccuk_logo.png
license: '[Code Club World Limited Terms of Service](https://github.com/CodeClub/scratch-curriculum/blob/master/LICENSE.md)'
translator: 'Ole Andreas Ramsdal'
language: nb
---
Incorrect YAML header, missing author.
grep yaml
Could you replace the. . .
with actual data, including "correct" headers as well as "incorrect" headers, so that we know when a solution is working as intended?
â Jeff Schaller
Jul 20 at 13:35
Also, the yaml I've seen (for Ansible) has indentation; does yours?
â Jeff Schaller
Jul 20 at 13:36
@JeffSchaller, no indentation. I will update my question accordingly.
â Ãistein Søvik
Jul 20 at 13:40
add a comment |Â
up vote
2
down vote
favorite
up vote
2
down vote
favorite
I am trying to identify which files in my project have incorrect headers. The files all starts like this
---
header:
.
.
.
title:
some header:
.
.
.
more headers:
level:
.
.
.
---
Where . . . only represents more headers. The headers contains no indentation. Using the following expression I have been able to extract the YAML header from every file.
grep -Przo --include=*.md "^---(.|n)*?---" .
Now I want to list the incorrect YAML headers.
- Every YAML header must have a
title: some text
- Every YAML header must have
language: [a-z]2
- It must either contain a
external: .*
orauthor: .*
. - The placement of
title:
,level:
,external:
andlanguage:
varies.
I tried to do something like
grep -L --include=*.md -e "external: .*" -e "author: .* ."
However the problem with this is that it searches the entire file, not just the YAML header. So I guess solving the issues above boils down to how I can feed the YAML header result from my previous search into grep again. I tried
grep -Przo --include=*.md "^---(.|n)*?---" . | xargs -0 grep "title:";
However this gave me an error "No such file or directory", so I am a bit uncertain how to proceed.
Examples:
---
title: Rull-en-ball
level: 1
author: Transkribert og oversatt fra [Unity3D](http://unity3d.com)
translator: Bjørn Fjukstad
license: Oversatt fra [unity3d.com](https://unity3d.com/learn/tutorials/projects/roll-ball-tutorial)
language: nb
---
Correct YAML, has an author, language and title.
---
title: Mini Golf
level: 2
language: en
external: http://appinventor.mit.edu/explore/ai2/minigolf.html
---
Correct YAML, has a title, language, and external instead of author.
---
title: 'Stjerner og galakser'
level: 2
logo: ../../assets/img/ccuk_logo.png
license: '[Code Club World Limited Terms of Service](https://github.com/CodeClub/scratch-curriculum/blob/master/LICENSE.md)'
translator: 'Ole Andreas Ramsdal'
language: nb
---
Incorrect YAML header, missing author.
grep yaml
I am trying to identify which files in my project have incorrect headers. The files all starts like this
---
header:
.
.
.
title:
some header:
.
.
.
more headers:
level:
.
.
.
---
Where . . . only represents more headers. The headers contains no indentation. Using the following expression I have been able to extract the YAML header from every file.
grep -Przo --include=*.md "^---(.|n)*?---" .
Now I want to list the incorrect YAML headers.
- Every YAML header must have a
title: some text
- Every YAML header must have
language: [a-z]2
- It must either contain a
external: .*
orauthor: .*
. - The placement of
title:
,level:
,external:
andlanguage:
varies.
I tried to do something like
grep -L --include=*.md -e "external: .*" -e "author: .* ."
However the problem with this is that it searches the entire file, not just the YAML header. So I guess solving the issues above boils down to how I can feed the YAML header result from my previous search into grep again. I tried
grep -Przo --include=*.md "^---(.|n)*?---" . | xargs -0 grep "title:";
However this gave me an error "No such file or directory", so I am a bit uncertain how to proceed.
Examples:
---
title: Rull-en-ball
level: 1
author: Transkribert og oversatt fra [Unity3D](http://unity3d.com)
translator: Bjørn Fjukstad
license: Oversatt fra [unity3d.com](https://unity3d.com/learn/tutorials/projects/roll-ball-tutorial)
language: nb
---
Correct YAML, has an author, language and title.
---
title: Mini Golf
level: 2
language: en
external: http://appinventor.mit.edu/explore/ai2/minigolf.html
---
Correct YAML, has a title, language, and external instead of author.
---
title: 'Stjerner og galakser'
level: 2
logo: ../../assets/img/ccuk_logo.png
license: '[Code Club World Limited Terms of Service](https://github.com/CodeClub/scratch-curriculum/blob/master/LICENSE.md)'
translator: 'Ole Andreas Ramsdal'
language: nb
---
Incorrect YAML header, missing author.
grep yaml
edited Jul 20 at 13:47
asked Jul 19 at 18:03
Ãistein Søvik
304
304
Could you replace the. . .
with actual data, including "correct" headers as well as "incorrect" headers, so that we know when a solution is working as intended?
â Jeff Schaller
Jul 20 at 13:35
Also, the yaml I've seen (for Ansible) has indentation; does yours?
â Jeff Schaller
Jul 20 at 13:36
@JeffSchaller, no indentation. I will update my question accordingly.
â Ãistein Søvik
Jul 20 at 13:40
add a comment |Â
Could you replace the. . .
with actual data, including "correct" headers as well as "incorrect" headers, so that we know when a solution is working as intended?
â Jeff Schaller
Jul 20 at 13:35
Also, the yaml I've seen (for Ansible) has indentation; does yours?
â Jeff Schaller
Jul 20 at 13:36
@JeffSchaller, no indentation. I will update my question accordingly.
â Ãistein Søvik
Jul 20 at 13:40
Could you replace the
. . .
with actual data, including "correct" headers as well as "incorrect" headers, so that we know when a solution is working as intended?â Jeff Schaller
Jul 20 at 13:35
Could you replace the
. . .
with actual data, including "correct" headers as well as "incorrect" headers, so that we know when a solution is working as intended?â Jeff Schaller
Jul 20 at 13:35
Also, the yaml I've seen (for Ansible) has indentation; does yours?
â Jeff Schaller
Jul 20 at 13:36
Also, the yaml I've seen (for Ansible) has indentation; does yours?
â Jeff Schaller
Jul 20 at 13:36
@JeffSchaller, no indentation. I will update my question accordingly.
â Ãistein Søvik
Jul 20 at 13:40
@JeffSchaller, no indentation. I will update my question accordingly.
â Ãistein Søvik
Jul 20 at 13:40
add a comment |Â
1 Answer
1
active
oldest
votes
up vote
2
down vote
accepted
Here's one way to do it. I assume you have bash (to loop recursively through the files), sed, and awk. Instead of using bash, you could alternatively use find
with -exec
to search for the files.
The general flow is:
- ask bash for the list of
*.md
files, recursively - pass each file to
sed
to extract the YAML header - pass that YAML header to awk for validation
- if the header fails validation, print the filename
The script:
#!/bin/bash
shopt -s globstar
for file in **/*.md
do
# use sed for the header
sed -n /^---$/,/^---$/p "$file" |
awk '
BEGIN
good_title=0
good_lang=0
good_extaut=0
/^title: .*/ good_title=1
/^language: [a-z][a-z]$/ good_lang=1
/^author: .*/ good_extaut=1
/^external: .*/ good_extaut=1
END
if (good_title && good_lang && good_extaut)
exit 0
else
exit 1
'
|| printf "Incorrect header found in %sn" "$file"
done
You can easily adjust the regex matching patterns in the awk script to be stricter or looser, depending on your exact requirements (perhaps you want alphanumeric characters instead of "any", as the current .
in your example has).
The sed
statement extracts the YAML header by:
- suppressing default-printing (
-n
) - asking for a line of addresses that match the pattern: beginning of line,
---
, end of line; the second pattern must occur after the first pattern. - that range of addresses is then
p
rinted
The awk
script is a little over-built, but I wanted to spell it out for clarity. Each time awk is called, it sets three flag variables to zero or false. If we see lines that match our criteria, we set the corresponding flag to one/true. Once all the lines have been seen, we return success or failure based on the status of those flags -- they must all be true in order to "pass" validation.
With these appropriately-named sample files scattered into the current directory and a subdirectory:
$ tree .
.
âÂÂâÂÂâ bad1.md
âÂÂâÂÂâ good1.md
âÂÂâÂÂâ good2.md
âÂÂâÂÂâ subdir
âÂÂâÂÂâ bad1.md
âÂÂâÂÂâ good1.md
1 directory, 5 files
... the script outputs:
Incorrect header found in bad1.md
Incorrect header found in subdir/bad1.md
add a comment |Â
1 Answer
1
active
oldest
votes
1 Answer
1
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
2
down vote
accepted
Here's one way to do it. I assume you have bash (to loop recursively through the files), sed, and awk. Instead of using bash, you could alternatively use find
with -exec
to search for the files.
The general flow is:
- ask bash for the list of
*.md
files, recursively - pass each file to
sed
to extract the YAML header - pass that YAML header to awk for validation
- if the header fails validation, print the filename
The script:
#!/bin/bash
shopt -s globstar
for file in **/*.md
do
# use sed for the header
sed -n /^---$/,/^---$/p "$file" |
awk '
BEGIN
good_title=0
good_lang=0
good_extaut=0
/^title: .*/ good_title=1
/^language: [a-z][a-z]$/ good_lang=1
/^author: .*/ good_extaut=1
/^external: .*/ good_extaut=1
END
if (good_title && good_lang && good_extaut)
exit 0
else
exit 1
'
|| printf "Incorrect header found in %sn" "$file"
done
You can easily adjust the regex matching patterns in the awk script to be stricter or looser, depending on your exact requirements (perhaps you want alphanumeric characters instead of "any", as the current .
in your example has).
The sed
statement extracts the YAML header by:
- suppressing default-printing (
-n
) - asking for a line of addresses that match the pattern: beginning of line,
---
, end of line; the second pattern must occur after the first pattern. - that range of addresses is then
p
rinted
The awk
script is a little over-built, but I wanted to spell it out for clarity. Each time awk is called, it sets three flag variables to zero or false. If we see lines that match our criteria, we set the corresponding flag to one/true. Once all the lines have been seen, we return success or failure based on the status of those flags -- they must all be true in order to "pass" validation.
With these appropriately-named sample files scattered into the current directory and a subdirectory:
$ tree .
.
âÂÂâÂÂâ bad1.md
âÂÂâÂÂâ good1.md
âÂÂâÂÂâ good2.md
âÂÂâÂÂâ subdir
âÂÂâÂÂâ bad1.md
âÂÂâÂÂâ good1.md
1 directory, 5 files
... the script outputs:
Incorrect header found in bad1.md
Incorrect header found in subdir/bad1.md
add a comment |Â
up vote
2
down vote
accepted
Here's one way to do it. I assume you have bash (to loop recursively through the files), sed, and awk. Instead of using bash, you could alternatively use find
with -exec
to search for the files.
The general flow is:
- ask bash for the list of
*.md
files, recursively - pass each file to
sed
to extract the YAML header - pass that YAML header to awk for validation
- if the header fails validation, print the filename
The script:
#!/bin/bash
shopt -s globstar
for file in **/*.md
do
# use sed for the header
sed -n /^---$/,/^---$/p "$file" |
awk '
BEGIN
good_title=0
good_lang=0
good_extaut=0
/^title: .*/ good_title=1
/^language: [a-z][a-z]$/ good_lang=1
/^author: .*/ good_extaut=1
/^external: .*/ good_extaut=1
END
if (good_title && good_lang && good_extaut)
exit 0
else
exit 1
'
|| printf "Incorrect header found in %sn" "$file"
done
You can easily adjust the regex matching patterns in the awk script to be stricter or looser, depending on your exact requirements (perhaps you want alphanumeric characters instead of "any", as the current .
in your example has).
The sed
statement extracts the YAML header by:
- suppressing default-printing (
-n
) - asking for a line of addresses that match the pattern: beginning of line,
---
, end of line; the second pattern must occur after the first pattern. - that range of addresses is then
p
rinted
The awk
script is a little over-built, but I wanted to spell it out for clarity. Each time awk is called, it sets three flag variables to zero or false. If we see lines that match our criteria, we set the corresponding flag to one/true. Once all the lines have been seen, we return success or failure based on the status of those flags -- they must all be true in order to "pass" validation.
With these appropriately-named sample files scattered into the current directory and a subdirectory:
$ tree .
.
âÂÂâÂÂâ bad1.md
âÂÂâÂÂâ good1.md
âÂÂâÂÂâ good2.md
âÂÂâÂÂâ subdir
âÂÂâÂÂâ bad1.md
âÂÂâÂÂâ good1.md
1 directory, 5 files
... the script outputs:
Incorrect header found in bad1.md
Incorrect header found in subdir/bad1.md
add a comment |Â
up vote
2
down vote
accepted
up vote
2
down vote
accepted
Here's one way to do it. I assume you have bash (to loop recursively through the files), sed, and awk. Instead of using bash, you could alternatively use find
with -exec
to search for the files.
The general flow is:
- ask bash for the list of
*.md
files, recursively - pass each file to
sed
to extract the YAML header - pass that YAML header to awk for validation
- if the header fails validation, print the filename
The script:
#!/bin/bash
shopt -s globstar
for file in **/*.md
do
# use sed for the header
sed -n /^---$/,/^---$/p "$file" |
awk '
BEGIN
good_title=0
good_lang=0
good_extaut=0
/^title: .*/ good_title=1
/^language: [a-z][a-z]$/ good_lang=1
/^author: .*/ good_extaut=1
/^external: .*/ good_extaut=1
END
if (good_title && good_lang && good_extaut)
exit 0
else
exit 1
'
|| printf "Incorrect header found in %sn" "$file"
done
You can easily adjust the regex matching patterns in the awk script to be stricter or looser, depending on your exact requirements (perhaps you want alphanumeric characters instead of "any", as the current .
in your example has).
The sed
statement extracts the YAML header by:
- suppressing default-printing (
-n
) - asking for a line of addresses that match the pattern: beginning of line,
---
, end of line; the second pattern must occur after the first pattern. - that range of addresses is then
p
rinted
The awk
script is a little over-built, but I wanted to spell it out for clarity. Each time awk is called, it sets three flag variables to zero or false. If we see lines that match our criteria, we set the corresponding flag to one/true. Once all the lines have been seen, we return success or failure based on the status of those flags -- they must all be true in order to "pass" validation.
With these appropriately-named sample files scattered into the current directory and a subdirectory:
$ tree .
.
âÂÂâÂÂâ bad1.md
âÂÂâÂÂâ good1.md
âÂÂâÂÂâ good2.md
âÂÂâÂÂâ subdir
âÂÂâÂÂâ bad1.md
âÂÂâÂÂâ good1.md
1 directory, 5 files
... the script outputs:
Incorrect header found in bad1.md
Incorrect header found in subdir/bad1.md
Here's one way to do it. I assume you have bash (to loop recursively through the files), sed, and awk. Instead of using bash, you could alternatively use find
with -exec
to search for the files.
The general flow is:
- ask bash for the list of
*.md
files, recursively - pass each file to
sed
to extract the YAML header - pass that YAML header to awk for validation
- if the header fails validation, print the filename
The script:
#!/bin/bash
shopt -s globstar
for file in **/*.md
do
# use sed for the header
sed -n /^---$/,/^---$/p "$file" |
awk '
BEGIN
good_title=0
good_lang=0
good_extaut=0
/^title: .*/ good_title=1
/^language: [a-z][a-z]$/ good_lang=1
/^author: .*/ good_extaut=1
/^external: .*/ good_extaut=1
END
if (good_title && good_lang && good_extaut)
exit 0
else
exit 1
'
|| printf "Incorrect header found in %sn" "$file"
done
You can easily adjust the regex matching patterns in the awk script to be stricter or looser, depending on your exact requirements (perhaps you want alphanumeric characters instead of "any", as the current .
in your example has).
The sed
statement extracts the YAML header by:
- suppressing default-printing (
-n
) - asking for a line of addresses that match the pattern: beginning of line,
---
, end of line; the second pattern must occur after the first pattern. - that range of addresses is then
p
rinted
The awk
script is a little over-built, but I wanted to spell it out for clarity. Each time awk is called, it sets three flag variables to zero or false. If we see lines that match our criteria, we set the corresponding flag to one/true. Once all the lines have been seen, we return success or failure based on the status of those flags -- they must all be true in order to "pass" validation.
With these appropriately-named sample files scattered into the current directory and a subdirectory:
$ tree .
.
âÂÂâÂÂâ bad1.md
âÂÂâÂÂâ good1.md
âÂÂâÂÂâ good2.md
âÂÂâÂÂâ subdir
âÂÂâÂÂâ bad1.md
âÂÂâÂÂâ good1.md
1 directory, 5 files
... the script outputs:
Incorrect header found in bad1.md
Incorrect header found in subdir/bad1.md
answered Jul 20 at 21:28
Jeff Schaller
30.8k846104
30.8k846104
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f457273%2ffinding-incorrect-yaml-headers%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Could you replace the
. . .
with actual data, including "correct" headers as well as "incorrect" headers, so that we know when a solution is working as intended?â Jeff Schaller
Jul 20 at 13:35
Also, the yaml I've seen (for Ansible) has indentation; does yours?
â Jeff Schaller
Jul 20 at 13:36
@JeffSchaller, no indentation. I will update my question accordingly.
â Ãistein Søvik
Jul 20 at 13:40