Having trouble parsing a csv file from the commandline

I have a CSV file that I've been working on for most of the day and I'm not having any luck getting it parsed properly using a Regex with awk.

awk is not processing the Regex as expected.

Here are the inputs:

GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p2, GNU MP 6.1.2)

Regex: /([(.*?)])|[^,]+/g

Sample text hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3],[recipe1, recipe2, recipe3],2019-01-10 06:06:31

Raw text (before stripping out double-quotes, which I do at a step not listed explicitly in this question):
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31

When I run this through Regexr.com, it shows the proper matches:
Screencap from Regexr

I pipe from cat -> sed -> awk (the sample text above is what comes out of sed) and run the following command (I only want the first 9 fields, which includes the entirety of the first field enclosed in , but not anything after that):

awk '/([(.*?)])|[^,]+/gprint $1,$2,$3,$4,$5,$6,$7,$8,$9'

What I'm expecting to see as an output:
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

NOTE: The important part about this is treating the field with the roles (between the brackets) as a single field, or at least including all of the roles in the output, but none of the recipes)

What I'm actually getting out is the full line that was fed in.

From playing around with the variables, I found the following field assigments coming out of awk:

$1 = hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1,

$2 = role2,

$3 = role3],[recipe1,

$4 = recipe2,

$5 = recipe3],2019-01-10

$6 = 06:06:31

I've tried using the accepted answer from This Stack Overflow question outright, and I've tried tweaking it to use as the delimiters instead of ", and that gets me closer, but it's still not treating the role field as a single field.

edited Jan 11 at 22:13

Rui F Ribeiro

39.6k1479132

asked Jan 11 at 20:22

Taegost

1155

What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

– Kusalananda
Jan 11 at 20:31

Actually, what are you trying to do with the regex generally? Which lines do you want to print?

– terdon♦
Jan 11 at 20:32

I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

– Taegost
Jan 11 at 20:34

@Taegost OK, but what is the regex doing? Your print statement already does that.

– terdon♦
Jan 11 at 20:38

add a comment |

I have a CSV file that I've been working on for most of the day and I'm not having any luck getting it parsed properly using a Regex with awk.

awk is not processing the Regex as expected.

Here are the inputs:

GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p2, GNU MP 6.1.2)

Regex: /([(.*?)])|[^,]+/g

Sample text hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3],[recipe1, recipe2, recipe3],2019-01-10 06:06:31

Raw text (before stripping out double-quotes, which I do at a step not listed explicitly in this question):
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31

When I run this through Regexr.com, it shows the proper matches:
Screencap from Regexr

awk '/([(.*?)])|[^,]+/gprint $1,$2,$3,$4,$5,$6,$7,$8,$9'

What I'm expecting to see as an output:
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

NOTE: The important part about this is treating the field with the roles (between the brackets) as a single field, or at least including all of the roles in the output, but none of the recipes)

What I'm actually getting out is the full line that was fed in.

From playing around with the variables, I found the following field assigments coming out of awk:

$1 = hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1,

$2 = role2,

$3 = role3],[recipe1,

$4 = recipe2,

$5 = recipe3],2019-01-10

$6 = 06:06:31

edited Jan 11 at 22:13

Rui F Ribeiro

39.6k1479132

asked Jan 11 at 20:22

Taegost

1155

What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

– Kusalananda
Jan 11 at 20:31

Actually, what are you trying to do with the regex generally? Which lines do you want to print?

– terdon♦
Jan 11 at 20:32

I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

– Taegost
Jan 11 at 20:34

@Taegost OK, but what is the regex doing? Your print statement already does that.

– terdon♦
Jan 11 at 20:38

add a comment |

I have a CSV file that I've been working on for most of the day and I'm not having any luck getting it parsed properly using a Regex with awk.

awk is not processing the Regex as expected.

Here are the inputs:

GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p2, GNU MP 6.1.2)

Regex: /([(.*?)])|[^,]+/g

Sample text hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3],[recipe1, recipe2, recipe3],2019-01-10 06:06:31

Raw text (before stripping out double-quotes, which I do at a step not listed explicitly in this question):
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31

When I run this through Regexr.com, it shows the proper matches:
Screencap from Regexr

awk '/([(.*?)])|[^,]+/gprint $1,$2,$3,$4,$5,$6,$7,$8,$9'

What I'm expecting to see as an output:
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

NOTE: The important part about this is treating the field with the roles (between the brackets) as a single field, or at least including all of the roles in the output, but none of the recipes)

What I'm actually getting out is the full line that was fed in.

From playing around with the variables, I found the following field assigments coming out of awk:

$1 = hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1,

$2 = role2,

$3 = role3],[recipe1,

$4 = recipe2,

$5 = recipe3],2019-01-10

$6 = 06:06:31

edited Jan 11 at 22:13

Rui F Ribeiro

39.6k1479132

asked Jan 11 at 20:22

Taegost

1155

I have a CSV file that I've been working on for most of the day and I'm not having any luck getting it parsed properly using a Regex with awk.

awk is not processing the Regex as expected.

Here are the inputs:

GNU Awk 4.1.4, API: 1.1 (GNU MPFR 3.1.5-p2, GNU MP 6.1.2)

Regex: /([(.*?)])|[^,]+/g

Sample text hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3],[recipe1, recipe2, recipe3],2019-01-10 06:06:31

Raw text (before stripping out double-quotes, which I do at a step not listed explicitly in this question):
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31

When I run this through Regexr.com, it shows the proper matches:
Screencap from Regexr

awk '/([(.*?)])|[^,]+/gprint $1,$2,$3,$4,$5,$6,$7,$8,$9'

What I'm expecting to see as an output:
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

NOTE: The important part about this is treating the field with the roles (between the brackets) as a single field, or at least including all of the roles in the output, but none of the recipes)

What I'm actually getting out is the full line that was fed in.

From playing around with the variables, I found the following field assigments coming out of awk:

$1 = hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1,

$2 = role2,

$3 = role3],[recipe1,

$4 = recipe2,

$5 = recipe3],2019-01-10

$6 = 06:06:31

awk regular-expression csv

edited Jan 11 at 22:13

Rui F Ribeiro

39.6k1479132

asked Jan 11 at 20:22

Taegost

1155

edited Jan 11 at 22:13

Rui F Ribeiro

39.6k1479132

asked Jan 11 at 20:22

Taegost

1155

edited Jan 11 at 22:13

Rui F Ribeiro

39.6k1479132

edited Jan 11 at 22:13

Rui F Ribeiro

39.6k1479132

edited Jan 11 at 22:13

Rui F Ribeiro

39.6k1479132

asked Jan 11 at 20:22

Taegost

1155

asked Jan 11 at 20:22

Taegost

1155

asked Jan 11 at 20:22

Taegost

1155

What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

– Kusalananda
Jan 11 at 20:31

Actually, what are you trying to do with the regex generally? Which lines do you want to print?

– terdon♦
Jan 11 at 20:32

I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

– Taegost
Jan 11 at 20:34

@Taegost OK, but what is the regex doing? Your print statement already does that.

– terdon♦
Jan 11 at 20:38

add a comment |

What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

– Kusalananda
Jan 11 at 20:31

Actually, what are you trying to do with the regex generally? Which lines do you want to print?

– terdon♦
Jan 11 at 20:32

I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

– Taegost
Jan 11 at 20:34

@Taegost OK, but what is the regex doing? Your print statement already does that.

– terdon♦
Jan 11 at 20:38

What do you intend for the g after the regular expression to mean? This would be a syntax error in an extended regular expression (which awk uses).

– Kusalananda
Jan 11 at 20:31

Actually, what are you trying to do with the regex generally? Which lines do you want to print?

– terdon♦
Jan 11 at 20:32

I only included the g because Regexr did, in the current state, it doesn't change the output at all. @terdon - I want it to print all of hte fields up to (and including) the field that contains the 3 roles, but nothing after that

– Taegost
Jan 11 at 20:34

@Taegost OK, but what is the regex doing? Your print statement already does that.

– terdon♦
Jan 11 at 20:38

add a comment |

2 Answers
2

active

oldest

votes

By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:

awk -F, '...'

To have awk print comma separated output, you need to set the OFS variable:

awk -F, -vOFS=, '...'

The real difficulty here is that you are then trying to treat [role1, role2, role3] as a single field, but that's 3 fields. There are commas there, so that will be split into [role1, role2 and role3]. If you know there will always be exactly 3 fields there, it's easy:

$ awk -F, -vOFS=, 'print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:

$ awk -F']' -vOFS=, 'gsub(/"/,"");print $1"]"' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

The trick is to use ] as the field separator and tell awk to only print the 1st field. This will print everything up to the first ]. We then add back the ] (since that is removed when the fields are built). The gsub removes all the quotes.

edited Jan 11 at 21:23

answered Jan 11 at 20:33

terdon♦

129k32253430

That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

– Taegost
Jan 11 at 20:36

@Taegost they might be for csv, but they aren't for awk.

– terdon♦
Jan 11 at 20:39

Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

– Taegost
Jan 11 at 20:42

If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

– DopeGhoti
Jan 11 at 20:52

@Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

– terdon♦
Jan 11 at 21:24

|
show 1 more comment

If you're dealing with a complex CSV file - in particular, one whose fields may contain quoted delimiters (in this case, commas) then a proper CSV parser is going to save a lot of headaches e.g. with csvtool

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
 csvtool col 1-9 -
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"

or (to remove the quotes)

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
 csvtool col 1-9 - | tr -d '"'
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.

perl -MText::CSV -lpe '
 BEGIN$p = Text::CSV->new() 
 $_ = join ",", map $_ = s/"//gr ($p->fields())[0..8] if $p->parse($_)
'

edited Jan 11 at 21:33

answered Jan 11 at 20:52

steeldriver

35.7k35286

This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

– Taegost
Jan 11 at 21:41

@Taegost if you figure out how to do it in Ruby, please add that as an answer!

– steeldriver
Jan 11 at 21:45

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f494013%2fhaving-trouble-parsing-a-csv-file-from-the-commandline%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:

awk -F, '...'

To have awk print comma separated output, you need to set the OFS variable:

awk -F, -vOFS=, '...'

$ awk -F, -vOFS=, 'print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:

$ awk -F']' -vOFS=, 'gsub(/"/,"");print $1"]"' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

edited Jan 11 at 21:23

answered Jan 11 at 20:33

terdon♦

129k32253430

That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

– Taegost
Jan 11 at 20:36

@Taegost they might be for csv, but they aren't for awk.

– terdon♦
Jan 11 at 20:39

Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

– Taegost
Jan 11 at 20:42

If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

– DopeGhoti
Jan 11 at 20:52

@Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

– terdon♦
Jan 11 at 21:24

|
show 1 more comment

By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:

awk -F, '...'

To have awk print comma separated output, you need to set the OFS variable:

awk -F, -vOFS=, '...'

$ awk -F, -vOFS=, 'print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:

$ awk -F']' -vOFS=, 'gsub(/"/,"");print $1"]"' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

edited Jan 11 at 21:23

answered Jan 11 at 20:33

terdon♦

129k32253430

That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

– Taegost
Jan 11 at 20:36

@Taegost they might be for csv, but they aren't for awk.

– terdon♦
Jan 11 at 20:39

Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

– Taegost
Jan 11 at 20:42

If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

– DopeGhoti
Jan 11 at 20:52

@Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

– terdon♦
Jan 11 at 21:24

|
show 1 more comment

By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:

awk -F, '...'

To have awk print comma separated output, you need to set the OFS variable:

awk -F, -vOFS=, '...'

$ awk -F, -vOFS=, 'print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:

$ awk -F']' -vOFS=, 'gsub(/"/,"");print $1"]"' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

edited Jan 11 at 21:23

answered Jan 11 at 20:33

terdon♦

129k32253430

By default, awk will use whitespace to define fields, which explains why you're getting the output you see. Since you want to use a comma to separate the fields, you need to say so with -F:

awk -F, '...'

To have awk print comma separated output, you need to set the OFS variable:

awk -F, -vOFS=, '...'

$ awk -F, -vOFS=, 'print $1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

However, based on the raw data you have now added, while a proper CSV parser will always be the better approach, you can still do it in awk. Just run this on your original input data:

$ awk -F']' -vOFS=, 'gsub(/"/,"");print $1"]"' file
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

edited Jan 11 at 21:23

answered Jan 11 at 20:33

terdon♦

129k32253430

edited Jan 11 at 21:23

answered Jan 11 at 20:33

terdon♦

129k32253430

answered Jan 11 at 20:33

terdon♦

129k32253430

answered Jan 11 at 20:33

terdon♦

129k32253430

That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

– Taegost
Jan 11 at 20:36

@Taegost they might be for csv, but they aren't for awk.

– terdon♦
Jan 11 at 20:39

Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

– Taegost
Jan 11 at 20:42

If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

– DopeGhoti
Jan 11 at 20:52

@Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

– terdon♦
Jan 11 at 21:24

|
show 1 more comment

That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

– Taegost
Jan 11 at 20:36

@Taegost they might be for csv, but they aren't for awk.

– terdon♦
Jan 11 at 20:39

Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

– Taegost
Jan 11 at 20:42

If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

– DopeGhoti
Jan 11 at 20:52

@Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

– terdon♦
Jan 11 at 21:24

That field with the roles and the recipes are a single data field, which is the problem. I stripped out the quotation marks that were in it initially because I couldn't get any Regex pattern to treat it as a single field, but still separate the rest on a comma. I'll update the question with the raw data, that may help. It'll take me a few minutes

– Taegost
Jan 11 at 20:36

@Taegost they might be for csv, but they aren't for awk.

– terdon♦
Jan 11 at 20:39

Correct, which is the entire point of the regex: To separate the fields by a comma, except for the ones that include multiple entries (the role and recipe fields), where everything within the brackets needs to be treated as a single field

– Taegost
Jan 11 at 20:42

If you're using the same character to delimit data and also as part of the data, you're going to have a bad time.

– DopeGhoti
Jan 11 at 20:52

@Taegost given your input data, you can use the fact that you only want the text until the first ]. But a dedicated CSV parser is a safer approach.

– terdon♦
Jan 11 at 21:24

|
show 1 more comment

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
 csvtool col 1-9 -
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"

or (to remove the quotes)

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
 csvtool col 1-9 - | tr -d '"'
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.

perl -MText::CSV -lpe '
 BEGIN$p = Text::CSV->new() 
 $_ = join ",", map $_ = s/"//gr ($p->fields())[0..8] if $p->parse($_)
'

edited Jan 11 at 21:33

answered Jan 11 at 20:52

steeldriver

35.7k35286

This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

– Taegost
Jan 11 at 21:41

@Taegost if you figure out how to do it in Ruby, please add that as an answer!

– steeldriver
Jan 11 at 21:45

add a comment |

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
 csvtool col 1-9 -
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"

or (to remove the quotes)

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
 csvtool col 1-9 - | tr -d '"'
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.

perl -MText::CSV -lpe '
 BEGIN$p = Text::CSV->new() 
 $_ = join ",", map $_ = s/"//gr ($p->fields())[0..8] if $p->parse($_)
'

edited Jan 11 at 21:33

answered Jan 11 at 20:52

steeldriver

35.7k35286

This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

– Taegost
Jan 11 at 21:41

@Taegost if you figure out how to do it in Ruby, please add that as an answer!

– steeldriver
Jan 11 at 21:45

add a comment |

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
 csvtool col 1-9 -
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"

or (to remove the quotes)

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
 csvtool col 1-9 - | tr -d '"'
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.

perl -MText::CSV -lpe '
 BEGIN$p = Text::CSV->new() 
 $_ = join ",", map $_ = s/"//gr ($p->fields())[0..8] if $p->parse($_)
'

edited Jan 11 at 21:33

answered Jan 11 at 20:52

steeldriver

35.7k35286

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
 csvtool col 1-9 -
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]"

or (to remove the quotes)

$ echo 'hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,"[""role1"", ""role2"", ""role3""]","[""recipe1"", ""recipe2"", ""recipe3""]",2019-01-10 06:06:31' | 
 csvtool col 1-9 - | tr -d '"'
hostname,hostname.domain.com,hostname.domain.com,windows,6.2.9200,1.2.3,location,environment,[role1, role2, role3]

If you can't obtain a standalone CSV parser such as csvtool, then both Perl and Python have CSV modules e.g.

perl -MText::CSV -lpe '
 BEGIN$p = Text::CSV->new() 
 $_ = join ",", map $_ = s/"//gr ($p->fields())[0..8] if $p->parse($_)
'

edited Jan 11 at 21:33

answered Jan 11 at 20:52

steeldriver

35.7k35286

edited Jan 11 at 21:33

answered Jan 11 at 20:52

steeldriver

35.7k35286

answered Jan 11 at 20:52

steeldriver

35.7k35286

answered Jan 11 at 20:52

steeldriver

35.7k35286

This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

– Taegost
Jan 11 at 21:41

@Taegost if you figure out how to do it in Ruby, please add that as an answer!

– steeldriver
Jan 11 at 21:45

add a comment |

This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

– Taegost
Jan 11 at 21:41

@Taegost if you figure out how to do it in Ruby, please add that as an answer!

– steeldriver
Jan 11 at 21:45

This is an excellent answer and is definitely the right way to go. Unfortunately installing 3rd party tools is locked way down and I don't have Perl or Python (But I do have Ruby, didn't think to try that). Thank you for taking the time to write this up, in the future this will definitely be my preferred approach!

– Taegost
Jan 11 at 21:41

@Taegost if you figure out how to do it in Ruby, please add that as an answer!

– steeldriver
Jan 11 at 21:45

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu