How could I easily expand a list of numbers with hyphens replacing repeated parts?

ATTENTION! I have changed the RegEx and sample data so some answers could be wrong! I apologize if doing this is bad practice.

I used grep (online tool) to extract a list of data where repeated parts are sometimes substituted with hyphens (-o flag). The numbers are always 8 digits. There may be more 8-digit numbers following these
RegEx used was: [0-9]8(, -[0-9]*)*(, [0-9]8)*
Sample data below:

33520470
33520850, -60, -70, -80, -90, 33630077
25453810
13815206, -07, -08, 60682651, 60709994
13340820
61040146, -55
60819060, -79
60819088

And my desired output would be:

Could this be done with grep? If not, could you suggest any unix or other tools to achieve this result? I was thinking sed or awk.

EDIT: This has been solved. I will include the correct command here just for convenience to skip having to dig through the comments:

-F ', ' ' print $1; for(a=2;a <= NF; a ++) if(length($a) <= 7) printf("%s%sn",substr($1,1,length($1)-(length($a)-1)),substr($a, 2)) else print $a '

edited Feb 18 at 14:13

asked Feb 4 at 6:59

Sigmund Freud

You can't use grep for that. awk will work, but you will need a small program to do it.

– RalfFriedl
Feb 4 at 7:06

Are the hyphened parts always exactly two digits?

– Sparhawk
Feb 4 at 7:19

There may be up to 7, the hyphen substitutes the repeating parts of the numbers. It would be safer to cover all possible occurrences.

– Sigmund Freud
Feb 4 at 7:28

I am not sure about the version, I am using the website online-utility.org/text/grep.jsp

– Sigmund Freud
Feb 4 at 7:40

This can still be done with grep, but with the new conditions it is definitely better handled by awk and sed.

– WAF
Feb 4 at 9:50

add a comment |

ATTENTION! I have changed the RegEx and sample data so some answers could be wrong! I apologize if doing this is bad practice.

33520470
33520850, -60, -70, -80, -90, 33630077
25453810
13815206, -07, -08, 60682651, 60709994
13340820
61040146, -55
60819060, -79
60819088

And my desired output would be:

Could this be done with grep? If not, could you suggest any unix or other tools to achieve this result? I was thinking sed or awk.

EDIT: This has been solved. I will include the correct command here just for convenience to skip having to dig through the comments:

-F ', ' ' print $1; for(a=2;a <= NF; a ++) if(length($a) <= 7) printf("%s%sn",substr($1,1,length($1)-(length($a)-1)),substr($a, 2)) else print $a '

edited Feb 18 at 14:13

asked Feb 4 at 6:59

Sigmund Freud

You can't use grep for that. awk will work, but you will need a small program to do it.

– RalfFriedl
Feb 4 at 7:06

Are the hyphened parts always exactly two digits?

– Sparhawk
Feb 4 at 7:19

There may be up to 7, the hyphen substitutes the repeating parts of the numbers. It would be safer to cover all possible occurrences.

– Sigmund Freud
Feb 4 at 7:28

I am not sure about the version, I am using the website online-utility.org/text/grep.jsp

– Sigmund Freud
Feb 4 at 7:40

This can still be done with grep, but with the new conditions it is definitely better handled by awk and sed.

– WAF
Feb 4 at 9:50

add a comment |

ATTENTION! I have changed the RegEx and sample data so some answers could be wrong! I apologize if doing this is bad practice.

33520470
33520850, -60, -70, -80, -90, 33630077
25453810
13815206, -07, -08, 60682651, 60709994
13340820
61040146, -55
60819060, -79
60819088

And my desired output would be:

Could this be done with grep? If not, could you suggest any unix or other tools to achieve this result? I was thinking sed or awk.

EDIT: This has been solved. I will include the correct command here just for convenience to skip having to dig through the comments:

-F ', ' ' print $1; for(a=2;a <= NF; a ++) if(length($a) <= 7) printf("%s%sn",substr($1,1,length($1)-(length($a)-1)),substr($a, 2)) else print $a '

edited Feb 18 at 14:13

asked Feb 4 at 6:59

Sigmund Freud

ATTENTION! I have changed the RegEx and sample data so some answers could be wrong! I apologize if doing this is bad practice.

33520470
33520850, -60, -70, -80, -90, 33630077
25453810
13815206, -07, -08, 60682651, 60709994
13340820
61040146, -55
60819060, -79
60819088

And my desired output would be:

Could this be done with grep? If not, could you suggest any unix or other tools to achieve this result? I was thinking sed or awk.

EDIT: This has been solved. I will include the correct command here just for convenience to skip having to dig through the comments:

-F ', ' ' print $1; for(a=2;a <= NF; a ++) if(length($a) <= 7) printf("%s%sn",substr($1,1,length($1)-(length($a)-1)),substr($a, 2)) else print $a '

awk sed grep

edited Feb 18 at 14:13

asked Feb 4 at 6:59

Sigmund Freud

edited Feb 18 at 14:13

asked Feb 4 at 6:59

Sigmund Freud

edited Feb 18 at 14:13

asked Feb 4 at 6:59

Sigmund Freud

asked Feb 4 at 6:59

Sigmund Freud

asked Feb 4 at 6:59

Sigmund Freud

You can't use grep for that. awk will work, but you will need a small program to do it.

– RalfFriedl
Feb 4 at 7:06

Are the hyphened parts always exactly two digits?

– Sparhawk
Feb 4 at 7:19

There may be up to 7, the hyphen substitutes the repeating parts of the numbers. It would be safer to cover all possible occurrences.

– Sigmund Freud
Feb 4 at 7:28

I am not sure about the version, I am using the website online-utility.org/text/grep.jsp

– Sigmund Freud
Feb 4 at 7:40

This can still be done with grep, but with the new conditions it is definitely better handled by awk and sed.

– WAF
Feb 4 at 9:50

add a comment |

You can't use grep for that. awk will work, but you will need a small program to do it.

– RalfFriedl
Feb 4 at 7:06

Are the hyphened parts always exactly two digits?

– Sparhawk
Feb 4 at 7:19

There may be up to 7, the hyphen substitutes the repeating parts of the numbers. It would be safer to cover all possible occurrences.

– Sigmund Freud
Feb 4 at 7:28

I am not sure about the version, I am using the website online-utility.org/text/grep.jsp

– Sigmund Freud
Feb 4 at 7:40

This can still be done with grep, but with the new conditions it is definitely better handled by awk and sed.

– WAF
Feb 4 at 9:50

You can't use grep for that. awk will work, but you will need a small program to do it.

– RalfFriedl
Feb 4 at 7:06

Are the hyphened parts always exactly two digits?

– Sparhawk
Feb 4 at 7:19

There may be up to 7, the hyphen substitutes the repeating parts of the numbers. It would be safer to cover all possible occurrences.

– Sigmund Freud
Feb 4 at 7:28

I am not sure about the version, I am using the website online-utility.org/text/grep.jsp

– Sigmund Freud
Feb 4 at 7:40

This can still be done with grep, but with the new conditions it is definitely better handled by awk and sed.

– WAF
Feb 4 at 9:50

add a comment |

2 Answers
2

active

oldest

votes

I tried it with awk:

cat file | awk -F ', ' ' print $1; for(a=2;a <= NF; a ++)printf("%s%sn",substr($1,1,length($1)-(length($a)-1)),substr($a, 2)) '

Output:

Edit:

Code to get correct result:

cat file | awk -F ', ' ' print $1; for(a=2;a <= NF; a ++) if(length($a) <= 3) printf("%s%sn",substr($1,1,length($1)-(length($a)-1)),substr($a, 2)) else print $a '

Result:

edited Feb 4 at 11:29

answered Feb 4 at 7:39

Matej

2066

This works well. Does awk support RegEx? Could it be tweaked further to include what grep does in the first example? Currently I have to use grep and then your solution on grep's output.

– Sigmund Freud
Feb 4 at 7:46

@SigmundFreud Yes, awk supports regex: tutorialspoint.com/awk/awk_regular_expressions.htm but I think that using multiple commands to get result is not bad solution.

– Matej
Feb 4 at 7:49

I discovered that I had an error in my RegEx. The new one is [0-9]8(, -[0-9]*)*(, [0-9]8)* and returns rows with more 8-digit numbers such as 13815206, -07, -08, 60682651, 60709994 which does not work with that command. The solution worked perfectly with the previously provided data set, however!

– Sigmund Freud
Feb 4 at 8:58

@SigmundFreud The difference is in these numbers? -07 -08

– Matej
Feb 4 at 9:03

1

You did it! It works perfectly now.

– Sigmund Freud
Feb 4 at 11:42

|
show 15 more comments

Updated with a pre-processing step to handle the modified input.

The rest of this answer assumes that the data has been pre-processed with

grep -oE '[0-9]8(, -[0-9]+)*'

I.e., the full solution would require

grep -oE ... file | awk ...

BEGIN FS = ", *" 


 print $1
 for (i = 2; i <= NF; ++i)
 print substr($1, 1, length($1) - length($i) + 1) substr($i, 2)

This awk script reads a line and then prints the first comma-delimited field. It then loops over the remaining fields and outputs the first field with enough characters cut off at the end to insert the characters after the - in the other fields.

The code allows for "suffixes" of variable length.

Testing:

$ awk -f script.awk file
33520470
33520850
33520860
33520870
33520880
33520890
25453810
13340820
61040146
61040155
60819060
60819079
60819088

Another example:

$ cat file
1111
2222,-3,-4, -33,-44, -333,-444

$ awk -f script.awk file
1111
2222
2223
2224
2233
2244
2333
2444

As a "one-liner":

awk -F ', *' 'print $1; for(i=2;i<=NF;++i)print substr($1,1,length($1)-length($i)+1)substr($i,2)' file

edited Feb 4 at 9:24

answered Feb 4 at 8:01

Kusalananda

132k17252416

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f498538%2fhow-could-i-easily-expand-a-list-of-numbers-with-hyphens-replacing-repeated-part%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

2 Answers
2

active

oldest

votes

2 Answers
2

active

oldest

votes

I tried it with awk:

cat file | awk -F ', ' ' print $1; for(a=2;a <= NF; a ++)printf("%s%sn",substr($1,1,length($1)-(length($a)-1)),substr($a, 2)) '

Output:

Edit:

Code to get correct result:

cat file | awk -F ', ' ' print $1; for(a=2;a <= NF; a ++) if(length($a) <= 3) printf("%s%sn",substr($1,1,length($1)-(length($a)-1)),substr($a, 2)) else print $a '

Result:

edited Feb 4 at 11:29

answered Feb 4 at 7:39

Matej

2066

This works well. Does awk support RegEx? Could it be tweaked further to include what grep does in the first example? Currently I have to use grep and then your solution on grep's output.

– Sigmund Freud
Feb 4 at 7:46

@SigmundFreud Yes, awk supports regex: tutorialspoint.com/awk/awk_regular_expressions.htm but I think that using multiple commands to get result is not bad solution.

– Matej
Feb 4 at 7:49

I discovered that I had an error in my RegEx. The new one is [0-9]8(, -[0-9]*)*(, [0-9]8)* and returns rows with more 8-digit numbers such as 13815206, -07, -08, 60682651, 60709994 which does not work with that command. The solution worked perfectly with the previously provided data set, however!

– Sigmund Freud
Feb 4 at 8:58

@SigmundFreud The difference is in these numbers? -07 -08

– Matej
Feb 4 at 9:03

1

You did it! It works perfectly now.

– Sigmund Freud
Feb 4 at 11:42

|
show 15 more comments

I tried it with awk:

cat file | awk -F ', ' ' print $1; for(a=2;a <= NF; a ++)printf("%s%sn",substr($1,1,length($1)-(length($a)-1)),substr($a, 2)) '

Output:

Edit:

Code to get correct result:

cat file | awk -F ', ' ' print $1; for(a=2;a <= NF; a ++) if(length($a) <= 3) printf("%s%sn",substr($1,1,length($1)-(length($a)-1)),substr($a, 2)) else print $a '

Result:

edited Feb 4 at 11:29

answered Feb 4 at 7:39

Matej

2066

This works well. Does awk support RegEx? Could it be tweaked further to include what grep does in the first example? Currently I have to use grep and then your solution on grep's output.

– Sigmund Freud
Feb 4 at 7:46

@SigmundFreud Yes, awk supports regex: tutorialspoint.com/awk/awk_regular_expressions.htm but I think that using multiple commands to get result is not bad solution.

– Matej
Feb 4 at 7:49

I discovered that I had an error in my RegEx. The new one is [0-9]8(, -[0-9]*)*(, [0-9]8)* and returns rows with more 8-digit numbers such as 13815206, -07, -08, 60682651, 60709994 which does not work with that command. The solution worked perfectly with the previously provided data set, however!

– Sigmund Freud
Feb 4 at 8:58

@SigmundFreud The difference is in these numbers? -07 -08

– Matej
Feb 4 at 9:03

1

You did it! It works perfectly now.

– Sigmund Freud
Feb 4 at 11:42

|
show 15 more comments

I tried it with awk:

cat file | awk -F ', ' ' print $1; for(a=2;a <= NF; a ++)printf("%s%sn",substr($1,1,length($1)-(length($a)-1)),substr($a, 2)) '

Output:

Edit:

Code to get correct result:

cat file | awk -F ', ' ' print $1; for(a=2;a <= NF; a ++) if(length($a) <= 3) printf("%s%sn",substr($1,1,length($1)-(length($a)-1)),substr($a, 2)) else print $a '

Result:

edited Feb 4 at 11:29

answered Feb 4 at 7:39

Matej

2066

I tried it with awk:

cat file | awk -F ', ' ' print $1; for(a=2;a <= NF; a ++)printf("%s%sn",substr($1,1,length($1)-(length($a)-1)),substr($a, 2)) '

Output:

Edit:

Code to get correct result:

cat file | awk -F ', ' ' print $1; for(a=2;a <= NF; a ++) if(length($a) <= 3) printf("%s%sn",substr($1,1,length($1)-(length($a)-1)),substr($a, 2)) else print $a '

Result:

edited Feb 4 at 11:29

answered Feb 4 at 7:39

Matej

2066

edited Feb 4 at 11:29

answered Feb 4 at 7:39

Matej

2066

answered Feb 4 at 7:39

Matej

2066

answered Feb 4 at 7:39

Matej

2066

This works well. Does awk support RegEx? Could it be tweaked further to include what grep does in the first example? Currently I have to use grep and then your solution on grep's output.

– Sigmund Freud
Feb 4 at 7:46

@SigmundFreud Yes, awk supports regex: tutorialspoint.com/awk/awk_regular_expressions.htm but I think that using multiple commands to get result is not bad solution.

– Matej
Feb 4 at 7:49

I discovered that I had an error in my RegEx. The new one is [0-9]8(, -[0-9]*)*(, [0-9]8)* and returns rows with more 8-digit numbers such as 13815206, -07, -08, 60682651, 60709994 which does not work with that command. The solution worked perfectly with the previously provided data set, however!

– Sigmund Freud
Feb 4 at 8:58

@SigmundFreud The difference is in these numbers? -07 -08

– Matej
Feb 4 at 9:03

1

You did it! It works perfectly now.

– Sigmund Freud
Feb 4 at 11:42

|
show 15 more comments

This works well. Does awk support RegEx? Could it be tweaked further to include what grep does in the first example? Currently I have to use grep and then your solution on grep's output.

– Sigmund Freud
Feb 4 at 7:46

@SigmundFreud Yes, awk supports regex: tutorialspoint.com/awk/awk_regular_expressions.htm but I think that using multiple commands to get result is not bad solution.

– Matej
Feb 4 at 7:49

I discovered that I had an error in my RegEx. The new one is [0-9]8(, -[0-9]*)*(, [0-9]8)* and returns rows with more 8-digit numbers such as 13815206, -07, -08, 60682651, 60709994 which does not work with that command. The solution worked perfectly with the previously provided data set, however!

– Sigmund Freud
Feb 4 at 8:58

@SigmundFreud The difference is in these numbers? -07 -08

– Matej
Feb 4 at 9:03

1

You did it! It works perfectly now.

– Sigmund Freud
Feb 4 at 11:42

This works well. Does awk support RegEx? Could it be tweaked further to include what grep does in the first example? Currently I have to use grep and then your solution on grep's output.

– Sigmund Freud
Feb 4 at 7:46

@SigmundFreud Yes, awk supports regex: tutorialspoint.com/awk/awk_regular_expressions.htm but I think that using multiple commands to get result is not bad solution.

– Matej
Feb 4 at 7:49

I discovered that I had an error in my RegEx. The new one is [0-9]8(, -[0-9]*)*(, [0-9]8)* and returns rows with more 8-digit numbers such as 13815206, -07, -08, 60682651, 60709994 which does not work with that command. The solution worked perfectly with the previously provided data set, however!

– Sigmund Freud
Feb 4 at 8:58

@SigmundFreud The difference is in these numbers? -07 -08

– Matej
Feb 4 at 9:03

You did it! It works perfectly now.

– Sigmund Freud
Feb 4 at 11:42

|
show 15 more comments

Updated with a pre-processing step to handle the modified input.

The rest of this answer assumes that the data has been pre-processed with

grep -oE '[0-9]8(, -[0-9]+)*'

I.e., the full solution would require

grep -oE ... file | awk ...

BEGIN FS = ", *" 


 print $1
 for (i = 2; i <= NF; ++i)
 print substr($1, 1, length($1) - length($i) + 1) substr($i, 2)

The code allows for "suffixes" of variable length.

Testing:

$ awk -f script.awk file
33520470
33520850
33520860
33520870
33520880
33520890
25453810
13340820
61040146
61040155
60819060
60819079
60819088

Another example:

$ cat file
1111
2222,-3,-4, -33,-44, -333,-444

$ awk -f script.awk file
1111
2222
2223
2224
2233
2244
2333
2444

As a "one-liner":

awk -F ', *' 'print $1; for(i=2;i<=NF;++i)print substr($1,1,length($1)-length($i)+1)substr($i,2)' file

edited Feb 4 at 9:24

answered Feb 4 at 8:01

Kusalananda

132k17252416

add a comment |

Updated with a pre-processing step to handle the modified input.

The rest of this answer assumes that the data has been pre-processed with

grep -oE '[0-9]8(, -[0-9]+)*'

I.e., the full solution would require

grep -oE ... file | awk ...

BEGIN FS = ", *" 


 print $1
 for (i = 2; i <= NF; ++i)
 print substr($1, 1, length($1) - length($i) + 1) substr($i, 2)

The code allows for "suffixes" of variable length.

Testing:

$ awk -f script.awk file
33520470
33520850
33520860
33520870
33520880
33520890
25453810
13340820
61040146
61040155
60819060
60819079
60819088

Another example:

$ cat file
1111
2222,-3,-4, -33,-44, -333,-444

$ awk -f script.awk file
1111
2222
2223
2224
2233
2244
2333
2444

As a "one-liner":

awk -F ', *' 'print $1; for(i=2;i<=NF;++i)print substr($1,1,length($1)-length($i)+1)substr($i,2)' file

edited Feb 4 at 9:24

answered Feb 4 at 8:01

Kusalananda

132k17252416

add a comment |

Updated with a pre-processing step to handle the modified input.

The rest of this answer assumes that the data has been pre-processed with

grep -oE '[0-9]8(, -[0-9]+)*'

I.e., the full solution would require

grep -oE ... file | awk ...

BEGIN FS = ", *" 


 print $1
 for (i = 2; i <= NF; ++i)
 print substr($1, 1, length($1) - length($i) + 1) substr($i, 2)

The code allows for "suffixes" of variable length.

Testing:

$ awk -f script.awk file
33520470
33520850
33520860
33520870
33520880
33520890
25453810
13340820
61040146
61040155
60819060
60819079
60819088

Another example:

$ cat file
1111
2222,-3,-4, -33,-44, -333,-444

$ awk -f script.awk file
1111
2222
2223
2224
2233
2244
2333
2444

As a "one-liner":

awk -F ', *' 'print $1; for(i=2;i<=NF;++i)print substr($1,1,length($1)-length($i)+1)substr($i,2)' file

edited Feb 4 at 9:24

answered Feb 4 at 8:01

Kusalananda

132k17252416

Updated with a pre-processing step to handle the modified input.

The rest of this answer assumes that the data has been pre-processed with

grep -oE '[0-9]8(, -[0-9]+)*'

I.e., the full solution would require

grep -oE ... file | awk ...

BEGIN FS = ", *" 


 print $1
 for (i = 2; i <= NF; ++i)
 print substr($1, 1, length($1) - length($i) + 1) substr($i, 2)

The code allows for "suffixes" of variable length.

Testing:

$ awk -f script.awk file
33520470
33520850
33520860
33520870
33520880
33520890
25453810
13340820
61040146
61040155
60819060
60819079
60819088

Another example:

$ cat file
1111
2222,-3,-4, -33,-44, -333,-444

$ awk -f script.awk file
1111
2222
2223
2224
2233
2244
2333
2444

As a "one-liner":

awk -F ', *' 'print $1; for(i=2;i<=NF;++i)print substr($1,1,length($1)-length($i)+1)substr($i,2)' file

edited Feb 4 at 9:24

answered Feb 4 at 8:01

Kusalananda

132k17252416

edited Feb 4 at 9:24

answered Feb 4 at 8:01

Kusalananda

132k17252416

answered Feb 4 at 8:01

Kusalananda

132k17252416

answered Feb 4 at 8:01

Kusalananda

132k17252416

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu