Abbreviate multiple column names in linux, keeping the last field

I have a file where all the column headers are the path names. I want to abbreviate each column header from something that looks like:

/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam

to:

sample1 sample2

How do I do this in linux? My files have anywhere from 46 to 100+ columns so manually editing column names is not an option. My desired file names are each 7 characters in length, as above.

Thanks

The header has the filename.
Each column header/ name is

/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam

where I just want it to be

sample1

To clarify, this is one text file with 46 columns. Each column header or name appears as the lengthy string above and I want to truncate each header to the 7 character version, e.g. 'sample1'...'sampl46'

Desired Example file (with data under each column header)

sample1 sample2 sample3 sample4 sample5 ...

edited Feb 6 at 18:47

agc

4,71111137

asked Feb 6 at 10:50

Cece

The header has the filenames, or the data?

– Jeff Schaller
Feb 6 at 11:28

What does a sample header look like, knowing that there are many columns. What's the delimiter?

– Jeff Schaller
Feb 6 at 11:29

Is this something you can use? basename /mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam | sed 's/.[[:alnum:]]+//g' sample1

– eblock
Feb 6 at 11:39

@eblock, I don't think I can use the sed command as above because it references a single column. The file has approximately 46 columns and I'm looking for a command/ script which will enable me to abbreviate all the column headers simultaneously. But thanks.

– Cece
Feb 6 at 12:41

Is there an input example of ...sample46... that gets abbreviated to sampl46 in order to get it down to 7 characters, or does the input start with ...sampl46...?

– Jeff Schaller
Feb 6 at 13:58

|
show 1 more comment

I have a file where all the column headers are the path names. I want to abbreviate each column header from something that looks like:

/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam

to:

sample1 sample2

How do I do this in linux? My files have anywhere from 46 to 100+ columns so manually editing column names is not an option. My desired file names are each 7 characters in length, as above.

Thanks

The header has the filename.
Each column header/ name is

/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam

where I just want it to be

sample1

Desired Example file (with data under each column header)

sample1 sample2 sample3 sample4 sample5 ...

edited Feb 6 at 18:47

agc

4,71111137

asked Feb 6 at 10:50

Cece

The header has the filenames, or the data?

– Jeff Schaller
Feb 6 at 11:28

What does a sample header look like, knowing that there are many columns. What's the delimiter?

– Jeff Schaller
Feb 6 at 11:29

Is this something you can use? basename /mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam | sed 's/.[[:alnum:]]+//g' sample1

– eblock
Feb 6 at 11:39

@eblock, I don't think I can use the sed command as above because it references a single column. The file has approximately 46 columns and I'm looking for a command/ script which will enable me to abbreviate all the column headers simultaneously. But thanks.

– Cece
Feb 6 at 12:41

Is there an input example of ...sample46... that gets abbreviated to sampl46 in order to get it down to 7 characters, or does the input start with ...sampl46...?

– Jeff Schaller
Feb 6 at 13:58

|
show 1 more comment

I have a file where all the column headers are the path names. I want to abbreviate each column header from something that looks like:

/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam

to:

sample1 sample2

How do I do this in linux? My files have anywhere from 46 to 100+ columns so manually editing column names is not an option. My desired file names are each 7 characters in length, as above.

Thanks

The header has the filename.
Each column header/ name is

/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam

where I just want it to be

sample1

Desired Example file (with data under each column header)

sample1 sample2 sample3 sample4 sample5 ...

edited Feb 6 at 18:47

agc

4,71111137

asked Feb 6 at 10:50

Cece

I have a file where all the column headers are the path names. I want to abbreviate each column header from something that looks like:

/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam

to:

sample1 sample2

How do I do this in linux? My files have anywhere from 46 to 100+ columns so manually editing column names is not an option. My desired file names are each 7 characters in length, as above.

Thanks

The header has the filename.
Each column header/ name is

/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam

where I just want it to be

sample1

Desired Example file (with data under each column header)

sample1 sample2 sample3 sample4 sample5 ...

linux text-processing

edited Feb 6 at 18:47

agc

4,71111137

asked Feb 6 at 10:50

Cece

edited Feb 6 at 18:47

agc

4,71111137

asked Feb 6 at 10:50

Cece

edited Feb 6 at 18:47

agc

4,71111137

edited Feb 6 at 18:47

agc

4,71111137

edited Feb 6 at 18:47

agc

4,71111137

asked Feb 6 at 10:50

Cece

asked Feb 6 at 10:50

Cece

asked Feb 6 at 10:50

Cece

The header has the filenames, or the data?

– Jeff Schaller
Feb 6 at 11:28

What does a sample header look like, knowing that there are many columns. What's the delimiter?

– Jeff Schaller
Feb 6 at 11:29

Is this something you can use? basename /mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam | sed 's/.[[:alnum:]]+//g' sample1

– eblock
Feb 6 at 11:39

@eblock, I don't think I can use the sed command as above because it references a single column. The file has approximately 46 columns and I'm looking for a command/ script which will enable me to abbreviate all the column headers simultaneously. But thanks.

– Cece
Feb 6 at 12:41

Is there an input example of ...sample46... that gets abbreviated to sampl46 in order to get it down to 7 characters, or does the input start with ...sampl46...?

– Jeff Schaller
Feb 6 at 13:58

|
show 1 more comment

The header has the filenames, or the data?

– Jeff Schaller
Feb 6 at 11:28

What does a sample header look like, knowing that there are many columns. What's the delimiter?

– Jeff Schaller
Feb 6 at 11:29

Is this something you can use? basename /mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam | sed 's/.[[:alnum:]]+//g' sample1

– eblock
Feb 6 at 11:39

@eblock, I don't think I can use the sed command as above because it references a single column. The file has approximately 46 columns and I'm looking for a command/ script which will enable me to abbreviate all the column headers simultaneously. But thanks.

– Cece
Feb 6 at 12:41

Is there an input example of ...sample46... that gets abbreviated to sampl46 in order to get it down to 7 characters, or does the input start with ...sampl46...?

– Jeff Schaller
Feb 6 at 13:58

The header has the filenames, or the data?

– Jeff Schaller
Feb 6 at 11:28

What does a sample header look like, knowing that there are many columns. What's the delimiter?

– Jeff Schaller
Feb 6 at 11:29

Is this something you can use? basename /mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam | sed 's/.[[:alnum:]]+//g' sample1

– eblock
Feb 6 at 11:39

@eblock, I don't think I can use the sed command as above because it references a single column. The file has approximately 46 columns and I'm looking for a command/ script which will enable me to abbreviate all the column headers simultaneously. But thanks.

– Cece
Feb 6 at 12:41

Is there an input example of ...sample46... that gets abbreviated to sampl46 in order to get it down to 7 characters, or does the input start with ...sampl46...?

– Jeff Schaller
Feb 6 at 13:58

|
show 1 more comment

4 Answers
4

active

oldest

votes

Assuming the unwanted suffix is always ".so.rg.mk.bam", then GNU sed's evaluate command can be used to run basename on just the first line of filename, replacing it with the required output:

sed -i '1s/.*/basename -as .so.rg.mk.bam -a &/e' filename

For non-GNU seds, head can be used instead:

sed -i '1s/.*/'"$(basename -as .so.rg.mk.bam -a $(head -1 filename))"'/' filename

Note: To see the results without changing the file, try it without the -i first.

edited Feb 8 at 19:16

answered Feb 6 at 18:57

agc

4,71111137

add a comment |

I would write a short program to copy the original file into a new file with the short names. Keeping the original file will give you a back up should something go wrong. Exactly what you write depends on the language you are comfortable with. This may be you shell such as Bash, or any of a number of languages such as java, c, pearl, python, etc.

Here is some psudo code:
old is the original file and new is the new file
create new

begin a loop to read each line in old
 read line from old
 delete all characters from line up to and including the last "/"
 delete delete all characters from line after the first 7
//This is what you want to save unless it conflicts with a previously saved line
 determine if you have a conflict.
 if there is a conflict
 add a number to the end of line to make it unique
 save line to new
 end of loop

edited Feb 6 at 14:21

answered Feb 6 at 14:16

Ed Roberts

The psudo code I provided got thrown into a single paragraph. I hope you can make sense of it.

– Ed Roberts
Feb 6 at 14:19

add a comment |

Let's say I have a file with 4 columns and two lines:

host:~ # cat file2
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample3.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4.so.rg.mk.bam
abc def ghi jkl

This command worked for me (not very handy, but still):

host:~ # sed -i -e 's/^///g' -e 's/[[:alnum:]]+///g' -e 's/.[[:alnum:]]+//g' -e 's////g' file2
host:~ # cat file2
sample1 sample2 sample3 sample4
abc def ghi jkl

I'm sure there's a more efficient way, but you could give it a try.

answered Feb 6 at 14:44

eblock

1166

add a comment |

You could use awk to process the headers. The following awk script works only on the first line (NR==1). It loops though all of the fields in that line, one at a time. For each field, it performs the following steps:

Find the first instance of the text /sample and trim the text up to that (and through the /).

Find the first instance in the remainder of a period and trim off the portion from the period onwards.

If the remainder is too long, then trim the sample text down as much as needed. The equation how much of it to keep turns out to be "6 plus the position of the first digit minus the overall length".

Once we're done processing this field, print it out with a trailing space.

Once we're done looping through all of the fields, print a newline.

Note that this leaves you with a trailing space at the end of the line.

The awk script:

NR == 1 
 for(i=1; i <= NF; i++) 
 tail=substr($i, 1 + match($i, "/sample")) # delete up to the first instance of "/sample"
 tail=substr(tail, 1, index(tail, ".") - 1) # find, then stop short of, the first period
 if (length(tail) > 7) # if it's too long
 match(tail, "[0-9]") # find the first digit
 # trim the beginning down, then append the number
 tail=substr(tail, 1, 6 + RSTART - length(tail))substr(tail, RSTART)
 
 printf tail" "
 
 print ""

On a sample input of:

/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample47.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4631.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample1234567.so.rg.mk.bam

The sample output is:

sample1 sampl47 sam4631 1234567

answered Feb 23 at 2:25

Jeff Schaller

42.6k1159136

add a comment |

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f499015%2fabbreviate-multiple-column-names-in-linux-keeping-the-last-field%23new-answer', 'question_page');

);

Post as a guest

Name

Required, but never shown

4 Answers
4

active

oldest

votes

4 Answers
4

active

oldest

votes

sed -i '1s/.*/basename -as .so.rg.mk.bam -a &/e' filename

For non-GNU seds, head can be used instead:

sed -i '1s/.*/'"$(basename -as .so.rg.mk.bam -a $(head -1 filename))"'/' filename

Note: To see the results without changing the file, try it without the -i first.

edited Feb 8 at 19:16

answered Feb 6 at 18:57

agc

4,71111137

add a comment |

sed -i '1s/.*/basename -as .so.rg.mk.bam -a &/e' filename

For non-GNU seds, head can be used instead:

sed -i '1s/.*/'"$(basename -as .so.rg.mk.bam -a $(head -1 filename))"'/' filename

Note: To see the results without changing the file, try it without the -i first.

edited Feb 8 at 19:16

answered Feb 6 at 18:57

agc

4,71111137

add a comment |

sed -i '1s/.*/basename -as .so.rg.mk.bam -a &/e' filename

For non-GNU seds, head can be used instead:

sed -i '1s/.*/'"$(basename -as .so.rg.mk.bam -a $(head -1 filename))"'/' filename

Note: To see the results without changing the file, try it without the -i first.

edited Feb 8 at 19:16

answered Feb 6 at 18:57

agc

4,71111137

sed -i '1s/.*/basename -as .so.rg.mk.bam -a &/e' filename

For non-GNU seds, head can be used instead:

sed -i '1s/.*/'"$(basename -as .so.rg.mk.bam -a $(head -1 filename))"'/' filename

Note: To see the results without changing the file, try it without the -i first.

edited Feb 8 at 19:16

answered Feb 6 at 18:57

agc

4,71111137

edited Feb 8 at 19:16

answered Feb 6 at 18:57

agc

4,71111137

answered Feb 6 at 18:57

agc

4,71111137

answered Feb 6 at 18:57

agc

4,71111137

add a comment |

Here is some psudo code:
old is the original file and new is the new file
create new

begin a loop to read each line in old
 read line from old
 delete all characters from line up to and including the last "/"
 delete delete all characters from line after the first 7
//This is what you want to save unless it conflicts with a previously saved line
 determine if you have a conflict.
 if there is a conflict
 add a number to the end of line to make it unique
 save line to new
 end of loop

edited Feb 6 at 14:21

answered Feb 6 at 14:16

Ed Roberts

The psudo code I provided got thrown into a single paragraph. I hope you can make sense of it.

– Ed Roberts
Feb 6 at 14:19

add a comment |

Here is some psudo code:
old is the original file and new is the new file
create new

begin a loop to read each line in old
 read line from old
 delete all characters from line up to and including the last "/"
 delete delete all characters from line after the first 7
//This is what you want to save unless it conflicts with a previously saved line
 determine if you have a conflict.
 if there is a conflict
 add a number to the end of line to make it unique
 save line to new
 end of loop

edited Feb 6 at 14:21

answered Feb 6 at 14:16

Ed Roberts

The psudo code I provided got thrown into a single paragraph. I hope you can make sense of it.

– Ed Roberts
Feb 6 at 14:19

add a comment |

Here is some psudo code:
old is the original file and new is the new file
create new

begin a loop to read each line in old
 read line from old
 delete all characters from line up to and including the last "/"
 delete delete all characters from line after the first 7
//This is what you want to save unless it conflicts with a previously saved line
 determine if you have a conflict.
 if there is a conflict
 add a number to the end of line to make it unique
 save line to new
 end of loop

edited Feb 6 at 14:21

answered Feb 6 at 14:16

Ed Roberts

Here is some psudo code:
old is the original file and new is the new file
create new

begin a loop to read each line in old
 read line from old
 delete all characters from line up to and including the last "/"
 delete delete all characters from line after the first 7
//This is what you want to save unless it conflicts with a previously saved line
 determine if you have a conflict.
 if there is a conflict
 add a number to the end of line to make it unique
 save line to new
 end of loop

edited Feb 6 at 14:21

answered Feb 6 at 14:16

Ed Roberts

edited Feb 6 at 14:21

answered Feb 6 at 14:16

Ed Roberts

answered Feb 6 at 14:16

Ed Roberts

answered Feb 6 at 14:16

Ed Roberts

The psudo code I provided got thrown into a single paragraph. I hope you can make sense of it.

– Ed Roberts
Feb 6 at 14:19

add a comment |

The psudo code I provided got thrown into a single paragraph. I hope you can make sense of it.

– Ed Roberts
Feb 6 at 14:19

The psudo code I provided got thrown into a single paragraph. I hope you can make sense of it.

– Ed Roberts
Feb 6 at 14:19

add a comment |

Let's say I have a file with 4 columns and two lines:

host:~ # cat file2
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample3.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4.so.rg.mk.bam
abc def ghi jkl

This command worked for me (not very handy, but still):

host:~ # sed -i -e 's/^///g' -e 's/[[:alnum:]]+///g' -e 's/.[[:alnum:]]+//g' -e 's////g' file2
host:~ # cat file2
sample1 sample2 sample3 sample4
abc def ghi jkl

I'm sure there's a more efficient way, but you could give it a try.

answered Feb 6 at 14:44

eblock

1166

add a comment |

Let's say I have a file with 4 columns and two lines:

host:~ # cat file2
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample3.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4.so.rg.mk.bam
abc def ghi jkl

This command worked for me (not very handy, but still):

host:~ # sed -i -e 's/^///g' -e 's/[[:alnum:]]+///g' -e 's/.[[:alnum:]]+//g' -e 's////g' file2
host:~ # cat file2
sample1 sample2 sample3 sample4
abc def ghi jkl

I'm sure there's a more efficient way, but you could give it a try.

answered Feb 6 at 14:44

eblock

1166

add a comment |

Let's say I have a file with 4 columns and two lines:

host:~ # cat file2
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample3.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4.so.rg.mk.bam
abc def ghi jkl

This command worked for me (not very handy, but still):

host:~ # sed -i -e 's/^///g' -e 's/[[:alnum:]]+///g' -e 's/.[[:alnum:]]+//g' -e 's////g' file2
host:~ # cat file2
sample1 sample2 sample3 sample4
abc def ghi jkl

I'm sure there's a more efficient way, but you could give it a try.

answered Feb 6 at 14:44

eblock

1166

Let's say I have a file with 4 columns and two lines:

host:~ # cat file2
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample3.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4.so.rg.mk.bam
abc def ghi jkl

This command worked for me (not very handy, but still):

host:~ # sed -i -e 's/^///g' -e 's/[[:alnum:]]+///g' -e 's/.[[:alnum:]]+//g' -e 's////g' file2
host:~ # cat file2
sample1 sample2 sample3 sample4
abc def ghi jkl

I'm sure there's a more efficient way, but you could give it a try.

answered Feb 6 at 14:44

eblock

1166

answered Feb 6 at 14:44

eblock

1166

answered Feb 6 at 14:44

eblock

1166

answered Feb 6 at 14:44

eblock

1166

add a comment |

Find the first instance of the text /sample and trim the text up to that (and through the /).

Find the first instance in the remainder of a period and trim off the portion from the period onwards.

If the remainder is too long, then trim the sample text down as much as needed. The equation how much of it to keep turns out to be "6 plus the position of the first digit minus the overall length".

Once we're done processing this field, print it out with a trailing space.

Once we're done looping through all of the fields, print a newline.

Note that this leaves you with a trailing space at the end of the line.

The awk script:

NR == 1 
 for(i=1; i <= NF; i++) 
 tail=substr($i, 1 + match($i, "/sample")) # delete up to the first instance of "/sample"
 tail=substr(tail, 1, index(tail, ".") - 1) # find, then stop short of, the first period
 if (length(tail) > 7) # if it's too long
 match(tail, "[0-9]") # find the first digit
 # trim the beginning down, then append the number
 tail=substr(tail, 1, 6 + RSTART - length(tail))substr(tail, RSTART)
 
 printf tail" "
 
 print ""

On a sample input of:

/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample47.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4631.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample1234567.so.rg.mk.bam

The sample output is:

sample1 sampl47 sam4631 1234567

answered Feb 23 at 2:25

Jeff Schaller

42.6k1159136

add a comment |

Find the first instance of the text /sample and trim the text up to that (and through the /).

Find the first instance in the remainder of a period and trim off the portion from the period onwards.

If the remainder is too long, then trim the sample text down as much as needed. The equation how much of it to keep turns out to be "6 plus the position of the first digit minus the overall length".

Once we're done processing this field, print it out with a trailing space.

Once we're done looping through all of the fields, print a newline.

Note that this leaves you with a trailing space at the end of the line.

The awk script:

NR == 1 
 for(i=1; i <= NF; i++) 
 tail=substr($i, 1 + match($i, "/sample")) # delete up to the first instance of "/sample"
 tail=substr(tail, 1, index(tail, ".") - 1) # find, then stop short of, the first period
 if (length(tail) > 7) # if it's too long
 match(tail, "[0-9]") # find the first digit
 # trim the beginning down, then append the number
 tail=substr(tail, 1, 6 + RSTART - length(tail))substr(tail, RSTART)
 
 printf tail" "
 
 print ""

On a sample input of:

/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample47.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4631.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample1234567.so.rg.mk.bam

The sample output is:

sample1 sampl47 sam4631 1234567

answered Feb 23 at 2:25

Jeff Schaller

42.6k1159136

add a comment |

Find the first instance of the text /sample and trim the text up to that (and through the /).

Find the first instance in the remainder of a period and trim off the portion from the period onwards.

If the remainder is too long, then trim the sample text down as much as needed. The equation how much of it to keep turns out to be "6 plus the position of the first digit minus the overall length".

Once we're done processing this field, print it out with a trailing space.

Once we're done looping through all of the fields, print a newline.

Note that this leaves you with a trailing space at the end of the line.

The awk script:

NR == 1 
 for(i=1; i <= NF; i++) 
 tail=substr($i, 1 + match($i, "/sample")) # delete up to the first instance of "/sample"
 tail=substr(tail, 1, index(tail, ".") - 1) # find, then stop short of, the first period
 if (length(tail) > 7) # if it's too long
 match(tail, "[0-9]") # find the first digit
 # trim the beginning down, then append the number
 tail=substr(tail, 1, 6 + RSTART - length(tail))substr(tail, RSTART)
 
 printf tail" "
 
 print ""

On a sample input of:

/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample47.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4631.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample1234567.so.rg.mk.bam

The sample output is:

sample1 sampl47 sam4631 1234567

answered Feb 23 at 2:25

Jeff Schaller

42.6k1159136

Find the first instance of the text /sample and trim the text up to that (and through the /).

Find the first instance in the remainder of a period and trim off the portion from the period onwards.

If the remainder is too long, then trim the sample text down as much as needed. The equation how much of it to keep turns out to be "6 plus the position of the first digit minus the overall length".

Once we're done processing this field, print it out with a trailing space.

Once we're done looping through all of the fields, print a newline.

Note that this leaves you with a trailing space at the end of the line.

The awk script:

NR == 1 
 for(i=1; i <= NF; i++) 
 tail=substr($i, 1 + match($i, "/sample")) # delete up to the first instance of "/sample"
 tail=substr(tail, 1, index(tail, ".") - 1) # find, then stop short of, the first period
 if (length(tail) > 7) # if it's too long
 match(tail, "[0-9]") # find the first digit
 # trim the beginning down, then append the number
 tail=substr(tail, 1, 6 + RSTART - length(tail))substr(tail, RSTART)
 
 printf tail" "
 
 print ""

On a sample input of:

/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample47.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4631.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample1234567.so.rg.mk.bam

The sample output is:

sample1 sampl47 sam4631 1234567

answered Feb 23 at 2:25

Jeff Schaller

42.6k1159136

answered Feb 23 at 2:25

Jeff Schaller

42.6k1159136

answered Feb 23 at 2:25

Jeff Schaller

42.6k1159136

answered Feb 23 at 2:25

Jeff Schaller

42.6k1159136

add a comment |

draft saved

draft discarded

Thanks for contributing an answer to Unix & Linux Stack Exchange!

Please be sure to answer the question. Provide details and share your research!

But avoid …

Asking for help, clarification, or responding to other answers.

Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.

draft saved

draft discarded

Post as a guest

Name

Required, but never shown

Name

Required, but never shown

Name

Required, but never shown

搜尋此網誌

mjhjmtu