Abbreviate multiple column names in linux, keeping the last field
Clash Royale CLAN TAG#URR8PPP
I have a file where all the column headers are the path names. I want to abbreviate each column header from something that looks like:
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam
to:
sample1 sample2
How do I do this in linux? My files have anywhere from 46 to 100+ columns so manually editing column names is not an option. My desired file names are each 7 characters in length, as above.
Thanks
The header has the filename.
Each column header/ name is
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam
where I just want it to be
sample1
To clarify, this is one text file with 46 columns. Each column header or name appears as the lengthy string above and I want to truncate each header to the 7 character version, e.g. 'sample1'...'sampl46'
Desired Example file (with data under each column header)
sample1 sample2 sample3 sample4 sample5 ...
linux text-processing
|
show 1 more comment
I have a file where all the column headers are the path names. I want to abbreviate each column header from something that looks like:
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam
to:
sample1 sample2
How do I do this in linux? My files have anywhere from 46 to 100+ columns so manually editing column names is not an option. My desired file names are each 7 characters in length, as above.
Thanks
The header has the filename.
Each column header/ name is
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam
where I just want it to be
sample1
To clarify, this is one text file with 46 columns. Each column header or name appears as the lengthy string above and I want to truncate each header to the 7 character version, e.g. 'sample1'...'sampl46'
Desired Example file (with data under each column header)
sample1 sample2 sample3 sample4 sample5 ...
linux text-processing
The header has the filenames, or the data?
– Jeff Schaller
Feb 6 at 11:28
What does a sample header look like, knowing that there are many columns. What's the delimiter?
– Jeff Schaller
Feb 6 at 11:29
Is this something you can use?basename /mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam | sed 's/.[[:alnum:]]+//g'
sample1
– eblock
Feb 6 at 11:39
@eblock, I don't think I can use the sed command as above because it references a single column. The file has approximately 46 columns and I'm looking for a command/ script which will enable me to abbreviate all the column headers simultaneously. But thanks.
– Cece
Feb 6 at 12:41
Is there an input example of...sample46...
that gets abbreviated tosampl46
in order to get it down to 7 characters, or does the input start with...sampl46...
?
– Jeff Schaller
Feb 6 at 13:58
|
show 1 more comment
I have a file where all the column headers are the path names. I want to abbreviate each column header from something that looks like:
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam
to:
sample1 sample2
How do I do this in linux? My files have anywhere from 46 to 100+ columns so manually editing column names is not an option. My desired file names are each 7 characters in length, as above.
Thanks
The header has the filename.
Each column header/ name is
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam
where I just want it to be
sample1
To clarify, this is one text file with 46 columns. Each column header or name appears as the lengthy string above and I want to truncate each header to the 7 character version, e.g. 'sample1'...'sampl46'
Desired Example file (with data under each column header)
sample1 sample2 sample3 sample4 sample5 ...
linux text-processing
I have a file where all the column headers are the path names. I want to abbreviate each column header from something that looks like:
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam
to:
sample1 sample2
How do I do this in linux? My files have anywhere from 46 to 100+ columns so manually editing column names is not an option. My desired file names are each 7 characters in length, as above.
Thanks
The header has the filename.
Each column header/ name is
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam
where I just want it to be
sample1
To clarify, this is one text file with 46 columns. Each column header or name appears as the lengthy string above and I want to truncate each header to the 7 character version, e.g. 'sample1'...'sampl46'
Desired Example file (with data under each column header)
sample1 sample2 sample3 sample4 sample5 ...
linux text-processing
linux text-processing
edited Feb 6 at 18:47
agc
4,71111137
4,71111137
asked Feb 6 at 10:50
CeceCece
12
12
The header has the filenames, or the data?
– Jeff Schaller
Feb 6 at 11:28
What does a sample header look like, knowing that there are many columns. What's the delimiter?
– Jeff Schaller
Feb 6 at 11:29
Is this something you can use?basename /mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam | sed 's/.[[:alnum:]]+//g'
sample1
– eblock
Feb 6 at 11:39
@eblock, I don't think I can use the sed command as above because it references a single column. The file has approximately 46 columns and I'm looking for a command/ script which will enable me to abbreviate all the column headers simultaneously. But thanks.
– Cece
Feb 6 at 12:41
Is there an input example of...sample46...
that gets abbreviated tosampl46
in order to get it down to 7 characters, or does the input start with...sampl46...
?
– Jeff Schaller
Feb 6 at 13:58
|
show 1 more comment
The header has the filenames, or the data?
– Jeff Schaller
Feb 6 at 11:28
What does a sample header look like, knowing that there are many columns. What's the delimiter?
– Jeff Schaller
Feb 6 at 11:29
Is this something you can use?basename /mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam | sed 's/.[[:alnum:]]+//g'
sample1
– eblock
Feb 6 at 11:39
@eblock, I don't think I can use the sed command as above because it references a single column. The file has approximately 46 columns and I'm looking for a command/ script which will enable me to abbreviate all the column headers simultaneously. But thanks.
– Cece
Feb 6 at 12:41
Is there an input example of...sample46...
that gets abbreviated tosampl46
in order to get it down to 7 characters, or does the input start with...sampl46...
?
– Jeff Schaller
Feb 6 at 13:58
The header has the filenames, or the data?
– Jeff Schaller
Feb 6 at 11:28
The header has the filenames, or the data?
– Jeff Schaller
Feb 6 at 11:28
What does a sample header look like, knowing that there are many columns. What's the delimiter?
– Jeff Schaller
Feb 6 at 11:29
What does a sample header look like, knowing that there are many columns. What's the delimiter?
– Jeff Schaller
Feb 6 at 11:29
Is this something you can use?
basename /mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam | sed 's/.[[:alnum:]]+//g'
sample1
– eblock
Feb 6 at 11:39
Is this something you can use?
basename /mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam | sed 's/.[[:alnum:]]+//g'
sample1
– eblock
Feb 6 at 11:39
@eblock, I don't think I can use the sed command as above because it references a single column. The file has approximately 46 columns and I'm looking for a command/ script which will enable me to abbreviate all the column headers simultaneously. But thanks.
– Cece
Feb 6 at 12:41
@eblock, I don't think I can use the sed command as above because it references a single column. The file has approximately 46 columns and I'm looking for a command/ script which will enable me to abbreviate all the column headers simultaneously. But thanks.
– Cece
Feb 6 at 12:41
Is there an input example of
...sample46...
that gets abbreviated to sampl46
in order to get it down to 7 characters, or does the input start with ...sampl46...
?– Jeff Schaller
Feb 6 at 13:58
Is there an input example of
...sample46...
that gets abbreviated to sampl46
in order to get it down to 7 characters, or does the input start with ...sampl46...
?– Jeff Schaller
Feb 6 at 13:58
|
show 1 more comment
4 Answers
4
active
oldest
votes
Assuming the unwanted suffix is always ".so.rg.mk.bam", then GNU sed
's e
valuate command can be used to run basename
on just the first line of filename, replacing it with the required output:
sed -i '1s/.*/basename -as .so.rg.mk.bam -a &/e' filename
For non-GNU sed
s, head
can be used instead:
sed -i '1s/.*/'"$(basename -as .so.rg.mk.bam -a $(head -1 filename))"'/' filename
--
Note: To see the results without changing the file, try it without the -i
first.
add a comment |
I would write a short program to copy the original file into a new file with the short names. Keeping the original file will give you a back up should something go wrong. Exactly what you write depends on the language you are comfortable with. This may be you shell such as Bash, or any of a number of languages such as java, c, pearl, python, etc.
Here is some psudo code:
old is the original file and new is the new file
create new
begin a loop to read each line in old
read line from old
delete all characters from line up to and including the last "/"
delete delete all characters from line after the first 7
//This is what you want to save unless it conflicts with a previously saved line
determine if you have a conflict.
if there is a conflict
add a number to the end of line to make it unique
save line to new
end of loop
The psudo code I provided got thrown into a single paragraph. I hope you can make sense of it.
– Ed Roberts
Feb 6 at 14:19
add a comment |
Let's say I have a file with 4 columns and two lines:
host:~ # cat file2
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample3.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4.so.rg.mk.bam
abc def ghi jkl
This command worked for me (not very handy, but still):
host:~ # sed -i -e 's/^///g' -e 's/[[:alnum:]]+///g' -e 's/.[[:alnum:]]+//g' -e 's////g' file2
host:~ # cat file2
sample1 sample2 sample3 sample4
abc def ghi jkl
I'm sure there's a more efficient way, but you could give it a try.
add a comment |
You could use awk to process the headers. The following awk script works only on the first line (NR==1
). It loops though all of the fields in that line, one at a time. For each field, it performs the following steps:
- Find the first instance of the text
/sample
and trim the text up to that (and through the/
). - Find the first instance in the remainder of a period and trim off the portion from the period onwards.
- If the remainder is too long, then trim the
sample
text down as much as needed. The equation how much of it to keep turns out to be "6 plus the position of the first digit minus the overall length". - Once we're done processing this field, print it out with a trailing space.
- Once we're done looping through all of the fields, print a newline.
Note that this leaves you with a trailing space at the end of the line.
The awk script:
NR == 1
for(i=1; i <= NF; i++)
tail=substr($i, 1 + match($i, "/sample")) # delete up to the first instance of "/sample"
tail=substr(tail, 1, index(tail, ".") - 1) # find, then stop short of, the first period
if (length(tail) > 7) # if it's too long
match(tail, "[0-9]") # find the first digit
# trim the beginning down, then append the number
tail=substr(tail, 1, 6 + RSTART - length(tail))substr(tail, RSTART)
printf tail" "
print ""
On a sample input of:
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample47.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4631.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample1234567.so.rg.mk.bam
The sample output is:
sample1 sampl47 sam4631 1234567
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f499015%2fabbreviate-multiple-column-names-in-linux-keeping-the-last-field%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
4 Answers
4
active
oldest
votes
4 Answers
4
active
oldest
votes
active
oldest
votes
active
oldest
votes
Assuming the unwanted suffix is always ".so.rg.mk.bam", then GNU sed
's e
valuate command can be used to run basename
on just the first line of filename, replacing it with the required output:
sed -i '1s/.*/basename -as .so.rg.mk.bam -a &/e' filename
For non-GNU sed
s, head
can be used instead:
sed -i '1s/.*/'"$(basename -as .so.rg.mk.bam -a $(head -1 filename))"'/' filename
--
Note: To see the results without changing the file, try it without the -i
first.
add a comment |
Assuming the unwanted suffix is always ".so.rg.mk.bam", then GNU sed
's e
valuate command can be used to run basename
on just the first line of filename, replacing it with the required output:
sed -i '1s/.*/basename -as .so.rg.mk.bam -a &/e' filename
For non-GNU sed
s, head
can be used instead:
sed -i '1s/.*/'"$(basename -as .so.rg.mk.bam -a $(head -1 filename))"'/' filename
--
Note: To see the results without changing the file, try it without the -i
first.
add a comment |
Assuming the unwanted suffix is always ".so.rg.mk.bam", then GNU sed
's e
valuate command can be used to run basename
on just the first line of filename, replacing it with the required output:
sed -i '1s/.*/basename -as .so.rg.mk.bam -a &/e' filename
For non-GNU sed
s, head
can be used instead:
sed -i '1s/.*/'"$(basename -as .so.rg.mk.bam -a $(head -1 filename))"'/' filename
--
Note: To see the results without changing the file, try it without the -i
first.
Assuming the unwanted suffix is always ".so.rg.mk.bam", then GNU sed
's e
valuate command can be used to run basename
on just the first line of filename, replacing it with the required output:
sed -i '1s/.*/basename -as .so.rg.mk.bam -a &/e' filename
For non-GNU sed
s, head
can be used instead:
sed -i '1s/.*/'"$(basename -as .so.rg.mk.bam -a $(head -1 filename))"'/' filename
--
Note: To see the results without changing the file, try it without the -i
first.
edited Feb 8 at 19:16
answered Feb 6 at 18:57
agcagc
4,71111137
4,71111137
add a comment |
add a comment |
I would write a short program to copy the original file into a new file with the short names. Keeping the original file will give you a back up should something go wrong. Exactly what you write depends on the language you are comfortable with. This may be you shell such as Bash, or any of a number of languages such as java, c, pearl, python, etc.
Here is some psudo code:
old is the original file and new is the new file
create new
begin a loop to read each line in old
read line from old
delete all characters from line up to and including the last "/"
delete delete all characters from line after the first 7
//This is what you want to save unless it conflicts with a previously saved line
determine if you have a conflict.
if there is a conflict
add a number to the end of line to make it unique
save line to new
end of loop
The psudo code I provided got thrown into a single paragraph. I hope you can make sense of it.
– Ed Roberts
Feb 6 at 14:19
add a comment |
I would write a short program to copy the original file into a new file with the short names. Keeping the original file will give you a back up should something go wrong. Exactly what you write depends on the language you are comfortable with. This may be you shell such as Bash, or any of a number of languages such as java, c, pearl, python, etc.
Here is some psudo code:
old is the original file and new is the new file
create new
begin a loop to read each line in old
read line from old
delete all characters from line up to and including the last "/"
delete delete all characters from line after the first 7
//This is what you want to save unless it conflicts with a previously saved line
determine if you have a conflict.
if there is a conflict
add a number to the end of line to make it unique
save line to new
end of loop
The psudo code I provided got thrown into a single paragraph. I hope you can make sense of it.
– Ed Roberts
Feb 6 at 14:19
add a comment |
I would write a short program to copy the original file into a new file with the short names. Keeping the original file will give you a back up should something go wrong. Exactly what you write depends on the language you are comfortable with. This may be you shell such as Bash, or any of a number of languages such as java, c, pearl, python, etc.
Here is some psudo code:
old is the original file and new is the new file
create new
begin a loop to read each line in old
read line from old
delete all characters from line up to and including the last "/"
delete delete all characters from line after the first 7
//This is what you want to save unless it conflicts with a previously saved line
determine if you have a conflict.
if there is a conflict
add a number to the end of line to make it unique
save line to new
end of loop
I would write a short program to copy the original file into a new file with the short names. Keeping the original file will give you a back up should something go wrong. Exactly what you write depends on the language you are comfortable with. This may be you shell such as Bash, or any of a number of languages such as java, c, pearl, python, etc.
Here is some psudo code:
old is the original file and new is the new file
create new
begin a loop to read each line in old
read line from old
delete all characters from line up to and including the last "/"
delete delete all characters from line after the first 7
//This is what you want to save unless it conflicts with a previously saved line
determine if you have a conflict.
if there is a conflict
add a number to the end of line to make it unique
save line to new
end of loop
edited Feb 6 at 14:21
answered Feb 6 at 14:16
Ed RobertsEd Roberts
12
12
The psudo code I provided got thrown into a single paragraph. I hope you can make sense of it.
– Ed Roberts
Feb 6 at 14:19
add a comment |
The psudo code I provided got thrown into a single paragraph. I hope you can make sense of it.
– Ed Roberts
Feb 6 at 14:19
The psudo code I provided got thrown into a single paragraph. I hope you can make sense of it.
– Ed Roberts
Feb 6 at 14:19
The psudo code I provided got thrown into a single paragraph. I hope you can make sense of it.
– Ed Roberts
Feb 6 at 14:19
add a comment |
Let's say I have a file with 4 columns and two lines:
host:~ # cat file2
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample3.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4.so.rg.mk.bam
abc def ghi jkl
This command worked for me (not very handy, but still):
host:~ # sed -i -e 's/^///g' -e 's/[[:alnum:]]+///g' -e 's/.[[:alnum:]]+//g' -e 's////g' file2
host:~ # cat file2
sample1 sample2 sample3 sample4
abc def ghi jkl
I'm sure there's a more efficient way, but you could give it a try.
add a comment |
Let's say I have a file with 4 columns and two lines:
host:~ # cat file2
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample3.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4.so.rg.mk.bam
abc def ghi jkl
This command worked for me (not very handy, but still):
host:~ # sed -i -e 's/^///g' -e 's/[[:alnum:]]+///g' -e 's/.[[:alnum:]]+//g' -e 's////g' file2
host:~ # cat file2
sample1 sample2 sample3 sample4
abc def ghi jkl
I'm sure there's a more efficient way, but you could give it a try.
add a comment |
Let's say I have a file with 4 columns and two lines:
host:~ # cat file2
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample3.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4.so.rg.mk.bam
abc def ghi jkl
This command worked for me (not very handy, but still):
host:~ # sed -i -e 's/^///g' -e 's/[[:alnum:]]+///g' -e 's/.[[:alnum:]]+//g' -e 's////g' file2
host:~ # cat file2
sample1 sample2 sample3 sample4
abc def ghi jkl
I'm sure there's a more efficient way, but you could give it a try.
Let's say I have a file with 4 columns and two lines:
host:~ # cat file2
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample2.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample3.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4.so.rg.mk.bam
abc def ghi jkl
This command worked for me (not very handy, but still):
host:~ # sed -i -e 's/^///g' -e 's/[[:alnum:]]+///g' -e 's/.[[:alnum:]]+//g' -e 's////g' file2
host:~ # cat file2
sample1 sample2 sample3 sample4
abc def ghi jkl
I'm sure there's a more efficient way, but you could give it a try.
answered Feb 6 at 14:44
eblockeblock
1166
1166
add a comment |
add a comment |
You could use awk to process the headers. The following awk script works only on the first line (NR==1
). It loops though all of the fields in that line, one at a time. For each field, it performs the following steps:
- Find the first instance of the text
/sample
and trim the text up to that (and through the/
). - Find the first instance in the remainder of a period and trim off the portion from the period onwards.
- If the remainder is too long, then trim the
sample
text down as much as needed. The equation how much of it to keep turns out to be "6 plus the position of the first digit minus the overall length". - Once we're done processing this field, print it out with a trailing space.
- Once we're done looping through all of the fields, print a newline.
Note that this leaves you with a trailing space at the end of the line.
The awk script:
NR == 1
for(i=1; i <= NF; i++)
tail=substr($i, 1 + match($i, "/sample")) # delete up to the first instance of "/sample"
tail=substr(tail, 1, index(tail, ".") - 1) # find, then stop short of, the first period
if (length(tail) > 7) # if it's too long
match(tail, "[0-9]") # find the first digit
# trim the beginning down, then append the number
tail=substr(tail, 1, 6 + RSTART - length(tail))substr(tail, RSTART)
printf tail" "
print ""
On a sample input of:
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample47.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4631.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample1234567.so.rg.mk.bam
The sample output is:
sample1 sampl47 sam4631 1234567
add a comment |
You could use awk to process the headers. The following awk script works only on the first line (NR==1
). It loops though all of the fields in that line, one at a time. For each field, it performs the following steps:
- Find the first instance of the text
/sample
and trim the text up to that (and through the/
). - Find the first instance in the remainder of a period and trim off the portion from the period onwards.
- If the remainder is too long, then trim the
sample
text down as much as needed. The equation how much of it to keep turns out to be "6 plus the position of the first digit minus the overall length". - Once we're done processing this field, print it out with a trailing space.
- Once we're done looping through all of the fields, print a newline.
Note that this leaves you with a trailing space at the end of the line.
The awk script:
NR == 1
for(i=1; i <= NF; i++)
tail=substr($i, 1 + match($i, "/sample")) # delete up to the first instance of "/sample"
tail=substr(tail, 1, index(tail, ".") - 1) # find, then stop short of, the first period
if (length(tail) > 7) # if it's too long
match(tail, "[0-9]") # find the first digit
# trim the beginning down, then append the number
tail=substr(tail, 1, 6 + RSTART - length(tail))substr(tail, RSTART)
printf tail" "
print ""
On a sample input of:
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample47.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4631.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample1234567.so.rg.mk.bam
The sample output is:
sample1 sampl47 sam4631 1234567
add a comment |
You could use awk to process the headers. The following awk script works only on the first line (NR==1
). It loops though all of the fields in that line, one at a time. For each field, it performs the following steps:
- Find the first instance of the text
/sample
and trim the text up to that (and through the/
). - Find the first instance in the remainder of a period and trim off the portion from the period onwards.
- If the remainder is too long, then trim the
sample
text down as much as needed. The equation how much of it to keep turns out to be "6 plus the position of the first digit minus the overall length". - Once we're done processing this field, print it out with a trailing space.
- Once we're done looping through all of the fields, print a newline.
Note that this leaves you with a trailing space at the end of the line.
The awk script:
NR == 1
for(i=1; i <= NF; i++)
tail=substr($i, 1 + match($i, "/sample")) # delete up to the first instance of "/sample"
tail=substr(tail, 1, index(tail, ".") - 1) # find, then stop short of, the first period
if (length(tail) > 7) # if it's too long
match(tail, "[0-9]") # find the first digit
# trim the beginning down, then append the number
tail=substr(tail, 1, 6 + RSTART - length(tail))substr(tail, RSTART)
printf tail" "
print ""
On a sample input of:
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample47.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4631.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample1234567.so.rg.mk.bam
The sample output is:
sample1 sampl47 sam4631 1234567
You could use awk to process the headers. The following awk script works only on the first line (NR==1
). It loops though all of the fields in that line, one at a time. For each field, it performs the following steps:
- Find the first instance of the text
/sample
and trim the text up to that (and through the/
). - Find the first instance in the remainder of a period and trim off the portion from the period onwards.
- If the remainder is too long, then trim the
sample
text down as much as needed. The equation how much of it to keep turns out to be "6 plus the position of the first digit minus the overall length". - Once we're done processing this field, print it out with a trailing space.
- Once we're done looping through all of the fields, print a newline.
Note that this leaves you with a trailing space at the end of the line.
The awk script:
NR == 1
for(i=1; i <= NF; i++)
tail=substr($i, 1 + match($i, "/sample")) # delete up to the first instance of "/sample"
tail=substr(tail, 1, index(tail, ".") - 1) # find, then stop short of, the first period
if (length(tail) > 7) # if it's too long
match(tail, "[0-9]") # find the first digit
# trim the beginning down, then append the number
tail=substr(tail, 1, 6 + RSTART - length(tail))substr(tail, RSTART)
printf tail" "
print ""
On a sample input of:
/mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample47.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample4631.so.rg.mk.bam /mydir/cat/dog/hen/test/block/sample1234567.so.rg.mk.bam
The sample output is:
sample1 sampl47 sam4631 1234567
answered Feb 23 at 2:25
Jeff SchallerJeff Schaller
42.6k1159136
42.6k1159136
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f499015%2fabbreviate-multiple-column-names-in-linux-keeping-the-last-field%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
The header has the filenames, or the data?
– Jeff Schaller
Feb 6 at 11:28
What does a sample header look like, knowing that there are many columns. What's the delimiter?
– Jeff Schaller
Feb 6 at 11:29
Is this something you can use?
basename /mydir/cat/dog/hen/test/block/sample1.so.rg.mk.bam | sed 's/.[[:alnum:]]+//g'
sample1
– eblock
Feb 6 at 11:39
@eblock, I don't think I can use the sed command as above because it references a single column. The file has approximately 46 columns and I'm looking for a command/ script which will enable me to abbreviate all the column headers simultaneously. But thanks.
– Cece
Feb 6 at 12:41
Is there an input example of
...sample46...
that gets abbreviated tosampl46
in order to get it down to 7 characters, or does the input start with...sampl46...
?– Jeff Schaller
Feb 6 at 13:58