Filter rows with a specific header name and containing â€œ1â€ in a column

up vote
1
down vote

favorite

I have a big file with many columns and rows, looks like:

A B C D E F1 F2 F3 F4 F5
a1 b1 c1 d1 e1 0 0 1 0 1
a2 b2 c2 d2 e2 1 0 0 1 1
a3 b3 c3 d3 e3 1 1 0 0 1
....

The A, B, C, D, E columns contain some information, and F1-5 columns represent some ids. The 0s or 1s mean absence/presence of the A-E information for that id.

I want to create files for each id, while every file contains the ABCDE information that the id has.
For example, F5 have three 1s in the first 3 rows, so

F5.txt:

A B C D E 
a1 b1 c1 d1 e1 
a2 b2 c2 d2 e2 
a3 b3 c3 d3 e3

F1 has two 1s in the first 3 rows, so

F1.txt:

A B C D E 
a2 b2 c2 d2 e2 
a3 b3 c3 d3 e3

How to filter this file and create new files with the id names (F1, F2...) using awk?

asked Oct 21 '17 at 16:25

MagicPants

464

add a commentÂ |Â

up vote
1
down vote

favorite

I have a big file with many columns and rows, looks like:

A B C D E F1 F2 F3 F4 F5
a1 b1 c1 d1 e1 0 0 1 0 1
a2 b2 c2 d2 e2 1 0 0 1 1
a3 b3 c3 d3 e3 1 1 0 0 1
....

The A, B, C, D, E columns contain some information, and F1-5 columns represent some ids. The 0s or 1s mean absence/presence of the A-E information for that id.

I want to create files for each id, while every file contains the ABCDE information that the id has.
For example, F5 have three 1s in the first 3 rows, so

F5.txt:

A B C D E 
a1 b1 c1 d1 e1 
a2 b2 c2 d2 e2 
a3 b3 c3 d3 e3

F1 has two 1s in the first 3 rows, so

F1.txt:

A B C D E 
a2 b2 c2 d2 e2 
a3 b3 c3 d3 e3

How to filter this file and create new files with the id names (F1, F2...) using awk?

asked Oct 21 '17 at 16:25

MagicPants

464

add a commentÂ |Â

up vote
1
down vote

favorite

I have a big file with many columns and rows, looks like:

A B C D E F1 F2 F3 F4 F5
a1 b1 c1 d1 e1 0 0 1 0 1
a2 b2 c2 d2 e2 1 0 0 1 1
a3 b3 c3 d3 e3 1 1 0 0 1
....

The A, B, C, D, E columns contain some information, and F1-5 columns represent some ids. The 0s or 1s mean absence/presence of the A-E information for that id.

I want to create files for each id, while every file contains the ABCDE information that the id has.
For example, F5 have three 1s in the first 3 rows, so

F5.txt:

A B C D E 
a1 b1 c1 d1 e1 
a2 b2 c2 d2 e2 
a3 b3 c3 d3 e3

F1 has two 1s in the first 3 rows, so

F1.txt:

A B C D E 
a2 b2 c2 d2 e2 
a3 b3 c3 d3 e3

How to filter this file and create new files with the id names (F1, F2...) using awk?

asked Oct 21 '17 at 16:25

MagicPants

464

I have a big file with many columns and rows, looks like:

A B C D E F1 F2 F3 F4 F5
a1 b1 c1 d1 e1 0 0 1 0 1
a2 b2 c2 d2 e2 1 0 0 1 1
a3 b3 c3 d3 e3 1 1 0 0 1
....

The A, B, C, D, E columns contain some information, and F1-5 columns represent some ids. The 0s or 1s mean absence/presence of the A-E information for that id.

I want to create files for each id, while every file contains the ABCDE information that the id has.
For example, F5 have three 1s in the first 3 rows, so

F5.txt:

A B C D E 
a1 b1 c1 d1 e1 
a2 b2 c2 d2 e2 
a3 b3 c3 d3 e3

F1 has two 1s in the first 3 rows, so

F1.txt:

A B C D E 
a2 b2 c2 d2 e2 
a3 b3 c3 d3 e3

How to filter this file and create new files with the id names (F1, F2...) using awk?

asked Oct 21 '17 at 16:25

MagicPants

464

asked Oct 21 '17 at 16:25

MagicPants

464

asked Oct 21 '17 at 16:25

MagicPants

464

asked Oct 21 '17 at 16:25

MagicPants

464

add a commentÂ |Â

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

AWK solution:

awk 'NR==1 split($0,h); columns=sprintf("%s %s %s %s %s",h[1],h[2],h[3],h[4],h[5]); next 
 for (i=6;i<=NF;i++) 
 if ($i) 
 if (!a[h[i]]++) print columns > h[i]".txt"; 
 print $1,$2,$3,$4,$5 > h[i]".txt" 
 
 ' file

split($0,h) - split the 1st record into array h to obtain header column names

columns=sprintf("%s %s %s %s %s",h[1],h[2],h[3],h[4],h[5]) - constructing common columns string A B C D E

if($i) - if the current field (starting from the 6th field) is not empty, i.e. not ""(empty string) or 0 - ready for further processing

h[i] - points to the current filename, i.e. F1 etc (or as you wrote: represent some ids)

if (!a[h[i]]++) print columns > h[i]".txt" - if the file under name h[i] is first time written - print the header/columns line to it (as 1st line)

Viewing results:

$ head F*.txt
==> F1.txt <==
A B C D E
a2 b2 c2 d2 e2
a3 b3 c3 d3 e3

==> F2.txt <==
A B C D E
a3 b3 c3 d3 e3

==> F3.txt <==
A B C D E
a1 b1 c1 d1 e1

==> F4.txt <==
A B C D E
a2 b2 c2 d2 e2

==> F5.txt <==
A B C D E
a1 b1 c1 d1 e1
a2 b2 c2 d2 e2
a3 b3 c3 d3 e3

edited Oct 21 '17 at 21:10

answered Oct 21 '17 at 16:42

RomanPerekhrest

22.5k12145

That's awesome! And why "if($i)" could match the condition? What does the "if(!a[h[i]]++) " mean?
â€“Â MagicPants
Oct 21 '17 at 17:23

@MagicPants, ok, added explanation
â€“Â RomanPerekhrest
Oct 21 '17 at 21:11

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f399571%2ffilter-rows-with-a-specific-header-name-and-containing-1-in-a-column%23new-answer', 'question_page');

);

Post as a guest

Name

1 Answer
1

active

oldest

votes

1 Answer
1

active

oldest

votes

up vote
1
down vote

accepted

AWK solution:

awk 'NR==1 split($0,h); columns=sprintf("%s %s %s %s %s",h[1],h[2],h[3],h[4],h[5]); next 
 for (i=6;i<=NF;i++) 
 if ($i) 
 if (!a[h[i]]++) print columns > h[i]".txt"; 
 print $1,$2,$3,$4,$5 > h[i]".txt" 
 
 ' file

split($0,h) - split the 1st record into array h to obtain header column names

columns=sprintf("%s %s %s %s %s",h[1],h[2],h[3],h[4],h[5]) - constructing common columns string A B C D E

if($i) - if the current field (starting from the 6th field) is not empty, i.e. not ""(empty string) or 0 - ready for further processing

h[i] - points to the current filename, i.e. F1 etc (or as you wrote: represent some ids)

if (!a[h[i]]++) print columns > h[i]".txt" - if the file under name h[i] is first time written - print the header/columns line to it (as 1st line)

Viewing results:

$ head F*.txt
==> F1.txt <==
A B C D E
a2 b2 c2 d2 e2
a3 b3 c3 d3 e3

==> F2.txt <==
A B C D E
a3 b3 c3 d3 e3

==> F3.txt <==
A B C D E
a1 b1 c1 d1 e1

==> F4.txt <==
A B C D E
a2 b2 c2 d2 e2

==> F5.txt <==
A B C D E
a1 b1 c1 d1 e1
a2 b2 c2 d2 e2
a3 b3 c3 d3 e3

edited Oct 21 '17 at 21:10

answered Oct 21 '17 at 16:42

RomanPerekhrest

22.5k12145

That's awesome! And why "if($i)" could match the condition? What does the "if(!a[h[i]]++) " mean?
â€“Â MagicPants
Oct 21 '17 at 17:23

@MagicPants, ok, added explanation
â€“Â RomanPerekhrest
Oct 21 '17 at 21:11

add a commentÂ |Â

up vote
1
down vote

accepted

AWK solution:

awk 'NR==1 split($0,h); columns=sprintf("%s %s %s %s %s",h[1],h[2],h[3],h[4],h[5]); next 
 for (i=6;i<=NF;i++) 
 if ($i) 
 if (!a[h[i]]++) print columns > h[i]".txt"; 
 print $1,$2,$3,$4,$5 > h[i]".txt" 
 
 ' file

split($0,h) - split the 1st record into array h to obtain header column names

columns=sprintf("%s %s %s %s %s",h[1],h[2],h[3],h[4],h[5]) - constructing common columns string A B C D E

if($i) - if the current field (starting from the 6th field) is not empty, i.e. not ""(empty string) or 0 - ready for further processing

h[i] - points to the current filename, i.e. F1 etc (or as you wrote: represent some ids)

if (!a[h[i]]++) print columns > h[i]".txt" - if the file under name h[i] is first time written - print the header/columns line to it (as 1st line)

Viewing results:

$ head F*.txt
==> F1.txt <==
A B C D E
a2 b2 c2 d2 e2
a3 b3 c3 d3 e3

==> F2.txt <==
A B C D E
a3 b3 c3 d3 e3

==> F3.txt <==
A B C D E
a1 b1 c1 d1 e1

==> F4.txt <==
A B C D E
a2 b2 c2 d2 e2

==> F5.txt <==
A B C D E
a1 b1 c1 d1 e1
a2 b2 c2 d2 e2
a3 b3 c3 d3 e3

edited Oct 21 '17 at 21:10

answered Oct 21 '17 at 16:42

RomanPerekhrest

22.5k12145

That's awesome! And why "if($i)" could match the condition? What does the "if(!a[h[i]]++) " mean?
â€“Â MagicPants
Oct 21 '17 at 17:23

@MagicPants, ok, added explanation
â€“Â RomanPerekhrest
Oct 21 '17 at 21:11

add a commentÂ |Â

up vote
1
down vote

accepted

AWK solution:

awk 'NR==1 split($0,h); columns=sprintf("%s %s %s %s %s",h[1],h[2],h[3],h[4],h[5]); next 
 for (i=6;i<=NF;i++) 
 if ($i) 
 if (!a[h[i]]++) print columns > h[i]".txt"; 
 print $1,$2,$3,$4,$5 > h[i]".txt" 
 
 ' file

split($0,h) - split the 1st record into array h to obtain header column names

columns=sprintf("%s %s %s %s %s",h[1],h[2],h[3],h[4],h[5]) - constructing common columns string A B C D E

if($i) - if the current field (starting from the 6th field) is not empty, i.e. not ""(empty string) or 0 - ready for further processing

h[i] - points to the current filename, i.e. F1 etc (or as you wrote: represent some ids)

if (!a[h[i]]++) print columns > h[i]".txt" - if the file under name h[i] is first time written - print the header/columns line to it (as 1st line)

Viewing results:

$ head F*.txt
==> F1.txt <==
A B C D E
a2 b2 c2 d2 e2
a3 b3 c3 d3 e3

==> F2.txt <==
A B C D E
a3 b3 c3 d3 e3

==> F3.txt <==
A B C D E
a1 b1 c1 d1 e1

==> F4.txt <==
A B C D E
a2 b2 c2 d2 e2

==> F5.txt <==
A B C D E
a1 b1 c1 d1 e1
a2 b2 c2 d2 e2
a3 b3 c3 d3 e3

edited Oct 21 '17 at 21:10

answered Oct 21 '17 at 16:42

RomanPerekhrest

22.5k12145

AWK solution:

awk 'NR==1 split($0,h); columns=sprintf("%s %s %s %s %s",h[1],h[2],h[3],h[4],h[5]); next 
 for (i=6;i<=NF;i++) 
 if ($i) 
 if (!a[h[i]]++) print columns > h[i]".txt"; 
 print $1,$2,$3,$4,$5 > h[i]".txt" 
 
 ' file

split($0,h) - split the 1st record into array h to obtain header column names

columns=sprintf("%s %s %s %s %s",h[1],h[2],h[3],h[4],h[5]) - constructing common columns string A B C D E

if($i) - if the current field (starting from the 6th field) is not empty, i.e. not ""(empty string) or 0 - ready for further processing

h[i] - points to the current filename, i.e. F1 etc (or as you wrote: represent some ids)

if (!a[h[i]]++) print columns > h[i]".txt" - if the file under name h[i] is first time written - print the header/columns line to it (as 1st line)

Viewing results:

$ head F*.txt
==> F1.txt <==
A B C D E
a2 b2 c2 d2 e2
a3 b3 c3 d3 e3

==> F2.txt <==
A B C D E
a3 b3 c3 d3 e3

==> F3.txt <==
A B C D E
a1 b1 c1 d1 e1

==> F4.txt <==
A B C D E
a2 b2 c2 d2 e2

==> F5.txt <==
A B C D E
a1 b1 c1 d1 e1
a2 b2 c2 d2 e2
a3 b3 c3 d3 e3

edited Oct 21 '17 at 21:10

answered Oct 21 '17 at 16:42

RomanPerekhrest

22.5k12145

edited Oct 21 '17 at 21:10

answered Oct 21 '17 at 16:42

RomanPerekhrest

22.5k12145

answered Oct 21 '17 at 16:42

RomanPerekhrest

22.5k12145

answered Oct 21 '17 at 16:42

RomanPerekhrest

22.5k12145

That's awesome! And why "if($i)" could match the condition? What does the "if(!a[h[i]]++) " mean?
â€“Â MagicPants
Oct 21 '17 at 17:23

@MagicPants, ok, added explanation
â€“Â RomanPerekhrest
Oct 21 '17 at 21:11

add a commentÂ |Â

That's awesome! And why "if($i)" could match the condition? What does the "if(!a[h[i]]++) " mean?
â€“Â MagicPants
Oct 21 '17 at 17:23

@MagicPants, ok, added explanation
â€“Â RomanPerekhrest
Oct 21 '17 at 21:11

That's awesome! And why "if($i)" could match the condition? What does the "if(!a[h[i]]++) " mean?
â€“Â MagicPants
Oct 21 '17 at 17:23

@MagicPants, ok, added explanation
â€“Â RomanPerekhrest
Oct 21 '17 at 21:11

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu