awk script to rearrange similar rows

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP





.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;








1















I want rearrange about 5 million rows (with 300 columns) into groups.



Data looks like the following: where there were various experiments (column 2) conducted at different locations (column headers in top row column 4 onwards) in different years (column 1) using instruments (column 3). The numbers in the matrix ( row 2 onwards, column 4 onwards) indicate how many instances of experiments were successful.



What I want is to rearrange the rows



Input



 345 346 347 348 349 350 351 352
2014 Exp1 IBM 24 45 22
2014 Exp2 LEN 23 32 34
2014 Exp3 LEN 2 34 34
2014 Exp4 IBM 34 44 43
2014 Exp5 IBM 2 45 51 45
2014 Exp6 IBM 34 23 54
2014 Exp7 IBM 23 23 24
2014 Exp8 IBM 34 45 56
2014 Exp9 LEN 24 45 45
2014 Exp10 LEN 43 45 32
2015 Exp11 IBM 34 55 33 34
2015 Exp12 IBM 1 33 4 5
2015 Exp13 IBM 43 55 34 43
2015 Exp14 IBM 45 32 43 4
2015 Exp15 IBM 23 4 5
2015 Exp16 IBM 32 34 43
2015 Exp17 IBM 32 34 46
2015 Exp18 LEN 32 54 67
2015 Exp19 SCL 56 6 4 45 56
2015 Exp20 LEN 67 56 76
2015 Exp21 LEN 45 56 65
2015 Exp22 SCL 45 55 54
2015 Exp23 SCL 4 55 45


What I would like to have are the rows rearranged into groups such that



1) Within the same year
2) Using the same instrument



create groups such that,



each group has at least 3 locations in common, each of which has at least 20 successful experiments.



Requested Output



 345 346 347 348 349 350 351 352
1 2014 Exp1 IBM 24 45 22
1 2014 Exp4 IBM 34 44 43
1 2014 Exp7 IBM 23 23 24
2 2014 Exp2 LEN 23 32 34
2 2014 Exp9 LEN 24 45 45
2 2014 Exp10 LEN 43 45 32
3 2014 Exp5 IBM 2 45 51 45
3 2014 Exp6 IBM 34 23 54
3 2014 Exp8 IBM 34 45 56
4 2015 Exp11 IBM 34 55 33 34
4 2015 Exp13 IBM 43 55 34 43
4 2015 Exp14 IBM 45 32 43 4
5 2015 Exp16 IBM 32 34 43
5 2015 Exp17 IBM 32 34 46
6 2015 Exp18 LEN 32 54 67
6 2015 Exp20 LEN 67 56 76
6 2015 Exp21 LEN 45 56 65
7 2015 Exp19 SCL 56 6 4 45 56
7 2015 Exp22 SCL 45 55 54
2014 Exp3 LEN 2 34 34
2015 Exp12 IBM 1 33 4 5
2015 Exp15 IBM 23 4 5
2015 Exp23 SCL 4 55 45


Here is what I tried.



awk ' NR>1 for (i=4;i<=NF;i++) if ($i!="") arr1[$1,$2,$3]=$i ; next 
$1,$2,$3 in arr1 {
for (j=1;j<length(arr1);j++))
{if (arr1[j] > 20)
group++;
END
for (j in n)
print group, arr1[j]

' input input









share|improve this question
























  • As far as I can see, to group the output correctly would require some form of clustering. This is non-trivial, and there would be several "correct" solutions in terms of how the rows were ordered.

    – Kusalananda
    Mar 9 at 18:16

















1















I want rearrange about 5 million rows (with 300 columns) into groups.



Data looks like the following: where there were various experiments (column 2) conducted at different locations (column headers in top row column 4 onwards) in different years (column 1) using instruments (column 3). The numbers in the matrix ( row 2 onwards, column 4 onwards) indicate how many instances of experiments were successful.



What I want is to rearrange the rows



Input



 345 346 347 348 349 350 351 352
2014 Exp1 IBM 24 45 22
2014 Exp2 LEN 23 32 34
2014 Exp3 LEN 2 34 34
2014 Exp4 IBM 34 44 43
2014 Exp5 IBM 2 45 51 45
2014 Exp6 IBM 34 23 54
2014 Exp7 IBM 23 23 24
2014 Exp8 IBM 34 45 56
2014 Exp9 LEN 24 45 45
2014 Exp10 LEN 43 45 32
2015 Exp11 IBM 34 55 33 34
2015 Exp12 IBM 1 33 4 5
2015 Exp13 IBM 43 55 34 43
2015 Exp14 IBM 45 32 43 4
2015 Exp15 IBM 23 4 5
2015 Exp16 IBM 32 34 43
2015 Exp17 IBM 32 34 46
2015 Exp18 LEN 32 54 67
2015 Exp19 SCL 56 6 4 45 56
2015 Exp20 LEN 67 56 76
2015 Exp21 LEN 45 56 65
2015 Exp22 SCL 45 55 54
2015 Exp23 SCL 4 55 45


What I would like to have are the rows rearranged into groups such that



1) Within the same year
2) Using the same instrument



create groups such that,



each group has at least 3 locations in common, each of which has at least 20 successful experiments.



Requested Output



 345 346 347 348 349 350 351 352
1 2014 Exp1 IBM 24 45 22
1 2014 Exp4 IBM 34 44 43
1 2014 Exp7 IBM 23 23 24
2 2014 Exp2 LEN 23 32 34
2 2014 Exp9 LEN 24 45 45
2 2014 Exp10 LEN 43 45 32
3 2014 Exp5 IBM 2 45 51 45
3 2014 Exp6 IBM 34 23 54
3 2014 Exp8 IBM 34 45 56
4 2015 Exp11 IBM 34 55 33 34
4 2015 Exp13 IBM 43 55 34 43
4 2015 Exp14 IBM 45 32 43 4
5 2015 Exp16 IBM 32 34 43
5 2015 Exp17 IBM 32 34 46
6 2015 Exp18 LEN 32 54 67
6 2015 Exp20 LEN 67 56 76
6 2015 Exp21 LEN 45 56 65
7 2015 Exp19 SCL 56 6 4 45 56
7 2015 Exp22 SCL 45 55 54
2014 Exp3 LEN 2 34 34
2015 Exp12 IBM 1 33 4 5
2015 Exp15 IBM 23 4 5
2015 Exp23 SCL 4 55 45


Here is what I tried.



awk ' NR>1 for (i=4;i<=NF;i++) if ($i!="") arr1[$1,$2,$3]=$i ; next 
$1,$2,$3 in arr1 {
for (j=1;j<length(arr1);j++))
{if (arr1[j] > 20)
group++;
END
for (j in n)
print group, arr1[j]

' input input









share|improve this question
























  • As far as I can see, to group the output correctly would require some form of clustering. This is non-trivial, and there would be several "correct" solutions in terms of how the rows were ordered.

    – Kusalananda
    Mar 9 at 18:16













1












1








1








I want rearrange about 5 million rows (with 300 columns) into groups.



Data looks like the following: where there were various experiments (column 2) conducted at different locations (column headers in top row column 4 onwards) in different years (column 1) using instruments (column 3). The numbers in the matrix ( row 2 onwards, column 4 onwards) indicate how many instances of experiments were successful.



What I want is to rearrange the rows



Input



 345 346 347 348 349 350 351 352
2014 Exp1 IBM 24 45 22
2014 Exp2 LEN 23 32 34
2014 Exp3 LEN 2 34 34
2014 Exp4 IBM 34 44 43
2014 Exp5 IBM 2 45 51 45
2014 Exp6 IBM 34 23 54
2014 Exp7 IBM 23 23 24
2014 Exp8 IBM 34 45 56
2014 Exp9 LEN 24 45 45
2014 Exp10 LEN 43 45 32
2015 Exp11 IBM 34 55 33 34
2015 Exp12 IBM 1 33 4 5
2015 Exp13 IBM 43 55 34 43
2015 Exp14 IBM 45 32 43 4
2015 Exp15 IBM 23 4 5
2015 Exp16 IBM 32 34 43
2015 Exp17 IBM 32 34 46
2015 Exp18 LEN 32 54 67
2015 Exp19 SCL 56 6 4 45 56
2015 Exp20 LEN 67 56 76
2015 Exp21 LEN 45 56 65
2015 Exp22 SCL 45 55 54
2015 Exp23 SCL 4 55 45


What I would like to have are the rows rearranged into groups such that



1) Within the same year
2) Using the same instrument



create groups such that,



each group has at least 3 locations in common, each of which has at least 20 successful experiments.



Requested Output



 345 346 347 348 349 350 351 352
1 2014 Exp1 IBM 24 45 22
1 2014 Exp4 IBM 34 44 43
1 2014 Exp7 IBM 23 23 24
2 2014 Exp2 LEN 23 32 34
2 2014 Exp9 LEN 24 45 45
2 2014 Exp10 LEN 43 45 32
3 2014 Exp5 IBM 2 45 51 45
3 2014 Exp6 IBM 34 23 54
3 2014 Exp8 IBM 34 45 56
4 2015 Exp11 IBM 34 55 33 34
4 2015 Exp13 IBM 43 55 34 43
4 2015 Exp14 IBM 45 32 43 4
5 2015 Exp16 IBM 32 34 43
5 2015 Exp17 IBM 32 34 46
6 2015 Exp18 LEN 32 54 67
6 2015 Exp20 LEN 67 56 76
6 2015 Exp21 LEN 45 56 65
7 2015 Exp19 SCL 56 6 4 45 56
7 2015 Exp22 SCL 45 55 54
2014 Exp3 LEN 2 34 34
2015 Exp12 IBM 1 33 4 5
2015 Exp15 IBM 23 4 5
2015 Exp23 SCL 4 55 45


Here is what I tried.



awk ' NR>1 for (i=4;i<=NF;i++) if ($i!="") arr1[$1,$2,$3]=$i ; next 
$1,$2,$3 in arr1 {
for (j=1;j<length(arr1);j++))
{if (arr1[j] > 20)
group++;
END
for (j in n)
print group, arr1[j]

' input input









share|improve this question
















I want rearrange about 5 million rows (with 300 columns) into groups.



Data looks like the following: where there were various experiments (column 2) conducted at different locations (column headers in top row column 4 onwards) in different years (column 1) using instruments (column 3). The numbers in the matrix ( row 2 onwards, column 4 onwards) indicate how many instances of experiments were successful.



What I want is to rearrange the rows



Input



 345 346 347 348 349 350 351 352
2014 Exp1 IBM 24 45 22
2014 Exp2 LEN 23 32 34
2014 Exp3 LEN 2 34 34
2014 Exp4 IBM 34 44 43
2014 Exp5 IBM 2 45 51 45
2014 Exp6 IBM 34 23 54
2014 Exp7 IBM 23 23 24
2014 Exp8 IBM 34 45 56
2014 Exp9 LEN 24 45 45
2014 Exp10 LEN 43 45 32
2015 Exp11 IBM 34 55 33 34
2015 Exp12 IBM 1 33 4 5
2015 Exp13 IBM 43 55 34 43
2015 Exp14 IBM 45 32 43 4
2015 Exp15 IBM 23 4 5
2015 Exp16 IBM 32 34 43
2015 Exp17 IBM 32 34 46
2015 Exp18 LEN 32 54 67
2015 Exp19 SCL 56 6 4 45 56
2015 Exp20 LEN 67 56 76
2015 Exp21 LEN 45 56 65
2015 Exp22 SCL 45 55 54
2015 Exp23 SCL 4 55 45


What I would like to have are the rows rearranged into groups such that



1) Within the same year
2) Using the same instrument



create groups such that,



each group has at least 3 locations in common, each of which has at least 20 successful experiments.



Requested Output



 345 346 347 348 349 350 351 352
1 2014 Exp1 IBM 24 45 22
1 2014 Exp4 IBM 34 44 43
1 2014 Exp7 IBM 23 23 24
2 2014 Exp2 LEN 23 32 34
2 2014 Exp9 LEN 24 45 45
2 2014 Exp10 LEN 43 45 32
3 2014 Exp5 IBM 2 45 51 45
3 2014 Exp6 IBM 34 23 54
3 2014 Exp8 IBM 34 45 56
4 2015 Exp11 IBM 34 55 33 34
4 2015 Exp13 IBM 43 55 34 43
4 2015 Exp14 IBM 45 32 43 4
5 2015 Exp16 IBM 32 34 43
5 2015 Exp17 IBM 32 34 46
6 2015 Exp18 LEN 32 54 67
6 2015 Exp20 LEN 67 56 76
6 2015 Exp21 LEN 45 56 65
7 2015 Exp19 SCL 56 6 4 45 56
7 2015 Exp22 SCL 45 55 54
2014 Exp3 LEN 2 34 34
2015 Exp12 IBM 1 33 4 5
2015 Exp15 IBM 23 4 5
2015 Exp23 SCL 4 55 45


Here is what I tried.



awk ' NR>1 for (i=4;i<=NF;i++) if ($i!="") arr1[$1,$2,$3]=$i ; next 
$1,$2,$3 in arr1 {
for (j=1;j<length(arr1);j++))
{if (arr1[j] > 20)
group++;
END
for (j in n)
print group, arr1[j]

' input input






bash awk perl






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 9 at 12:24









Rui F Ribeiro

41.9k1483142




41.9k1483142










asked Mar 30 '15 at 0:45









Sheetal KaulSheetal Kaul

62




62












  • As far as I can see, to group the output correctly would require some form of clustering. This is non-trivial, and there would be several "correct" solutions in terms of how the rows were ordered.

    – Kusalananda
    Mar 9 at 18:16

















  • As far as I can see, to group the output correctly would require some form of clustering. This is non-trivial, and there would be several "correct" solutions in terms of how the rows were ordered.

    – Kusalananda
    Mar 9 at 18:16
















As far as I can see, to group the output correctly would require some form of clustering. This is non-trivial, and there would be several "correct" solutions in terms of how the rows were ordered.

– Kusalananda
Mar 9 at 18:16





As far as I can see, to group the output correctly would require some form of clustering. This is non-trivial, and there would be several "correct" solutions in terms of how the rows were ordered.

– Kusalananda
Mar 9 at 18:16










1 Answer
1






active

oldest

votes


















1














A few random hints depending on the actual data formatting and other issues...



How are the data fields separated? (The first three spacings give the impression that there's a TAB character in between, while the last columns seem space separated.) You should be aware that the column information is lost for the culumns 4-N if your field separator is defined as per default. So the logic of your code is seriously flawed.



If you have no TAB separators but all blanks you can use GNU awk's FIELDWIDTHS feature to access the data (including the missing "blank" data, as you seem to be trying to achieve).



If you have TABs for the first three separators and blanks for the rest you should explocitly define FS="t", so that you can directly work on fields 1-3 and have the spacing intact in the final data (that you can address as a whole as field 4), which will make it easy to find "blank data".



It may further make processing easier if you create subsets of your data on the fly, operate on those, and concatenate the individual subsets afterwards. To separate the data in files depening on, say, year and instrument you can write:



awk ' print > "set_" $1 "_" $3" ' input


and it will create files named, e.g., set_2015_LEN or set_2014_IBM containing the respective entries.



The final task to identify "matching numeric column sets" depends on the previously mentioned topics; if, for example, the final eigth data columns can be addressed as one fixed length entity it might suffice to use the sort utility with an appropriately defined key-specification (see sort's option -k).



(BTW: for a compound index test instead of $1,$2,$3 in arr1 you have to write ($1,$2,$3) in arr1.)






share|improve this answer























  • Thank you for your suggestions, the data is tab delimited, I couldnt seem to line it up in this post, so introduced some manual spaces...the actual data has 300 columns, what I am thinking is replacing the >20 values by 1 and the others by blank, so that I can treat it as a fixed length entity. I checked out the sort -k option, but I couldn't follow how that would apply here specially for so many columns, would you give me a small example? thanks again for your suggestion on splitting the data-set.

    – Sheetal Kaul
    Mar 30 '15 at 2:34












Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f193284%2fawk-script-to-rearrange-similar-rows%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























1 Answer
1






active

oldest

votes








1 Answer
1






active

oldest

votes









active

oldest

votes






active

oldest

votes









1














A few random hints depending on the actual data formatting and other issues...



How are the data fields separated? (The first three spacings give the impression that there's a TAB character in between, while the last columns seem space separated.) You should be aware that the column information is lost for the culumns 4-N if your field separator is defined as per default. So the logic of your code is seriously flawed.



If you have no TAB separators but all blanks you can use GNU awk's FIELDWIDTHS feature to access the data (including the missing "blank" data, as you seem to be trying to achieve).



If you have TABs for the first three separators and blanks for the rest you should explocitly define FS="t", so that you can directly work on fields 1-3 and have the spacing intact in the final data (that you can address as a whole as field 4), which will make it easy to find "blank data".



It may further make processing easier if you create subsets of your data on the fly, operate on those, and concatenate the individual subsets afterwards. To separate the data in files depening on, say, year and instrument you can write:



awk ' print > "set_" $1 "_" $3" ' input


and it will create files named, e.g., set_2015_LEN or set_2014_IBM containing the respective entries.



The final task to identify "matching numeric column sets" depends on the previously mentioned topics; if, for example, the final eigth data columns can be addressed as one fixed length entity it might suffice to use the sort utility with an appropriately defined key-specification (see sort's option -k).



(BTW: for a compound index test instead of $1,$2,$3 in arr1 you have to write ($1,$2,$3) in arr1.)






share|improve this answer























  • Thank you for your suggestions, the data is tab delimited, I couldnt seem to line it up in this post, so introduced some manual spaces...the actual data has 300 columns, what I am thinking is replacing the >20 values by 1 and the others by blank, so that I can treat it as a fixed length entity. I checked out the sort -k option, but I couldn't follow how that would apply here specially for so many columns, would you give me a small example? thanks again for your suggestion on splitting the data-set.

    – Sheetal Kaul
    Mar 30 '15 at 2:34
















1














A few random hints depending on the actual data formatting and other issues...



How are the data fields separated? (The first three spacings give the impression that there's a TAB character in between, while the last columns seem space separated.) You should be aware that the column information is lost for the culumns 4-N if your field separator is defined as per default. So the logic of your code is seriously flawed.



If you have no TAB separators but all blanks you can use GNU awk's FIELDWIDTHS feature to access the data (including the missing "blank" data, as you seem to be trying to achieve).



If you have TABs for the first three separators and blanks for the rest you should explocitly define FS="t", so that you can directly work on fields 1-3 and have the spacing intact in the final data (that you can address as a whole as field 4), which will make it easy to find "blank data".



It may further make processing easier if you create subsets of your data on the fly, operate on those, and concatenate the individual subsets afterwards. To separate the data in files depening on, say, year and instrument you can write:



awk ' print > "set_" $1 "_" $3" ' input


and it will create files named, e.g., set_2015_LEN or set_2014_IBM containing the respective entries.



The final task to identify "matching numeric column sets" depends on the previously mentioned topics; if, for example, the final eigth data columns can be addressed as one fixed length entity it might suffice to use the sort utility with an appropriately defined key-specification (see sort's option -k).



(BTW: for a compound index test instead of $1,$2,$3 in arr1 you have to write ($1,$2,$3) in arr1.)






share|improve this answer























  • Thank you for your suggestions, the data is tab delimited, I couldnt seem to line it up in this post, so introduced some manual spaces...the actual data has 300 columns, what I am thinking is replacing the >20 values by 1 and the others by blank, so that I can treat it as a fixed length entity. I checked out the sort -k option, but I couldn't follow how that would apply here specially for so many columns, would you give me a small example? thanks again for your suggestion on splitting the data-set.

    – Sheetal Kaul
    Mar 30 '15 at 2:34














1












1








1







A few random hints depending on the actual data formatting and other issues...



How are the data fields separated? (The first three spacings give the impression that there's a TAB character in between, while the last columns seem space separated.) You should be aware that the column information is lost for the culumns 4-N if your field separator is defined as per default. So the logic of your code is seriously flawed.



If you have no TAB separators but all blanks you can use GNU awk's FIELDWIDTHS feature to access the data (including the missing "blank" data, as you seem to be trying to achieve).



If you have TABs for the first three separators and blanks for the rest you should explocitly define FS="t", so that you can directly work on fields 1-3 and have the spacing intact in the final data (that you can address as a whole as field 4), which will make it easy to find "blank data".



It may further make processing easier if you create subsets of your data on the fly, operate on those, and concatenate the individual subsets afterwards. To separate the data in files depening on, say, year and instrument you can write:



awk ' print > "set_" $1 "_" $3" ' input


and it will create files named, e.g., set_2015_LEN or set_2014_IBM containing the respective entries.



The final task to identify "matching numeric column sets" depends on the previously mentioned topics; if, for example, the final eigth data columns can be addressed as one fixed length entity it might suffice to use the sort utility with an appropriately defined key-specification (see sort's option -k).



(BTW: for a compound index test instead of $1,$2,$3 in arr1 you have to write ($1,$2,$3) in arr1.)






share|improve this answer













A few random hints depending on the actual data formatting and other issues...



How are the data fields separated? (The first three spacings give the impression that there's a TAB character in between, while the last columns seem space separated.) You should be aware that the column information is lost for the culumns 4-N if your field separator is defined as per default. So the logic of your code is seriously flawed.



If you have no TAB separators but all blanks you can use GNU awk's FIELDWIDTHS feature to access the data (including the missing "blank" data, as you seem to be trying to achieve).



If you have TABs for the first three separators and blanks for the rest you should explocitly define FS="t", so that you can directly work on fields 1-3 and have the spacing intact in the final data (that you can address as a whole as field 4), which will make it easy to find "blank data".



It may further make processing easier if you create subsets of your data on the fly, operate on those, and concatenate the individual subsets afterwards. To separate the data in files depening on, say, year and instrument you can write:



awk ' print > "set_" $1 "_" $3" ' input


and it will create files named, e.g., set_2015_LEN or set_2014_IBM containing the respective entries.



The final task to identify "matching numeric column sets" depends on the previously mentioned topics; if, for example, the final eigth data columns can be addressed as one fixed length entity it might suffice to use the sort utility with an appropriately defined key-specification (see sort's option -k).



(BTW: for a compound index test instead of $1,$2,$3 in arr1 you have to write ($1,$2,$3) in arr1.)







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 30 '15 at 1:24









JanisJanis

10.4k21638




10.4k21638












  • Thank you for your suggestions, the data is tab delimited, I couldnt seem to line it up in this post, so introduced some manual spaces...the actual data has 300 columns, what I am thinking is replacing the >20 values by 1 and the others by blank, so that I can treat it as a fixed length entity. I checked out the sort -k option, but I couldn't follow how that would apply here specially for so many columns, would you give me a small example? thanks again for your suggestion on splitting the data-set.

    – Sheetal Kaul
    Mar 30 '15 at 2:34


















  • Thank you for your suggestions, the data is tab delimited, I couldnt seem to line it up in this post, so introduced some manual spaces...the actual data has 300 columns, what I am thinking is replacing the >20 values by 1 and the others by blank, so that I can treat it as a fixed length entity. I checked out the sort -k option, but I couldn't follow how that would apply here specially for so many columns, would you give me a small example? thanks again for your suggestion on splitting the data-set.

    – Sheetal Kaul
    Mar 30 '15 at 2:34

















Thank you for your suggestions, the data is tab delimited, I couldnt seem to line it up in this post, so introduced some manual spaces...the actual data has 300 columns, what I am thinking is replacing the >20 values by 1 and the others by blank, so that I can treat it as a fixed length entity. I checked out the sort -k option, but I couldn't follow how that would apply here specially for so many columns, would you give me a small example? thanks again for your suggestion on splitting the data-set.

– Sheetal Kaul
Mar 30 '15 at 2:34






Thank you for your suggestions, the data is tab delimited, I couldnt seem to line it up in this post, so introduced some manual spaces...the actual data has 300 columns, what I am thinking is replacing the >20 values by 1 and the others by blank, so that I can treat it as a fixed length entity. I checked out the sort -k option, but I couldn't follow how that would apply here specially for so many columns, would you give me a small example? thanks again for your suggestion on splitting the data-set.

– Sheetal Kaul
Mar 30 '15 at 2:34


















draft saved

draft discarded
















































Thanks for contributing an answer to Unix & Linux Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f193284%2fawk-script-to-rearrange-similar-rows%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown






Popular posts from this blog

How to check contact read email or not when send email to Individual?

Displaying single band from multi-band raster using QGIS

How many registers does an x86_64 CPU actually have?