Getting a match count of objects in a file

up vote
1
down vote

favorite

I have a large file that has entries that look like this:

entry-id: 1
sn: John
cn: Smith
empType: A
ADID: 123456

entry-id: 2
sn: James
cn: Smith
empType: B
ADID: 123456

entry-id: 3
sn: Jobu
cn: Smith
empType: A
ADID: 123456

entry-id: 4
sn: Jobu
cn: Smith
empType: A
ADID:

Each entry is separated by a new line. I need a count of entries that have an empType of A, and MUST ALSO have a value after ADID(total of 2). I've tried to use awk and grep and egrep, and still having no luck. Any ideas?

asked Dec 14 '17 at 3:29

King of NES

1163

What exactly did you try in awk? I would think something like awk -vRS= '/empType: A/ && /ADID: [0-9]+/ n++ END print n' file should work
â€“Â steeldriver
Dec 14 '17 at 3:39

running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
â€“Â King of NES
Dec 14 '17 at 3:54

You did include the correct filename to read as input?
â€“Â bu5hman
Dec 14 '17 at 4:15

it was the correct file...
â€“Â King of NES
Dec 14 '17 at 4:18

add a commentÂ |Â

up vote
1
down vote

favorite

I have a large file that has entries that look like this:

entry-id: 1
sn: John
cn: Smith
empType: A
ADID: 123456

entry-id: 2
sn: James
cn: Smith
empType: B
ADID: 123456

entry-id: 3
sn: Jobu
cn: Smith
empType: A
ADID: 123456

entry-id: 4
sn: Jobu
cn: Smith
empType: A
ADID:

asked Dec 14 '17 at 3:29

King of NES

1163

What exactly did you try in awk? I would think something like awk -vRS= '/empType: A/ && /ADID: [0-9]+/ n++ END print n' file should work
â€“Â steeldriver
Dec 14 '17 at 3:39

running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
â€“Â King of NES
Dec 14 '17 at 3:54

You did include the correct filename to read as input?
â€“Â bu5hman
Dec 14 '17 at 4:15

it was the correct file...
â€“Â King of NES
Dec 14 '17 at 4:18

add a commentÂ |Â

up vote
1
down vote

favorite

I have a large file that has entries that look like this:

entry-id: 1
sn: John
cn: Smith
empType: A
ADID: 123456

entry-id: 2
sn: James
cn: Smith
empType: B
ADID: 123456

entry-id: 3
sn: Jobu
cn: Smith
empType: A
ADID: 123456

entry-id: 4
sn: Jobu
cn: Smith
empType: A
ADID:

asked Dec 14 '17 at 3:29

King of NES

1163

I have a large file that has entries that look like this:

entry-id: 1
sn: John
cn: Smith
empType: A
ADID: 123456

entry-id: 2
sn: James
cn: Smith
empType: B
ADID: 123456

entry-id: 3
sn: Jobu
cn: Smith
empType: A
ADID: 123456

entry-id: 4
sn: Jobu
cn: Smith
empType: A
ADID:

asked Dec 14 '17 at 3:29

King of NES

1163

asked Dec 14 '17 at 3:29

King of NES

1163

asked Dec 14 '17 at 3:29

King of NES

1163

asked Dec 14 '17 at 3:29

King of NES

1163

What exactly did you try in awk? I would think something like awk -vRS= '/empType: A/ && /ADID: [0-9]+/ n++ END print n' file should work
â€“Â steeldriver
Dec 14 '17 at 3:39

running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
â€“Â King of NES
Dec 14 '17 at 3:54

You did include the correct filename to read as input?
â€“Â bu5hman
Dec 14 '17 at 4:15

it was the correct file...
â€“Â King of NES
Dec 14 '17 at 4:18

add a commentÂ |Â

What exactly did you try in awk? I would think something like awk -vRS= '/empType: A/ && /ADID: [0-9]+/ n++ END print n' file should work
â€“Â steeldriver
Dec 14 '17 at 3:39

running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
â€“Â King of NES
Dec 14 '17 at 3:54

You did include the correct filename to read as input?
â€“Â bu5hman
Dec 14 '17 at 4:15

it was the correct file...
â€“Â King of NES
Dec 14 '17 at 4:18

What exactly did you try in awk? I would think something like awk -vRS= '/empType: A/ && /ADID: [0-9]+/ n++ END print n' file should work
â€“Â steeldriver
Dec 14 '17 at 3:39

running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
â€“Â King of NES
Dec 14 '17 at 3:54

You did include the correct filename to read as input?
â€“Â bu5hman
Dec 14 '17 at 4:15

it was the correct file...
â€“Â King of NES
Dec 14 '17 at 4:18

add a commentÂ |Â

6 Answers
6

active

oldest

votes

up vote
0
down vote

Awk solution:

awk '/empType: / f=($2=="A"? 1:0) f && /ADID: [0-9]+/ c++ END print c ' file

f - flag indicating empType: A section processing

c - count of empType: A entries with filled ADID key

The output:

edited Dec 14 '17 at 6:12

answered Dec 14 '17 at 6:00

RomanPerekhrest

22.4k12145

add a commentÂ |Â

up vote
0
down vote

Here is an alternative awk solution that uses blank line "" as record separator RS and new line n as field separator FS

BEGIN RS=""; FS="n"

 split($4,a,": ")
 split($5,b,": ")

a[2]=="A" && b[2]!="" c++
END print c

the script can be executed with

awk -f main.awk file

answered Dec 14 '17 at 6:39

etopylight

383117

add a commentÂ |Â

up vote
0
down vote

Simple two grep method, where data is the input file:

grep -A1 'empType: A' data | grep -c 'ADID: .+'

Output:

edited Dec 14 '17 at 7:15

answered Dec 14 '17 at 7:09

agc

4,1101935

add a commentÂ |Â

up vote
0
down vote

I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l. So here is an awk script that does just that:

#!/usr/bin/env awk
# getids.awk

BEGIN
 RS="";
 FS="n"


/ADID: [0-9]/ && /empType: A/print $1

And here it is in action:

user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3

user@host:~$ awk -f getids.awk data.txt | wc -l
2

Of course if you just want the count we can do that too:

#!/usr/bin/env awk
# count.awk

BEGIN 
 RS="";
 FS="n";
 count=0;


/ADID: [0-9]/ && /empType: A/count++

END 
 print count

And because I love Python, here is a Python script that does the same thing:

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""

import sys

# Create a list to store the matched records
records = 

# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
 for line in data:

 # When an "entry-id" is reached, create a new record
 if line.startswith('entry-id'):
 entry_id = line.split(':')[1].strip()
 records.append('entry-id': entry_id)

 # For other lines, update the current record
 elif line.strip():
 key = line.partition(':')[0].strip()
 value = line.partition(':')[2].strip()
 records[-1][key] = value

 # Extract the list of records meeting the desired critera
 matches = [record for record in records if record['empType'] == 'A' and record['ADID']]

 # Print out the entry-ids for all of the matches
 for match in matches:
 print('entry-id: ' + match['entry-id'])

And here's the Python script in action:

user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3

user@host:~$ python getids.py data.txt | wc -l
2

And if we really do just want the counts:

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""

import sys

# Keep a count of the number of matches 
count = 0

# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False

# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
 for line in data:

 # When an "entry-id" is reached, reset the flags 
 if line.startswith('entry-id'):
 emptype_flag = False
 adid_flag = False
 elif line.strip() == "empType: A":
 emptype_flag = True
 elif line.startswith("ADID") and line.strip().split(':')[1]:
 adid_flag = True

 # If both conditions hold the increment the counter
 # and reset the flags
 if emptype_flag and adid_flag:
 count = count + 1
 emptype_flag = False
 adid_flag = False

 # Print the number of matches
 print(count)

And, while we're at it, how about a pure Bash script? Here's one:

#!/usr/bin/env bash

# getids.bash

while read line; do
if [[ "$line" =~ "entry-id:" ]]; then
 entry_id="$line"
 emptype=false
 adid=false
elif [[ "$line" =~ "empType: A" ]]; then
 emptype=true
elif [[ "$line" =~ ADID: [0-9] ]]; then
 adid=true
fi
if [[ "$emptype" == true && "$adid" == true ]]; then
 echo "$entry_id"
 emptype=false
 adid=false
fi
done < "$1"

And running the bash script:

user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3

And finally, here's something using just grep and wc:

user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l

2

edited Dec 14 '17 at 13:36

answered Dec 14 '17 at 5:39

igal

4,830930

add a commentÂ |Â

up vote
0
down vote

With perl, that could be:

perl -l -00ne '
 my %f = /(.*?):s*(.*)/g;
 ++$n if $fempType eq "A" && $fADID ne "";
 END print 0+$n' < file

-n causes the code given to -e to be applied to each input record

-00 for records to be paragraphs.

We build a %f associative array where key and values are mapped to each (key):spaces(value) in the record.

and increment $n where the conditions are met.

we print $n in the END (adding 0 to make sure we get 0 and not an empty string if there's no match).

edited Dec 14 '17 at 14:57

answered Dec 14 '17 at 14:14

StÃ©phane Chazelas

282k53520854

add a commentÂ |Â

up vote
0
down vote

I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was

perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"

Now I didn't know what perl -000 does, but i think it's saying search multiple lines within a paragraph,
-n while loop
e one line of program??
print paragraph if you find empType: A
now pipe those matched paragraphs to |
grep -i -c "^ADID:" find ignore cased and count number of ADIDs.
I'm not sure if the other commands failed because of my Linux version, but the above command worked pretty well, not sure how to make the empType an ignored case though....

answered Dec 14 '17 at 16:13

King of NES

1163

add a commentÂ |Â

Your Answer

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: false,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);

);

draft saved

draft discarded

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f410792%2fgetting-a-match-count-of-objects-in-a-file%23new-answer', 'question_page');

);

Post as a guest

Name

6 Answers
6

active

oldest

votes

6 Answers
6

active

oldest

votes

up vote
0
down vote

Awk solution:

awk '/empType: / f=($2=="A"? 1:0) f && /ADID: [0-9]+/ c++ END print c ' file

f - flag indicating empType: A section processing

c - count of empType: A entries with filled ADID key

The output:

edited Dec 14 '17 at 6:12

answered Dec 14 '17 at 6:00

RomanPerekhrest

22.4k12145

add a commentÂ |Â

up vote
0
down vote

Awk solution:

awk '/empType: / f=($2=="A"? 1:0) f && /ADID: [0-9]+/ c++ END print c ' file

f - flag indicating empType: A section processing

c - count of empType: A entries with filled ADID key

The output:

edited Dec 14 '17 at 6:12

answered Dec 14 '17 at 6:00

RomanPerekhrest

22.4k12145

add a commentÂ |Â

up vote
0
down vote

Awk solution:

awk '/empType: / f=($2=="A"? 1:0) f && /ADID: [0-9]+/ c++ END print c ' file

f - flag indicating empType: A section processing

c - count of empType: A entries with filled ADID key

The output:

edited Dec 14 '17 at 6:12

answered Dec 14 '17 at 6:00

RomanPerekhrest

22.4k12145

Awk solution:

awk '/empType: / f=($2=="A"? 1:0) f && /ADID: [0-9]+/ c++ END print c ' file

f - flag indicating empType: A section processing

c - count of empType: A entries with filled ADID key

The output:

edited Dec 14 '17 at 6:12

answered Dec 14 '17 at 6:00

RomanPerekhrest

22.4k12145

edited Dec 14 '17 at 6:12

answered Dec 14 '17 at 6:00

RomanPerekhrest

22.4k12145

answered Dec 14 '17 at 6:00

RomanPerekhrest

22.4k12145

answered Dec 14 '17 at 6:00

RomanPerekhrest

22.4k12145

add a commentÂ |Â

up vote
0
down vote

Here is an alternative awk solution that uses blank line "" as record separator RS and new line n as field separator FS

BEGIN RS=""; FS="n"

 split($4,a,": ")
 split($5,b,": ")

a[2]=="A" && b[2]!="" c++
END print c

the script can be executed with

awk -f main.awk file

answered Dec 14 '17 at 6:39

etopylight

383117

add a commentÂ |Â

up vote
0
down vote

Here is an alternative awk solution that uses blank line "" as record separator RS and new line n as field separator FS

BEGIN RS=""; FS="n"

 split($4,a,": ")
 split($5,b,": ")

a[2]=="A" && b[2]!="" c++
END print c

the script can be executed with

awk -f main.awk file

answered Dec 14 '17 at 6:39

etopylight

383117

add a commentÂ |Â

up vote
0
down vote

Here is an alternative awk solution that uses blank line "" as record separator RS and new line n as field separator FS

BEGIN RS=""; FS="n"

 split($4,a,": ")
 split($5,b,": ")

a[2]=="A" && b[2]!="" c++
END print c

the script can be executed with

awk -f main.awk file

answered Dec 14 '17 at 6:39

etopylight

383117

Here is an alternative awk solution that uses blank line "" as record separator RS and new line n as field separator FS

BEGIN RS=""; FS="n"

 split($4,a,": ")
 split($5,b,": ")

a[2]=="A" && b[2]!="" c++
END print c

the script can be executed with

awk -f main.awk file

answered Dec 14 '17 at 6:39

etopylight

383117

answered Dec 14 '17 at 6:39

etopylight

383117

answered Dec 14 '17 at 6:39

etopylight

383117

answered Dec 14 '17 at 6:39

etopylight

383117

add a commentÂ |Â

up vote
0
down vote

Simple two grep method, where data is the input file:

grep -A1 'empType: A' data | grep -c 'ADID: .+'

Output:

edited Dec 14 '17 at 7:15

answered Dec 14 '17 at 7:09

agc

4,1101935

add a commentÂ |Â

up vote
0
down vote

Simple two grep method, where data is the input file:

grep -A1 'empType: A' data | grep -c 'ADID: .+'

Output:

edited Dec 14 '17 at 7:15

answered Dec 14 '17 at 7:09

agc

4,1101935

add a commentÂ |Â

up vote
0
down vote

Simple two grep method, where data is the input file:

grep -A1 'empType: A' data | grep -c 'ADID: .+'

Output:

edited Dec 14 '17 at 7:15

answered Dec 14 '17 at 7:09

agc

4,1101935

Simple two grep method, where data is the input file:

grep -A1 'empType: A' data | grep -c 'ADID: .+'

Output:

edited Dec 14 '17 at 7:15

answered Dec 14 '17 at 7:09

agc

4,1101935

edited Dec 14 '17 at 7:15

answered Dec 14 '17 at 7:09

agc

4,1101935

answered Dec 14 '17 at 7:09

agc

4,1101935

answered Dec 14 '17 at 7:09

agc

4,1101935

add a commentÂ |Â

up vote
0
down vote

I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l. So here is an awk script that does just that:

#!/usr/bin/env awk
# getids.awk

BEGIN
 RS="";
 FS="n"


/ADID: [0-9]/ && /empType: A/print $1

And here it is in action:

user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3

user@host:~$ awk -f getids.awk data.txt | wc -l
2

Of course if you just want the count we can do that too:

#!/usr/bin/env awk
# count.awk

BEGIN 
 RS="";
 FS="n";
 count=0;


/ADID: [0-9]/ && /empType: A/count++

END 
 print count

And because I love Python, here is a Python script that does the same thing:

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""

import sys

# Create a list to store the matched records
records = 

# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
 for line in data:

 # When an "entry-id" is reached, create a new record
 if line.startswith('entry-id'):
 entry_id = line.split(':')[1].strip()
 records.append('entry-id': entry_id)

 # For other lines, update the current record
 elif line.strip():
 key = line.partition(':')[0].strip()
 value = line.partition(':')[2].strip()
 records[-1][key] = value

 # Extract the list of records meeting the desired critera
 matches = [record for record in records if record['empType'] == 'A' and record['ADID']]

 # Print out the entry-ids for all of the matches
 for match in matches:
 print('entry-id: ' + match['entry-id'])

And here's the Python script in action:

user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3

user@host:~$ python getids.py data.txt | wc -l
2

And if we really do just want the counts:

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""

import sys

# Keep a count of the number of matches 
count = 0

# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False

# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
 for line in data:

 # When an "entry-id" is reached, reset the flags 
 if line.startswith('entry-id'):
 emptype_flag = False
 adid_flag = False
 elif line.strip() == "empType: A":
 emptype_flag = True
 elif line.startswith("ADID") and line.strip().split(':')[1]:
 adid_flag = True

 # If both conditions hold the increment the counter
 # and reset the flags
 if emptype_flag and adid_flag:
 count = count + 1
 emptype_flag = False
 adid_flag = False

 # Print the number of matches
 print(count)

And, while we're at it, how about a pure Bash script? Here's one:

#!/usr/bin/env bash

# getids.bash

while read line; do
if [[ "$line" =~ "entry-id:" ]]; then
 entry_id="$line"
 emptype=false
 adid=false
elif [[ "$line" =~ "empType: A" ]]; then
 emptype=true
elif [[ "$line" =~ ADID: [0-9] ]]; then
 adid=true
fi
if [[ "$emptype" == true && "$adid" == true ]]; then
 echo "$entry_id"
 emptype=false
 adid=false
fi
done < "$1"

And running the bash script:

user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3

And finally, here's something using just grep and wc:

user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l

2

edited Dec 14 '17 at 13:36

answered Dec 14 '17 at 5:39

igal

4,830930

add a commentÂ |Â

up vote
0
down vote

I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l. So here is an awk script that does just that:

#!/usr/bin/env awk
# getids.awk

BEGIN
 RS="";
 FS="n"


/ADID: [0-9]/ && /empType: A/print $1

And here it is in action:

user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3

user@host:~$ awk -f getids.awk data.txt | wc -l
2

Of course if you just want the count we can do that too:

#!/usr/bin/env awk
# count.awk

BEGIN 
 RS="";
 FS="n";
 count=0;


/ADID: [0-9]/ && /empType: A/count++

END 
 print count

And because I love Python, here is a Python script that does the same thing:

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""

import sys

# Create a list to store the matched records
records = 

# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
 for line in data:

 # When an "entry-id" is reached, create a new record
 if line.startswith('entry-id'):
 entry_id = line.split(':')[1].strip()
 records.append('entry-id': entry_id)

 # For other lines, update the current record
 elif line.strip():
 key = line.partition(':')[0].strip()
 value = line.partition(':')[2].strip()
 records[-1][key] = value

 # Extract the list of records meeting the desired critera
 matches = [record for record in records if record['empType'] == 'A' and record['ADID']]

 # Print out the entry-ids for all of the matches
 for match in matches:
 print('entry-id: ' + match['entry-id'])

And here's the Python script in action:

user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3

user@host:~$ python getids.py data.txt | wc -l
2

And if we really do just want the counts:

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""

import sys

# Keep a count of the number of matches 
count = 0

# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False

# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
 for line in data:

 # When an "entry-id" is reached, reset the flags 
 if line.startswith('entry-id'):
 emptype_flag = False
 adid_flag = False
 elif line.strip() == "empType: A":
 emptype_flag = True
 elif line.startswith("ADID") and line.strip().split(':')[1]:
 adid_flag = True

 # If both conditions hold the increment the counter
 # and reset the flags
 if emptype_flag and adid_flag:
 count = count + 1
 emptype_flag = False
 adid_flag = False

 # Print the number of matches
 print(count)

And, while we're at it, how about a pure Bash script? Here's one:

#!/usr/bin/env bash

# getids.bash

while read line; do
if [[ "$line" =~ "entry-id:" ]]; then
 entry_id="$line"
 emptype=false
 adid=false
elif [[ "$line" =~ "empType: A" ]]; then
 emptype=true
elif [[ "$line" =~ ADID: [0-9] ]]; then
 adid=true
fi
if [[ "$emptype" == true && "$adid" == true ]]; then
 echo "$entry_id"
 emptype=false
 adid=false
fi
done < "$1"

And running the bash script:

user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3

And finally, here's something using just grep and wc:

user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l

2

edited Dec 14 '17 at 13:36

answered Dec 14 '17 at 5:39

igal

4,830930

add a commentÂ |Â

up vote
0
down vote

I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l. So here is an awk script that does just that:

#!/usr/bin/env awk
# getids.awk

BEGIN
 RS="";
 FS="n"


/ADID: [0-9]/ && /empType: A/print $1

And here it is in action:

user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3

user@host:~$ awk -f getids.awk data.txt | wc -l
2

Of course if you just want the count we can do that too:

#!/usr/bin/env awk
# count.awk

BEGIN 
 RS="";
 FS="n";
 count=0;


/ADID: [0-9]/ && /empType: A/count++

END 
 print count

And because I love Python, here is a Python script that does the same thing:

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""

import sys

# Create a list to store the matched records
records = 

# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
 for line in data:

 # When an "entry-id" is reached, create a new record
 if line.startswith('entry-id'):
 entry_id = line.split(':')[1].strip()
 records.append('entry-id': entry_id)

 # For other lines, update the current record
 elif line.strip():
 key = line.partition(':')[0].strip()
 value = line.partition(':')[2].strip()
 records[-1][key] = value

 # Extract the list of records meeting the desired critera
 matches = [record for record in records if record['empType'] == 'A' and record['ADID']]

 # Print out the entry-ids for all of the matches
 for match in matches:
 print('entry-id: ' + match['entry-id'])

And here's the Python script in action:

user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3

user@host:~$ python getids.py data.txt | wc -l
2

And if we really do just want the counts:

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""

import sys

# Keep a count of the number of matches 
count = 0

# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False

# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
 for line in data:

 # When an "entry-id" is reached, reset the flags 
 if line.startswith('entry-id'):
 emptype_flag = False
 adid_flag = False
 elif line.strip() == "empType: A":
 emptype_flag = True
 elif line.startswith("ADID") and line.strip().split(':')[1]:
 adid_flag = True

 # If both conditions hold the increment the counter
 # and reset the flags
 if emptype_flag and adid_flag:
 count = count + 1
 emptype_flag = False
 adid_flag = False

 # Print the number of matches
 print(count)

And, while we're at it, how about a pure Bash script? Here's one:

#!/usr/bin/env bash

# getids.bash

while read line; do
if [[ "$line" =~ "entry-id:" ]]; then
 entry_id="$line"
 emptype=false
 adid=false
elif [[ "$line" =~ "empType: A" ]]; then
 emptype=true
elif [[ "$line" =~ ADID: [0-9] ]]; then
 adid=true
fi
if [[ "$emptype" == true && "$adid" == true ]]; then
 echo "$entry_id"
 emptype=false
 adid=false
fi
done < "$1"

And running the bash script:

user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3

And finally, here's something using just grep and wc:

user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l

2

edited Dec 14 '17 at 13:36

answered Dec 14 '17 at 5:39

igal

4,830930

I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l. So here is an awk script that does just that:

#!/usr/bin/env awk
# getids.awk

BEGIN
 RS="";
 FS="n"


/ADID: [0-9]/ && /empType: A/print $1

And here it is in action:

user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3

user@host:~$ awk -f getids.awk data.txt | wc -l
2

Of course if you just want the count we can do that too:

#!/usr/bin/env awk
# count.awk

BEGIN 
 RS="";
 FS="n";
 count=0;


/ADID: [0-9]/ && /empType: A/count++

END 
 print count

And because I love Python, here is a Python script that does the same thing:

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""

import sys

# Create a list to store the matched records
records = 

# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
 for line in data:

 # When an "entry-id" is reached, create a new record
 if line.startswith('entry-id'):
 entry_id = line.split(':')[1].strip()
 records.append('entry-id': entry_id)

 # For other lines, update the current record
 elif line.strip():
 key = line.partition(':')[0].strip()
 value = line.partition(':')[2].strip()
 records[-1][key] = value

 # Extract the list of records meeting the desired critera
 matches = [record for record in records if record['empType'] == 'A' and record['ADID']]

 # Print out the entry-ids for all of the matches
 for match in matches:
 print('entry-id: ' + match['entry-id'])

And here's the Python script in action:

user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3

user@host:~$ python getids.py data.txt | wc -l
2

And if we really do just want the counts:

#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""

import sys

# Keep a count of the number of matches 
count = 0

# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False

# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
 for line in data:

 # When an "entry-id" is reached, reset the flags 
 if line.startswith('entry-id'):
 emptype_flag = False
 adid_flag = False
 elif line.strip() == "empType: A":
 emptype_flag = True
 elif line.startswith("ADID") and line.strip().split(':')[1]:
 adid_flag = True

 # If both conditions hold the increment the counter
 # and reset the flags
 if emptype_flag and adid_flag:
 count = count + 1
 emptype_flag = False
 adid_flag = False

 # Print the number of matches
 print(count)

And, while we're at it, how about a pure Bash script? Here's one:

#!/usr/bin/env bash

# getids.bash

while read line; do
if [[ "$line" =~ "entry-id:" ]]; then
 entry_id="$line"
 emptype=false
 adid=false
elif [[ "$line" =~ "empType: A" ]]; then
 emptype=true
elif [[ "$line" =~ ADID: [0-9] ]]; then
 adid=true
fi
if [[ "$emptype" == true && "$adid" == true ]]; then
 echo "$entry_id"
 emptype=false
 adid=false
fi
done < "$1"

And running the bash script:

user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3

And finally, here's something using just grep and wc:

user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l

2

edited Dec 14 '17 at 13:36

answered Dec 14 '17 at 5:39

igal

4,830930

edited Dec 14 '17 at 13:36

answered Dec 14 '17 at 5:39

igal

4,830930

answered Dec 14 '17 at 5:39

igal

4,830930

answered Dec 14 '17 at 5:39

igal

4,830930

add a commentÂ |Â

up vote
0
down vote

With perl, that could be:

perl -l -00ne '
 my %f = /(.*?):s*(.*)/g;
 ++$n if $fempType eq "A" && $fADID ne "";
 END print 0+$n' < file

-n causes the code given to -e to be applied to each input record

-00 for records to be paragraphs.

We build a %f associative array where key and values are mapped to each (key):spaces(value) in the record.

and increment $n where the conditions are met.

we print $n in the END (adding 0 to make sure we get 0 and not an empty string if there's no match).

edited Dec 14 '17 at 14:57

answered Dec 14 '17 at 14:14

282k53520854

add a commentÂ |Â

up vote
0
down vote

With perl, that could be:

perl -l -00ne '
 my %f = /(.*?):s*(.*)/g;
 ++$n if $fempType eq "A" && $fADID ne "";
 END print 0+$n' < file

-n causes the code given to -e to be applied to each input record

-00 for records to be paragraphs.

We build a %f associative array where key and values are mapped to each (key):spaces(value) in the record.

and increment $n where the conditions are met.

we print $n in the END (adding 0 to make sure we get 0 and not an empty string if there's no match).

edited Dec 14 '17 at 14:57

answered Dec 14 '17 at 14:14

282k53520854

add a commentÂ |Â

up vote
0
down vote

With perl, that could be:

perl -l -00ne '
 my %f = /(.*?):s*(.*)/g;
 ++$n if $fempType eq "A" && $fADID ne "";
 END print 0+$n' < file

-n causes the code given to -e to be applied to each input record

-00 for records to be paragraphs.

We build a %f associative array where key and values are mapped to each (key):spaces(value) in the record.

and increment $n where the conditions are met.

we print $n in the END (adding 0 to make sure we get 0 and not an empty string if there's no match).

edited Dec 14 '17 at 14:57

answered Dec 14 '17 at 14:14

282k53520854

With perl, that could be:

perl -l -00ne '
 my %f = /(.*?):s*(.*)/g;
 ++$n if $fempType eq "A" && $fADID ne "";
 END print 0+$n' < file

-n causes the code given to -e to be applied to each input record

-00 for records to be paragraphs.

We build a %f associative array where key and values are mapped to each (key):spaces(value) in the record.

and increment $n where the conditions are met.

we print $n in the END (adding 0 to make sure we get 0 and not an empty string if there's no match).

edited Dec 14 '17 at 14:57

answered Dec 14 '17 at 14:14

282k53520854

edited Dec 14 '17 at 14:57

answered Dec 14 '17 at 14:14

282k53520854

answered Dec 14 '17 at 14:14

282k53520854

answered Dec 14 '17 at 14:14

282k53520854

add a commentÂ |Â

up vote
0
down vote

I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was

perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"

answered Dec 14 '17 at 16:13

King of NES

1163

add a commentÂ |Â

up vote
0
down vote

I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was

perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"

answered Dec 14 '17 at 16:13

King of NES

1163

add a commentÂ |Â

up vote
0
down vote

I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was

perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"

answered Dec 14 '17 at 16:13

King of NES

1163

I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was

perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"

answered Dec 14 '17 at 16:13

King of NES

1163

answered Dec 14 '17 at 16:13

King of NES

1163

answered Dec 14 '17 at 16:13

King of NES

1163

answered Dec 14 '17 at 16:13

King of NES

1163

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Post as a guest

Name

搜尋此網誌

mjhjmtu