Getting a match count of objects in a file
Clash Royale CLAN TAG#URR8PPP
up vote
1
down vote
favorite
I have a large file that has entries that look like this:
entry-id: 1
sn: John
cn: Smith
empType: A
ADID: 123456
entry-id: 2
sn: James
cn: Smith
empType: B
ADID: 123456
entry-id: 3
sn: Jobu
cn: Smith
empType: A
ADID: 123456
entry-id: 4
sn: Jobu
cn: Smith
empType: A
ADID:
Each entry is separated by a new line. I need a count of entries that have an empType of A, and MUST ALSO have a value after ADID(total of 2). I've tried to use awk and grep and egrep, and still having no luck. Any ideas?
linux command-line
add a comment |Â
up vote
1
down vote
favorite
I have a large file that has entries that look like this:
entry-id: 1
sn: John
cn: Smith
empType: A
ADID: 123456
entry-id: 2
sn: James
cn: Smith
empType: B
ADID: 123456
entry-id: 3
sn: Jobu
cn: Smith
empType: A
ADID: 123456
entry-id: 4
sn: Jobu
cn: Smith
empType: A
ADID:
Each entry is separated by a new line. I need a count of entries that have an empType of A, and MUST ALSO have a value after ADID(total of 2). I've tried to use awk and grep and egrep, and still having no luck. Any ideas?
linux command-line
What exactly did you try in awk? I would think something likeawk -vRS= '/empType: A/ && /ADID: [0-9]+/ n++ END print n' file
should work
â steeldriver
Dec 14 '17 at 3:39
running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
â King of NES
Dec 14 '17 at 3:54
You did include the correct filename to read as input?
â bu5hman
Dec 14 '17 at 4:15
it was the correct file...
â King of NES
Dec 14 '17 at 4:18
add a comment |Â
up vote
1
down vote
favorite
up vote
1
down vote
favorite
I have a large file that has entries that look like this:
entry-id: 1
sn: John
cn: Smith
empType: A
ADID: 123456
entry-id: 2
sn: James
cn: Smith
empType: B
ADID: 123456
entry-id: 3
sn: Jobu
cn: Smith
empType: A
ADID: 123456
entry-id: 4
sn: Jobu
cn: Smith
empType: A
ADID:
Each entry is separated by a new line. I need a count of entries that have an empType of A, and MUST ALSO have a value after ADID(total of 2). I've tried to use awk and grep and egrep, and still having no luck. Any ideas?
linux command-line
I have a large file that has entries that look like this:
entry-id: 1
sn: John
cn: Smith
empType: A
ADID: 123456
entry-id: 2
sn: James
cn: Smith
empType: B
ADID: 123456
entry-id: 3
sn: Jobu
cn: Smith
empType: A
ADID: 123456
entry-id: 4
sn: Jobu
cn: Smith
empType: A
ADID:
Each entry is separated by a new line. I need a count of entries that have an empType of A, and MUST ALSO have a value after ADID(total of 2). I've tried to use awk and grep and egrep, and still having no luck. Any ideas?
linux command-line
asked Dec 14 '17 at 3:29
King of NES
1163
1163
What exactly did you try in awk? I would think something likeawk -vRS= '/empType: A/ && /ADID: [0-9]+/ n++ END print n' file
should work
â steeldriver
Dec 14 '17 at 3:39
running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
â King of NES
Dec 14 '17 at 3:54
You did include the correct filename to read as input?
â bu5hman
Dec 14 '17 at 4:15
it was the correct file...
â King of NES
Dec 14 '17 at 4:18
add a comment |Â
What exactly did you try in awk? I would think something likeawk -vRS= '/empType: A/ && /ADID: [0-9]+/ n++ END print n' file
should work
â steeldriver
Dec 14 '17 at 3:39
running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
â King of NES
Dec 14 '17 at 3:54
You did include the correct filename to read as input?
â bu5hman
Dec 14 '17 at 4:15
it was the correct file...
â King of NES
Dec 14 '17 at 4:18
What exactly did you try in awk? I would think something like
awk -vRS= '/empType: A/ && /ADID: [0-9]+/ n++ END print n' file
should workâ steeldriver
Dec 14 '17 at 3:39
What exactly did you try in awk? I would think something like
awk -vRS= '/empType: A/ && /ADID: [0-9]+/ n++ END print n' file
should workâ steeldriver
Dec 14 '17 at 3:39
running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
â King of NES
Dec 14 '17 at 3:54
running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
â King of NES
Dec 14 '17 at 3:54
You did include the correct filename to read as input?
â bu5hman
Dec 14 '17 at 4:15
You did include the correct filename to read as input?
â bu5hman
Dec 14 '17 at 4:15
it was the correct file...
â King of NES
Dec 14 '17 at 4:18
it was the correct file...
â King of NES
Dec 14 '17 at 4:18
add a comment |Â
6 Answers
6
active
oldest
votes
up vote
0
down vote
Awk
solution:
awk '/empType: / f=($2=="A"? 1:0) f && /ADID: [0-9]+/ c++ END print c ' file
f
- flag indicatingempType: A
section processingc
- count ofempType: A
entries with filledADID
key
The output:
2
add a comment |Â
up vote
0
down vote
Here is an alternative awk solution that uses blank line ""
as record separator RS
and new line n
as field separator FS
BEGIN RS=""; FS="n"
split($4,a,": ")
split($5,b,": ")
a[2]=="A" && b[2]!="" c++
END print c
the script can be executed with
awk -f main.awk file
add a comment |Â
up vote
0
down vote
Simple two grep
method, where data is the input file:
grep -A1 'empType: A' data | grep -c 'ADID: .+'
Output:
2
add a comment |Â
up vote
0
down vote
I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l
. So here is an awk
script that does just that:
#!/usr/bin/env awk
# getids.awk
BEGIN
RS="";
FS="n"
/ADID: [0-9]/ && /empType: A/print $1
And here it is in action:
user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3
user@host:~$ awk -f getids.awk data.txt | wc -l
2
Of course if you just want the count we can do that too:
#!/usr/bin/env awk
# count.awk
BEGIN
RS="";
FS="n";
count=0;
/ADID: [0-9]/ && /empType: A/count++
END
print count
And because I love Python, here is a Python script that does the same thing:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""
import sys
# Create a list to store the matched records
records =
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, create a new record
if line.startswith('entry-id'):
entry_id = line.split(':')[1].strip()
records.append('entry-id': entry_id)
# For other lines, update the current record
elif line.strip():
key = line.partition(':')[0].strip()
value = line.partition(':')[2].strip()
records[-1][key] = value
# Extract the list of records meeting the desired critera
matches = [record for record in records if record['empType'] == 'A' and record['ADID']]
# Print out the entry-ids for all of the matches
for match in matches:
print('entry-id: ' + match['entry-id'])
And here's the Python script in action:
user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3
user@host:~$ python getids.py data.txt | wc -l
2
And if we really do just want the counts:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""
import sys
# Keep a count of the number of matches
count = 0
# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, reset the flags
if line.startswith('entry-id'):
emptype_flag = False
adid_flag = False
elif line.strip() == "empType: A":
emptype_flag = True
elif line.startswith("ADID") and line.strip().split(':')[1]:
adid_flag = True
# If both conditions hold the increment the counter
# and reset the flags
if emptype_flag and adid_flag:
count = count + 1
emptype_flag = False
adid_flag = False
# Print the number of matches
print(count)
And, while we're at it, how about a pure Bash script? Here's one:
#!/usr/bin/env bash
# getids.bash
while read line; do
if [[ "$line" =~ "entry-id:" ]]; then
entry_id="$line"
emptype=false
adid=false
elif [[ "$line" =~ "empType: A" ]]; then
emptype=true
elif [[ "$line" =~ ADID: [0-9] ]]; then
adid=true
fi
if [[ "$emptype" == true && "$adid" == true ]]; then
echo "$entry_id"
emptype=false
adid=false
fi
done < "$1"
And running the bash
script:
user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3
And finally, here's something using just grep
and wc
:
user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l
2
add a comment |Â
up vote
0
down vote
With perl
, that could be:
perl -l -00ne '
my %f = /(.*?):s*(.*)/g;
++$n if $fempType eq "A" && $fADID ne "";
END print 0+$n' < file
-n
causes the code given to-e
to be applied to each input record-00
for records to be paragraphs.- We build a
%f
associative array where key and values are mapped to each(key):spaces(value)
in the record. - and increment
$n
where the conditions are met. - we print
$n
in theEND
(adding0
to make sure we get0
and not an empty string if there's no match).
add a comment |Â
up vote
0
down vote
I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was
perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"
Now I didn't know what perl -000 does, but i think it's saying search multiple lines within a paragraph,
-n while loop
e one line of program??
print paragraph if you find empType: A
now pipe those matched paragraphs to |
grep -i -c "^ADID:" find ignore cased and count number of ADIDs.
I'm not sure if the other commands failed because of my Linux version, but the above command worked pretty well, not sure how to make the empType an ignored case though....
add a comment |Â
6 Answers
6
active
oldest
votes
6 Answers
6
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
0
down vote
Awk
solution:
awk '/empType: / f=($2=="A"? 1:0) f && /ADID: [0-9]+/ c++ END print c ' file
f
- flag indicatingempType: A
section processingc
- count ofempType: A
entries with filledADID
key
The output:
2
add a comment |Â
up vote
0
down vote
Awk
solution:
awk '/empType: / f=($2=="A"? 1:0) f && /ADID: [0-9]+/ c++ END print c ' file
f
- flag indicatingempType: A
section processingc
- count ofempType: A
entries with filledADID
key
The output:
2
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Awk
solution:
awk '/empType: / f=($2=="A"? 1:0) f && /ADID: [0-9]+/ c++ END print c ' file
f
- flag indicatingempType: A
section processingc
- count ofempType: A
entries with filledADID
key
The output:
2
Awk
solution:
awk '/empType: / f=($2=="A"? 1:0) f && /ADID: [0-9]+/ c++ END print c ' file
f
- flag indicatingempType: A
section processingc
- count ofempType: A
entries with filledADID
key
The output:
2
edited Dec 14 '17 at 6:12
answered Dec 14 '17 at 6:00
RomanPerekhrest
22.4k12145
22.4k12145
add a comment |Â
add a comment |Â
up vote
0
down vote
Here is an alternative awk solution that uses blank line ""
as record separator RS
and new line n
as field separator FS
BEGIN RS=""; FS="n"
split($4,a,": ")
split($5,b,": ")
a[2]=="A" && b[2]!="" c++
END print c
the script can be executed with
awk -f main.awk file
add a comment |Â
up vote
0
down vote
Here is an alternative awk solution that uses blank line ""
as record separator RS
and new line n
as field separator FS
BEGIN RS=""; FS="n"
split($4,a,": ")
split($5,b,": ")
a[2]=="A" && b[2]!="" c++
END print c
the script can be executed with
awk -f main.awk file
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Here is an alternative awk solution that uses blank line ""
as record separator RS
and new line n
as field separator FS
BEGIN RS=""; FS="n"
split($4,a,": ")
split($5,b,": ")
a[2]=="A" && b[2]!="" c++
END print c
the script can be executed with
awk -f main.awk file
Here is an alternative awk solution that uses blank line ""
as record separator RS
and new line n
as field separator FS
BEGIN RS=""; FS="n"
split($4,a,": ")
split($5,b,": ")
a[2]=="A" && b[2]!="" c++
END print c
the script can be executed with
awk -f main.awk file
answered Dec 14 '17 at 6:39
etopylight
383117
383117
add a comment |Â
add a comment |Â
up vote
0
down vote
Simple two grep
method, where data is the input file:
grep -A1 'empType: A' data | grep -c 'ADID: .+'
Output:
2
add a comment |Â
up vote
0
down vote
Simple two grep
method, where data is the input file:
grep -A1 'empType: A' data | grep -c 'ADID: .+'
Output:
2
add a comment |Â
up vote
0
down vote
up vote
0
down vote
Simple two grep
method, where data is the input file:
grep -A1 'empType: A' data | grep -c 'ADID: .+'
Output:
2
Simple two grep
method, where data is the input file:
grep -A1 'empType: A' data | grep -c 'ADID: .+'
Output:
2
edited Dec 14 '17 at 7:15
answered Dec 14 '17 at 7:09
agc
4,1101935
4,1101935
add a comment |Â
add a comment |Â
up vote
0
down vote
I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l
. So here is an awk
script that does just that:
#!/usr/bin/env awk
# getids.awk
BEGIN
RS="";
FS="n"
/ADID: [0-9]/ && /empType: A/print $1
And here it is in action:
user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3
user@host:~$ awk -f getids.awk data.txt | wc -l
2
Of course if you just want the count we can do that too:
#!/usr/bin/env awk
# count.awk
BEGIN
RS="";
FS="n";
count=0;
/ADID: [0-9]/ && /empType: A/count++
END
print count
And because I love Python, here is a Python script that does the same thing:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""
import sys
# Create a list to store the matched records
records =
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, create a new record
if line.startswith('entry-id'):
entry_id = line.split(':')[1].strip()
records.append('entry-id': entry_id)
# For other lines, update the current record
elif line.strip():
key = line.partition(':')[0].strip()
value = line.partition(':')[2].strip()
records[-1][key] = value
# Extract the list of records meeting the desired critera
matches = [record for record in records if record['empType'] == 'A' and record['ADID']]
# Print out the entry-ids for all of the matches
for match in matches:
print('entry-id: ' + match['entry-id'])
And here's the Python script in action:
user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3
user@host:~$ python getids.py data.txt | wc -l
2
And if we really do just want the counts:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""
import sys
# Keep a count of the number of matches
count = 0
# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, reset the flags
if line.startswith('entry-id'):
emptype_flag = False
adid_flag = False
elif line.strip() == "empType: A":
emptype_flag = True
elif line.startswith("ADID") and line.strip().split(':')[1]:
adid_flag = True
# If both conditions hold the increment the counter
# and reset the flags
if emptype_flag and adid_flag:
count = count + 1
emptype_flag = False
adid_flag = False
# Print the number of matches
print(count)
And, while we're at it, how about a pure Bash script? Here's one:
#!/usr/bin/env bash
# getids.bash
while read line; do
if [[ "$line" =~ "entry-id:" ]]; then
entry_id="$line"
emptype=false
adid=false
elif [[ "$line" =~ "empType: A" ]]; then
emptype=true
elif [[ "$line" =~ ADID: [0-9] ]]; then
adid=true
fi
if [[ "$emptype" == true && "$adid" == true ]]; then
echo "$entry_id"
emptype=false
adid=false
fi
done < "$1"
And running the bash
script:
user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3
And finally, here's something using just grep
and wc
:
user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l
2
add a comment |Â
up vote
0
down vote
I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l
. So here is an awk
script that does just that:
#!/usr/bin/env awk
# getids.awk
BEGIN
RS="";
FS="n"
/ADID: [0-9]/ && /empType: A/print $1
And here it is in action:
user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3
user@host:~$ awk -f getids.awk data.txt | wc -l
2
Of course if you just want the count we can do that too:
#!/usr/bin/env awk
# count.awk
BEGIN
RS="";
FS="n";
count=0;
/ADID: [0-9]/ && /empType: A/count++
END
print count
And because I love Python, here is a Python script that does the same thing:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""
import sys
# Create a list to store the matched records
records =
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, create a new record
if line.startswith('entry-id'):
entry_id = line.split(':')[1].strip()
records.append('entry-id': entry_id)
# For other lines, update the current record
elif line.strip():
key = line.partition(':')[0].strip()
value = line.partition(':')[2].strip()
records[-1][key] = value
# Extract the list of records meeting the desired critera
matches = [record for record in records if record['empType'] == 'A' and record['ADID']]
# Print out the entry-ids for all of the matches
for match in matches:
print('entry-id: ' + match['entry-id'])
And here's the Python script in action:
user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3
user@host:~$ python getids.py data.txt | wc -l
2
And if we really do just want the counts:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""
import sys
# Keep a count of the number of matches
count = 0
# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, reset the flags
if line.startswith('entry-id'):
emptype_flag = False
adid_flag = False
elif line.strip() == "empType: A":
emptype_flag = True
elif line.startswith("ADID") and line.strip().split(':')[1]:
adid_flag = True
# If both conditions hold the increment the counter
# and reset the flags
if emptype_flag and adid_flag:
count = count + 1
emptype_flag = False
adid_flag = False
# Print the number of matches
print(count)
And, while we're at it, how about a pure Bash script? Here's one:
#!/usr/bin/env bash
# getids.bash
while read line; do
if [[ "$line" =~ "entry-id:" ]]; then
entry_id="$line"
emptype=false
adid=false
elif [[ "$line" =~ "empType: A" ]]; then
emptype=true
elif [[ "$line" =~ ADID: [0-9] ]]; then
adid=true
fi
if [[ "$emptype" == true && "$adid" == true ]]; then
echo "$entry_id"
emptype=false
adid=false
fi
done < "$1"
And running the bash
script:
user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3
And finally, here's something using just grep
and wc
:
user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l
2
add a comment |Â
up vote
0
down vote
up vote
0
down vote
I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l
. So here is an awk
script that does just that:
#!/usr/bin/env awk
# getids.awk
BEGIN
RS="";
FS="n"
/ADID: [0-9]/ && /empType: A/print $1
And here it is in action:
user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3
user@host:~$ awk -f getids.awk data.txt | wc -l
2
Of course if you just want the count we can do that too:
#!/usr/bin/env awk
# count.awk
BEGIN
RS="";
FS="n";
count=0;
/ADID: [0-9]/ && /empType: A/count++
END
print count
And because I love Python, here is a Python script that does the same thing:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""
import sys
# Create a list to store the matched records
records =
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, create a new record
if line.startswith('entry-id'):
entry_id = line.split(':')[1].strip()
records.append('entry-id': entry_id)
# For other lines, update the current record
elif line.strip():
key = line.partition(':')[0].strip()
value = line.partition(':')[2].strip()
records[-1][key] = value
# Extract the list of records meeting the desired critera
matches = [record for record in records if record['empType'] == 'A' and record['ADID']]
# Print out the entry-ids for all of the matches
for match in matches:
print('entry-id: ' + match['entry-id'])
And here's the Python script in action:
user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3
user@host:~$ python getids.py data.txt | wc -l
2
And if we really do just want the counts:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""
import sys
# Keep a count of the number of matches
count = 0
# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, reset the flags
if line.startswith('entry-id'):
emptype_flag = False
adid_flag = False
elif line.strip() == "empType: A":
emptype_flag = True
elif line.startswith("ADID") and line.strip().split(':')[1]:
adid_flag = True
# If both conditions hold the increment the counter
# and reset the flags
if emptype_flag and adid_flag:
count = count + 1
emptype_flag = False
adid_flag = False
# Print the number of matches
print(count)
And, while we're at it, how about a pure Bash script? Here's one:
#!/usr/bin/env bash
# getids.bash
while read line; do
if [[ "$line" =~ "entry-id:" ]]; then
entry_id="$line"
emptype=false
adid=false
elif [[ "$line" =~ "empType: A" ]]; then
emptype=true
elif [[ "$line" =~ ADID: [0-9] ]]; then
adid=true
fi
if [[ "$emptype" == true && "$adid" == true ]]; then
echo "$entry_id"
emptype=false
adid=false
fi
done < "$1"
And running the bash
script:
user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3
And finally, here's something using just grep
and wc
:
user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l
2
I like the idea of get the records that satisfy your requirements (better for e.g. testing) and counting them with wc -l
. So here is an awk
script that does just that:
#!/usr/bin/env awk
# getids.awk
BEGIN
RS="";
FS="n"
/ADID: [0-9]/ && /empType: A/print $1
And here it is in action:
user@host:~$ awk -f getids.awk data.txt
entry-id: 1
entry-id: 3
user@host:~$ awk -f getids.awk data.txt | wc -l
2
Of course if you just want the count we can do that too:
#!/usr/bin/env awk
# count.awk
BEGIN
RS="";
FS="n";
count=0;
/ADID: [0-9]/ && /empType: A/count++
END
print count
And because I love Python, here is a Python script that does the same thing:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""getids.py"""
import sys
# Create a list to store the matched records
records =
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, create a new record
if line.startswith('entry-id'):
entry_id = line.split(':')[1].strip()
records.append('entry-id': entry_id)
# For other lines, update the current record
elif line.strip():
key = line.partition(':')[0].strip()
value = line.partition(':')[2].strip()
records[-1][key] = value
# Extract the list of records meeting the desired critera
matches = [record for record in records if record['empType'] == 'A' and record['ADID']]
# Print out the entry-ids for all of the matches
for match in matches:
print('entry-id: ' + match['entry-id'])
And here's the Python script in action:
user@host:~$ python getids.py data.txt
entry-id: 1
entry-id: 3
user@host:~$ python getids.py data.txt | wc -l
2
And if we really do just want the counts:
#!/usr/bin/env python2
# -*- coding: ascii -*-
"""count.py"""
import sys
# Keep a count of the number of matches
count = 0
# Use flags to keep track of the current record
emptype_flag = False
adid_flag = False
# Iterate over the lines of the input file
with open(sys.argv[1]) as data:
for line in data:
# When an "entry-id" is reached, reset the flags
if line.startswith('entry-id'):
emptype_flag = False
adid_flag = False
elif line.strip() == "empType: A":
emptype_flag = True
elif line.startswith("ADID") and line.strip().split(':')[1]:
adid_flag = True
# If both conditions hold the increment the counter
# and reset the flags
if emptype_flag and adid_flag:
count = count + 1
emptype_flag = False
adid_flag = False
# Print the number of matches
print(count)
And, while we're at it, how about a pure Bash script? Here's one:
#!/usr/bin/env bash
# getids.bash
while read line; do
if [[ "$line" =~ "entry-id:" ]]; then
entry_id="$line"
emptype=false
adid=false
elif [[ "$line" =~ "empType: A" ]]; then
emptype=true
elif [[ "$line" =~ ADID: [0-9] ]]; then
adid=true
fi
if [[ "$emptype" == true && "$adid" == true ]]; then
echo "$entry_id"
emptype=false
adid=false
fi
done < "$1"
And running the bash
script:
user@host:~$ bash getids.bash data.txt
entry-id: 1
entry-id: 3
And finally, here's something using just grep
and wc
:
user@host:~$ cat data.txt | grep -A1 'empType: A' | grep "ADID: S" | wc -l
2
edited Dec 14 '17 at 13:36
answered Dec 14 '17 at 5:39
igal
4,830930
4,830930
add a comment |Â
add a comment |Â
up vote
0
down vote
With perl
, that could be:
perl -l -00ne '
my %f = /(.*?):s*(.*)/g;
++$n if $fempType eq "A" && $fADID ne "";
END print 0+$n' < file
-n
causes the code given to-e
to be applied to each input record-00
for records to be paragraphs.- We build a
%f
associative array where key and values are mapped to each(key):spaces(value)
in the record. - and increment
$n
where the conditions are met. - we print
$n
in theEND
(adding0
to make sure we get0
and not an empty string if there's no match).
add a comment |Â
up vote
0
down vote
With perl
, that could be:
perl -l -00ne '
my %f = /(.*?):s*(.*)/g;
++$n if $fempType eq "A" && $fADID ne "";
END print 0+$n' < file
-n
causes the code given to-e
to be applied to each input record-00
for records to be paragraphs.- We build a
%f
associative array where key and values are mapped to each(key):spaces(value)
in the record. - and increment
$n
where the conditions are met. - we print
$n
in theEND
(adding0
to make sure we get0
and not an empty string if there's no match).
add a comment |Â
up vote
0
down vote
up vote
0
down vote
With perl
, that could be:
perl -l -00ne '
my %f = /(.*?):s*(.*)/g;
++$n if $fempType eq "A" && $fADID ne "";
END print 0+$n' < file
-n
causes the code given to-e
to be applied to each input record-00
for records to be paragraphs.- We build a
%f
associative array where key and values are mapped to each(key):spaces(value)
in the record. - and increment
$n
where the conditions are met. - we print
$n
in theEND
(adding0
to make sure we get0
and not an empty string if there's no match).
With perl
, that could be:
perl -l -00ne '
my %f = /(.*?):s*(.*)/g;
++$n if $fempType eq "A" && $fADID ne "";
END print 0+$n' < file
-n
causes the code given to-e
to be applied to each input record-00
for records to be paragraphs.- We build a
%f
associative array where key and values are mapped to each(key):spaces(value)
in the record. - and increment
$n
where the conditions are met. - we print
$n
in theEND
(adding0
to make sure we get0
and not an empty string if there's no match).
edited Dec 14 '17 at 14:57
answered Dec 14 '17 at 14:14
Stéphane Chazelas
282k53520854
282k53520854
add a comment |Â
add a comment |Â
up vote
0
down vote
I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was
perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"
Now I didn't know what perl -000 does, but i think it's saying search multiple lines within a paragraph,
-n while loop
e one line of program??
print paragraph if you find empType: A
now pipe those matched paragraphs to |
grep -i -c "^ADID:" find ignore cased and count number of ADIDs.
I'm not sure if the other commands failed because of my Linux version, but the above command worked pretty well, not sure how to make the empType an ignored case though....
add a comment |Â
up vote
0
down vote
I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was
perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"
Now I didn't know what perl -000 does, but i think it's saying search multiple lines within a paragraph,
-n while loop
e one line of program??
print paragraph if you find empType: A
now pipe those matched paragraphs to |
grep -i -c "^ADID:" find ignore cased and count number of ADIDs.
I'm not sure if the other commands failed because of my Linux version, but the above command worked pretty well, not sure how to make the empType an ignored case though....
add a comment |Â
up vote
0
down vote
up vote
0
down vote
I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was
perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"
Now I didn't know what perl -000 does, but i think it's saying search multiple lines within a paragraph,
-n while loop
e one line of program??
print paragraph if you find empType: A
now pipe those matched paragraphs to |
grep -i -c "^ADID:" find ignore cased and count number of ADIDs.
I'm not sure if the other commands failed because of my Linux version, but the above command worked pretty well, not sure how to make the empType an ignored case though....
I wasn't able to do anything with the -A on a grep, and the other answers returned label too long or some other error.
What I did find that worked was
perl -000 -ne 'print if/empType: A/' file.ldif|grep -i -c "^ADID: [0-9A-Za-z]"
Now I didn't know what perl -000 does, but i think it's saying search multiple lines within a paragraph,
-n while loop
e one line of program??
print paragraph if you find empType: A
now pipe those matched paragraphs to |
grep -i -c "^ADID:" find ignore cased and count number of ADIDs.
I'm not sure if the other commands failed because of my Linux version, but the above command worked pretty well, not sure how to make the empType an ignored case though....
answered Dec 14 '17 at 16:13
King of NES
1163
1163
add a comment |Â
add a comment |Â
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f410792%2fgetting-a-match-count-of-objects-in-a-file%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
What exactly did you try in awk? I would think something like
awk -vRS= '/empType: A/ && /ADID: [0-9]+/ n++ END print n' file
should workâ steeldriver
Dec 14 '17 at 3:39
running your command, I got "awk: record `smapsHistory: [NDSEn...' too long record number 213244" there are only like 100 records with an employeeType of C, and it's going crazy....
â King of NES
Dec 14 '17 at 3:54
You did include the correct filename to read as input?
â bu5hman
Dec 14 '17 at 4:15
it was the correct file...
â King of NES
Dec 14 '17 at 4:18