Splitting text files based on a semi-regular expression

Clash Royale CLAN TAG#URR8PPP
up vote
-2
down vote
favorite
I have a number of pretty large text file that I want to split into a bunch of smaller files (the number of files will vary file to file).
All of them follow a the same pattern:
"id": 999999,
"url": "https://***",
"name": "****",
"name_abbreviation": "****",
"decision_date": "****",
"docket_number": "****",
"first_page": "***",
"last_page": "***",
"citations": [
"type": "***",
"cite": "***"
,
"type": "***",
"cite": "***"
],
"volume":
"url": "https://***",
"volume_number": "**"
,
"reporter":
"url": "***",
"full_name": "***"
,
"court":
"url": "https://***",
"id": ***,
"slug": "***",
"name": "***",
"name_abbreviation": "***"
,
"jurisdiction":
"url": "https://***",
"id": **,
"slug": "**",
"name": "***.",
"name_long": "***",
"whitelisted": ***
,
"casebody":
"status": "ok",
"data":
"attorneys": [
"****",
"***"
],
"opinions": [
"type": "***",
"text": "INSERT MANY LINES OF TEXT",
"author": "***"
],
"judges": [
"***"
],
"parties": [
"***"
],
"head_matter": "***"
},
And then repeats a variable amount of times.
I am trying to split this into each of these repeats in its own new text file. Aka from the first instance of "id": 99999, through the body of text and the final "head_matter" variable, until the next "id": 99999 will come up next.
My problem is that there are 3 '"id": ' patterns, but I only want to split at the first.
[a solution using awk or grep or csplit would be most preferable, this is going in a larger c shell script]
regular-expression split
New contributor
Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
add a comment |
up vote
-2
down vote
favorite
I have a number of pretty large text file that I want to split into a bunch of smaller files (the number of files will vary file to file).
All of them follow a the same pattern:
"id": 999999,
"url": "https://***",
"name": "****",
"name_abbreviation": "****",
"decision_date": "****",
"docket_number": "****",
"first_page": "***",
"last_page": "***",
"citations": [
"type": "***",
"cite": "***"
,
"type": "***",
"cite": "***"
],
"volume":
"url": "https://***",
"volume_number": "**"
,
"reporter":
"url": "***",
"full_name": "***"
,
"court":
"url": "https://***",
"id": ***,
"slug": "***",
"name": "***",
"name_abbreviation": "***"
,
"jurisdiction":
"url": "https://***",
"id": **,
"slug": "**",
"name": "***.",
"name_long": "***",
"whitelisted": ***
,
"casebody":
"status": "ok",
"data":
"attorneys": [
"****",
"***"
],
"opinions": [
"type": "***",
"text": "INSERT MANY LINES OF TEXT",
"author": "***"
],
"judges": [
"***"
],
"parties": [
"***"
],
"head_matter": "***"
},
And then repeats a variable amount of times.
I am trying to split this into each of these repeats in its own new text file. Aka from the first instance of "id": 99999, through the body of text and the final "head_matter" variable, until the next "id": 99999 will come up next.
My problem is that there are 3 '"id": ' patterns, but I only want to split at the first.
[a solution using awk or grep or csplit would be most preferable, this is going in a larger c shell script]
regular-expression split
New contributor
Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Perl is the tool you need. I'm on my phone, will explain more later.
– waltinator
3 hours ago
1
that looks suspiciously like json; is it?
– Jeff Schaller
2 hours ago
I have a json version! I've been working with the txt version, but I have access to both!
– Sara Alexandra
2 hours ago
add a comment |
up vote
-2
down vote
favorite
up vote
-2
down vote
favorite
I have a number of pretty large text file that I want to split into a bunch of smaller files (the number of files will vary file to file).
All of them follow a the same pattern:
"id": 999999,
"url": "https://***",
"name": "****",
"name_abbreviation": "****",
"decision_date": "****",
"docket_number": "****",
"first_page": "***",
"last_page": "***",
"citations": [
"type": "***",
"cite": "***"
,
"type": "***",
"cite": "***"
],
"volume":
"url": "https://***",
"volume_number": "**"
,
"reporter":
"url": "***",
"full_name": "***"
,
"court":
"url": "https://***",
"id": ***,
"slug": "***",
"name": "***",
"name_abbreviation": "***"
,
"jurisdiction":
"url": "https://***",
"id": **,
"slug": "**",
"name": "***.",
"name_long": "***",
"whitelisted": ***
,
"casebody":
"status": "ok",
"data":
"attorneys": [
"****",
"***"
],
"opinions": [
"type": "***",
"text": "INSERT MANY LINES OF TEXT",
"author": "***"
],
"judges": [
"***"
],
"parties": [
"***"
],
"head_matter": "***"
},
And then repeats a variable amount of times.
I am trying to split this into each of these repeats in its own new text file. Aka from the first instance of "id": 99999, through the body of text and the final "head_matter" variable, until the next "id": 99999 will come up next.
My problem is that there are 3 '"id": ' patterns, but I only want to split at the first.
[a solution using awk or grep or csplit would be most preferable, this is going in a larger c shell script]
regular-expression split
New contributor
Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
I have a number of pretty large text file that I want to split into a bunch of smaller files (the number of files will vary file to file).
All of them follow a the same pattern:
"id": 999999,
"url": "https://***",
"name": "****",
"name_abbreviation": "****",
"decision_date": "****",
"docket_number": "****",
"first_page": "***",
"last_page": "***",
"citations": [
"type": "***",
"cite": "***"
,
"type": "***",
"cite": "***"
],
"volume":
"url": "https://***",
"volume_number": "**"
,
"reporter":
"url": "***",
"full_name": "***"
,
"court":
"url": "https://***",
"id": ***,
"slug": "***",
"name": "***",
"name_abbreviation": "***"
,
"jurisdiction":
"url": "https://***",
"id": **,
"slug": "**",
"name": "***.",
"name_long": "***",
"whitelisted": ***
,
"casebody":
"status": "ok",
"data":
"attorneys": [
"****",
"***"
],
"opinions": [
"type": "***",
"text": "INSERT MANY LINES OF TEXT",
"author": "***"
],
"judges": [
"***"
],
"parties": [
"***"
],
"head_matter": "***"
},
And then repeats a variable amount of times.
I am trying to split this into each of these repeats in its own new text file. Aka from the first instance of "id": 99999, through the body of text and the final "head_matter" variable, until the next "id": 99999 will come up next.
My problem is that there are 3 '"id": ' patterns, but I only want to split at the first.
[a solution using awk or grep or csplit would be most preferable, this is going in a larger c shell script]
regular-expression split
regular-expression split
New contributor
Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
edited 3 hours ago
Rui F Ribeiro
38k1475123
38k1475123
New contributor
Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
asked 3 hours ago
Sara Alexandra
1
1
New contributor
Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
New contributor
Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
Perl is the tool you need. I'm on my phone, will explain more later.
– waltinator
3 hours ago
1
that looks suspiciously like json; is it?
– Jeff Schaller
2 hours ago
I have a json version! I've been working with the txt version, but I have access to both!
– Sara Alexandra
2 hours ago
add a comment |
Perl is the tool you need. I'm on my phone, will explain more later.
– waltinator
3 hours ago
1
that looks suspiciously like json; is it?
– Jeff Schaller
2 hours ago
I have a json version! I've been working with the txt version, but I have access to both!
– Sara Alexandra
2 hours ago
Perl is the tool you need. I'm on my phone, will explain more later.
– waltinator
3 hours ago
Perl is the tool you need. I'm on my phone, will explain more later.
– waltinator
3 hours ago
1
1
that looks suspiciously like json; is it?
– Jeff Schaller
2 hours ago
that looks suspiciously like json; is it?
– Jeff Schaller
2 hours ago
I have a json version! I've been working with the txt version, but I have access to both!
– Sara Alexandra
2 hours ago
I have a json version! I've been working with the txt version, but I have access to both!
– Sara Alexandra
2 hours ago
add a comment |
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
active
oldest
votes
Sara Alexandra is a new contributor. Be nice, and check out our Code of Conduct.
Sara Alexandra is a new contributor. Be nice, and check out our Code of Conduct.
Sara Alexandra is a new contributor. Be nice, and check out our Code of Conduct.
Sara Alexandra is a new contributor. Be nice, and check out our Code of Conduct.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f480837%2fsplitting-text-files-based-on-a-semi-regular-expression%23new-answer', 'question_page');
);
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Perl is the tool you need. I'm on my phone, will explain more later.
– waltinator
3 hours ago
1
that looks suspiciously like json; is it?
– Jeff Schaller
2 hours ago
I have a json version! I've been working with the txt version, but I have access to both!
– Sara Alexandra
2 hours ago