Splitting text files based on a semi-regular expression

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
-2
down vote

favorite












I have a number of pretty large text file that I want to split into a bunch of smaller files (the number of files will vary file to file).



All of them follow a the same pattern:



 "id": 999999,
"url": "https://***",
"name": "****",
"name_abbreviation": "****",
"decision_date": "****",
"docket_number": "****",
"first_page": "***",
"last_page": "***",
"citations": [

"type": "***",
"cite": "***"
,

"type": "***",
"cite": "***"

],
"volume":
"url": "https://***",
"volume_number": "**"
,
"reporter":
"url": "***",
"full_name": "***"
,
"court":
"url": "https://***",
"id": ***,
"slug": "***",
"name": "***",
"name_abbreviation": "***"
,
"jurisdiction":
"url": "https://***",
"id": **,
"slug": "**",
"name": "***.",
"name_long": "***",
"whitelisted": ***
,
"casebody":
"status": "ok",
"data":
"attorneys": [
"****",
"***"
],
"opinions": [

"type": "***",
"text": "INSERT MANY LINES OF TEXT",
"author": "***"

],
"judges": [
"***"
],
"parties": [
"***"
],
"head_matter": "***"


},


And then repeats a variable amount of times.



I am trying to split this into each of these repeats in its own new text file. Aka from the first instance of "id": 99999, through the body of text and the final "head_matter" variable, until the next "id": 99999 will come up next.



My problem is that there are 3 '"id": ' patterns, but I only want to split at the first.



[a solution using awk or grep or csplit would be most preferable, this is going in a larger c shell script]










share|improve this question









New contributor




Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Perl is the tool you need. I'm on my phone, will explain more later.
    – waltinator
    3 hours ago






  • 1




    that looks suspiciously like json; is it?
    – Jeff Schaller
    2 hours ago










  • I have a json version! I've been working with the txt version, but I have access to both!
    – Sara Alexandra
    2 hours ago














up vote
-2
down vote

favorite












I have a number of pretty large text file that I want to split into a bunch of smaller files (the number of files will vary file to file).



All of them follow a the same pattern:



 "id": 999999,
"url": "https://***",
"name": "****",
"name_abbreviation": "****",
"decision_date": "****",
"docket_number": "****",
"first_page": "***",
"last_page": "***",
"citations": [

"type": "***",
"cite": "***"
,

"type": "***",
"cite": "***"

],
"volume":
"url": "https://***",
"volume_number": "**"
,
"reporter":
"url": "***",
"full_name": "***"
,
"court":
"url": "https://***",
"id": ***,
"slug": "***",
"name": "***",
"name_abbreviation": "***"
,
"jurisdiction":
"url": "https://***",
"id": **,
"slug": "**",
"name": "***.",
"name_long": "***",
"whitelisted": ***
,
"casebody":
"status": "ok",
"data":
"attorneys": [
"****",
"***"
],
"opinions": [

"type": "***",
"text": "INSERT MANY LINES OF TEXT",
"author": "***"

],
"judges": [
"***"
],
"parties": [
"***"
],
"head_matter": "***"


},


And then repeats a variable amount of times.



I am trying to split this into each of these repeats in its own new text file. Aka from the first instance of "id": 99999, through the body of text and the final "head_matter" variable, until the next "id": 99999 will come up next.



My problem is that there are 3 '"id": ' patterns, but I only want to split at the first.



[a solution using awk or grep or csplit would be most preferable, this is going in a larger c shell script]










share|improve this question









New contributor




Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.



















  • Perl is the tool you need. I'm on my phone, will explain more later.
    – waltinator
    3 hours ago






  • 1




    that looks suspiciously like json; is it?
    – Jeff Schaller
    2 hours ago










  • I have a json version! I've been working with the txt version, but I have access to both!
    – Sara Alexandra
    2 hours ago












up vote
-2
down vote

favorite









up vote
-2
down vote

favorite











I have a number of pretty large text file that I want to split into a bunch of smaller files (the number of files will vary file to file).



All of them follow a the same pattern:



 "id": 999999,
"url": "https://***",
"name": "****",
"name_abbreviation": "****",
"decision_date": "****",
"docket_number": "****",
"first_page": "***",
"last_page": "***",
"citations": [

"type": "***",
"cite": "***"
,

"type": "***",
"cite": "***"

],
"volume":
"url": "https://***",
"volume_number": "**"
,
"reporter":
"url": "***",
"full_name": "***"
,
"court":
"url": "https://***",
"id": ***,
"slug": "***",
"name": "***",
"name_abbreviation": "***"
,
"jurisdiction":
"url": "https://***",
"id": **,
"slug": "**",
"name": "***.",
"name_long": "***",
"whitelisted": ***
,
"casebody":
"status": "ok",
"data":
"attorneys": [
"****",
"***"
],
"opinions": [

"type": "***",
"text": "INSERT MANY LINES OF TEXT",
"author": "***"

],
"judges": [
"***"
],
"parties": [
"***"
],
"head_matter": "***"


},


And then repeats a variable amount of times.



I am trying to split this into each of these repeats in its own new text file. Aka from the first instance of "id": 99999, through the body of text and the final "head_matter" variable, until the next "id": 99999 will come up next.



My problem is that there are 3 '"id": ' patterns, but I only want to split at the first.



[a solution using awk or grep or csplit would be most preferable, this is going in a larger c shell script]










share|improve this question









New contributor




Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











I have a number of pretty large text file that I want to split into a bunch of smaller files (the number of files will vary file to file).



All of them follow a the same pattern:



 "id": 999999,
"url": "https://***",
"name": "****",
"name_abbreviation": "****",
"decision_date": "****",
"docket_number": "****",
"first_page": "***",
"last_page": "***",
"citations": [

"type": "***",
"cite": "***"
,

"type": "***",
"cite": "***"

],
"volume":
"url": "https://***",
"volume_number": "**"
,
"reporter":
"url": "***",
"full_name": "***"
,
"court":
"url": "https://***",
"id": ***,
"slug": "***",
"name": "***",
"name_abbreviation": "***"
,
"jurisdiction":
"url": "https://***",
"id": **,
"slug": "**",
"name": "***.",
"name_long": "***",
"whitelisted": ***
,
"casebody":
"status": "ok",
"data":
"attorneys": [
"****",
"***"
],
"opinions": [

"type": "***",
"text": "INSERT MANY LINES OF TEXT",
"author": "***"

],
"judges": [
"***"
],
"parties": [
"***"
],
"head_matter": "***"


},


And then repeats a variable amount of times.



I am trying to split this into each of these repeats in its own new text file. Aka from the first instance of "id": 99999, through the body of text and the final "head_matter" variable, until the next "id": 99999 will come up next.



My problem is that there are 3 '"id": ' patterns, but I only want to split at the first.



[a solution using awk or grep or csplit would be most preferable, this is going in a larger c shell script]







regular-expression split






share|improve this question









New contributor




Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











share|improve this question









New contributor




Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









share|improve this question




share|improve this question








edited 3 hours ago









Rui F Ribeiro

38k1475123




38k1475123






New contributor




Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.









asked 3 hours ago









Sara Alexandra

1




1




New contributor




Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.





New contributor





Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.






Sara Alexandra is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.











  • Perl is the tool you need. I'm on my phone, will explain more later.
    – waltinator
    3 hours ago






  • 1




    that looks suspiciously like json; is it?
    – Jeff Schaller
    2 hours ago










  • I have a json version! I've been working with the txt version, but I have access to both!
    – Sara Alexandra
    2 hours ago
















  • Perl is the tool you need. I'm on my phone, will explain more later.
    – waltinator
    3 hours ago






  • 1




    that looks suspiciously like json; is it?
    – Jeff Schaller
    2 hours ago










  • I have a json version! I've been working with the txt version, but I have access to both!
    – Sara Alexandra
    2 hours ago















Perl is the tool you need. I'm on my phone, will explain more later.
– waltinator
3 hours ago




Perl is the tool you need. I'm on my phone, will explain more later.
– waltinator
3 hours ago




1




1




that looks suspiciously like json; is it?
– Jeff Schaller
2 hours ago




that looks suspiciously like json; is it?
– Jeff Schaller
2 hours ago












I have a json version! I've been working with the txt version, but I have access to both!
– Sara Alexandra
2 hours ago




I have a json version! I've been working with the txt version, but I have access to both!
– Sara Alexandra
2 hours ago















active

oldest

votes











Your Answer








StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);






Sara Alexandra is a new contributor. Be nice, and check out our Code of Conduct.









 

draft saved


draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f480837%2fsplitting-text-files-based-on-a-semi-regular-expression%23new-answer', 'question_page');

);

Post as a guest



































active

oldest

votes













active

oldest

votes









active

oldest

votes






active

oldest

votes








Sara Alexandra is a new contributor. Be nice, and check out our Code of Conduct.









 

draft saved


draft discarded


















Sara Alexandra is a new contributor. Be nice, and check out our Code of Conduct.












Sara Alexandra is a new contributor. Be nice, and check out our Code of Conduct.











Sara Alexandra is a new contributor. Be nice, and check out our Code of Conduct.













 


draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f480837%2fsplitting-text-files-based-on-a-semi-regular-expression%23new-answer', 'question_page');

);

Post as a guest













































































Popular posts from this blog

Peggy Mitchell

Palaiologos

The Forum (Inglewood, California)