How to parse the file

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP











up vote
-1
down vote

favorite












I have the below file1.txt. What I want to do is to take the value1 until value7 and output it in one row. The value will be scanned between the word "Start" and "End". In case the label/value is missing, the output will show "NA"



Please see the wanted output.txt below.



In short, I want to copy the values between Start and End and output in one line. If value label doesn't exist , the value will show NA. And continously scan the value for another record (Start till End) until enf og the file1.txt.



file1.txt



Start

label1 label2 label3 label4

value1 value2 value3 value4

label5

value5

label6 label7

value6 value7

End


Start

label1 label2 label4

valueA valueB valueD

label5

valueE

label6

valueF

End


Start
.
.
.
End


output.txt



label1 label2 label3 label4 label5 label6 label7

value1 value2 value3 value4 value5 value6 value7

valueA valueB NA valueD valueE valueF NA






share|improve this question





















  • Did you try anything ?
    – Kiwy
    May 9 at 14:09














up vote
-1
down vote

favorite












I have the below file1.txt. What I want to do is to take the value1 until value7 and output it in one row. The value will be scanned between the word "Start" and "End". In case the label/value is missing, the output will show "NA"



Please see the wanted output.txt below.



In short, I want to copy the values between Start and End and output in one line. If value label doesn't exist , the value will show NA. And continously scan the value for another record (Start till End) until enf og the file1.txt.



file1.txt



Start

label1 label2 label3 label4

value1 value2 value3 value4

label5

value5

label6 label7

value6 value7

End


Start

label1 label2 label4

valueA valueB valueD

label5

valueE

label6

valueF

End


Start
.
.
.
End


output.txt



label1 label2 label3 label4 label5 label6 label7

value1 value2 value3 value4 value5 value6 value7

valueA valueB NA valueD valueE valueF NA






share|improve this question





















  • Did you try anything ?
    – Kiwy
    May 9 at 14:09












up vote
-1
down vote

favorite









up vote
-1
down vote

favorite











I have the below file1.txt. What I want to do is to take the value1 until value7 and output it in one row. The value will be scanned between the word "Start" and "End". In case the label/value is missing, the output will show "NA"



Please see the wanted output.txt below.



In short, I want to copy the values between Start and End and output in one line. If value label doesn't exist , the value will show NA. And continously scan the value for another record (Start till End) until enf og the file1.txt.



file1.txt



Start

label1 label2 label3 label4

value1 value2 value3 value4

label5

value5

label6 label7

value6 value7

End


Start

label1 label2 label4

valueA valueB valueD

label5

valueE

label6

valueF

End


Start
.
.
.
End


output.txt



label1 label2 label3 label4 label5 label6 label7

value1 value2 value3 value4 value5 value6 value7

valueA valueB NA valueD valueE valueF NA






share|improve this question













I have the below file1.txt. What I want to do is to take the value1 until value7 and output it in one row. The value will be scanned between the word "Start" and "End". In case the label/value is missing, the output will show "NA"



Please see the wanted output.txt below.



In short, I want to copy the values between Start and End and output in one line. If value label doesn't exist , the value will show NA. And continously scan the value for another record (Start till End) until enf og the file1.txt.



file1.txt



Start

label1 label2 label3 label4

value1 value2 value3 value4

label5

value5

label6 label7

value6 value7

End


Start

label1 label2 label4

valueA valueB valueD

label5

valueE

label6

valueF

End


Start
.
.
.
End


output.txt



label1 label2 label3 label4 label5 label6 label7

value1 value2 value3 value4 value5 value6 value7

valueA valueB NA valueD valueE valueF NA








share|improve this question












share|improve this question




share|improve this question








edited May 9 at 13:41









Jeff Schaller

31.1k846105




31.1k846105









asked May 9 at 13:39









user290080

1




1











  • Did you try anything ?
    – Kiwy
    May 9 at 14:09
















  • Did you try anything ?
    – Kiwy
    May 9 at 14:09















Did you try anything ?
– Kiwy
May 9 at 14:09




Did you try anything ?
– Kiwy
May 9 at 14:09










1 Answer
1






active

oldest

votes

















up vote
0
down vote













This Python script should do what you want:



#!/usr/bin/env python
# -*- encoding: ascii -*-
"""parse.py

Parses a custom-format data-file.
Processes the file first and then prints the results.
"""

import sys

# Read the data from the file
file = open(sys.argv[1], 'r')

# Initialize a dictionary to collect the values for each label
labels =

# Initialize a stack to keep track of block state
stack =

# Initialize a counter to count the number of blocks
block = 0

# Process the file
line = file.readline()
while line:

# Remove white-space
line = line.strip()

# The stack should be empty when we start a new block
if line.lower() == "start":
if stack:
raise Exception("Invalid File Format: Bad Start")
else:
stack.append(line)

# Otherwise the bottom of the stack should be a "Start"
# When we reach the end of a block we empty the stack
# end increment the block counter
elif line.lower() == "end":
if stack[0].lower() != "start":
raise Exception("Invalid File Format: Bad End")
else:
block += 1
stack =

# Other lines should come in consecutive label/value pairs
# i.e. a value row should follow a label row
elif line:

# If there are an odd number of data rows in the stack then
# the current row should be a value row - check that it matches
# the corresponding label row
if len(stack[1:])%2==1:

_labels = stack[-1].split()
_values = line.split()

# Verify that the label row and value row have the same number
# of columns
if len(_labels) == len(_values):
stack.append(line)
for label, value in zip(_labels, _values):

# Add new labels to the labels dictionary
if label not in labels:
labels[label] =
"cols": len(label)


# Add the value for the current block
labels[label][block] = value

# Keep track of the longest value for each label
# so we can format the output later
if len(value) > labels[label]["cols"]:
labels[label]["cols"] = len(value)
else:
raise Exception("Invalid File Format: Label/Value Mismatch")

# If there are an even number of data rows in the stack then
# the current row should be a label row - append it to the stack
else:
stack.append(line)

# Read the next line
line = file.readline()

# Construct the header row
header = ""
for label in labels:
cols = labels[label]["cols"]
header += "0: <width".format(label, width=cols+1)

# Construct the data rows
rows =
for i in range(0, block):
row = ""
for label in labels:
cols = labels[label]["cols"]
row += "0: <width".format(labels[label].get(i, "NA"), width=cols+1)
rows.append(row)

# Print the results
print(header)
for row in rows:
print(row)


You can run it like this:



python parse.py file1.txt


It produces the following output on your example data:



label1 label2 label3 label4 label5 label6 label7
value1 value2 value3 value4 value5 value6 value7
valueA valueB NA valueD valueE valueF NA





share|improve this answer





















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "106"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: false,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: null,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f442771%2fhow-to-parse-the-file%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    0
    down vote













    This Python script should do what you want:



    #!/usr/bin/env python
    # -*- encoding: ascii -*-
    """parse.py

    Parses a custom-format data-file.
    Processes the file first and then prints the results.
    """

    import sys

    # Read the data from the file
    file = open(sys.argv[1], 'r')

    # Initialize a dictionary to collect the values for each label
    labels =

    # Initialize a stack to keep track of block state
    stack =

    # Initialize a counter to count the number of blocks
    block = 0

    # Process the file
    line = file.readline()
    while line:

    # Remove white-space
    line = line.strip()

    # The stack should be empty when we start a new block
    if line.lower() == "start":
    if stack:
    raise Exception("Invalid File Format: Bad Start")
    else:
    stack.append(line)

    # Otherwise the bottom of the stack should be a "Start"
    # When we reach the end of a block we empty the stack
    # end increment the block counter
    elif line.lower() == "end":
    if stack[0].lower() != "start":
    raise Exception("Invalid File Format: Bad End")
    else:
    block += 1
    stack =

    # Other lines should come in consecutive label/value pairs
    # i.e. a value row should follow a label row
    elif line:

    # If there are an odd number of data rows in the stack then
    # the current row should be a value row - check that it matches
    # the corresponding label row
    if len(stack[1:])%2==1:

    _labels = stack[-1].split()
    _values = line.split()

    # Verify that the label row and value row have the same number
    # of columns
    if len(_labels) == len(_values):
    stack.append(line)
    for label, value in zip(_labels, _values):

    # Add new labels to the labels dictionary
    if label not in labels:
    labels[label] =
    "cols": len(label)


    # Add the value for the current block
    labels[label][block] = value

    # Keep track of the longest value for each label
    # so we can format the output later
    if len(value) > labels[label]["cols"]:
    labels[label]["cols"] = len(value)
    else:
    raise Exception("Invalid File Format: Label/Value Mismatch")

    # If there are an even number of data rows in the stack then
    # the current row should be a label row - append it to the stack
    else:
    stack.append(line)

    # Read the next line
    line = file.readline()

    # Construct the header row
    header = ""
    for label in labels:
    cols = labels[label]["cols"]
    header += "0: <width".format(label, width=cols+1)

    # Construct the data rows
    rows =
    for i in range(0, block):
    row = ""
    for label in labels:
    cols = labels[label]["cols"]
    row += "0: <width".format(labels[label].get(i, "NA"), width=cols+1)
    rows.append(row)

    # Print the results
    print(header)
    for row in rows:
    print(row)


    You can run it like this:



    python parse.py file1.txt


    It produces the following output on your example data:



    label1 label2 label3 label4 label5 label6 label7
    value1 value2 value3 value4 value5 value6 value7
    valueA valueB NA valueD valueE valueF NA





    share|improve this answer

























      up vote
      0
      down vote













      This Python script should do what you want:



      #!/usr/bin/env python
      # -*- encoding: ascii -*-
      """parse.py

      Parses a custom-format data-file.
      Processes the file first and then prints the results.
      """

      import sys

      # Read the data from the file
      file = open(sys.argv[1], 'r')

      # Initialize a dictionary to collect the values for each label
      labels =

      # Initialize a stack to keep track of block state
      stack =

      # Initialize a counter to count the number of blocks
      block = 0

      # Process the file
      line = file.readline()
      while line:

      # Remove white-space
      line = line.strip()

      # The stack should be empty when we start a new block
      if line.lower() == "start":
      if stack:
      raise Exception("Invalid File Format: Bad Start")
      else:
      stack.append(line)

      # Otherwise the bottom of the stack should be a "Start"
      # When we reach the end of a block we empty the stack
      # end increment the block counter
      elif line.lower() == "end":
      if stack[0].lower() != "start":
      raise Exception("Invalid File Format: Bad End")
      else:
      block += 1
      stack =

      # Other lines should come in consecutive label/value pairs
      # i.e. a value row should follow a label row
      elif line:

      # If there are an odd number of data rows in the stack then
      # the current row should be a value row - check that it matches
      # the corresponding label row
      if len(stack[1:])%2==1:

      _labels = stack[-1].split()
      _values = line.split()

      # Verify that the label row and value row have the same number
      # of columns
      if len(_labels) == len(_values):
      stack.append(line)
      for label, value in zip(_labels, _values):

      # Add new labels to the labels dictionary
      if label not in labels:
      labels[label] =
      "cols": len(label)


      # Add the value for the current block
      labels[label][block] = value

      # Keep track of the longest value for each label
      # so we can format the output later
      if len(value) > labels[label]["cols"]:
      labels[label]["cols"] = len(value)
      else:
      raise Exception("Invalid File Format: Label/Value Mismatch")

      # If there are an even number of data rows in the stack then
      # the current row should be a label row - append it to the stack
      else:
      stack.append(line)

      # Read the next line
      line = file.readline()

      # Construct the header row
      header = ""
      for label in labels:
      cols = labels[label]["cols"]
      header += "0: <width".format(label, width=cols+1)

      # Construct the data rows
      rows =
      for i in range(0, block):
      row = ""
      for label in labels:
      cols = labels[label]["cols"]
      row += "0: <width".format(labels[label].get(i, "NA"), width=cols+1)
      rows.append(row)

      # Print the results
      print(header)
      for row in rows:
      print(row)


      You can run it like this:



      python parse.py file1.txt


      It produces the following output on your example data:



      label1 label2 label3 label4 label5 label6 label7
      value1 value2 value3 value4 value5 value6 value7
      valueA valueB NA valueD valueE valueF NA





      share|improve this answer























        up vote
        0
        down vote










        up vote
        0
        down vote









        This Python script should do what you want:



        #!/usr/bin/env python
        # -*- encoding: ascii -*-
        """parse.py

        Parses a custom-format data-file.
        Processes the file first and then prints the results.
        """

        import sys

        # Read the data from the file
        file = open(sys.argv[1], 'r')

        # Initialize a dictionary to collect the values for each label
        labels =

        # Initialize a stack to keep track of block state
        stack =

        # Initialize a counter to count the number of blocks
        block = 0

        # Process the file
        line = file.readline()
        while line:

        # Remove white-space
        line = line.strip()

        # The stack should be empty when we start a new block
        if line.lower() == "start":
        if stack:
        raise Exception("Invalid File Format: Bad Start")
        else:
        stack.append(line)

        # Otherwise the bottom of the stack should be a "Start"
        # When we reach the end of a block we empty the stack
        # end increment the block counter
        elif line.lower() == "end":
        if stack[0].lower() != "start":
        raise Exception("Invalid File Format: Bad End")
        else:
        block += 1
        stack =

        # Other lines should come in consecutive label/value pairs
        # i.e. a value row should follow a label row
        elif line:

        # If there are an odd number of data rows in the stack then
        # the current row should be a value row - check that it matches
        # the corresponding label row
        if len(stack[1:])%2==1:

        _labels = stack[-1].split()
        _values = line.split()

        # Verify that the label row and value row have the same number
        # of columns
        if len(_labels) == len(_values):
        stack.append(line)
        for label, value in zip(_labels, _values):

        # Add new labels to the labels dictionary
        if label not in labels:
        labels[label] =
        "cols": len(label)


        # Add the value for the current block
        labels[label][block] = value

        # Keep track of the longest value for each label
        # so we can format the output later
        if len(value) > labels[label]["cols"]:
        labels[label]["cols"] = len(value)
        else:
        raise Exception("Invalid File Format: Label/Value Mismatch")

        # If there are an even number of data rows in the stack then
        # the current row should be a label row - append it to the stack
        else:
        stack.append(line)

        # Read the next line
        line = file.readline()

        # Construct the header row
        header = ""
        for label in labels:
        cols = labels[label]["cols"]
        header += "0: <width".format(label, width=cols+1)

        # Construct the data rows
        rows =
        for i in range(0, block):
        row = ""
        for label in labels:
        cols = labels[label]["cols"]
        row += "0: <width".format(labels[label].get(i, "NA"), width=cols+1)
        rows.append(row)

        # Print the results
        print(header)
        for row in rows:
        print(row)


        You can run it like this:



        python parse.py file1.txt


        It produces the following output on your example data:



        label1 label2 label3 label4 label5 label6 label7
        value1 value2 value3 value4 value5 value6 value7
        valueA valueB NA valueD valueE valueF NA





        share|improve this answer













        This Python script should do what you want:



        #!/usr/bin/env python
        # -*- encoding: ascii -*-
        """parse.py

        Parses a custom-format data-file.
        Processes the file first and then prints the results.
        """

        import sys

        # Read the data from the file
        file = open(sys.argv[1], 'r')

        # Initialize a dictionary to collect the values for each label
        labels =

        # Initialize a stack to keep track of block state
        stack =

        # Initialize a counter to count the number of blocks
        block = 0

        # Process the file
        line = file.readline()
        while line:

        # Remove white-space
        line = line.strip()

        # The stack should be empty when we start a new block
        if line.lower() == "start":
        if stack:
        raise Exception("Invalid File Format: Bad Start")
        else:
        stack.append(line)

        # Otherwise the bottom of the stack should be a "Start"
        # When we reach the end of a block we empty the stack
        # end increment the block counter
        elif line.lower() == "end":
        if stack[0].lower() != "start":
        raise Exception("Invalid File Format: Bad End")
        else:
        block += 1
        stack =

        # Other lines should come in consecutive label/value pairs
        # i.e. a value row should follow a label row
        elif line:

        # If there are an odd number of data rows in the stack then
        # the current row should be a value row - check that it matches
        # the corresponding label row
        if len(stack[1:])%2==1:

        _labels = stack[-1].split()
        _values = line.split()

        # Verify that the label row and value row have the same number
        # of columns
        if len(_labels) == len(_values):
        stack.append(line)
        for label, value in zip(_labels, _values):

        # Add new labels to the labels dictionary
        if label not in labels:
        labels[label] =
        "cols": len(label)


        # Add the value for the current block
        labels[label][block] = value

        # Keep track of the longest value for each label
        # so we can format the output later
        if len(value) > labels[label]["cols"]:
        labels[label]["cols"] = len(value)
        else:
        raise Exception("Invalid File Format: Label/Value Mismatch")

        # If there are an even number of data rows in the stack then
        # the current row should be a label row - append it to the stack
        else:
        stack.append(line)

        # Read the next line
        line = file.readline()

        # Construct the header row
        header = ""
        for label in labels:
        cols = labels[label]["cols"]
        header += "0: <width".format(label, width=cols+1)

        # Construct the data rows
        rows =
        for i in range(0, block):
        row = ""
        for label in labels:
        cols = labels[label]["cols"]
        row += "0: <width".format(labels[label].get(i, "NA"), width=cols+1)
        rows.append(row)

        # Print the results
        print(header)
        for row in rows:
        print(row)


        You can run it like this:



        python parse.py file1.txt


        It produces the following output on your example data:



        label1 label2 label3 label4 label5 label6 label7
        value1 value2 value3 value4 value5 value6 value7
        valueA valueB NA valueD valueE valueF NA






        share|improve this answer













        share|improve this answer



        share|improve this answer











        answered May 10 at 0:01









        igal

        4,785930




        4,785930






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f442771%2fhow-to-parse-the-file%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            How to check contact read email or not when send email to Individual?

            Displaying single band from multi-band raster using QGIS

            How many registers does an x86_64 CPU actually have?