script to translate xml to json

Clash Royale CLAN TAG#URR8PPP
.everyoneloves__top-leaderboard:empty,.everyoneloves__mid-leaderboard:empty,.everyoneloves__bot-mid-leaderboard:empty margin-bottom:0;
I have 5000 questions in txt file like this:
<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
I want to write a script in Ubuntu to convert all questions like this:
"text":"The question her",
"answer1":"text",
"answer2":"text",
"answer3":"text",
"answer4":"text"
,
ubuntu text-processing scripting xml json
|
show 4 more comments
I have 5000 questions in txt file like this:
<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
I want to write a script in Ubuntu to convert all questions like this:
"text":"The question her",
"answer1":"text",
"answer2":"text",
"answer3":"text",
"answer4":"text"
,
ubuntu text-processing scripting xml json
that's better be done in python perl etc.. than in pure bash. so OS is irrelevant, it's more stack overflow question
– dgan
Mar 7 at 10:11
Is this XML to JSON?
– Arkadiusz Drabczyk
Mar 7 at 10:13
yes it is - Ark
– Silver
Mar 7 at 10:14
What kind of assumptions can you make on the input? Are questions and answers always on separate lines, are the<quiz>and</quiz>tags always on their own on a line. Could the text have some encoding (CDATA,{,é....), could it contain double quotes or backslashes? Is the XML file encoded in UTF-8?
– Stéphane Chazelas
Mar 7 at 10:14
encoding="UTF-8" - all Questions yes - Stephane.
– Silver
Mar 7 at 10:16
|
show 4 more comments
I have 5000 questions in txt file like this:
<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
I want to write a script in Ubuntu to convert all questions like this:
"text":"The question her",
"answer1":"text",
"answer2":"text",
"answer3":"text",
"answer4":"text"
,
ubuntu text-processing scripting xml json
I have 5000 questions in txt file like this:
<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
I want to write a script in Ubuntu to convert all questions like this:
"text":"The question her",
"answer1":"text",
"answer2":"text",
"answer3":"text",
"answer4":"text"
,
ubuntu text-processing scripting xml json
ubuntu text-processing scripting xml json
edited Mar 7 at 10:28
Stéphane Chazelas
313k57592948
313k57592948
asked Mar 7 at 10:09
SilverSilver
1
1
that's better be done in python perl etc.. than in pure bash. so OS is irrelevant, it's more stack overflow question
– dgan
Mar 7 at 10:11
Is this XML to JSON?
– Arkadiusz Drabczyk
Mar 7 at 10:13
yes it is - Ark
– Silver
Mar 7 at 10:14
What kind of assumptions can you make on the input? Are questions and answers always on separate lines, are the<quiz>and</quiz>tags always on their own on a line. Could the text have some encoding (CDATA,{,é....), could it contain double quotes or backslashes? Is the XML file encoded in UTF-8?
– Stéphane Chazelas
Mar 7 at 10:14
encoding="UTF-8" - all Questions yes - Stephane.
– Silver
Mar 7 at 10:16
|
show 4 more comments
that's better be done in python perl etc.. than in pure bash. so OS is irrelevant, it's more stack overflow question
– dgan
Mar 7 at 10:11
Is this XML to JSON?
– Arkadiusz Drabczyk
Mar 7 at 10:13
yes it is - Ark
– Silver
Mar 7 at 10:14
What kind of assumptions can you make on the input? Are questions and answers always on separate lines, are the<quiz>and</quiz>tags always on their own on a line. Could the text have some encoding (CDATA,{,é....), could it contain double quotes or backslashes? Is the XML file encoded in UTF-8?
– Stéphane Chazelas
Mar 7 at 10:14
encoding="UTF-8" - all Questions yes - Stephane.
– Silver
Mar 7 at 10:16
that's better be done in python perl etc.. than in pure bash. so OS is irrelevant, it's more stack overflow question
– dgan
Mar 7 at 10:11
that's better be done in python perl etc.. than in pure bash. so OS is irrelevant, it's more stack overflow question
– dgan
Mar 7 at 10:11
Is this XML to JSON?
– Arkadiusz Drabczyk
Mar 7 at 10:13
Is this XML to JSON?
– Arkadiusz Drabczyk
Mar 7 at 10:13
yes it is - Ark
– Silver
Mar 7 at 10:14
yes it is - Ark
– Silver
Mar 7 at 10:14
What kind of assumptions can you make on the input? Are questions and answers always on separate lines, are the
<quiz> and </quiz> tags always on their own on a line. Could the text have some encoding (CDATA, {, é....), could it contain double quotes or backslashes? Is the XML file encoded in UTF-8?– Stéphane Chazelas
Mar 7 at 10:14
What kind of assumptions can you make on the input? Are questions and answers always on separate lines, are the
<quiz> and </quiz> tags always on their own on a line. Could the text have some encoding (CDATA, {, é....), could it contain double quotes or backslashes? Is the XML file encoded in UTF-8?– Stéphane Chazelas
Mar 7 at 10:14
encoding="UTF-8" - all Questions yes - Stephane.
– Silver
Mar 7 at 10:16
encoding="UTF-8" - all Questions yes - Stephane.
– Silver
Mar 7 at 10:16
|
show 4 more comments
3 Answers
3
active
oldest
votes
I assume your Ubuntu has python installed
#!/usr/bin/python3
import io
import json
import xml.etree.ElementTree
d = """<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
"""
s = io.StringIO(d)
# root = xml.etree.ElementTree.parse("filename_here").getroot()
root = xml.etree.ElementTree.parse(s).getroot()
out =
i = 1
for child in root:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
out[name] = value
print(json.dumps(out))
save it and chmod to executable
you can easily modify to take a file as input instead of just text
EDIT
Okey, this is a more complete script:
#!/usr/bin/python3
import json
import sys
import xml.etree.ElementTree
def read_file(filename):
root = xml.etree.ElementTree.parse(filename).getroot()
return root
# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
i = 1
tmp =
for child in quiz_element:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
tmp[name] = value
out.append(tmp)
def parse_root(root_element, out):
for child in root_element:
if child.tag == 'quiz':
parse_quiz(child, out)
def convert_xml_to_json(filename):
root = read_file(filename)
out =
parse_root(root, out)
print(json.dumps(out))
if __name__ == '__main__':
if len(sys.argv) > 1:
convert_xml_to_json(sys.argv[1])
else:
print("Usage: script <filename_with_xml>")
I made a file with following, which I named xmltest:
<questions>
<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
<quiz>
<que>Question number 1</que>
<ca>blabla</ca>
<ia>stuff</ia>
</quiz>
</questions>
So you have a list of quiz inside some other container.
Now, I launch it like this:$ chmod u+x scratch.py, then scratch.py filenamewithxml
This gives me the answer:
$ ./scratch4.py xmltest
["answer3": "text", "answer2": "text", "question": "The question her", "answer4": "text", "answer1": "text", "answer2": "stuff", "question": "Question number 1", "answer1": "blabla"]
I love this kind of code. Read in XML push out JSON. Wonderful.
– roaima
Mar 7 at 10:30
That only works if the input has just one<quiz>
– Stéphane Chazelas
Mar 7 at 10:36
add a comment |
in fact, you could get away here even w/o Python programming, just using 2 unix utilities:
jtm- that one allows xml <-> json lossless conversionjtc- that one allows manipulating JSONs
Thus, assuming your xml is in file.xml, jtm would convert it to the following json:
bash $ jtm file.xml
[
"quiz": [
"que": "The question her"
,
"ca": "text"
,
"ia": "text"
,
"ia": "text"
,
"ia": "text"
]
]
bash $
and then, applying a series of JSON transformations, you can arrive to a desired result:
bash $ jtm file.xml | jtc -w'<quiz>l:[1:][-2]' -ei echo '"answer[-]"': ; -i'<quiz>l:[1:]' | jtc -w'<quiz>l:[-1][:][0]' -w'<quiz>l:[-1][:]' -s | jtc -w'<quiz>l:' -w'<quiz>l:[0]' -s | jtc -w'<quiz>l: <>v' -u'"text"'
[
"answer1": "text",
"answer2": "text",
"answer3": "text",
"answer4": "text",
"text": "The question her"
]
bash $
Though, due to involved shell scripting (echo command), it'll be slower than Python's - for 5000 questions, I'd expect it would run around a minute. (In the future version of the jtc I plan to allow interpolations even in statically specified JSONs, so that for templating no external shell-scripting would be required, then the operations will be blazing fast)
if you're curious about jtc syntax, you could find a user guide here: https://github.com/ldn-softdev/jtc/blob/master/User%20Guide.md
add a comment |
Thank you dgan, but your code:
1- print the output in the screen not in json file, and not support encoding= utf-8, so i change it:
##!/usr/bin/python3
import json, codecs
import sys
import xml.etree.ElementTree
def read_file(filename):
root = xml.etree.ElementTree.parse(filename).getroot()
return root
# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
i = 1
tmp =
for child in quiz_element:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
tmp[name] = value
out.append(tmp)
def parse_root(root_element, out):
for child in root_element:
if child.tag == 'quiz':
parse_quiz(child, out)
def convert_xml_to_json(filename):
root = read_file(filename)
out =
parse_root(root, out)
with open('data.json', 'w') as outfile:
json.dump(out, codecs.getwriter('utf-8')(outfile), sort_keys=True, ensure_ascii=False)
if __name__ == '__main__':
if len(sys.argv) > 1:
convert_xml_to_json(sys.argv[1])
else:
print("Usage: script <filename_with_xml>")
`
add a comment |
Your Answer
StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "106"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);
StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);
else
createEditor();
);
function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);
);
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f504880%2fscript-to-translate-xml-to-json%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
3 Answers
3
active
oldest
votes
3 Answers
3
active
oldest
votes
active
oldest
votes
active
oldest
votes
I assume your Ubuntu has python installed
#!/usr/bin/python3
import io
import json
import xml.etree.ElementTree
d = """<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
"""
s = io.StringIO(d)
# root = xml.etree.ElementTree.parse("filename_here").getroot()
root = xml.etree.ElementTree.parse(s).getroot()
out =
i = 1
for child in root:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
out[name] = value
print(json.dumps(out))
save it and chmod to executable
you can easily modify to take a file as input instead of just text
EDIT
Okey, this is a more complete script:
#!/usr/bin/python3
import json
import sys
import xml.etree.ElementTree
def read_file(filename):
root = xml.etree.ElementTree.parse(filename).getroot()
return root
# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
i = 1
tmp =
for child in quiz_element:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
tmp[name] = value
out.append(tmp)
def parse_root(root_element, out):
for child in root_element:
if child.tag == 'quiz':
parse_quiz(child, out)
def convert_xml_to_json(filename):
root = read_file(filename)
out =
parse_root(root, out)
print(json.dumps(out))
if __name__ == '__main__':
if len(sys.argv) > 1:
convert_xml_to_json(sys.argv[1])
else:
print("Usage: script <filename_with_xml>")
I made a file with following, which I named xmltest:
<questions>
<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
<quiz>
<que>Question number 1</que>
<ca>blabla</ca>
<ia>stuff</ia>
</quiz>
</questions>
So you have a list of quiz inside some other container.
Now, I launch it like this:$ chmod u+x scratch.py, then scratch.py filenamewithxml
This gives me the answer:
$ ./scratch4.py xmltest
["answer3": "text", "answer2": "text", "question": "The question her", "answer4": "text", "answer1": "text", "answer2": "stuff", "question": "Question number 1", "answer1": "blabla"]
I love this kind of code. Read in XML push out JSON. Wonderful.
– roaima
Mar 7 at 10:30
That only works if the input has just one<quiz>
– Stéphane Chazelas
Mar 7 at 10:36
add a comment |
I assume your Ubuntu has python installed
#!/usr/bin/python3
import io
import json
import xml.etree.ElementTree
d = """<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
"""
s = io.StringIO(d)
# root = xml.etree.ElementTree.parse("filename_here").getroot()
root = xml.etree.ElementTree.parse(s).getroot()
out =
i = 1
for child in root:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
out[name] = value
print(json.dumps(out))
save it and chmod to executable
you can easily modify to take a file as input instead of just text
EDIT
Okey, this is a more complete script:
#!/usr/bin/python3
import json
import sys
import xml.etree.ElementTree
def read_file(filename):
root = xml.etree.ElementTree.parse(filename).getroot()
return root
# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
i = 1
tmp =
for child in quiz_element:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
tmp[name] = value
out.append(tmp)
def parse_root(root_element, out):
for child in root_element:
if child.tag == 'quiz':
parse_quiz(child, out)
def convert_xml_to_json(filename):
root = read_file(filename)
out =
parse_root(root, out)
print(json.dumps(out))
if __name__ == '__main__':
if len(sys.argv) > 1:
convert_xml_to_json(sys.argv[1])
else:
print("Usage: script <filename_with_xml>")
I made a file with following, which I named xmltest:
<questions>
<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
<quiz>
<que>Question number 1</que>
<ca>blabla</ca>
<ia>stuff</ia>
</quiz>
</questions>
So you have a list of quiz inside some other container.
Now, I launch it like this:$ chmod u+x scratch.py, then scratch.py filenamewithxml
This gives me the answer:
$ ./scratch4.py xmltest
["answer3": "text", "answer2": "text", "question": "The question her", "answer4": "text", "answer1": "text", "answer2": "stuff", "question": "Question number 1", "answer1": "blabla"]
I love this kind of code. Read in XML push out JSON. Wonderful.
– roaima
Mar 7 at 10:30
That only works if the input has just one<quiz>
– Stéphane Chazelas
Mar 7 at 10:36
add a comment |
I assume your Ubuntu has python installed
#!/usr/bin/python3
import io
import json
import xml.etree.ElementTree
d = """<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
"""
s = io.StringIO(d)
# root = xml.etree.ElementTree.parse("filename_here").getroot()
root = xml.etree.ElementTree.parse(s).getroot()
out =
i = 1
for child in root:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
out[name] = value
print(json.dumps(out))
save it and chmod to executable
you can easily modify to take a file as input instead of just text
EDIT
Okey, this is a more complete script:
#!/usr/bin/python3
import json
import sys
import xml.etree.ElementTree
def read_file(filename):
root = xml.etree.ElementTree.parse(filename).getroot()
return root
# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
i = 1
tmp =
for child in quiz_element:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
tmp[name] = value
out.append(tmp)
def parse_root(root_element, out):
for child in root_element:
if child.tag == 'quiz':
parse_quiz(child, out)
def convert_xml_to_json(filename):
root = read_file(filename)
out =
parse_root(root, out)
print(json.dumps(out))
if __name__ == '__main__':
if len(sys.argv) > 1:
convert_xml_to_json(sys.argv[1])
else:
print("Usage: script <filename_with_xml>")
I made a file with following, which I named xmltest:
<questions>
<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
<quiz>
<que>Question number 1</que>
<ca>blabla</ca>
<ia>stuff</ia>
</quiz>
</questions>
So you have a list of quiz inside some other container.
Now, I launch it like this:$ chmod u+x scratch.py, then scratch.py filenamewithxml
This gives me the answer:
$ ./scratch4.py xmltest
["answer3": "text", "answer2": "text", "question": "The question her", "answer4": "text", "answer1": "text", "answer2": "stuff", "question": "Question number 1", "answer1": "blabla"]
I assume your Ubuntu has python installed
#!/usr/bin/python3
import io
import json
import xml.etree.ElementTree
d = """<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
"""
s = io.StringIO(d)
# root = xml.etree.ElementTree.parse("filename_here").getroot()
root = xml.etree.ElementTree.parse(s).getroot()
out =
i = 1
for child in root:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
out[name] = value
print(json.dumps(out))
save it and chmod to executable
you can easily modify to take a file as input instead of just text
EDIT
Okey, this is a more complete script:
#!/usr/bin/python3
import json
import sys
import xml.etree.ElementTree
def read_file(filename):
root = xml.etree.ElementTree.parse(filename).getroot()
return root
# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
i = 1
tmp =
for child in quiz_element:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
tmp[name] = value
out.append(tmp)
def parse_root(root_element, out):
for child in root_element:
if child.tag == 'quiz':
parse_quiz(child, out)
def convert_xml_to_json(filename):
root = read_file(filename)
out =
parse_root(root, out)
print(json.dumps(out))
if __name__ == '__main__':
if len(sys.argv) > 1:
convert_xml_to_json(sys.argv[1])
else:
print("Usage: script <filename_with_xml>")
I made a file with following, which I named xmltest:
<questions>
<quiz>
<que>The question her</que>
<ca>text</ca>
<ia>text</ia>
<ia>text</ia>
<ia>text</ia>
</quiz>
<quiz>
<que>Question number 1</que>
<ca>blabla</ca>
<ia>stuff</ia>
</quiz>
</questions>
So you have a list of quiz inside some other container.
Now, I launch it like this:$ chmod u+x scratch.py, then scratch.py filenamewithxml
This gives me the answer:
$ ./scratch4.py xmltest
["answer3": "text", "answer2": "text", "question": "The question her", "answer4": "text", "answer1": "text", "answer2": "stuff", "question": "Question number 1", "answer1": "blabla"]
edited Mar 7 at 11:03
answered Mar 7 at 10:25
dgandgan
13919
13919
I love this kind of code. Read in XML push out JSON. Wonderful.
– roaima
Mar 7 at 10:30
That only works if the input has just one<quiz>
– Stéphane Chazelas
Mar 7 at 10:36
add a comment |
I love this kind of code. Read in XML push out JSON. Wonderful.
– roaima
Mar 7 at 10:30
That only works if the input has just one<quiz>
– Stéphane Chazelas
Mar 7 at 10:36
I love this kind of code. Read in XML push out JSON. Wonderful.
– roaima
Mar 7 at 10:30
I love this kind of code. Read in XML push out JSON. Wonderful.
– roaima
Mar 7 at 10:30
That only works if the input has just one
<quiz>– Stéphane Chazelas
Mar 7 at 10:36
That only works if the input has just one
<quiz>– Stéphane Chazelas
Mar 7 at 10:36
add a comment |
in fact, you could get away here even w/o Python programming, just using 2 unix utilities:
jtm- that one allows xml <-> json lossless conversionjtc- that one allows manipulating JSONs
Thus, assuming your xml is in file.xml, jtm would convert it to the following json:
bash $ jtm file.xml
[
"quiz": [
"que": "The question her"
,
"ca": "text"
,
"ia": "text"
,
"ia": "text"
,
"ia": "text"
]
]
bash $
and then, applying a series of JSON transformations, you can arrive to a desired result:
bash $ jtm file.xml | jtc -w'<quiz>l:[1:][-2]' -ei echo '"answer[-]"': ; -i'<quiz>l:[1:]' | jtc -w'<quiz>l:[-1][:][0]' -w'<quiz>l:[-1][:]' -s | jtc -w'<quiz>l:' -w'<quiz>l:[0]' -s | jtc -w'<quiz>l: <>v' -u'"text"'
[
"answer1": "text",
"answer2": "text",
"answer3": "text",
"answer4": "text",
"text": "The question her"
]
bash $
Though, due to involved shell scripting (echo command), it'll be slower than Python's - for 5000 questions, I'd expect it would run around a minute. (In the future version of the jtc I plan to allow interpolations even in statically specified JSONs, so that for templating no external shell-scripting would be required, then the operations will be blazing fast)
if you're curious about jtc syntax, you could find a user guide here: https://github.com/ldn-softdev/jtc/blob/master/User%20Guide.md
add a comment |
in fact, you could get away here even w/o Python programming, just using 2 unix utilities:
jtm- that one allows xml <-> json lossless conversionjtc- that one allows manipulating JSONs
Thus, assuming your xml is in file.xml, jtm would convert it to the following json:
bash $ jtm file.xml
[
"quiz": [
"que": "The question her"
,
"ca": "text"
,
"ia": "text"
,
"ia": "text"
,
"ia": "text"
]
]
bash $
and then, applying a series of JSON transformations, you can arrive to a desired result:
bash $ jtm file.xml | jtc -w'<quiz>l:[1:][-2]' -ei echo '"answer[-]"': ; -i'<quiz>l:[1:]' | jtc -w'<quiz>l:[-1][:][0]' -w'<quiz>l:[-1][:]' -s | jtc -w'<quiz>l:' -w'<quiz>l:[0]' -s | jtc -w'<quiz>l: <>v' -u'"text"'
[
"answer1": "text",
"answer2": "text",
"answer3": "text",
"answer4": "text",
"text": "The question her"
]
bash $
Though, due to involved shell scripting (echo command), it'll be slower than Python's - for 5000 questions, I'd expect it would run around a minute. (In the future version of the jtc I plan to allow interpolations even in statically specified JSONs, so that for templating no external shell-scripting would be required, then the operations will be blazing fast)
if you're curious about jtc syntax, you could find a user guide here: https://github.com/ldn-softdev/jtc/blob/master/User%20Guide.md
add a comment |
in fact, you could get away here even w/o Python programming, just using 2 unix utilities:
jtm- that one allows xml <-> json lossless conversionjtc- that one allows manipulating JSONs
Thus, assuming your xml is in file.xml, jtm would convert it to the following json:
bash $ jtm file.xml
[
"quiz": [
"que": "The question her"
,
"ca": "text"
,
"ia": "text"
,
"ia": "text"
,
"ia": "text"
]
]
bash $
and then, applying a series of JSON transformations, you can arrive to a desired result:
bash $ jtm file.xml | jtc -w'<quiz>l:[1:][-2]' -ei echo '"answer[-]"': ; -i'<quiz>l:[1:]' | jtc -w'<quiz>l:[-1][:][0]' -w'<quiz>l:[-1][:]' -s | jtc -w'<quiz>l:' -w'<quiz>l:[0]' -s | jtc -w'<quiz>l: <>v' -u'"text"'
[
"answer1": "text",
"answer2": "text",
"answer3": "text",
"answer4": "text",
"text": "The question her"
]
bash $
Though, due to involved shell scripting (echo command), it'll be slower than Python's - for 5000 questions, I'd expect it would run around a minute. (In the future version of the jtc I plan to allow interpolations even in statically specified JSONs, so that for templating no external shell-scripting would be required, then the operations will be blazing fast)
if you're curious about jtc syntax, you could find a user guide here: https://github.com/ldn-softdev/jtc/blob/master/User%20Guide.md
in fact, you could get away here even w/o Python programming, just using 2 unix utilities:
jtm- that one allows xml <-> json lossless conversionjtc- that one allows manipulating JSONs
Thus, assuming your xml is in file.xml, jtm would convert it to the following json:
bash $ jtm file.xml
[
"quiz": [
"que": "The question her"
,
"ca": "text"
,
"ia": "text"
,
"ia": "text"
,
"ia": "text"
]
]
bash $
and then, applying a series of JSON transformations, you can arrive to a desired result:
bash $ jtm file.xml | jtc -w'<quiz>l:[1:][-2]' -ei echo '"answer[-]"': ; -i'<quiz>l:[1:]' | jtc -w'<quiz>l:[-1][:][0]' -w'<quiz>l:[-1][:]' -s | jtc -w'<quiz>l:' -w'<quiz>l:[0]' -s | jtc -w'<quiz>l: <>v' -u'"text"'
[
"answer1": "text",
"answer2": "text",
"answer3": "text",
"answer4": "text",
"text": "The question her"
]
bash $
Though, due to involved shell scripting (echo command), it'll be slower than Python's - for 5000 questions, I'd expect it would run around a minute. (In the future version of the jtc I plan to allow interpolations even in statically specified JSONs, so that for templating no external shell-scripting would be required, then the operations will be blazing fast)
if you're curious about jtc syntax, you could find a user guide here: https://github.com/ldn-softdev/jtc/blob/master/User%20Guide.md
edited Mar 7 at 15:20
answered Mar 7 at 14:10
DmitryDmitry
812
812
add a comment |
add a comment |
Thank you dgan, but your code:
1- print the output in the screen not in json file, and not support encoding= utf-8, so i change it:
##!/usr/bin/python3
import json, codecs
import sys
import xml.etree.ElementTree
def read_file(filename):
root = xml.etree.ElementTree.parse(filename).getroot()
return root
# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
i = 1
tmp =
for child in quiz_element:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
tmp[name] = value
out.append(tmp)
def parse_root(root_element, out):
for child in root_element:
if child.tag == 'quiz':
parse_quiz(child, out)
def convert_xml_to_json(filename):
root = read_file(filename)
out =
parse_root(root, out)
with open('data.json', 'w') as outfile:
json.dump(out, codecs.getwriter('utf-8')(outfile), sort_keys=True, ensure_ascii=False)
if __name__ == '__main__':
if len(sys.argv) > 1:
convert_xml_to_json(sys.argv[1])
else:
print("Usage: script <filename_with_xml>")
`
add a comment |
Thank you dgan, but your code:
1- print the output in the screen not in json file, and not support encoding= utf-8, so i change it:
##!/usr/bin/python3
import json, codecs
import sys
import xml.etree.ElementTree
def read_file(filename):
root = xml.etree.ElementTree.parse(filename).getroot()
return root
# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
i = 1
tmp =
for child in quiz_element:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
tmp[name] = value
out.append(tmp)
def parse_root(root_element, out):
for child in root_element:
if child.tag == 'quiz':
parse_quiz(child, out)
def convert_xml_to_json(filename):
root = read_file(filename)
out =
parse_root(root, out)
with open('data.json', 'w') as outfile:
json.dump(out, codecs.getwriter('utf-8')(outfile), sort_keys=True, ensure_ascii=False)
if __name__ == '__main__':
if len(sys.argv) > 1:
convert_xml_to_json(sys.argv[1])
else:
print("Usage: script <filename_with_xml>")
`
add a comment |
Thank you dgan, but your code:
1- print the output in the screen not in json file, and not support encoding= utf-8, so i change it:
##!/usr/bin/python3
import json, codecs
import sys
import xml.etree.ElementTree
def read_file(filename):
root = xml.etree.ElementTree.parse(filename).getroot()
return root
# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
i = 1
tmp =
for child in quiz_element:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
tmp[name] = value
out.append(tmp)
def parse_root(root_element, out):
for child in root_element:
if child.tag == 'quiz':
parse_quiz(child, out)
def convert_xml_to_json(filename):
root = read_file(filename)
out =
parse_root(root, out)
with open('data.json', 'w') as outfile:
json.dump(out, codecs.getwriter('utf-8')(outfile), sort_keys=True, ensure_ascii=False)
if __name__ == '__main__':
if len(sys.argv) > 1:
convert_xml_to_json(sys.argv[1])
else:
print("Usage: script <filename_with_xml>")
`
Thank you dgan, but your code:
1- print the output in the screen not in json file, and not support encoding= utf-8, so i change it:
##!/usr/bin/python3
import json, codecs
import sys
import xml.etree.ElementTree
def read_file(filename):
root = xml.etree.ElementTree.parse(filename).getroot()
return root
# assule we have a list of <quiz>, contained in some other element
def parse_quiz(quiz_element, out):
i = 1
tmp =
for child in quiz_element:
name, value = child.tag, child.text
if name == 'que':
name = 'question'
else:
name = 'answer%s' % i
i += 1
tmp[name] = value
out.append(tmp)
def parse_root(root_element, out):
for child in root_element:
if child.tag == 'quiz':
parse_quiz(child, out)
def convert_xml_to_json(filename):
root = read_file(filename)
out =
parse_root(root, out)
with open('data.json', 'w') as outfile:
json.dump(out, codecs.getwriter('utf-8')(outfile), sort_keys=True, ensure_ascii=False)
if __name__ == '__main__':
if len(sys.argv) > 1:
convert_xml_to_json(sys.argv[1])
else:
print("Usage: script <filename_with_xml>")
`
edited Mar 7 at 16:11
answered Mar 7 at 15:08
silversilver
133
133
add a comment |
add a comment |
Thanks for contributing an answer to Unix & Linux Stack Exchange!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2funix.stackexchange.com%2fquestions%2f504880%2fscript-to-translate-xml-to-json%23new-answer', 'question_page');
);
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Sign up or log in
StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');
);
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Sign up using Google
Sign up using Facebook
Sign up using Email and Password
Post as a guest
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
Required, but never shown
that's better be done in python perl etc.. than in pure bash. so OS is irrelevant, it's more stack overflow question
– dgan
Mar 7 at 10:11
Is this XML to JSON?
– Arkadiusz Drabczyk
Mar 7 at 10:13
yes it is - Ark
– Silver
Mar 7 at 10:14
What kind of assumptions can you make on the input? Are questions and answers always on separate lines, are the
<quiz>and</quiz>tags always on their own on a line. Could the text have some encoding (CDATA,{,é....), could it contain double quotes or backslashes? Is the XML file encoded in UTF-8?– Stéphane Chazelas
Mar 7 at 10:14
encoding="UTF-8" - all Questions yes - Stephane.
– Silver
Mar 7 at 10:16