Parsing a string of key-value pairs as a dictionary

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP












6












$begingroup$


I always use nested list and dictionary comprehension for unstructured data and this is a common way I use it.



In [14]: data = """
41:n
43:n
44:n
46:n
47:n
49:n
50:n
51:n
52:n
53:n
54:n
55:cm
56:n
57:n
58:n"""
In [15]: int(line.split(":")[0]):line.split(":")[1] for line in data.split("n") if len(line.split(":"))==2
Out [15]:
41: 'n',
43: 'n',
44: 'n',
46: 'n',
47: 'n',
49: 'n',
50: 'n',
51: 'n',
52: 'n',
53: 'n',
54: 'n',
55: 'cm',
56: 'n',
57: 'n',
58: 'n'


Here I am doing line.split(":")[0] three times. Is there any better way to do this?










share|improve this question











$endgroup$







  • 1




    $begingroup$
    This would benefit from a better description of "unstructured data". The presented example is very well structured and could be eval'd as a dict with only minor changes.
    $endgroup$
    – TemporalWolf
    Mar 1 at 18:48















6












$begingroup$


I always use nested list and dictionary comprehension for unstructured data and this is a common way I use it.



In [14]: data = """
41:n
43:n
44:n
46:n
47:n
49:n
50:n
51:n
52:n
53:n
54:n
55:cm
56:n
57:n
58:n"""
In [15]: int(line.split(":")[0]):line.split(":")[1] for line in data.split("n") if len(line.split(":"))==2
Out [15]:
41: 'n',
43: 'n',
44: 'n',
46: 'n',
47: 'n',
49: 'n',
50: 'n',
51: 'n',
52: 'n',
53: 'n',
54: 'n',
55: 'cm',
56: 'n',
57: 'n',
58: 'n'


Here I am doing line.split(":")[0] three times. Is there any better way to do this?










share|improve this question











$endgroup$







  • 1




    $begingroup$
    This would benefit from a better description of "unstructured data". The presented example is very well structured and could be eval'd as a dict with only minor changes.
    $endgroup$
    – TemporalWolf
    Mar 1 at 18:48













6












6








6





$begingroup$


I always use nested list and dictionary comprehension for unstructured data and this is a common way I use it.



In [14]: data = """
41:n
43:n
44:n
46:n
47:n
49:n
50:n
51:n
52:n
53:n
54:n
55:cm
56:n
57:n
58:n"""
In [15]: int(line.split(":")[0]):line.split(":")[1] for line in data.split("n") if len(line.split(":"))==2
Out [15]:
41: 'n',
43: 'n',
44: 'n',
46: 'n',
47: 'n',
49: 'n',
50: 'n',
51: 'n',
52: 'n',
53: 'n',
54: 'n',
55: 'cm',
56: 'n',
57: 'n',
58: 'n'


Here I am doing line.split(":")[0] three times. Is there any better way to do this?










share|improve this question











$endgroup$




I always use nested list and dictionary comprehension for unstructured data and this is a common way I use it.



In [14]: data = """
41:n
43:n
44:n
46:n
47:n
49:n
50:n
51:n
52:n
53:n
54:n
55:cm
56:n
57:n
58:n"""
In [15]: int(line.split(":")[0]):line.split(":")[1] for line in data.split("n") if len(line.split(":"))==2
Out [15]:
41: 'n',
43: 'n',
44: 'n',
46: 'n',
47: 'n',
49: 'n',
50: 'n',
51: 'n',
52: 'n',
53: 'n',
54: 'n',
55: 'cm',
56: 'n',
57: 'n',
58: 'n'


Here I am doing line.split(":")[0] three times. Is there any better way to do this?







python python-3.x parsing dictionary






share|improve this question















share|improve this question













share|improve this question




share|improve this question








edited Mar 1 at 8:15









200_success

130k17155419




130k17155419










asked Mar 1 at 4:15









Rahul PatelRahul Patel

289413




289413







  • 1




    $begingroup$
    This would benefit from a better description of "unstructured data". The presented example is very well structured and could be eval'd as a dict with only minor changes.
    $endgroup$
    – TemporalWolf
    Mar 1 at 18:48












  • 1




    $begingroup$
    This would benefit from a better description of "unstructured data". The presented example is very well structured and could be eval'd as a dict with only minor changes.
    $endgroup$
    – TemporalWolf
    Mar 1 at 18:48







1




1




$begingroup$
This would benefit from a better description of "unstructured data". The presented example is very well structured and could be eval'd as a dict with only minor changes.
$endgroup$
– TemporalWolf
Mar 1 at 18:48




$begingroup$
This would benefit from a better description of "unstructured data". The presented example is very well structured and could be eval'd as a dict with only minor changes.
$endgroup$
– TemporalWolf
Mar 1 at 18:48










4 Answers
4






active

oldest

votes


















9












$begingroup$

Your string looks very similar to the YAML syntax. Indeed it is almost valid syntax for an associative list, there are only spaces missing after the :. So, why not use a YAML parser?



import yaml

data = """
41:n
43:n
44:n
46:n
47:n
49:n
50:n
51:n
52:n
53:n
54:n
55:cm
56:n
57:n
58:n"""

print(yaml.load(data.replace(":", ": ")))
# 41: 'n',
# 43: 'n',
# 44: 'n',
# 46: 'n',
# 47: 'n',
# 49: 'n',
# 50: 'n',
# 51: 'n',
# 52: 'n',
# 53: 'n',
# 54: 'n',
# 55: 'cm',
# 56: 'n',
# 57: 'n',
# 58: 'n'


You might have to install it first, which you can do via pip install yaml.






share|improve this answer









$endgroup$












  • $begingroup$
    Thanks. This is wonderful.
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:49






  • 1




    $begingroup$
    what I believe is whenever you have parser like csv, xml, json, yaml, config or any parser, you should use them first. therefore I accept this as an answer. I hear about yaml but never cared. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 2 at 10:08


















10












$begingroup$

Note it is much easier to read if you chop up the comprehension into blocks, instead of having them all on one line



You could use unpacking to remove some usages of line.split



>>> 
... int(k): v
... for line in data.split()
... for k, v in (line.split(':'),)
...
41: 'n', 43: 'n', 44: 'n', 46: 'n', 47: 'n', 49: 'n', 50: 'n', 51: 'n', 52: 'n', 53: 'n', 54: 'n', 55: 'cm', 56: 'n', 57: 'n', 58: 'n'


Or if the first argument can be of str type you could use dict().



This will unpack the line.split and convert them into a key, value pair for you



>>> dict(
... line.split(':')
... for line in data.split()
... )
'41': 'n', '43': 'n', '44': 'n', '46': 'n', '47': 'n', '49': 'n', '50': 'n', '51': 'n', '52': 'n', '53': 'n', '54': 'n', '55': 'cm', '56': 'n', '57': 'n', '58': 'n'





share|improve this answer











$endgroup$












  • $begingroup$
    This is great. This will be very much helpful in the future.
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:58


















9












$begingroup$

You have too much logic in the dict comprehension:




int(line.split(":")[0]):line.split(":")[1] for line in data.split("n") if len(line.split(":"))==2



First of all, let's expand it to a normal for-loop:



>>> result = 
>>> for line in data.split("n"):
... if len(line.split(":"))==2:
... result[int(line.split(":")[0])] = line.split(":")[1]
>>> result


I can see that you use the following check if len(line.split(":"))==2: to eliminate the first blank space from the data.split("n"):



>>> data.split("n")
['',
'41:n',
'43:n',
...
'58:n']


But the docs for str.split advice to use str.split() without specifying a sep parameter if you wanna discard the empty string at the beginning:



>>> data.split()
['41:n',
'43:n',
...
'58:n']


So, now we can remove unnecessary check from your code:



>>> result = 
>>> for line in data.split():
... result[int(line.split(":")[0])] = line.split(":")[1]
>>> result


Here you calculate line.split(":") twice. Take it out:



>>> result = 
>>> for line in data.split():
... key, value = line.split(":")
... result[int(key)] = value
>>> result


This is the most basic version. Don't put it back to a dict comprehension as it will still look quite complex. But you could make a function out of it. For example, something like this:



>>> def to_key_value(line, sep=':'):
... key, value = line.split(sep)
... return int(key), value

>>> dict(map(to_key_value, data.split()))
41: 'n',
43: 'n',
...
58: 'n'


Another option that I came up with:



>>> from functools import partial
>>> lines = data.split()
>>> split_by_colon = partial(str.split, sep=':')
>>> key_value_pairs = map(split_by_colon, lines)
>>> int(key): value for key, value in key_value_pairs
41: 'n',
43: 'n',
...
58: 'n'


Also, if you don't want to keep in memory a list of results from data.split, you might find this helpful: Is there a generator version of string.split() in Python?






share|improve this answer









$endgroup$












  • $begingroup$
    I said I want solution for list/dict comprehension. Your solution is nice but looks ugly. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 1 at 12:09






  • 2




    $begingroup$
    @RahulPatel: You might want to learn constructive criticism and some diplomacy ;) Georgy was nice enough to spend time on your problem...
    $endgroup$
    – Eric Duminil
    Mar 1 at 21:01










  • $begingroup$
    @EricDuminil Totally agree. I really apologize here. He has really spend time for me who don't deserve. Thanks Georgy
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:47











  • $begingroup$
    @RahulPatel No worries! I didn't understand from the question that you wanted solution only with dict comprehension. In any case, if my approach didn't work well for you, it may work for someone else, considering that the problem is pretty generic.
    $endgroup$
    – Georgy
    Mar 2 at 9:06










  • $begingroup$
    Thank you @Georgy. I am a very basic scripter who actually don't understands advanced solution like one of yours. once I understand enough, this might come in handy. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 2 at 10:03


















8












$begingroup$

There's nothing wrong with the solution you have come with, but if you want an alternative, regex might come in handy here:



In [10]: import re
In [11]: data = """
...: 41:n
...: 43:n
...: 44:n
...: 46:n
...: 47:n
...: 49:n
...: 50:n
...: 51:n
...: 52:n
...: 53:n
...: 54:n
...: 55:cm
...: 56:n
...: 57:n
...: 58:n"""

In [12]: dict(re.findall(r'(d+):(.*)', data))
Out[12]:
'41': 'n',
'43': 'n',
'44': 'n',
'46': 'n',
'47': 'n',
'49': 'n',
'50': 'n',
'51': 'n',
'52': 'n',
'53': 'n',
'54': 'n',
'55': 'cm',
'56': 'n',
'57': 'n',
'58': 'n'


Explanation:



1st Capturing Group (d+):



d+ - matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
: matches the character : literally (case sensitive)



2nd Capturing Group (.*):



.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)



If there might be letters in the first matching group (though I doubt it since your casting that to an int), you might want to use:



dict(re.findall(r'(.*):(.*)', data))


I usually prefer using split()s over regexes because I feel like I have more control over the functionality of the code.



You might ask, why would you want to use the more complicated and verbose syntax of regular expressions rather than the more intuitive and simple string methods? Sometimes, the advantage is that regular expressions offer far more flexibility.




Regarding the comment of @Rahul regarding speed I'd say it depends:



Although string manipulation will usually be somewhat faster, the actual performance heavily depends on a number of factors, including:



  • How many times you parse the regex

  • How cleverly you write your string code

  • Whether the regex is precompiled

As the regex gets more complicated, it will take much more effort and complexity to write equivlent string manipulation code that performs well.



As far as I can tell, string operations will almost always beat regular expressions. But the more complex it gets, the harder it will be that string operations can keep up not only in performance matters but also regarding maintenance.






share|improve this answer











$endgroup$












  • $begingroup$
    Yeah. I think regexes are slow too.
    $endgroup$
    – Rahul Patel
    Mar 1 at 7:29











Your Answer





StackExchange.ifUsing("editor", function ()
return StackExchange.using("mathjaxEditing", function ()
StackExchange.MarkdownEditor.creationCallbacks.add(function (editor, postfix)
StackExchange.mathjaxEditing.prepareWmdForMathJax(editor, postfix, [["\$", "\$"]]);
);
);
, "mathjax-editing");

StackExchange.ifUsing("editor", function ()
StackExchange.using("externalEditor", function ()
StackExchange.using("snippets", function ()
StackExchange.snippets.init();
);
);
, "code-snippets");

StackExchange.ready(function()
var channelOptions =
tags: "".split(" "),
id: "196"
;
initTagRenderer("".split(" "), "".split(" "), channelOptions);

StackExchange.using("externalEditor", function()
// Have to fire editor after snippets, if snippets enabled
if (StackExchange.settings.snippets.snippetsEnabled)
StackExchange.using("snippets", function()
createEditor();
);

else
createEditor();

);

function createEditor()
StackExchange.prepareEditor(
heartbeatType: 'answer',
autoActivateHeartbeat: false,
convertImagesToLinks: false,
noModals: true,
showLowRepImageUploadWarning: true,
reputationToPostImages: null,
bindNavPrevention: true,
postfix: "",
imageUploader:
brandingHtml: "Powered by u003ca class="icon-imgur-white" href="https://imgur.com/"u003eu003c/au003e",
contentPolicyHtml: "User contributions licensed under u003ca href="https://creativecommons.org/licenses/by-sa/3.0/"u003ecc by-sa 3.0 with attribution requiredu003c/au003e u003ca href="https://stackoverflow.com/legal/content-policy"u003e(content policy)u003c/au003e",
allowUrls: true
,
onDemand: true,
discardSelector: ".discard-answer"
,immediatelyShowMarkdownHelp:true
);



);













draft saved

draft discarded


















StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214510%2fparsing-a-string-of-key-value-pairs-as-a-dictionary%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown

























4 Answers
4






active

oldest

votes








4 Answers
4






active

oldest

votes









active

oldest

votes






active

oldest

votes









9












$begingroup$

Your string looks very similar to the YAML syntax. Indeed it is almost valid syntax for an associative list, there are only spaces missing after the :. So, why not use a YAML parser?



import yaml

data = """
41:n
43:n
44:n
46:n
47:n
49:n
50:n
51:n
52:n
53:n
54:n
55:cm
56:n
57:n
58:n"""

print(yaml.load(data.replace(":", ": ")))
# 41: 'n',
# 43: 'n',
# 44: 'n',
# 46: 'n',
# 47: 'n',
# 49: 'n',
# 50: 'n',
# 51: 'n',
# 52: 'n',
# 53: 'n',
# 54: 'n',
# 55: 'cm',
# 56: 'n',
# 57: 'n',
# 58: 'n'


You might have to install it first, which you can do via pip install yaml.






share|improve this answer









$endgroup$












  • $begingroup$
    Thanks. This is wonderful.
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:49






  • 1




    $begingroup$
    what I believe is whenever you have parser like csv, xml, json, yaml, config or any parser, you should use them first. therefore I accept this as an answer. I hear about yaml but never cared. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 2 at 10:08















9












$begingroup$

Your string looks very similar to the YAML syntax. Indeed it is almost valid syntax for an associative list, there are only spaces missing after the :. So, why not use a YAML parser?



import yaml

data = """
41:n
43:n
44:n
46:n
47:n
49:n
50:n
51:n
52:n
53:n
54:n
55:cm
56:n
57:n
58:n"""

print(yaml.load(data.replace(":", ": ")))
# 41: 'n',
# 43: 'n',
# 44: 'n',
# 46: 'n',
# 47: 'n',
# 49: 'n',
# 50: 'n',
# 51: 'n',
# 52: 'n',
# 53: 'n',
# 54: 'n',
# 55: 'cm',
# 56: 'n',
# 57: 'n',
# 58: 'n'


You might have to install it first, which you can do via pip install yaml.






share|improve this answer









$endgroup$












  • $begingroup$
    Thanks. This is wonderful.
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:49






  • 1




    $begingroup$
    what I believe is whenever you have parser like csv, xml, json, yaml, config or any parser, you should use them first. therefore I accept this as an answer. I hear about yaml but never cared. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 2 at 10:08













9












9








9





$begingroup$

Your string looks very similar to the YAML syntax. Indeed it is almost valid syntax for an associative list, there are only spaces missing after the :. So, why not use a YAML parser?



import yaml

data = """
41:n
43:n
44:n
46:n
47:n
49:n
50:n
51:n
52:n
53:n
54:n
55:cm
56:n
57:n
58:n"""

print(yaml.load(data.replace(":", ": ")))
# 41: 'n',
# 43: 'n',
# 44: 'n',
# 46: 'n',
# 47: 'n',
# 49: 'n',
# 50: 'n',
# 51: 'n',
# 52: 'n',
# 53: 'n',
# 54: 'n',
# 55: 'cm',
# 56: 'n',
# 57: 'n',
# 58: 'n'


You might have to install it first, which you can do via pip install yaml.






share|improve this answer









$endgroup$



Your string looks very similar to the YAML syntax. Indeed it is almost valid syntax for an associative list, there are only spaces missing after the :. So, why not use a YAML parser?



import yaml

data = """
41:n
43:n
44:n
46:n
47:n
49:n
50:n
51:n
52:n
53:n
54:n
55:cm
56:n
57:n
58:n"""

print(yaml.load(data.replace(":", ": ")))
# 41: 'n',
# 43: 'n',
# 44: 'n',
# 46: 'n',
# 47: 'n',
# 49: 'n',
# 50: 'n',
# 51: 'n',
# 52: 'n',
# 53: 'n',
# 54: 'n',
# 55: 'cm',
# 56: 'n',
# 57: 'n',
# 58: 'n'


You might have to install it first, which you can do via pip install yaml.







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 1 at 15:59









GraipherGraipher

26.6k54092




26.6k54092











  • $begingroup$
    Thanks. This is wonderful.
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:49






  • 1




    $begingroup$
    what I believe is whenever you have parser like csv, xml, json, yaml, config or any parser, you should use them first. therefore I accept this as an answer. I hear about yaml but never cared. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 2 at 10:08
















  • $begingroup$
    Thanks. This is wonderful.
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:49






  • 1




    $begingroup$
    what I believe is whenever you have parser like csv, xml, json, yaml, config or any parser, you should use them first. therefore I accept this as an answer. I hear about yaml but never cared. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 2 at 10:08















$begingroup$
Thanks. This is wonderful.
$endgroup$
– Rahul Patel
Mar 2 at 5:49




$begingroup$
Thanks. This is wonderful.
$endgroup$
– Rahul Patel
Mar 2 at 5:49




1




1




$begingroup$
what I believe is whenever you have parser like csv, xml, json, yaml, config or any parser, you should use them first. therefore I accept this as an answer. I hear about yaml but never cared. Thanks.
$endgroup$
– Rahul Patel
Mar 2 at 10:08




$begingroup$
what I believe is whenever you have parser like csv, xml, json, yaml, config or any parser, you should use them first. therefore I accept this as an answer. I hear about yaml but never cared. Thanks.
$endgroup$
– Rahul Patel
Mar 2 at 10:08













10












$begingroup$

Note it is much easier to read if you chop up the comprehension into blocks, instead of having them all on one line



You could use unpacking to remove some usages of line.split



>>> 
... int(k): v
... for line in data.split()
... for k, v in (line.split(':'),)
...
41: 'n', 43: 'n', 44: 'n', 46: 'n', 47: 'n', 49: 'n', 50: 'n', 51: 'n', 52: 'n', 53: 'n', 54: 'n', 55: 'cm', 56: 'n', 57: 'n', 58: 'n'


Or if the first argument can be of str type you could use dict().



This will unpack the line.split and convert them into a key, value pair for you



>>> dict(
... line.split(':')
... for line in data.split()
... )
'41': 'n', '43': 'n', '44': 'n', '46': 'n', '47': 'n', '49': 'n', '50': 'n', '51': 'n', '52': 'n', '53': 'n', '54': 'n', '55': 'cm', '56': 'n', '57': 'n', '58': 'n'





share|improve this answer











$endgroup$












  • $begingroup$
    This is great. This will be very much helpful in the future.
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:58















10












$begingroup$

Note it is much easier to read if you chop up the comprehension into blocks, instead of having them all on one line



You could use unpacking to remove some usages of line.split



>>> 
... int(k): v
... for line in data.split()
... for k, v in (line.split(':'),)
...
41: 'n', 43: 'n', 44: 'n', 46: 'n', 47: 'n', 49: 'n', 50: 'n', 51: 'n', 52: 'n', 53: 'n', 54: 'n', 55: 'cm', 56: 'n', 57: 'n', 58: 'n'


Or if the first argument can be of str type you could use dict().



This will unpack the line.split and convert them into a key, value pair for you



>>> dict(
... line.split(':')
... for line in data.split()
... )
'41': 'n', '43': 'n', '44': 'n', '46': 'n', '47': 'n', '49': 'n', '50': 'n', '51': 'n', '52': 'n', '53': 'n', '54': 'n', '55': 'cm', '56': 'n', '57': 'n', '58': 'n'





share|improve this answer











$endgroup$












  • $begingroup$
    This is great. This will be very much helpful in the future.
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:58













10












10








10





$begingroup$

Note it is much easier to read if you chop up the comprehension into blocks, instead of having them all on one line



You could use unpacking to remove some usages of line.split



>>> 
... int(k): v
... for line in data.split()
... for k, v in (line.split(':'),)
...
41: 'n', 43: 'n', 44: 'n', 46: 'n', 47: 'n', 49: 'n', 50: 'n', 51: 'n', 52: 'n', 53: 'n', 54: 'n', 55: 'cm', 56: 'n', 57: 'n', 58: 'n'


Or if the first argument can be of str type you could use dict().



This will unpack the line.split and convert them into a key, value pair for you



>>> dict(
... line.split(':')
... for line in data.split()
... )
'41': 'n', '43': 'n', '44': 'n', '46': 'n', '47': 'n', '49': 'n', '50': 'n', '51': 'n', '52': 'n', '53': 'n', '54': 'n', '55': 'cm', '56': 'n', '57': 'n', '58': 'n'





share|improve this answer











$endgroup$



Note it is much easier to read if you chop up the comprehension into blocks, instead of having them all on one line



You could use unpacking to remove some usages of line.split



>>> 
... int(k): v
... for line in data.split()
... for k, v in (line.split(':'),)
...
41: 'n', 43: 'n', 44: 'n', 46: 'n', 47: 'n', 49: 'n', 50: 'n', 51: 'n', 52: 'n', 53: 'n', 54: 'n', 55: 'cm', 56: 'n', 57: 'n', 58: 'n'


Or if the first argument can be of str type you could use dict().



This will unpack the line.split and convert them into a key, value pair for you



>>> dict(
... line.split(':')
... for line in data.split()
... )
'41': 'n', '43': 'n', '44': 'n', '46': 'n', '47': 'n', '49': 'n', '50': 'n', '51': 'n', '52': 'n', '53': 'n', '54': 'n', '55': 'cm', '56': 'n', '57': 'n', '58': 'n'






share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 1 at 12:55

























answered Mar 1 at 10:53









LudisposedLudisposed

9,10322267




9,10322267











  • $begingroup$
    This is great. This will be very much helpful in the future.
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:58
















  • $begingroup$
    This is great. This will be very much helpful in the future.
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:58















$begingroup$
This is great. This will be very much helpful in the future.
$endgroup$
– Rahul Patel
Mar 2 at 5:58




$begingroup$
This is great. This will be very much helpful in the future.
$endgroup$
– Rahul Patel
Mar 2 at 5:58











9












$begingroup$

You have too much logic in the dict comprehension:




int(line.split(":")[0]):line.split(":")[1] for line in data.split("n") if len(line.split(":"))==2



First of all, let's expand it to a normal for-loop:



>>> result = 
>>> for line in data.split("n"):
... if len(line.split(":"))==2:
... result[int(line.split(":")[0])] = line.split(":")[1]
>>> result


I can see that you use the following check if len(line.split(":"))==2: to eliminate the first blank space from the data.split("n"):



>>> data.split("n")
['',
'41:n',
'43:n',
...
'58:n']


But the docs for str.split advice to use str.split() without specifying a sep parameter if you wanna discard the empty string at the beginning:



>>> data.split()
['41:n',
'43:n',
...
'58:n']


So, now we can remove unnecessary check from your code:



>>> result = 
>>> for line in data.split():
... result[int(line.split(":")[0])] = line.split(":")[1]
>>> result


Here you calculate line.split(":") twice. Take it out:



>>> result = 
>>> for line in data.split():
... key, value = line.split(":")
... result[int(key)] = value
>>> result


This is the most basic version. Don't put it back to a dict comprehension as it will still look quite complex. But you could make a function out of it. For example, something like this:



>>> def to_key_value(line, sep=':'):
... key, value = line.split(sep)
... return int(key), value

>>> dict(map(to_key_value, data.split()))
41: 'n',
43: 'n',
...
58: 'n'


Another option that I came up with:



>>> from functools import partial
>>> lines = data.split()
>>> split_by_colon = partial(str.split, sep=':')
>>> key_value_pairs = map(split_by_colon, lines)
>>> int(key): value for key, value in key_value_pairs
41: 'n',
43: 'n',
...
58: 'n'


Also, if you don't want to keep in memory a list of results from data.split, you might find this helpful: Is there a generator version of string.split() in Python?






share|improve this answer









$endgroup$












  • $begingroup$
    I said I want solution for list/dict comprehension. Your solution is nice but looks ugly. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 1 at 12:09






  • 2




    $begingroup$
    @RahulPatel: You might want to learn constructive criticism and some diplomacy ;) Georgy was nice enough to spend time on your problem...
    $endgroup$
    – Eric Duminil
    Mar 1 at 21:01










  • $begingroup$
    @EricDuminil Totally agree. I really apologize here. He has really spend time for me who don't deserve. Thanks Georgy
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:47











  • $begingroup$
    @RahulPatel No worries! I didn't understand from the question that you wanted solution only with dict comprehension. In any case, if my approach didn't work well for you, it may work for someone else, considering that the problem is pretty generic.
    $endgroup$
    – Georgy
    Mar 2 at 9:06










  • $begingroup$
    Thank you @Georgy. I am a very basic scripter who actually don't understands advanced solution like one of yours. once I understand enough, this might come in handy. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 2 at 10:03















9












$begingroup$

You have too much logic in the dict comprehension:




int(line.split(":")[0]):line.split(":")[1] for line in data.split("n") if len(line.split(":"))==2



First of all, let's expand it to a normal for-loop:



>>> result = 
>>> for line in data.split("n"):
... if len(line.split(":"))==2:
... result[int(line.split(":")[0])] = line.split(":")[1]
>>> result


I can see that you use the following check if len(line.split(":"))==2: to eliminate the first blank space from the data.split("n"):



>>> data.split("n")
['',
'41:n',
'43:n',
...
'58:n']


But the docs for str.split advice to use str.split() without specifying a sep parameter if you wanna discard the empty string at the beginning:



>>> data.split()
['41:n',
'43:n',
...
'58:n']


So, now we can remove unnecessary check from your code:



>>> result = 
>>> for line in data.split():
... result[int(line.split(":")[0])] = line.split(":")[1]
>>> result


Here you calculate line.split(":") twice. Take it out:



>>> result = 
>>> for line in data.split():
... key, value = line.split(":")
... result[int(key)] = value
>>> result


This is the most basic version. Don't put it back to a dict comprehension as it will still look quite complex. But you could make a function out of it. For example, something like this:



>>> def to_key_value(line, sep=':'):
... key, value = line.split(sep)
... return int(key), value

>>> dict(map(to_key_value, data.split()))
41: 'n',
43: 'n',
...
58: 'n'


Another option that I came up with:



>>> from functools import partial
>>> lines = data.split()
>>> split_by_colon = partial(str.split, sep=':')
>>> key_value_pairs = map(split_by_colon, lines)
>>> int(key): value for key, value in key_value_pairs
41: 'n',
43: 'n',
...
58: 'n'


Also, if you don't want to keep in memory a list of results from data.split, you might find this helpful: Is there a generator version of string.split() in Python?






share|improve this answer









$endgroup$












  • $begingroup$
    I said I want solution for list/dict comprehension. Your solution is nice but looks ugly. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 1 at 12:09






  • 2




    $begingroup$
    @RahulPatel: You might want to learn constructive criticism and some diplomacy ;) Georgy was nice enough to spend time on your problem...
    $endgroup$
    – Eric Duminil
    Mar 1 at 21:01










  • $begingroup$
    @EricDuminil Totally agree. I really apologize here. He has really spend time for me who don't deserve. Thanks Georgy
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:47











  • $begingroup$
    @RahulPatel No worries! I didn't understand from the question that you wanted solution only with dict comprehension. In any case, if my approach didn't work well for you, it may work for someone else, considering that the problem is pretty generic.
    $endgroup$
    – Georgy
    Mar 2 at 9:06










  • $begingroup$
    Thank you @Georgy. I am a very basic scripter who actually don't understands advanced solution like one of yours. once I understand enough, this might come in handy. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 2 at 10:03













9












9








9





$begingroup$

You have too much logic in the dict comprehension:




int(line.split(":")[0]):line.split(":")[1] for line in data.split("n") if len(line.split(":"))==2



First of all, let's expand it to a normal for-loop:



>>> result = 
>>> for line in data.split("n"):
... if len(line.split(":"))==2:
... result[int(line.split(":")[0])] = line.split(":")[1]
>>> result


I can see that you use the following check if len(line.split(":"))==2: to eliminate the first blank space from the data.split("n"):



>>> data.split("n")
['',
'41:n',
'43:n',
...
'58:n']


But the docs for str.split advice to use str.split() without specifying a sep parameter if you wanna discard the empty string at the beginning:



>>> data.split()
['41:n',
'43:n',
...
'58:n']


So, now we can remove unnecessary check from your code:



>>> result = 
>>> for line in data.split():
... result[int(line.split(":")[0])] = line.split(":")[1]
>>> result


Here you calculate line.split(":") twice. Take it out:



>>> result = 
>>> for line in data.split():
... key, value = line.split(":")
... result[int(key)] = value
>>> result


This is the most basic version. Don't put it back to a dict comprehension as it will still look quite complex. But you could make a function out of it. For example, something like this:



>>> def to_key_value(line, sep=':'):
... key, value = line.split(sep)
... return int(key), value

>>> dict(map(to_key_value, data.split()))
41: 'n',
43: 'n',
...
58: 'n'


Another option that I came up with:



>>> from functools import partial
>>> lines = data.split()
>>> split_by_colon = partial(str.split, sep=':')
>>> key_value_pairs = map(split_by_colon, lines)
>>> int(key): value for key, value in key_value_pairs
41: 'n',
43: 'n',
...
58: 'n'


Also, if you don't want to keep in memory a list of results from data.split, you might find this helpful: Is there a generator version of string.split() in Python?






share|improve this answer









$endgroup$



You have too much logic in the dict comprehension:




int(line.split(":")[0]):line.split(":")[1] for line in data.split("n") if len(line.split(":"))==2



First of all, let's expand it to a normal for-loop:



>>> result = 
>>> for line in data.split("n"):
... if len(line.split(":"))==2:
... result[int(line.split(":")[0])] = line.split(":")[1]
>>> result


I can see that you use the following check if len(line.split(":"))==2: to eliminate the first blank space from the data.split("n"):



>>> data.split("n")
['',
'41:n',
'43:n',
...
'58:n']


But the docs for str.split advice to use str.split() without specifying a sep parameter if you wanna discard the empty string at the beginning:



>>> data.split()
['41:n',
'43:n',
...
'58:n']


So, now we can remove unnecessary check from your code:



>>> result = 
>>> for line in data.split():
... result[int(line.split(":")[0])] = line.split(":")[1]
>>> result


Here you calculate line.split(":") twice. Take it out:



>>> result = 
>>> for line in data.split():
... key, value = line.split(":")
... result[int(key)] = value
>>> result


This is the most basic version. Don't put it back to a dict comprehension as it will still look quite complex. But you could make a function out of it. For example, something like this:



>>> def to_key_value(line, sep=':'):
... key, value = line.split(sep)
... return int(key), value

>>> dict(map(to_key_value, data.split()))
41: 'n',
43: 'n',
...
58: 'n'


Another option that I came up with:



>>> from functools import partial
>>> lines = data.split()
>>> split_by_colon = partial(str.split, sep=':')
>>> key_value_pairs = map(split_by_colon, lines)
>>> int(key): value for key, value in key_value_pairs
41: 'n',
43: 'n',
...
58: 'n'


Also, if you don't want to keep in memory a list of results from data.split, you might find this helpful: Is there a generator version of string.split() in Python?







share|improve this answer












share|improve this answer



share|improve this answer










answered Mar 1 at 11:56









GeorgyGeorgy

1,0912520




1,0912520











  • $begingroup$
    I said I want solution for list/dict comprehension. Your solution is nice but looks ugly. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 1 at 12:09






  • 2




    $begingroup$
    @RahulPatel: You might want to learn constructive criticism and some diplomacy ;) Georgy was nice enough to spend time on your problem...
    $endgroup$
    – Eric Duminil
    Mar 1 at 21:01










  • $begingroup$
    @EricDuminil Totally agree. I really apologize here. He has really spend time for me who don't deserve. Thanks Georgy
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:47











  • $begingroup$
    @RahulPatel No worries! I didn't understand from the question that you wanted solution only with dict comprehension. In any case, if my approach didn't work well for you, it may work for someone else, considering that the problem is pretty generic.
    $endgroup$
    – Georgy
    Mar 2 at 9:06










  • $begingroup$
    Thank you @Georgy. I am a very basic scripter who actually don't understands advanced solution like one of yours. once I understand enough, this might come in handy. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 2 at 10:03
















  • $begingroup$
    I said I want solution for list/dict comprehension. Your solution is nice but looks ugly. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 1 at 12:09






  • 2




    $begingroup$
    @RahulPatel: You might want to learn constructive criticism and some diplomacy ;) Georgy was nice enough to spend time on your problem...
    $endgroup$
    – Eric Duminil
    Mar 1 at 21:01










  • $begingroup$
    @EricDuminil Totally agree. I really apologize here. He has really spend time for me who don't deserve. Thanks Georgy
    $endgroup$
    – Rahul Patel
    Mar 2 at 5:47











  • $begingroup$
    @RahulPatel No worries! I didn't understand from the question that you wanted solution only with dict comprehension. In any case, if my approach didn't work well for you, it may work for someone else, considering that the problem is pretty generic.
    $endgroup$
    – Georgy
    Mar 2 at 9:06










  • $begingroup$
    Thank you @Georgy. I am a very basic scripter who actually don't understands advanced solution like one of yours. once I understand enough, this might come in handy. Thanks.
    $endgroup$
    – Rahul Patel
    Mar 2 at 10:03















$begingroup$
I said I want solution for list/dict comprehension. Your solution is nice but looks ugly. Thanks.
$endgroup$
– Rahul Patel
Mar 1 at 12:09




$begingroup$
I said I want solution for list/dict comprehension. Your solution is nice but looks ugly. Thanks.
$endgroup$
– Rahul Patel
Mar 1 at 12:09




2




2




$begingroup$
@RahulPatel: You might want to learn constructive criticism and some diplomacy ;) Georgy was nice enough to spend time on your problem...
$endgroup$
– Eric Duminil
Mar 1 at 21:01




$begingroup$
@RahulPatel: You might want to learn constructive criticism and some diplomacy ;) Georgy was nice enough to spend time on your problem...
$endgroup$
– Eric Duminil
Mar 1 at 21:01












$begingroup$
@EricDuminil Totally agree. I really apologize here. He has really spend time for me who don't deserve. Thanks Georgy
$endgroup$
– Rahul Patel
Mar 2 at 5:47





$begingroup$
@EricDuminil Totally agree. I really apologize here. He has really spend time for me who don't deserve. Thanks Georgy
$endgroup$
– Rahul Patel
Mar 2 at 5:47













$begingroup$
@RahulPatel No worries! I didn't understand from the question that you wanted solution only with dict comprehension. In any case, if my approach didn't work well for you, it may work for someone else, considering that the problem is pretty generic.
$endgroup$
– Georgy
Mar 2 at 9:06




$begingroup$
@RahulPatel No worries! I didn't understand from the question that you wanted solution only with dict comprehension. In any case, if my approach didn't work well for you, it may work for someone else, considering that the problem is pretty generic.
$endgroup$
– Georgy
Mar 2 at 9:06












$begingroup$
Thank you @Georgy. I am a very basic scripter who actually don't understands advanced solution like one of yours. once I understand enough, this might come in handy. Thanks.
$endgroup$
– Rahul Patel
Mar 2 at 10:03




$begingroup$
Thank you @Georgy. I am a very basic scripter who actually don't understands advanced solution like one of yours. once I understand enough, this might come in handy. Thanks.
$endgroup$
– Rahul Patel
Mar 2 at 10:03











8












$begingroup$

There's nothing wrong with the solution you have come with, but if you want an alternative, regex might come in handy here:



In [10]: import re
In [11]: data = """
...: 41:n
...: 43:n
...: 44:n
...: 46:n
...: 47:n
...: 49:n
...: 50:n
...: 51:n
...: 52:n
...: 53:n
...: 54:n
...: 55:cm
...: 56:n
...: 57:n
...: 58:n"""

In [12]: dict(re.findall(r'(d+):(.*)', data))
Out[12]:
'41': 'n',
'43': 'n',
'44': 'n',
'46': 'n',
'47': 'n',
'49': 'n',
'50': 'n',
'51': 'n',
'52': 'n',
'53': 'n',
'54': 'n',
'55': 'cm',
'56': 'n',
'57': 'n',
'58': 'n'


Explanation:



1st Capturing Group (d+):



d+ - matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
: matches the character : literally (case sensitive)



2nd Capturing Group (.*):



.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)



If there might be letters in the first matching group (though I doubt it since your casting that to an int), you might want to use:



dict(re.findall(r'(.*):(.*)', data))


I usually prefer using split()s over regexes because I feel like I have more control over the functionality of the code.



You might ask, why would you want to use the more complicated and verbose syntax of regular expressions rather than the more intuitive and simple string methods? Sometimes, the advantage is that regular expressions offer far more flexibility.




Regarding the comment of @Rahul regarding speed I'd say it depends:



Although string manipulation will usually be somewhat faster, the actual performance heavily depends on a number of factors, including:



  • How many times you parse the regex

  • How cleverly you write your string code

  • Whether the regex is precompiled

As the regex gets more complicated, it will take much more effort and complexity to write equivlent string manipulation code that performs well.



As far as I can tell, string operations will almost always beat regular expressions. But the more complex it gets, the harder it will be that string operations can keep up not only in performance matters but also regarding maintenance.






share|improve this answer











$endgroup$












  • $begingroup$
    Yeah. I think regexes are slow too.
    $endgroup$
    – Rahul Patel
    Mar 1 at 7:29















8












$begingroup$

There's nothing wrong with the solution you have come with, but if you want an alternative, regex might come in handy here:



In [10]: import re
In [11]: data = """
...: 41:n
...: 43:n
...: 44:n
...: 46:n
...: 47:n
...: 49:n
...: 50:n
...: 51:n
...: 52:n
...: 53:n
...: 54:n
...: 55:cm
...: 56:n
...: 57:n
...: 58:n"""

In [12]: dict(re.findall(r'(d+):(.*)', data))
Out[12]:
'41': 'n',
'43': 'n',
'44': 'n',
'46': 'n',
'47': 'n',
'49': 'n',
'50': 'n',
'51': 'n',
'52': 'n',
'53': 'n',
'54': 'n',
'55': 'cm',
'56': 'n',
'57': 'n',
'58': 'n'


Explanation:



1st Capturing Group (d+):



d+ - matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
: matches the character : literally (case sensitive)



2nd Capturing Group (.*):



.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)



If there might be letters in the first matching group (though I doubt it since your casting that to an int), you might want to use:



dict(re.findall(r'(.*):(.*)', data))


I usually prefer using split()s over regexes because I feel like I have more control over the functionality of the code.



You might ask, why would you want to use the more complicated and verbose syntax of regular expressions rather than the more intuitive and simple string methods? Sometimes, the advantage is that regular expressions offer far more flexibility.




Regarding the comment of @Rahul regarding speed I'd say it depends:



Although string manipulation will usually be somewhat faster, the actual performance heavily depends on a number of factors, including:



  • How many times you parse the regex

  • How cleverly you write your string code

  • Whether the regex is precompiled

As the regex gets more complicated, it will take much more effort and complexity to write equivlent string manipulation code that performs well.



As far as I can tell, string operations will almost always beat regular expressions. But the more complex it gets, the harder it will be that string operations can keep up not only in performance matters but also regarding maintenance.






share|improve this answer











$endgroup$












  • $begingroup$
    Yeah. I think regexes are slow too.
    $endgroup$
    – Rahul Patel
    Mar 1 at 7:29













8












8








8





$begingroup$

There's nothing wrong with the solution you have come with, but if you want an alternative, regex might come in handy here:



In [10]: import re
In [11]: data = """
...: 41:n
...: 43:n
...: 44:n
...: 46:n
...: 47:n
...: 49:n
...: 50:n
...: 51:n
...: 52:n
...: 53:n
...: 54:n
...: 55:cm
...: 56:n
...: 57:n
...: 58:n"""

In [12]: dict(re.findall(r'(d+):(.*)', data))
Out[12]:
'41': 'n',
'43': 'n',
'44': 'n',
'46': 'n',
'47': 'n',
'49': 'n',
'50': 'n',
'51': 'n',
'52': 'n',
'53': 'n',
'54': 'n',
'55': 'cm',
'56': 'n',
'57': 'n',
'58': 'n'


Explanation:



1st Capturing Group (d+):



d+ - matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
: matches the character : literally (case sensitive)



2nd Capturing Group (.*):



.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)



If there might be letters in the first matching group (though I doubt it since your casting that to an int), you might want to use:



dict(re.findall(r'(.*):(.*)', data))


I usually prefer using split()s over regexes because I feel like I have more control over the functionality of the code.



You might ask, why would you want to use the more complicated and verbose syntax of regular expressions rather than the more intuitive and simple string methods? Sometimes, the advantage is that regular expressions offer far more flexibility.




Regarding the comment of @Rahul regarding speed I'd say it depends:



Although string manipulation will usually be somewhat faster, the actual performance heavily depends on a number of factors, including:



  • How many times you parse the regex

  • How cleverly you write your string code

  • Whether the regex is precompiled

As the regex gets more complicated, it will take much more effort and complexity to write equivlent string manipulation code that performs well.



As far as I can tell, string operations will almost always beat regular expressions. But the more complex it gets, the harder it will be that string operations can keep up not only in performance matters but also regarding maintenance.






share|improve this answer











$endgroup$



There's nothing wrong with the solution you have come with, but if you want an alternative, regex might come in handy here:



In [10]: import re
In [11]: data = """
...: 41:n
...: 43:n
...: 44:n
...: 46:n
...: 47:n
...: 49:n
...: 50:n
...: 51:n
...: 52:n
...: 53:n
...: 54:n
...: 55:cm
...: 56:n
...: 57:n
...: 58:n"""

In [12]: dict(re.findall(r'(d+):(.*)', data))
Out[12]:
'41': 'n',
'43': 'n',
'44': 'n',
'46': 'n',
'47': 'n',
'49': 'n',
'50': 'n',
'51': 'n',
'52': 'n',
'53': 'n',
'54': 'n',
'55': 'cm',
'56': 'n',
'57': 'n',
'58': 'n'


Explanation:



1st Capturing Group (d+):



d+ - matches a digit (equal to [0-9])
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy)
: matches the character : literally (case sensitive)



2nd Capturing Group (.*):



.* matches any character (except for line terminators)
* Quantifier — Matches between zero and unlimited times, as many times as possible, giving back as needed (greedy)



If there might be letters in the first matching group (though I doubt it since your casting that to an int), you might want to use:



dict(re.findall(r'(.*):(.*)', data))


I usually prefer using split()s over regexes because I feel like I have more control over the functionality of the code.



You might ask, why would you want to use the more complicated and verbose syntax of regular expressions rather than the more intuitive and simple string methods? Sometimes, the advantage is that regular expressions offer far more flexibility.




Regarding the comment of @Rahul regarding speed I'd say it depends:



Although string manipulation will usually be somewhat faster, the actual performance heavily depends on a number of factors, including:



  • How many times you parse the regex

  • How cleverly you write your string code

  • Whether the regex is precompiled

As the regex gets more complicated, it will take much more effort and complexity to write equivlent string manipulation code that performs well.



As far as I can tell, string operations will almost always beat regular expressions. But the more complex it gets, the harder it will be that string operations can keep up not only in performance matters but also regarding maintenance.







share|improve this answer














share|improve this answer



share|improve this answer








edited Mar 1 at 8:00

























answered Mar 1 at 7:25









яүυкяүυк

7,22122056




7,22122056











  • $begingroup$
    Yeah. I think regexes are slow too.
    $endgroup$
    – Rahul Patel
    Mar 1 at 7:29
















  • $begingroup$
    Yeah. I think regexes are slow too.
    $endgroup$
    – Rahul Patel
    Mar 1 at 7:29















$begingroup$
Yeah. I think regexes are slow too.
$endgroup$
– Rahul Patel
Mar 1 at 7:29




$begingroup$
Yeah. I think regexes are slow too.
$endgroup$
– Rahul Patel
Mar 1 at 7:29

















draft saved

draft discarded
















































Thanks for contributing an answer to Code Review Stack Exchange!


  • Please be sure to answer the question. Provide details and share your research!

But avoid


  • Asking for help, clarification, or responding to other answers.

  • Making statements based on opinion; back them up with references or personal experience.

Use MathJax to format equations. MathJax reference.


To learn more, see our tips on writing great answers.




draft saved


draft discarded














StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2fcodereview.stackexchange.com%2fquestions%2f214510%2fparsing-a-string-of-key-value-pairs-as-a-dictionary%23new-answer', 'question_page');

);

Post as a guest















Required, but never shown





















































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown

































Required, but never shown














Required, but never shown












Required, but never shown







Required, but never shown






Popular posts from this blog

How to check contact read email or not when send email to Individual?

Bahrain

Postfix configuration issue with fips on centos 7; mailgun relay