Python Parse JSON value with space delimited subfields - python

I need to parse JSON like this:
{
"entity": " a=123455 b=234234 c=S d=CO e=1 f=user1 timestamp=null",
"otherField": "text"
}
I want to get values for a, b, c, d, e, timestamp separately. Is there a better way than assigning the entity value to a string, then parsing with REGEX?

There is nothing to the JSON standard that parses that value for you, you'll have to do this in Python.
It could be easier to just split that string on whitespace, then on =:
entities = dict(keyvalue.split('=', 1) for keyvalue in data['entity'].split())
This results in:
>>> data = {'entity': " a=123455 b=234234 c=S d=CO e=1 f=user1 timestamp=null"}
>>> dict(keyvalue.split('=', 1) for keyvalue in data['entity'].split())
{'a': '123455', 'c': 'S', 'b': '234234', 'e': '1', 'd': 'CO', 'f': 'user1', 'timestamp': 'null'}

What about this:
>>> dic = dict(item.split("=") for item in s['entity'].strip().split(" "))
>>> dic
>>> {'a': '123455', 'c': 'S', 'b': '234234', 'e': '1', 'd': 'CO', 'f': 'user1', 'timestamp':'null'}
>>> dic['a']
'123455'
>>> dic['b']
'234234'
>>> dic['c']
'S'
>>> dic['d']
'CO'
>>>

Related

Extract outermost lists from string using regex

I have a long string containing attributes, in order to parse this I am attempting to extract the 'lists' from the string, I'm having some trouble particularly when dealing with multi-dimensional lists.
An Example String:
'a="foo",c=[d="test",f="bar",g=[h="some",i="text"],j="over"],k="here",i=[j="baz"]'
I would like to extract
c=[d="test",f="bar",g=[h="some",i="text"],j="over"]
and
i=[j="baz"]
from this string.
Is this possible using regex?
I've tried numerous different regex, this is my most recent one:
([^\W0-9]\w*=\[.*\])
This string looks like a JSON object, with a few differences. My plan is to turn this into a JSON string, then parse it. After that, it is a matter of picking out what you want:
import json
import re
def str2obj(the_string):
out = re.sub(r"(\w+)=", f'"\\1":', the_string)
out = out.replace("[", "{").replace("]", "}")
out = "{%s}" % out
out = json.loads(out)
return out
string_object = 'a="foo",c=[d="test",f="bar",g=[h="some",i="text"],j="over"],k="here",i=[j="baz"]'
json_object = str2obj(string_object)
print(json_object)
assert json_object["a"] == "foo"
assert json_object["c"] == {
'd': 'test',
'f': 'bar',
'g': {'h': 'some', 'i': 'text'},
'j': 'over'
}
assert json_object["k"] == "here"
assert json_object["i"] == {"j": "baz"}
Output:
{'a': 'foo', 'c': {'d': 'test', 'f': 'bar', 'g': {'h': 'some', 'i': 'text'}, 'j': 'over'}, 'k': 'here', 'i': {'j': 'baz'}}
Notes
The re.sub call replace a= with "a":
The replace calls turn the square brackets into the curly ones
There is no error checking in the code, I assume what you have is valid in term of balanced brackets

Dictionary values are in "set", how can I manage to remove the sets?

dict = {'a': {'Islamabad'}, 'b' : {'Islamabad'}, 'c': {'Paris'},
'd': {'Bern'}, 'e': {'Moscow'}}
result wanted:
dict = {'a': 'Islamabad', 'b' : 'Islamabad', 'c': 'Paris',
'd': 'Bern', 'e': 'Moscow'}
A simple way to get one element from a set is set.pop().
dct = {'a': {'Islamabad'}, 'b' : {'Islamabad'}, 'c': {'Paris'},
'd': {'Bern'}, 'e': {'Moscow'}}
dct = {k: v.pop() for k, v in dct.items()}
print(dct)
v.pop() modifies the original set v. If this is not desired, use next(iter(v)).
Update: Or use an assignment with a [target_list] or (target_list) (--> comment from #Kelly Bundy)
P.S. don't use dict as a variable name
You can convert set to string using * with str() function ( str(*set) ). This way it will remove brackets from the string
You have to loop over the dict and change each value to string like this
dict1 = {'a': {'Islamabad'}, 'b' : {'Islamabad'}, 'c': {'Paris'}, 'd': {'Bern'}, 'e': {'Moscow'}}
dict1 = {k: str(*v) for k, v in dict1.items()}
output: {'a': 'Islamabad', 'b': 'Islamabad', 'c': 'Paris', 'd': 'Bern', 'e': 'Moscow'}
and also, don't use "dict" to name your dictionaries. "dict" is python built-in word which may lead to error in some scenarios
Extremely poor code, please don't use this in any other cases than your own, but this works with your current example.
import ast
dictt = {'a': {'Islamabad'}, 'b' : {'Islamabad'}, 'c': {'Paris'},
'd': {'Bern'}, 'e': {'Moscow'}}
dictt = str(dictt).replace("{","") #Convert to string and remove {
dictt = "{" + dictt.replace("}","") +"}" #remove } and re-add outer { }
dictt = ast.literal_eval(dictt) #Back to dict
print(dictt)
output:
{'a': 'Islamabad', 'b': 'Islamabad', 'c': 'Paris', 'd': 'Bern', 'e': 'Moscow'}

assigning a value from dictionary to equation

i want to assign a value to a string equation but i am stuck with the logic.
dic1 = {'d': '2', 'a': '1', 'c': '3', 'b': '2'}
equation_string = 'ab+cd'
i want to the output like:
'12+32' = 44
My Logic:
1 -> writing a for loop to assign the values to the string but i don't know how to skip '+' sign in the string.
for itr in range(0,len(equation_string)):
equation_String[itr] = dict1[equation_str[itr]]
In order to achieve it. You need to firstly replace the key in equation_string of dic1 with the corresponding value using string.replace(). When you made all the replacements within the string, execute the string expression using eval(). Below is the sample code:
>>> dic1 = {'d': '2', 'a': '1', 'c': '3', 'b': '2'}
>>> equation_string = 'ab+cd'
>>> for k, v in dic1.iteritems():
... equation_string = equation_string.replace(k, v)
...
>>> equation_string
'12+32' # Updated value of equation_string
>>> eval(equation_string)
44

Regex How to match Empty

I have log structure looks like
a b c|
so for example:
Mozilla 5.0 white|
should be matched/extracted to sth like
a: Mozilla, b: 5.0, c: white
but there is an entry in my log is:
iOS|
which can be explained as
a:iOS, b:null, c:null
I am using python3 re, doing match with named group ?P
is there any way to achieve this?
>>> m = re.match(r"(?P<a>[^\s]+)(\s+(?P<b>[^\s]+))?(\s+(?P<c>[^\s]+))?\s*\|")
>>> m.groups()
('iOS', None, None)
>>> m.groupdict()
{'c': None, 'a': 'iOS', 'b': None}
>>> m = re.match(r"(?P<a>[^\s]+)(\s+(?P<b>[^\s]+))?(\s+(?P<c>[^\s]+))?\s*\|")
>>> m.groups()
('Mozilla', ' 5.0', ' white')
>>> m.groupdict()
{'c': 'white', 'a': 'Mozilla', 'b': '5.0'}
UPDATE:
I noticed that the previous version included spaces in the returned groups - I had factored the \s+ into the (?P<>...) to save a couple bytes, but it had that side effect. So I fixed that and also made it tolerant of spaces before the final '|'
You can put your patterns in a list like following :
>>> pattern = ['a', 'b', 'c']
Then use re.findall() to find all the relative parts, then use zip and dict to create the relative dictionary:
>>> s = "IOS|"
>>> dict(zip(pattern,re.findall('([^\s]+)?\s?([^\s]+)?\s?([^\s]+)?\|',s)[0]))
{'a': 'IOS', 'c': '', 'b': ''}
>>>
>>> s = "Mozilla 5.0 white|"
>>>
>>> dict(zip(pattern,re.findall('([^\s]+)?\s?([^\s]+)?\s?([^\s]+)?\|',s)[0]))
{'a': 'Mozilla', 'c': 'white', 'b': '5.0'}

Filling a python dictionary in for loop returns same values

For the needs of the project im iterating over some data and adding needed values to premade dictionary.
here is striped down example of code which represents my question:
class Parser:
def __init__(self):
self.records = [['value1','value2','value3'],
['value4','value5','value6'],
['value7','value8','value9']]
def get_parsed(self):
parsed = []
dic = {'key1': '',
'key2': '',
'key3': '',
}
for i in self.records:
dic['key1'] = i[0]
dic['key2'] = i[1]
dic['key3'] = i[2]
parsed.append(dic)
return parsed
What i expect to get is list of dicts like this:
[{'key1':'value1','key2':'value2','key3':'value3'},
{'key1':'value4','key2':'value5','key3':'value6'},
{'key1':'value7','key2':'value8','key3':'value9'}]
But what im getting is:
[{'key1':'value1','key2':'value2','key3':'value3'},
{'key1':'value1','key2':'value2','key3':'value3'},
{'key1':'value1','key2':'value2','key3':'value3'}]
Though if i move dictionary initialization into 'for' loop - im getting the desired result but i don't understand why is that happening?
EDIT:
The question is more "Why does it happens this way"
I did some testing in ipython and that's what i've got:
In [1]: d = {'a':'','b':'','c':''}
In [2]: d
Out[2]: {'a': '', 'b': '', 'c': ''}
In [3]: d['a'] = 'value'
In [4]: d['b'] = 'other_value'
In [5]: d['c'] = 'third_value'
In [6]: d
Out[6]: {'a': 'value', 'b': 'other_value', 'c': 'third_value'}
In [7]: d['a'] = 'inserting new value'
In [8]: d
Out[8]: {'a': 'inserting new value', 'b': 'other_value', 'c': 'third_value'}
So the value of the key could be updated and it changes, why doesn't it happen in FOR loop?
Because your dic is created outside the loop, you only create one dict. If you want three different dicts, you need to create three different dicts, so move the initial creation of dic inside the loop.
To answer your updated question, the issue is that although you think you are appending a new dict with each parsed.append(dic), you are just appending the same dict three times. Append doesn't copy the dict. So whenever you modify that dict, all the dicts in parse show the change, since they are all the same dict. This version of your second code example may be more illustrative:
>>> d = {'a': '', 'b': '', 'c': ''}
>>> stuff = []
>>> stuff.append(d)
>>> print stuff
[{'a': '', 'c': '', 'b': ''}]
>>> d['a'] = 'other'
>>> print stuff
[{'a': 'other', 'c': '', 'b': ''}]
>>> stuff.append(d)
>>> print stuff
[{'a': 'other', 'c': '', 'b': ''}, {'a': 'other', 'c': '', 'b': ''}]
>>> d['a'] = 'yet another'
>>> print stuff
[{'a': 'yet another', 'c': '', 'b': ''}, {'a': 'yet another', 'c': '', 'b': ''}]
Notice that changing the dict "works" in that it indeed changes the value, but regardless of that, the list still contains the same dict multiple times, so any changes you make overwrite whatever changes you made earlier. In the end, your list only contains the last version of the dict, because all earlier changes were overwritten in all dicts in the list.
You are modifying and inserting the same dictionary to your list three times. Create a new dictionary for each iteration of the loop:
for i in self.records:
dic = { 'key1': i[0], 'key2': i[1], 'key3': i[2] }
parsed.append(dic)

Categories

Resources