A really basic requirement.
I would like to convert from this format:
"column1=value1;column2=value2"
to this format (JSON):
{"column1":"value1","column2":"value2"}
Any best approach in Python would be appreciated.
Thanks in advance.
using regular expressions
import re
REGEX = r"([^=;]+)=([^=;]+)"
finder = re.compile(REGEX)
s = "column1=value1;column2=value2"
matches = re.finditer(finder, s)
d = {}
for match in matches:
key = match.group(1)
val = match.group(2)
d[key] = val
print(d)
Output:
{'column2': 'value2', 'column1': 'value1'}
If you really want to parse your string to JSON you should try something like this:
import json # simplejson if you use a python version below 2.6
string = u'{"column1":"value1", "column2": "value2"}'
json = json.loads(string)
If you want to parse your string to a dictionary you should try ast:
import ast
string = u'{"column1":"value1", "column2": "value2"}'
ast.literal_eval(string)=>{'column1': 'value1', 'column2': 'value2'}
Related
Assuming the structure of the json string does not change, is the order of a jsonpath match value result stable?
import jsonpath_ng
response = json.loads(response)
jsonpath_expression_name = jsonpath_ng.parse("$[forms][0][questionGroups][*][questions]..[name]")
match_name = [match.value for match in jsonpath_expression_name.find(response)]
jsonpath_expression_id = jsonpath_ng.parse("$[forms][0][questionGroups][*][questions]..[id]")
matches_id = [match.value for match in jsonpath_expression_id.find(response)]
survey_q_dict = { k:v for (k,v) in zip(matches_id, match_name)}
Thank you!
I need a little help processing a String to a Dict, considering that the String is not in a common format, but an output from a UDF function
The return from the PySpark UDF looks like the string below:
"{list=[{a=1}, {a=2}, {a=3}]}"
And I need to convert it to a python dictionary with the structure below:
{
"list": [
{"a": 1}
{"a": 2}
{"a": 3}
]
}
So I can access it's values, like
dict["list"][1]["a"]
I already tried using:
JSON.loads
ast_eval()
Could someone please help me?
As an example of how this unparsed string is generated:
#udf()
def execute_method():
return {"list": [{"a":1},{"b":1}{"c":1}]}
df_result = df_source.withColumn("result", execute_method())
By the very least you will need to replace = with : and surround keys with double quotes:
import json
import re
string = "{list=[{a=1}, {a=2}, {a=3}]}"
fixed_string = re.sub(r'(\w+)=', r'"\1":', string)
print(type(fixed_string), fixed_string)
parsed = json.loads(fixed_string)
print(type(parsed), parsed)
outputs
<class 'str'> {"list":[{"a":1}, {"a":2}, {"a":3}]}
<class 'dict'> {'list': [{'a': 1}, {'a': 2}, {'a': 3}]}
try this :
import re
import json
data="{list=[{a=1}, {a=2}, {a=3}]}"
data=data.replace('=',':')
pattern=[e.group() for e in re.finditer('[a-z]+', data, flags=re.IGNORECASE)]
for e in set(pattern):
data=data.replace(e,"\""+e+"\"")
print(json.loads(data))
I have a very string as output of function as follows:
tmp = <"last seen":1568,"reviews [{"id":15869,"author":"abnbvg","changes":........>
How will I fetch the "id":15869 out of it?
The string content looks like JSON, so either use the json module or use a regular expression to extract the specific string you need.
The data looks like a JSON string. Use:
try:
import json
except ImportError:
import simplejson as json
tmp = '"last seen":1568,"reviews":[{"id":15869,"author":"abnbvg"}]'
data = json.loads('{{{}}}'.format(tmp))
>>> print data
{u'reviews': [{u'id': 15869, u'author': u'abnbvg'}], u'last seen': 1568}
>>> print data['reviews'][0]['id']
15869
Note that I wrapped the string in { and } to make a dictionary. You might not have to do that if the actual JSON string is already encapsulated with braces.
If id is the only thing you need from the string and it will always be something like {"id":15869,"author":"abnbvg"..., then you can go with sinple string split instead of json conversion.
tmp = '"last seen":1568,"reviews" : [{"id":15869,"author":"abnbvg","changes":........'
tmp1 = tmp.split('"id":', 1)[1]
id = tmp1.split(",", 1)[0]
Please note that tmp1 line may raise IndexError in case there is no "id" key found in the string. You can use -1 instead of 1 to side step. But in this way, you can report that "id" is not found.
try:
tmp1 = tmp.split('"id":', 1)[1]
id = tmp1.split(",", 1)[0]
except IndexError:
print "id key is not present in the json"
id = None
If you do really need more variables from the json string, please go with mhawke's solution of converting the json to dictionary and getting the value. You can use ast.literal_eval
from ast import literal_eval
tmp = '"last seen":1568,"reviews" : [{"id":15869,"author":"abnbvg","changes":........'
tmp_dict = literal_eval("""{%s}"""%(tmp))
print tmp_dict["reviews"][0]["id"]
In the second case, if you need to collect all the "id" keys in the list, this will help:
id_list =[]
for id_dict in tmp_dict["reviews"]:
id_list.append(id_dict["id"])
print id_list
So I'm parsing a really big log file with some embedded json.
So I'll see lines like this
foo="{my_object:foo, bar:baz}" a=b c=d
The problem is that since the internal json can have spaces, but outside of the JSON, spaces act as tuple delimiters (except where they have unquoted strings . Huzzah for whatever idiot thought that was a good idea), I'm not sure how to figure out where the end of the JSON string is without reimplementing large portions of a json parser.
Is there a json parser for Python where I can give it '{"my_object":"foo", "bar":"baz"} asdfasdf', and it can return ({'my_object' : 'foo', 'bar':'baz'}, 'asdfasdf') or am I going to have to reimplement the json parser by hand?
Found a really cool answer. Use json.JSONDecoder's scan_once function
In [30]: import json
In [31]: d = json.JSONDecoder()
In [32]: my_string = 'key="{"foo":"bar"}"more_gibberish'
In [33]: d.scan_once(my_string, 5)
Out[33]: ({u'foo': u'bar'}, 18)
In [37]: my_string[18:]
Out[37]: '"more_gibberish'
Just be careful
In [38]: d.scan_once(my_string, 6)
Out[38]: (u'foo', 11)
Match everything around it.
>>> re.search('^foo="(.*)" a=.+ c=.+$', 'foo="{my_object:foo, bar:baz}" a=b c=d').group(1)
'{my_object:foo, bar:baz}'
Use shlex and json.
Something like:
import shlex
import json
def decode_line(line):
decoded = {}
fields = shlex.split(line)
for f in fields:
k, v = f.split('=', 1)
if k == "foo":
v = json.loads(v)
decoded[k] = v
return decoded
This does assume that the JSON inside the quotes is quoted properly.
Here's a short example program that uses the above:
import pipes
testdict = {"hello": "world", "foo": "bar"}
line = 'foo=' + pipes.quote(json.dumps(testdict)) + ' a=b c=d'
print line
print decode_line(line)
With output:
foo='{"foo": "bar", "hello": "world"}' a=b c=d
{'a': 'b', 'c': 'd', 'foo': {u'foo': u'bar', u'hello': u'world'}}
Come straight to the point.
A str object below:
s = '{"key1":"value1", "key2":"value2", "key3":"value3"}'
As you see, dict is wrapped in str. Now how to escape dict from str?
In other words, is it possible d = operate(s), d["key1"] = "value1", and etc?
>>> ast.literal_eval('{"key1":"value1", "key2":"value2", "key3":"value3"}')
{'key3': 'value3', 'key2': 'value2', 'key1': 'value1'}
i'd use json:
try:
import json
except ImportError:
import simplejson as json
d = json.loads(s)
You're looking for eval.