Convert a String to Python Dictionary or JSON Object - python

Here is the problem - I have a string in the following format (note: there are no line breaks). I simply want this string to be serialized in a python dictionary or a json object to navigate easily. I have tried both ast.literal_eval and json but the end result is either an error or simply another string. I have been scratching my head over this for sometimes and I know there is a simple and elegant solution than to just write my own parser.
{
table_name:
{
"columns":
[
{
"col_1":{"col_1_1":"value_1_1","col_1_2":"value_1_2"},
"col_2":{"col_2_1":"value_2_1","col_2_2":"value_2_2"},
"col_3":"value_3","col_4":"value_4","col_5":"value_5"}],
"Rows":1,"Total":1,"Flag":1,"Instruction":none
}
}

Note, that JSON decoder expects each property name to be enclosed in double quotes.Use the following approach with re.sub() and json.loads() functions:
import json, re
s = '{table_name:{"columns":[{"col_1":{"col_1_1":"value_1_1","col_1_2":"value_1_2"},"col_2":{"col_2_1":"value_2_1","col_2_2":"value_2_2"},"col_3":"value_3","col_4":"value_4","col_5":"value_5"}],"Rows":1,"Total":1,"Flag":1,"Instruction":none}}'
s = re.sub(r'\b(?<!\")([_\w]+)(?=\:)', r'"\1"', s).replace('none', '"None"')
obj = json.loads(s)
print(obj)
The output:
{'table_name': {'columns': [{'col_5': 'value_5', 'col_2': {'col_2_1': 'value_2_1', 'col_2_2': 'value_2_2'}, 'col_3': 'value_3', 'col_1': {'col_1_2': 'value_1_2', 'col_1_1': 'value_1_1'}, 'col_4': 'value_4'}], 'Flag': 1, 'Total': 1, 'Instruction': 'None', 'Rows': 1}}

Related

How to convert a string to python object?

So, I have got this piece of string:
"[{'id': 45, 'user_id': 2, 'cart_item_id': UUID('0fdc9e75-3d9c-4b89-912b-7058e1233432'), 'quantity': 1}]"
And i want to convert it to list of dict in python
Can anyone please help me out
A nice way would be using eval():
my_obj = eval(your_string)
A not very nice, but a bit safer way would be parsing it as json. But quotes have to be handled first:
my_obj = json.loads(your_string.replace("'", '"')
Option 1 has to be used with caution as eval will evaluate any python code and thus is prone to attacks.
Option 2 is ok, but a corner case with quotes is to be expected.
Try this
def Convert(string):
li = list(string.split("-"))
return li
# Driver code
str1 = "[{'id': 45, 'user_id': 2, 'cart_item_id': UUID('0fdc9e75-3d9c-4b89-912b-7058e1233432'), 'quantity': 1}]"
print(Convert(str1))
You can use eval to evaluate your string to object. Just make sure the classes used/present in the string are available in the local or global namespace.
from uuid import UUID
output = eval(
"""[{'id': 45, 'user_id': 2, 'cart_item_id': UUID('0fdc9e75-3d9c-4b89-912b-7058e1233432'), 'quantity': 1}]""",
)
Here is a way to do that:
string = "[{'id': 45, 'user_id': 2, 'cart_item_id': UUID('0fdc9e75-3d9c-4b89-912b-7058e1233432'), 'quantity': 1}]"
with open('your_file.py') as file:
data = file.read()
with open('your_file.py', 'w') as file:
file.write(f'list_ = {string}\n{data}')
this will literally add the list to the file.

Formatting string representation of structure to python dictionary

I need a little help processing a String to a Dict, considering that the String is not in a common format, but an output from a UDF function
The return from the PySpark UDF looks like the string below:
"{list=[{a=1}, {a=2}, {a=3}]}"
And I need to convert it to a python dictionary with the structure below:
{
"list": [
{"a": 1}
{"a": 2}
{"a": 3}
]
}
So I can access it's values, like
dict["list"][1]["a"]
I already tried using:
JSON.loads
ast_eval()
Could someone please help me?
As an example of how this unparsed string is generated:
#udf()
def execute_method():
return {"list": [{"a":1},{"b":1}{"c":1}]}
df_result = df_source.withColumn("result", execute_method())
By the very least you will need to replace = with : and surround keys with double quotes:
import json
import re
string = "{list=[{a=1}, {a=2}, {a=3}]}"
fixed_string = re.sub(r'(\w+)=', r'"\1":', string)
print(type(fixed_string), fixed_string)
parsed = json.loads(fixed_string)
print(type(parsed), parsed)
outputs
<class 'str'> {"list":[{"a":1}, {"a":2}, {"a":3}]}
<class 'dict'> {'list': [{'a': 1}, {'a': 2}, {'a': 3}]}
try this :
import re
import json
data="{list=[{a=1}, {a=2}, {a=3}]}"
data=data.replace('=',':')
pattern=[e.group() for e in re.finditer('[a-z]+', data, flags=re.IGNORECASE)]
for e in set(pattern):
data=data.replace(e,"\""+e+"\"")
print(json.loads(data))

The regular expression how to gets the string containing the content

I have the json data, which are as follows:
'[
{"max":0,"min":0,"name":"tom","age":18},
{"max":0,"min":0,"name":"jack","age":28},
.....
]'
Now that I know name=tom, how can I get the dict containing tom through regular expressions?Or there are other better ways?
like this
'{"max":0,"min":0,"name":"tom","age":18}'
Thank you very much!!
Assuming this is a list of dicts:
lst=[{"max":0,"min":0,"name":"tom","age":18},
{"max":0,"min":0,"name":"jack","age":28}]
Then
print(list(filter(lambda x:x["name"]=="tom",lst)))
Outputs
[{'max': 0, 'min': 0, 'name': 'tom', 'age': 18}]
You can deserialize json file with json module.
And then iterate over dictionaries in the usual way. Like this:
import json
from typing import Dict
with open("file.json", "r") as f:
data: Dict = json.load(f)
for d in data:
if d["name"] == "tom":
print(d)
Notice that your JSON file is malformed with unusual :.
Here is the correct one:
[
{
"max": 0,
"min": 0,
"name": "tom",
"age": 18
},
{
"max": 0,
"min": 0,
"name": "jack",
"age": 28
}
]
As others also mentioned your problem looks like it would be handier to use json.
But if you're limited to regex by whatever, here is a simple regex that would work:
{.+\"name\":\"tom\".+}
You should use the build-in json module to parse a Json file. The RegEx is not the best choice because the Json is a standard format and there are many parsers for it! You can see below how you can use the built-in Python Json parser module.
test.json:
[
{
"max":0,
"min":0,
"name":"tom",
"age":18
},
{
"max":0,
"min":0,
"name":"jack",
"age":28
}
]
Code:
import json
with open("test.json", "r") as opened_file:
load_data = json.load(opened_file)
for elem in load_data:
if elem["name"] == "tom":
print(elem)
Output:
>>> python3 test.py
{'max': 0, 'min': 0, 'name': 'tom', 'age': 18}
Another solution, without regex, using only strings (assuming s has the entire data):
idx = s.find('"name":"tom"') + len('"name":"tom"')
idx2 = s[:idx].rfind('{')
dict_json = s[idx2:idx] + s[idx:idx+s[idx:].find('}')+1]
dict_json will be {"max":0,"min":0,"name":"tom","age":18}

converting a string to a line in a json file in python

I am trying to write a json file. The json file should look something like this -
{
"roster":[
{"name":"Andy","age":11},
{"name":"Nathan","age":10},
{"name":"Amy","age":12}
],
"links":[
{"source":1,"target":0,"value":1},
{"source":2,"target":0,"value":8},
{"source":3,"target":0,"value":10}
]
}
I am trying to create the roster part of the json by running through a for loop. In each iteration I am trying to append a line to the json file as follows -
wf = open("abc.json", "w")
wf.write('{\n"Roster":[\n')
for example in data:
name = ----some code here ----
group = ----some code here ----
wf.write('{"name":"'+name+'","group":'+group+'},\n')
I am getting a typeError - str and int objects cannot be concatenated. I understand why I am getting that error. I was just wondering if there is a better way to do it.
Python is Strongly Typed which means:
s = "foo"
s += 123
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: cannot concatenate 'str' and 'int' objects
If you are adding strings (which btw are immutable) you need to "cast" the int to a str:
s = "foo"
s += str(123)
You are getting the error:
typeError - str and int objects cannot be concatenated
Because strings and integers cannot be concatenated.
Use str() to cast the integer to a string for concatenation instead.
wf.write('{"name":"'+name+'","group":'+str(group)+'},\n')
Or even better read up on string formatting in Python.
Or even more betterer read up on the Python JSON module.
Checking your string. You have to ensure string concatenation.
Try:
'{"name":"'+str(name)+'","group":'+str(group)+'},\n'
Or use json module to support you to output json file easily.
import json
output = {}
output['roster'] = [
{'name':'Andy', 'age':11},
{'name': 'Nathan', 'age': 10}
]
print(json.dumps(output))
Output:
{"roster": [{"name": "Andy", "age": 11}, {"name": "Nathan", "age": 10}]}
If you really want to write the json by hand, then the way to get around the concatenation error is to simply typecast the number to a string:
wf.write('{"name" : "'+str(name)+'","group":'+str(group)'},\n')
It would definitely be advisable to use the json module in python, it's a pretty great library.
For example, to get the json example you gave us:
{
"roster":[
{"name":"Andy","age":11},
{"name":"Nathan","age":10},
{"name":"Amy","age":12}
],
"links":[
{"source":1,"target":0,"value":1},
{"source":2,"target":0,"value":8},
{"source":3,"target":0,"value":10}
]
}
You could use the following:
import json
result = {}
result['roster'] = []
result['roster'].append({'name' : 'Andy', 'age' : 11})
result['roster'].append({'name' : 'Nathan', 'age' : 10})
result['roster'].append({'name' : 'Amy', 'age' : 12})
result['links'] = []
result['links'].append({'source' : 1, 'target' : 0, 'value' : 1})
...
(Obviously you would probably do this with a for loop instead of by hand in your program).
And then to convert this python dictionary to a json object you just use:
json.dumps(result)

How to parse json with ijson and python

I have JSON data as an array of dictionaries which comes as the request payload.
[
{ "Field1": 1, "Feld2": "5" },
{ "Field1": 3, "Feld2": "6" }
]
I tried ijson.items(f, '') which yields the entire JSON object as one single item. Is there a way I can iterate the items inside the array one by one using ijson?
Here is the sample code I tried which is yielding the JSON as one single object.
f = open("metadatam1.json")
objs = ijson.items(f, '')
for o in objs:
print str(o) + "\n"
[{'Feld2': u'5', 'Field1': 1}, {'Feld2': u'6', 'Field1': 3}]
I'm not very familiar with ijson, but reading some of its code it looks like calling items with a prefix of "item" should work to get the items of the array, rather than the top-level object:
for item in ijson.items(f, "item"):
# do stuff with the item dict

Categories

Resources