Error converting JSON string to Python dict using repr() - python

I have a sticky problem procesing the JSON string below. My goal is to convert the JSON string into a Python dict.
When I prefix my string with r, I can successfully convert the JSON string to a python dict. Since this strings comes from an API request, I require a different solution.
I have tried using repr() function (recommended) solution, but it doesn't work.
Not working but ideal
import json
s = '{"success":true,"dashboard":{"id":347,"name":"Revenue","layout_id":0,"layout":"{\"l\":[{\"dimensions\":{\"w\":4,\"h\":5,\"x\":0,\"y\":0,\"i\":\"0\",\"minW\":3,\"minH\":5,\"moved\":false,\"static\":false},\"widget\":{\"id\":9853,\"type\":\"chart\",\"report\":{\"id\":1407}}},{\"dimensions\":{\"w\":4,\"h\":10,\"x\":4,\"y\":0,\"i\":\"1\",\"minW\":3,\"minH\":5,\"moved\":false,\"static\":false},\"widget\":{\"id\":1935,\"type\":\"chart\",\"report\":{\"id\":1408}}},{\"dimensions\":{\"w\":4,\"h\":10,\"x\":8,\"y\":20,\"i\":\"2\",\"minW\":3,\"minH\":5,\"moved\":false,\"static\":false},\"widget\":{\"id\":2717,\"type\":\"chart\",\"report\":{\"id\":1409}}},{\"dimensions\":{\"w\":4,\"h\":10,\"x\":0,\"y\":5,\"i\":\"3\",\"minW\":3,\"minH\":5,\"moved\":false,\"static\":false},\"widget\":{\"id\":9831,\"type\":\"chart\",\"report\":{\"id\":1406}}},{\"dimensions\":{\"w\":3,\"h\":5,\"x\":8,\"y\":0,\"i\":\"4\",\"minW\":3,\"minH\":5,\"moved\":false,\"static\":false},\"widget\":{\"id\":3578,\"type\":\"summary\",\"report\":{\"id\":1414}}},{\"dimensions\":{\"w\":4,\"h\":10,\"x\":6,\"y\":10,\"i\":\"5\",\"minW\":3,\"minH\":5,\"moved\":false,\"static\":false},\"widget\":{\"id\":3125,\"type\":\"chart\",\"report\":{\"id\":1408}}}]}","defaultdash":0,"permission_id":1,"description":null,"created_by":2,"group_id":null,"field1":null,"fieldtype1":null,"field2":null,"fieldtype2":null,"field3":null,"fieldtype3":null,"field4":null,"fieldtype4":null,"field5":null,"fieldtype5":null,"field6":null,"fieldtype6":null,"field7":null,"fieldtype7":null,"field8":null,"fieldtype8":null,"field9":null,"fieldtype9":null,"field10":null,"fieldtype10":null,"field11":null,"fieldtype11":null,"field12":null,"fieldtype12":null,"field13":null,"fieldtype13":null,"field14":null,"fieldtype14":null,"field15":null,"fieldtype15":null,"field16":null,"fieldtype16":null,"field17":null,"fieldtype17":null,"field18":null,"fieldtype18":null,"created_at":"2022-12-05 09:43:09","updated_at":"2023-02-08 08:52:27","deleted_at":null}}'
s1 = repr(s)[1:-1]
s2 = json.loads(s1)
print(type(s2))
Working but not ideal
import json
s = r'{"success":true,"dashboard":{"id":347,"name":"Revenue","layout_id":0,"layout":"{\"l\":[{\"dimensions\":{\"w\":4,\"h\":5,\"x\":0,\"y\":0,\"i\":\"0\",\"minW\":3,\"minH\":5,\"moved\":false,\"static\":false},\"widget\":{\"id\":9853,\"type\":\"chart\",\"report\":{\"id\":1407}}},{\"dimensions\":{\"w\":4,\"h\":10,\"x\":4,\"y\":0,\"i\":\"1\",\"minW\":3,\"minH\":5,\"moved\":false,\"static\":false},\"widget\":{\"id\":1935,\"type\":\"chart\",\"report\":{\"id\":1408}}},{\"dimensions\":{\"w\":4,\"h\":10,\"x\":8,\"y\":20,\"i\":\"2\",\"minW\":3,\"minH\":5,\"moved\":false,\"static\":false},\"widget\":{\"id\":2717,\"type\":\"chart\",\"report\":{\"id\":1409}}},{\"dimensions\":{\"w\":4,\"h\":10,\"x\":0,\"y\":5,\"i\":\"3\",\"minW\":3,\"minH\":5,\"moved\":false,\"static\":false},\"widget\":{\"id\":9831,\"type\":\"chart\",\"report\":{\"id\":1406}}},{\"dimensions\":{\"w\":3,\"h\":5,\"x\":8,\"y\":0,\"i\":\"4\",\"minW\":3,\"minH\":5,\"moved\":false,\"static\":false},\"widget\":{\"id\":3578,\"type\":\"summary\",\"report\":{\"id\":1414}}},{\"dimensions\":{\"w\":4,\"h\":10,\"x\":6,\"y\":10,\"i\":\"5\",\"minW\":3,\"minH\":5,\"moved\":false,\"static\":false},\"widget\":{\"id\":3125,\"type\":\"chart\",\"report\":{\"id\":1408}}}]}","defaultdash":0,"permission_id":1,"description":null,"created_by":2,"group_id":null,"field1":null,"fieldtype1":null,"field2":null,"fieldtype2":null,"field3":null,"fieldtype3":null,"field4":null,"fieldtype4":null,"field5":null,"fieldtype5":null,"field6":null,"fieldtype6":null,"field7":null,"fieldtype7":null,"field8":null,"fieldtype8":null,"field9":null,"fieldtype9":null,"field10":null,"fieldtype10":null,"field11":null,"fieldtype11":null,"field12":null,"fieldtype12":null,"field13":null,"fieldtype13":null,"field14":null,"fieldtype14":null,"field15":null,"fieldtype15":null,"field16":null,"fieldtype16":null,"field17":null,"fieldtype17":null,"field18":null,"fieldtype18":null,"created_at":"2022-12-05 09:43:09","updated_at":"2023-02-08 08:52:27","deleted_at":null}}'
s1 = json.loads(s)
print(type(s1))

Related

How to preserve bytesobjects in a dataframe column through HTTP server in python3?

I have a pandas dataframe. The 'data' column contains bytes bojects (binary files)
df = pd.DataFrame({'file_hash' : [01ccba93f3647ca50..., 739b24dc0dfea....],
'data' : [b'x\x9cd\xbbuT\x1c\xc1\xb7-\xdc\xf8 A\x12\xdc..., b'x\x9c\xcc\xbaeTT\xdf\x1b?z\x08\t\xa5A%$\x15a...]})
Now, I am sending this through a http server
bytes_obj = zlib.compress((output.to_csv(index=False)).encode())
self.wfile.write(bytes_obj)
While I am able to the dataframe at client side,
response = requests.get(url)
response_bytes = response.content
response_dataframe = pd.read_csv(io.BytesIO(zlib.decompress(response_bytes)))
The bytes object is now strings like "b'x\x9cd\xbbuT\x1c\xc1\xb7-...". If I convert these strings, the become like b'b\'x\\x9cd\\xbbuT\\x1c\\xc1\\xb7
I tried many ways but just cannot get back the the exact bytes objects. I would really appreciate some suggestions.
Thanks
It feels stupid but I think I should mention the solution as it can save someone's time.
I just used ast.literal_eval function from the ast module.
Basically, liternal_eval converts a string which looks like a bytes object to real bytesobject. So, string"b'x\x9cd\xbbuT\x1c\xc1\xb7-..."becomes bytes object b'x\x9cd\xbbuT\x1c\xc1\xb7-...
In my case,
response_dataframe["data"] = response_dataframe["data"].apply(lambda x: ast.literal_eval(x))
and not the data column containts the bytes objects I needed.

Convert JSON Dict to Pandas Dataframe

I have what appears to be a very simple JSON dict I need to convert into a Pandas dataframe. The dict is being pulled in for me as a string which I have little control over.
{
"data": "[{'key1':'value1'}]"
}
I have tried the usual methods such as pd.read_json() and json_normalize() etc but can't seem to get it anywhere close. Has anyone a few different suggestions to try. I think ive see every error message python has at this stage.
It seems to me that your JSON data is improperly formatted. The double quotations around the brackets indicate that everything within those double quotes is a string. Essentially the data is considered a string and not an array of values. Remove the double quotes and to create an array in your JSON file.
{
"data": [{"key1":"value1"}]
}
This will create the array and allow your JSON to be properly parsed using your previous stated methods.
The example provided is a single key, but in general you can use pandas to load json and nested json with pd.json_normalize(yourjsonhere)

Is there a way to convert list of string formatted dictionary to a dataframe in Python?

I am practicing how to use beautifulsoup and currently in a pickle as I can't convert the results to a dataframe. Hope to get your help.
In this example, the page I want to scrape can be obtained using the following:
from bs4 import BeautifulSoup
import requests
import pandas as pd
page = requests.get("https://store.moncler.com/en-ca/women/autumn-winter/view-all-outerwear?tp=72010&ds_rl=1243188&gclid=EAIaIQobChMIpfDj9bjP5wIVlJOzCh0-9ghJEAAYASAAEgLuSfD_BwE&gclsrc=aw.ds", verify = False)
soup = BeautifulSoup(page.content, 'html.parser')
I have managed to isolate to the product section using the following code
test_class = []
for section_tag in soup.find_all('section', class_='search__products__shelf search__products__shelf--moncler'):
for test in section_tag.find_all('article'):
test_class.append(test.get('data-ytos-track-product-data'))
The result of this is a list of string-formatted dictionary which looks like the following:
['{"product_position":0,"product_title":"TREPORT","product_brand":"MONCLER","product_category":"3074457345616676837/3074457345616676843","product_micro_category":"Outerwear","product_micro_category_id":"3074457345616676843","product_macro_category":"OUTERWEAR","product_macro_category_id":"3074457345616676837","product_color_id":"Dark
blue","product_color":"Dark
blue","product_price":0.0,"product_discountedPrice":2530.0,"product_price_tf":"0","product_discountedPrice_tf":"2126.05","product_id":"1890828705323513","product_variant_id":"1890828705323514","list":"searchresult","product_quantity":1,"product_coupon":"","product_cod8":null,"product_cod10":null,"product_legacy_macro_id":"1012","product_legacy_micro_id":"1446","product_is_in_stock":true,"is_rsi_product":false,"rsi_product_tracking_url":""}',
'{"product_position":1,"product_title":"RIMAC","product_brand":"MONCLER","product_category":"3074457345616676837/3074457345616676854","product_micro_category":"Bomber
Jacket","product_micro_category_id":"3074457345616676854","product_macro_category":"OUTERWEAR","product_macro_category_id":"3074457345616676837","product_color_id":"Dark
blue","product_color":"Dark
blue","product_price":0.0,"product_discountedPrice":2340.0,"product_price_tf":"0","product_discountedPrice_tf":"1966.39","product_id":"5549023491788128","product_variant_id":"5549023491788129","list":"searchresult","product_quantity":1,"product_coupon":"","product_cod8":null,"product_cod10":null,"product_legacy_macro_id":"1012","product_legacy_micro_id":"4715","product_is_in_stock":true,"is_rsi_product":false,"rsi_product_tracking_url":""}',
My question is how to convert the result to a pandas dataframe from a list of string formatted dictionary like that?
I have tried to use the code below to start with
import ast
ast.literal_eval(test_class[1])
but to no avail (it gives me below error code).
ValueError: malformed node or string: <_ast.Name object at
0x000001985A976748>
The end result should store each key of the dictionary into columns in a Dataframe (ie. 'product_position','product_title','product_brand',etc)
Any help / guidance would be much appreciated.
Thanks.
Looks like the question really is about how to parse a string, not how to do something with pandas.
The list you have seem to contain simply valid json strings. You can convert them to python dict's using json.loads() from the standard lib. Of course if some strings are malformed that's another story, you'll have to google how to parse malformed jsons.
After getting a list of python dicts turning them into a DataFrame is trivial.
you can use json.loads and then instantiate pandas.DataFrame with the obtained list of dictionaries:
d = [json.loads(e) for e in data]
df = pd.DataFrame(d)

Python/Pandas: read nested JSON

I am reading a data table from an API that returns me the data in JSON format, and one of the columns is itself a JSON string. I succeed in creating a Pandas dataframe for the overall table, but in the process of reading it, double quotes in the JSON string get converted to single quotes, and I can't parse the nested JSON.
I can't provide a reproducible example, but here is the key code:
myResult = requests.get(myURL, headers = myHeaders).text
myDF = pd.read_json(myResult, orient = "records", dtype = {"custom": str}, encoding = "unicode_escape")
Where custom is the nested JSON string. Try as I might by setting the dtype and encoding arguments, I cannot force Pandas to preserve the double quotes in the string.
So what started off as:
"custom": {"Field1":"Value1","Field2":"Value2"}
gets into the dataframe as:
{'Field1':'Value1','Field2':'Value2'}
I found this question which suggests using a custom parser for read_csv - but I can't see that this option is available for read_json.
I found a few suggestions here but the only one I could try was manually replacing the double quotes - and this causes fresh errors because there are apostrophes contained within the nested field values themselves...
The JSON strings are formatted correctly within myResult so it's the parsing applied by read_json that's the problem. Is there any way to change that or do I need to find some other way of reading this in?

Python JSON dict to dataframe no rows

So after hours of unsuccesfull googling I finally decided to post this here.
I am trying to convert some data obtained by an API call to a
Pandas.DataFrame()
This is my code:
response = requests.get(url)
data_as_list = response.json()['data']
for dct in data_as_list:
json_df = pd.DataFrame.from_records(dct)
Unfortunately, the returned dataframe only contains the column names, but no row data at all, even though the dictionary has some. I already tried from_dict and pd.read_json() (after dumping it into a JSON string). But all of these had the same
result.
The data is a nested dictionary in JSON format and looks like this
You can make DataFrames From python lists (that contains dictionaries or lists (nested list)) like this code:
json_df = pd.DataFrame(data_as_list)
Do this,
pd.DataFrame(data_as_list)

Categories

Resources