Loading python string with 'u' as json - python

I have a string in the following strings
json_string = '{u"favorited": false, u"contributors": null}'
json_string1 = '{"favorited": false, "contributors": null}'
The following json load works fine.
json.loads(json_string1 )
But, the following json load give me value error, how to fix this?
json.loads(json_string)
ValueError: Expecting property name: line 1 column 2 (char 1)

I faced the same problem with strings I received from a customer. The strings arrived with u's. I found a workaround using the ast package:
import ast
import json
my_str='{u"favorited": false, u"contributors": null}'
my_str=my_str.replace('"',"'")
my_str=my_str.replace(': false',': False')
my_str=my_str.replace(': null',': None')
my_str = ast.literal_eval(my_str)
my_dumps=json.dumps(my_str)
my_json=json.loads(my_dumps)
Note the replacement of "false" and "null" by "False" and "None", since the literal_eval only recognizes specific types of Python literal structures. This means that if you may need more replacements in your code - depending on the strings you receive.

You could remove the u suffix from the string using REGEX and then load the JSON
s = '{u"favorited": false, u"contributors": null}'
json_string = re.sub('(\W)\s*u"',r'\1"', s)
json.loads(json_string )

Use json.dumps to convert a Python dictionary to a string, not str. Then you can expect json.loads to work:
Incorrect:
>>> D = {u"favorited": False, u"contributors": None}
>>> s = str(D)
>>> s
"{u'favorited': False, u'contributors': None}"
>>> json.loads(s)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "D:\dev\Python27\lib\json\__init__.py", line 339, in loads
return _default_decoder.decode(s)
File "D:\dev\Python27\lib\json\decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "D:\dev\Python27\lib\json\decoder.py", line 380, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting property name: line 1 column 2 (char 1)
Correct:
>>> D = {u"favorited": False, u"contributors": None}
>>> s = json.dumps(D)
>>> s
'{"favorited": false, "contributors": null}'
>>> json.loads(s)
{u'favorited': False, u'contributors': None}

Related

How to pull json data from HTML into a python dictionary?

I am trying to extract structured data within a json statement inside a html page. Therefore I retrieved the html and got the json via xpath:
json.loads(response.xpath('//*[#id="product"]/script[2]/text()').extract_first())
The data starts like this:
response.xpath('//*[#id="product"]/script[2]/text()').extract_first()
"\r\ndataLayer.push({\r\n\t'event': 'EECproductDetailView',\r\n\t'ecommerce': {\r\n\t\t'detail': {\r\n\r\n\t\t\t'products': [{\r\n\t\t\t\t'id': '14171171',\r\n\t\t\t\t'name': 'Gingium 120mg',\r\n\t\t\t\t'price': '27.9',\r\n\r\n\t\t\t\t'brand': 'Hexal AG',\r\n\r\n\r\n\t\t\t\t'variant': 'Filmtabletten, 60 Stück, N2',\r\n\r\n\r\n\t\t\t\t'category': 'gedaechtnis-konzentration'\r\n\t\t\t}]\r\n\t\t}\r\n\t}\r\n});\r\n"
Sample structured json:
<script>
dataLayer.push({
'event': 'EECproductDetailView',
'ecommerce': {
'detail': {
'products': [{
'id': '14122171',
'name': 'test',
'price': '27.9'
}]
}
}
});
</script>
The error message is:
>>> json.loads(response.xpath('//*[#id="product"]/script[2]/text()').extract_first())
Traceback (most recent call last):
File "<console>", line 1, in <module>
File "/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
return _default_decoder.decode(s)
File "/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/Cellar/python/3.7.1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 355, in raw_decode
raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 2 column 1 (char 2)
I also tried to decode:
>>> json.loads(response.xpath('//*[#id="product"]/script[2]/text()').extract_first().decode("utf-8"))
Traceback (most recent call last):
File "<console>", line 1, in <module>
AttributeError: 'str' object has no attribute 'decode'
>>>
How can I pull the product data into a python dictionary?
Many issues exist in your approach that I will discuss them below. You want to parse the value passed to push function as json and you have this as input:
dataLayer.push({
'event': 'EECproductDetailView',
'ecommerce': {
'detail': {
'products': [{
'id': '14122171',
'name': 'test',
'price': '27.9'
}]
}
}
});
Issues:
This data is raw. You shouldn't pass it directly to json.loads, to resolve this try to grab {'event' .... } from your string via regex or some string interpolation. For example if your data format is always like this and other javascripts are not defined in scope via {} then grab the index of first { and last } and do substring to get the main data.
This data contains ' as string indicators, but json standard use double quotes ". You should take care of replacing them as well.
After resolving issues you can use json.loads to parse your input.

How to deserialize a string which has a quote in a string value?

I have the following string I need to deserialize:
{
"bw": 20,
"center_freq": 2437,
"channel": 6,
"essid": "DIRECT-sB47" Philips 6198",
"freq": 2437
}
This is almost a correct JSON, except for the quote in the value DIRECT-sB47" Philips 6198 which prematurely ends the string, breaking the rest of the JSON.
Is there a way to deserialize elements which have the pattern
"key": "something which includes a quote",
or should I try to first pre-process the string with a regex to remove that quote (I do not care about it, nor about any other weird characters in the keys or values)?
UPDATE: sorry for not posting the code (it is a standard deserialization via json). The code is also available at repl.it
import json
data = '''
{
"bw": 20,
"center_freq": 2437,
"channel": 6,
"essid": "DIRECT-sB47" Philips 6198",
"freq": 2437
}
'''
trans = json.loads(data)
print(trans)
The traceback:
Traceback (most recent call last):
File "main.py", line 12, in <module>
trans = json.loads(data)
File "/usr/local/lib/python3.6/json/__init__.py", line 354, in loads
return _default_decoder.decode(s)
File "/usr/local/lib/python3.6/json/decoder.py", line 339, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/local/lib/python3.6/json/decoder.py", line 355, in raw_decode
obj, end = self.scan_once(s, idx)
json.decoder.JSONDecodeError: Expecting ',' delimiter: line 6 column 26 (char 79)
The same code without the quote works fine
import json
data = '''
{
"bw": 20,
"center_freq": 2437,
"channel": 6,
"essid": "DIRECT-sB47 Philips 6198",
"freq": 2437
}
'''
trans = json.loads(data)
print(trans)
COMMENT: I realize that the provider of the JSON should fix their code (I opened a bug report with them). In the meantime, until the bug is fixed (if it is) I would like to try a workaround.
I ended up analyzing the exception which includes the place of the faulty character, removing it and deserializing again (in a loop).
Worst case the whole data string is swallowed, which in my case is better than crashing.
import json
import re
data = '''
{
"bw": 20,
"center_freq": 2437,
"channel": 6,
"essid": "DIRECT-sB47" Philips 6198",
"freq": 2437
}
'''
while True:
try:
trans = json.loads(data)
except json.decoder.JSONDecodeError as e:
s = int(re.search(r"\.*char (\d+)", str(e)).group(1))-2
print(f"incorrect character at position {s}, removing")
data = data[:s] + data[(s + 1):]
else:
break
print(trans)

Python json prase ValueError: No JSON object could be decoded [duplicate]

This question already has answers here:
Single vs double quotes in JSON
(12 answers)
Closed 4 years ago.
I'm acctually having trouble with an json string an python. I've written a script that use find-my-iphone python module this module give me this string at the output
{u'locationType': u'', u'altitude': 0.0, u'locationFinished': True, u'longitude': 7.340714223689717, u'positionType': u'GPS', u'floorLevel': 0, u'timeStamp': 1497518502892L, u'latitude': 47.81268700030429, u'isOld': False, u'isInaccurate': False, u'verticalAccuracy': 0.0, u'horizontalAccuracy': 50.0}
After a bit of prosesing with:
loc = api.devices[deviceID].location()
locstr = str(loc).replace("u'",'"').replace("'",'"') #.replace("}","")
I obtain a string that look like this:
{"locationType": "", "altitude": 0.0, "locationFinished": False, "longitude": 7.340450948111099, "positionType": "GPS", "floorLevel": 0, "timeStamp": 1497518436368L, "latitude": 47.81275740829093, "isOld": False, "isInaccurate": False, "verticalAccuracy": 0.0, "horizontalAccuracy": 100.0}
There is my code:
`
from pyicloud import PyiCloudService
from geopy.distance import vincenty
import json
import sys
api = PyiCloudService('*****.*****#free.fr','******')
deviceID = u"Qo+Jyvct3IIl7N3MXrz6LfDvm8qjDCHjkedOvse1mhzWf1sikvSFQOHYVNSUzmWV" # Needed
deviceNAME = "<AppleDevice(iPhone 5s: David Smartphone)>" # Just an help
api.devices[deviceID].location()
api.devices[deviceID].status()
loc = api.devices[deviceID].location()
locstr = str(loc).replace("u'",'"').replace("'",'"') #.replace("}","")
But when I try to use
json.loads(locstr)
Python give me :
Traceback (most recent call last): File "distancePAPA.py", line 19,
in
t = json.loads(locstr) File "/usr/lib/python2.7/json/init.py", line 339, in loads
return _default_decoder.decode(s) File "/usr/lib/python2.7/json/decoder.py", line 364, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end()) File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
raise ValueError("No JSON object could be decoded") ValueError: No JSON object could be decoded
I don't really know what I did wrong so I'am asking fo help
PS1: I just really need gps coordinate
PS2: I'm french so sorry for mistake.
import json
>>> loc = api.devices[deviceID].location()
>>> locstr = json.dumps(loc)
'{"horizontalAccuracy": 50.0, "floorLevel": 0, "isOld": false, "isInaccurate": false, "verticalAccuracy": 0.0, "timeStamp": 1497518502892, "altitude": 0.0, "locationFinished": true, "longitude": 7.340714223689717, "positionType": "GPS", "locationType": "", "latitude": 47.81268700030429}'
>>> json.loads(locstr)
{u'timeStamp': 1497518502892, u'altitude': 0.0, u'locationFinished': True, u'longitude': 7.340714223689717, u'horizontalAccuracy': 50.0, u'floorLevel': 0, u'locationType': u'', u'latitude': 47.81268700030429, u'isOld': False, u'isInaccurate': False, u'verticalAccuracy': 0.0, u'positionType': u'GPS'}
Hi use the module ast,
import ast
loc_str = ast.literal_eval(str(loc))
Json does not support False and the postfix L in 1497518436368L; To make it regular, we need to convert it again
s = locstr.replace('False', 'false').replace('L,', ',')
And now, we could use json.loads(s)

Json error : "end is out of bounds"?

I'm having an issue I can't understand. I sending a json data as string via Redis (as a queue) and the receiver is throwing the following error :
[ERROR JSON (in queue)] - {"ip": null, "domain": "somedomain.com", "name": "Some user name", "contact_id": 12345, "signature":
"6f496a4eaba2c1ea4e371ea2c4951ad92f41ddf45ff4949ffa761b0648a22e38"} => end is out of bounds
The code that throws the exception is the following :
try:
item = json.loads(item[1])
except ValueError as e:
sys.stderr.write("[ERROR JSON (in queue)] - {1} => {0}\n".format(str(e), str(item)))
return None
What is really odd, is that if I open a python console and do the following :
>>> import json
>>> s = '{"ip": null, "domain": "somedomain.com", "name": "Some user name", "contact_id": 12345, "signature": "6f496a4eaba2c1ea4e371ea2c4951ad92f41ddf45ff4949ffa761b0648a22e38"}'
>>> print s
I have no issue, the string (copy/pasted in the Python console) yield no errors at all, but my original code is throwing one!
Do you have any idea about what is causing the issue?
You are loading item[1], which is the second character of the string items:
>>> json.loads('"')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: end is out of bounds
You should write:
item = json.loads(item)

Python json.loads ValueError, expecting delimiter

I am extracting a postgres table as json. The output file contains lines like:
{"data": {"test": 1, "hello": "I have \" !"}, "id": 4}
Now I need to load them in my python code using json.loads, but I get this error:
Traceback (most recent call last):
File "test.py", line 33, in <module>
print json.loads('''{"id": 4, "data": {"test": 1, "hello": "I have \" !"}}''')
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/__init__.py", line 338, in loads
return _default_decoder.decode(s)
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 365, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/System/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/json/decoder.py", line 381, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 50 (char 49)
I figured out the fix is to add another \ to \". So, if I pass
{"data": {"test": 1, "hello": "I have \\" !"}, "id": 4}
to json.loads, I get this:
{u'data': {u'test': 1, u'hello': u'I have " !'}, u'id': 4}
Is there a way to do this without adding the extra \? Like passing a parameter to json.loads or something?
You can specify so called “raw strings”:
>>> print r'{"data": {"test": 1, "hello": "I have \" !"}, "id": 4}'
{"data": {"test": 1, "hello": "I have \" !"}, "id": 4}
They don’t interpret the backslashes.
Usual strings change \" to ", so you can have " characters in strings that are themselves limited by double quotes:
>>> "foo\"bar"
'foo"bar'
So the transformation from \" to " is not done by json.loads, but by Python itself.
Try this:
json.loads(r'{"data": {"test": 1, "hello": "I have \" !"}, "id": 4}')
If you have that string inside a variable, then just:
json.loads(data.replace("\\", r"\\"))
Hope it helps!
Try using triple quotes r""", no need to consider the \ thing.
json_string = r"""
{
"jsonObj": []
}
"""
data = json.loads(json_string)
Try the ways source.replace('""', '') or sub it, cause "" in the source will make json.loads(source) can not distinguish them.
for my instance, i wrote:
STRING.replace("': '", '": "').replace("', '", '", "').replace("{'", '{"').replace("'}", '"}').replace("': \"", '": "').replace("', \"", '", "').replace("\", '", '", "').replace("'", '\\"')
and works like a charm.

Categories

Resources