JSONDecodeError when using for loop in python [duplicate] - python

This question already has answers here:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
(24 answers)
Closed 1 year ago.
I've been trying to do some API queries to get some missing data in my DF. I'm using grequest library to send multiple request and create a list for the response object. Then I use a for loop to load the response in a json to retrieve the missing data. What I noticed is that when loading the data using .json() from the list directly using notition list[0].json() it works fine, but when trying to read the list and then load the response into a json, This error comes up : JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Here's my code :
import requests
import json
import grequests
ls = []
for i in null_data['name']:
url = 'https://pokeapi.co/api/v2/pokemon/' + i.lower()
ls.append(url)
rs = (grequests.get(u) for u in ls)
s = grequests.map(rs)
#This line works
print(s[0].json()['weight']/10)
for x in s:
#This one fails
js = x.json()
peso = js['weight']/10
null_data.loc[null_data['name'] == i.capitalize(), 'weight_kg'] = peso
<ipython-input-21-9f404bc56f66> in <module>
13
14 for x in s:
---> 15 js = x.json()
16 peso = js['weight']/10
17 null_data.loc[null_data['name'] == i.capitalize(), 'weight_kg'] = peso
JSONDecodeError: Expecting value: line 1 column 1 (char 0)

One (or more) elements are empty.
So:
...
for x in s:
if x != ""
js = x.json()
peso = js['weight']/10
null_data.loc[null_data['name'] == i.capitalize(), 'weight_kg'] = peso
...
or
...
for x in s:
try:
js = x.json()
peso = js['weight']/10
null_data.loc[null_data['name'] == i.capitalize(), 'weight_kg'] = peso
except json.JSONDecodeError as ex: print("Failed to decode(%s)"%ex)
...
The first checks if x is an empty string while the other tries to decode every one but upon an exception just prints an error message instead of quitting.

Related

Applying google_translator to a column of a pandas dataframe

I am trying the following. I am applying a detect and translate function to a column containing free text variable which are the professional role of some customers:
from langdetect import detect
from google_trans_new import google_translator
#simple function to detect and translate text
def detect_and_translate(text,target_lang='en'):
result_lang = detect(text)
if result_lang == target_lang:
return text
else:
translator = google_translator()
translate_text = translator.translate(text,lang_src=result_lang,lang_tgt=target_lang)
return translate_text
df_processed['Position_Employed'] = df_processed['Position_Employed'].replace({'0':'unknown', 0:'unknown'})
df_processed['Position_Employed'] = df_processed['Position_Employed'].apply(detect_and_translate)
But I am getting the following error;
JSONDecodeError: Extra data: line 1 column 433 (char 432)
I have tried to update the solution from this link but it did not work to editing line 151 in google_trans_new/google_trans_new.py which is: response = (decoded_line + ']') to response = decoded_line
Python google-trans-new translate raises error: JSONDecodeError: Extra data:
What can I do?

How do I combine several JSON API responses into a single variable/object?

I am pulling data in from an API that limits the number of records per request to 100. There are 7274 records in total and everything is returned as JSON.
I want to concatenate all 7274 records into a single variable/object and eventually export to a JSON file.
The response JSON objects are structured like this:
{"data":[{"key1":87,"key2":"Ottawa",..."key21":"ReggieWatts"}],"total":7274}
I just want the objects inside the "data" array so that the output looks like this:
{'key1': 87, 'key2': 'Ottawa', 'key21': 'ReggieWatts'},{'key1': 23, 'key2': 'Cincinnati', 'key21': 'BabeRuth'},...
I’ve tried without success to use the dict.update() method to concatenate the new values to a variable that’s collecting all the records.
I am getting this error:
ValueError: dictionary update sequence element #0 has length 21; 2 is required
Here’s the stripped down code.
import json
import time
import random
import requests
from requests.exceptions import HTTPError
api_request_limit = 100
# total_num_players = 7274
total_num_players = 201 # only using 201 for now so that we aren't hammering the api while testing
start_index = 0
base_api_url = "https://api.nhle.com/stats/rest/en/skater/bios?isAggregate=true&isGame=false&sort=[{%22property%22:%22playerId%22,%22direction%22:%22ASC%22}]&limit=100&factCayenneExp=gamesPlayed%3E=1&cayenneExp=gameTypeId=2%20and%20seasonId%3C=20202021%20and%20seasonId%3E=19171918&start="
player_data = {}
curr_data = {}
for curr_start_index in range(start_index, total_num_players, api_request_limit):
api_url = base_api_url + str(curr_start_index)
try:
response = requests.get(api_url)
# if successful, no exceptions
response.raise_for_status()
except HTTPError as http_err:
print(f'HTTP error occurred: {http_err}')
except Exception as err:
print(f'Other error occurred: {err}')
else:
# print('Success!')
curr_data = response.json()
player_data.update(curr_data['data'])
# player_data = {**player_data, **curr_data['data']} # Does not work either
# print(player_data)
# for i in curr_skaters['data']:
# print(str(i['firstSeasonForGameType']) + ": " + str(i['skaterFullName']) + " " + str(i['playerId']))
set_delay = (random.random() * 2) + 1
time.sleep(set_delay)
Should I be iterating through each of the 100 records individually to add them to player_data?
The ValueError implies that the issue is with the number of key:value pairs in each object which says to me I'm using the .update() method incorrectly here.
Thanks
If you want them all in a single dictionary which can be exported to a json file, you'll need to have unique keys for each response. Perhaps the following will accomplish what you want:
response0 = {"data":[{"key1":87,"key2":"Ottawa","key21":"ReggieWatts"}],"total":7274}
response1 = {"data":[{"key1":23,"key2":"Cincinnati","key21":"BabeRuth"}],"total":4555}
all_data = {}
for i, resp in enumerate([response0, response1]):
all_data[f'resp{i}'] = resp['data'][0]
This returns
all_data = {'resp0': {'key1': 87, 'key2': 'Ottawa', 'key21': 'ReggieWatts'},
'resp1': {'key1': 23, 'key2': 'Cincinnati', 'key21': 'BabeRuth'}}
Edit: I went for a dictionary object initially since I think it saves more naturally as json, but to get it as a python list, you can use the following:
all_data = []
for resp in [response0, response1]:
all_data.append(resp['data'][0])
Finally, this object is easily saveable as json:
import json
with open('saved_responses.json', 'w') as file:
json.dump(all_data, file)
I figured it out with a big thanks to William.
I also found this post very helpful.
https://stackoverflow.com/a/26853961/610406
Here's the fix I eventually landed on:
import json
import time
import random
import requests
from requests.exceptions import HTTPError
api_request_limit = 100
# total_num_players = 7274 # skaters
total_num_players = 201 # only using 201 for now so that we aren't hammering the api while testing
start_index = 0
base_api_url_skaters = "https://api.nhle.com/stats/rest/en/skater/bios?isAggregate=true&isGame=false&sort=[{%22property%22:%22playerId%22,%22direction%22:%22ASC%22}]&limit=100&factCayenneExp=gamesPlayed%3E=1&cayenneExp=gameTypeId=2%20and%20seasonId%3C=20202021%20and%20seasonId%3E=19171918&start="
player_data = [] # Needs to be a list.
curr_data = {}
for curr_start_index in range(start_index, total_num_players, api_request_limit):
api_url = base_api_url_skaters + str(curr_start_index)
try:
response = requests.get(api_url)
# if successful, no exceptions
response.raise_for_status()
except HTTPError as http_err:
print(f'HTTP error occurred: {http_err}')
except Exception as err:
print(f'Other error occurred: {err}')
else:
# print('Success!')
curr_data = response.json()
# *** >>>This line is what I needed! So simple in retrospect.<<< ***
player_data = [*player_data, *curr_data['data']]
set_delay = (random.random() * 3) + 1
time.sleep(set_delay)
print(f'Counter: {curr_start_index}. Delay: {set_delay}. Record Count: {len(player_data)}.')
with open('nhl_skaters_bios_1917-2021.json', 'w') as f:
json.dump(player_data,f)
As a Gist:
https://gist.github.com/sspboyd/68ec8f5c5cd15ee7467d4326e3b74111

ValueError: dictionary update sequence element #13 has length 1; 2 is required

I am getting the following error:
ValueError Traceback (most recent call last)
<ipython-input-19-ec485c9b9711> in <module>
31 except Exception as e:
32 print(e)
---> 33 raise e
34 print(i)
35 i = i+1
<ipython-input-19-ec485c9b9711> in <module>
21 # cc = dict(x.split(':') for x in c.split(','))
22 c = '"'.join(c)
---> 23 cc = dict(x.split(':') for x in c.split(','))
24 df_temp = pd.DataFrame(cc.items())
25 df_temp = df_temp.replace('"','',regex=True)
ValueError: dictionary update sequence element #13 has length 1; 2 is required
Below is the block which is throwing the error. I checked out some of the posts here but they are code specific. Not sure is it input issue or the code.
df_final = pd.DataFrame()
i=1
for file in files:
try:
s3 = session.resource('s3')
key = file
obj = s3.Object('my-bucket',key)
n = obj.get()['Body'].read()
gzipfile = BytesIO(n)
gzipfile = gzip.GzipFile(fileobj=gzipfile)
content = gzipfile.read()
content = content.decode('utf-8')
if len(content) > 0:
content = re.findall(r"(?<=\{)(.*?)(?=\})",content)
for c in content:
c= c.split('"')
for index,val in enumerate(c):
if index%2 == 1:
c[index] = val.replace(':','_').replace(',','_')
c = '"'.join(c)
cc = dict(x.split(':') for x in c.split(','))
df_temp = pd.DataFrame(cc.items())
df_temp = df_temp.replace('"','',regex=True)
df_temp = df_temp.T
new_header = df_temp.iloc[0] #grab the first row for the header
df_temp = df_temp[1:] #take the data less the header row
df_temp.columns = new_header
df_final = pd.concat([df_final, df_temp])
except Exception as e:
print(e)
raise e
print(i)
i = i+1
Can you share what is the issue here? This used to work fine before. Do I make a change or ignore the error?
My guess is that your data is malformed. I'm guessing that at some point, x.split(':') is producing a list with only one element in it because there is no : in x, the string being split. This leads, during the creation of a dictionary from this data, to a single value being passed when a pair of values (for "key" and "value") is expected.
I would suggest that you fire up your debugger and either let the debugger stop when it hits this error, or figure out when it happens and get to a point where you're just about to have the error occur. Then look at the data that's being or about to be processed in your debugger display and see if you can find this malformed data that is causing the problem. You might have to run a prep pass on the data to fix this problem and others like it before running the line that is throwing the exception.

Comparison of parsing data in Ruby using Nokogiri and Python using lxml and xpath

I am trying to convert Ruby syntax into its Python equivalent:
Here is the Ruby code:
collection_action :updateBackPlayers, :method => :get do
url="http://fantasyfootball.telegraph.co.uk/premierleague/select-team"
website = Nokogiri::HTML(open(url))
elements=website.xpath('//*[#id="list-GK"]/table/tr')
arr = []
elements.each do |row|
x=row.xpath('td')
name = x[0].text
club = x[1].text
#player=Player.find_by_name_and_club(name,club)
arr=x[7].text.split("|")
score=arr[1].split("/")
cards=arr[3].split("/")
clean_sheets = arr[2].split("/")
goals=arr[4]
weekly_points = arr[0].to_i - (#player.points || 0)
#player.update_attributes(:weekly_points => weekly_points, :points => arr[0].to_i, :value=>x[2].text.to_f, :games=>score[0].to_i, :part_appearances=>score[1], :goals=>goals.to_i )
#player.update_attributes(:yellows=>cards[0], :reds=>cards[1], :clean_sheets=> clean_sheets[0], :part_clean_sheets=>clean_sheets[1] )
##player.penalties_saved = arr[2]
end
This is of-course using Nokogiri, but I would like to do the same using lxml and XPath and then save this information in a Django model using SQLite3.
Here is the Python code.
from lxml.html import parser
url = "http://fantasyfootball.telegraph.co.uk/premierleague/select-team"
website = parse(url)
elements = website.xpath('//div[#id="list-GK"]//table/tr')
for row in elements:
x = row.xpath('td')
for z in x:
name = z[1].text
club = z[2].text
But this doesn´t work and I get an out of limit error. I think the problem lies in the elements.each do |row| isn´t completely the same as for row in elements: in Python.
I get the following error:
Traceback (most recent call last):
File "<stdin>", line 3, in <module>
IndexError: list index out of range
But if i do:
from lxml.html import parse
from fanaments.models import Player
url="http://fantasyfootball.telegraph.co.uk/premierleague/select-team"
page = parse(url)
elements = page.xpath('//div[#id="list-GK"]//table/tr')
for row in elements:
x = row.xpath('td')
for z in x:
print z.text
I get the following:
None
Hart, J
MCY
4.0
20
11.6%
None
Hart, Joe
MCY|£4.0m|5|0||20|0|0|0|4|0|0|3|0|0||||unknown|
MCY|£4.0m|39|0||140|1|0|0||0||18|0|1|||||
None
de Gea, D
MUN
3.9
15
6.4%
None
de Gea, David
MUN|£3.9m|5|0||15|0|0|0|6|0|0|2|0|0||||unknown|
MUN|£3.9m|33|0||99|0|0|0||0||11|0|0|||||
How can I manage to save the name as name and so forth?? Please help I have been stuck on this too long :)

Displaying better error message than "No JSON object could be decoded"

Python code to load data from some long complicated JSON file:
with open(filename, "r") as f:
data = json.loads(f.read())
(note: the best code version should be:
with open(filename, "r") as f:
data = json.load(f)
but both exhibit similar behavior)
For many types of JSON error (missing delimiters, incorrect backslashes in strings, etc), this prints a nice helpful message containing the line and column number where the JSON error was found.
However, for other types of JSON error (including the classic "using comma on the last item in a list", but also other things like capitalising true/false), Python's output is just:
Traceback (most recent call last):
File "myfile.py", line 8, in myfunction
config = json.loads(f.read())
File "c:\python27\lib\json\__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "c:\python27\lib\json\decoder.py", line 360, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "c:\python27\lib\json\decoder.py", line 378, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded
For that type of ValueError, how do you get Python to tell you where is the error in the JSON file?
I've found that the simplejson module gives more descriptive errors in many cases where the built-in json module is vague. For instance, for the case of having a comma after the last item in a list:
json.loads('[1,2,]')
....
ValueError: No JSON object could be decoded
which is not very descriptive. The same operation with simplejson:
simplejson.loads('[1,2,]')
...
simplejson.decoder.JSONDecodeError: Expecting object: line 1 column 5 (char 5)
Much better! Likewise for other common errors like capitalizing True.
You wont be able to get python to tell you where the JSON is incorrect. You will need to use a linter online somewhere like this
This will show you error in the JSON you are trying to decode.
You could try the rson library found here: http://code.google.com/p/rson/ . I it also up on PYPI: https://pypi.python.org/pypi/rson/0.9 so you can use easy_install or pip to get it.
for the example given by tom:
>>> rson.loads('[1,2,]')
...
rson.base.tokenizer.RSONDecodeError: Unexpected trailing comma: line 1, column 6, text ']'
RSON is a designed to be a superset of JSON, so it can parse JSON files. It also has an alternate syntax which is much nicer for humans to look at and edit. I use it quite a bit for input files.
As for the capitalizing of boolean values: it appears that rson reads incorrectly capitalized booleans as strings.
>>> rson.loads('[true,False]')
[True, u'False']
I had a similar problem and it was due to singlequotes. The JSON standard(http://json.org) talks only about using double quotes so it must be that the python json library supports only double quotes.
For my particular version of this problem, I went ahead and searched the function declaration of load_json_file(path) within the packaging.py file, then smuggled a print line into it:
def load_json_file(path):
data = open(path, 'r').read()
print data
try:
return Bunch(json.loads(data))
except ValueError, e:
raise MalformedJsonFileError('%s when reading "%s"' % (str(e),
path))
That way it would print the content of the json file before entering the try-catch, and that way – even with my barely existing Python knowledge – I was able to quickly figure out why my configuration couldn't read the json file.
(It was because I had set up my text editor to write a UTF-8 BOM … stupid)
Just mentioning this because, while maybe not a good answer to the OP's specific problem, this was a rather quick method in determining the source of a very oppressing bug. And I bet that many people will stumble upon this article who are searching a more verbose solution for a MalformedJsonFileError: No JSON object could be decoded when reading …. So that might help them.
As to me, my json file is very large, when use common json in python it gets the above error.
After install simplejson by sudo pip install simplejson.
And then I solved it.
import json
import simplejson
def test_parse_json():
f_path = '/home/hello/_data.json'
with open(f_path) as f:
# j_data = json.load(f) # ValueError: No JSON object could be decoded
j_data = simplejson.load(f) # right
lst_img = j_data['images']['image']
print lst_img[0]
if __name__ == '__main__':
test_parse_json()
I had a similar problem this was my code:
json_file=json.dumps(pyJson)
file = open("list.json",'w')
file.write(json_file)
json_file = open("list.json","r")
json_decoded = json.load(json_file)
print json_decoded
the problem was i had forgotten to file.close() I did it and fixed the problem.
The accepted answer is the easiest one to fix the problem. But in case you are not allowed to install the simplejson due to your company policy, I propose below solution to fix the particular issue of "using comma on the last item in a list":
Create a child class "JSONLintCheck" to inherite from class "JSONDecoder" and override the init method of the class "JSONDecoder" like below:
def __init__(self, encoding=None, object_hook=None, parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)
super(JSONLintCheck,self).__init__(encoding=None, object_hook=None, parse_float=None,parse_int=None, parse_constant=None, strict=True,object_pairs_hook=None)
self.scan_once = make_scanner(self)
make_scanner is a new function that used to override the 'scan_once' method of the above class. And here is code for it:
1 #!/usr/bin/env python
2 from json import JSONDecoder
3 from json import decoder
4 import re
5
6 NUMBER_RE = re.compile(
7 r'(-?(?:0|[1-9]\d*))(\.\d+)?([eE][-+]?\d+)?',
8 (re.VERBOSE | re.MULTILINE | re.DOTALL))
9
10 def py_make_scanner(context):
11 parse_object = context.parse_object
12 parse_array = context.parse_array
13 parse_string = context.parse_string
14 match_number = NUMBER_RE.match
15 encoding = context.encoding
16 strict = context.strict
17 parse_float = context.parse_float
18 parse_int = context.parse_int
19 parse_constant = context.parse_constant
20 object_hook = context.object_hook
21 object_pairs_hook = context.object_pairs_hook
22
23 def _scan_once(string, idx):
24 try:
25 nextchar = string[idx]
26 except IndexError:
27 raise ValueError(decoder.errmsg("Could not get the next character",string,idx))
28 #raise StopIteration
29
30 if nextchar == '"':
31 return parse_string(string, idx + 1, encoding, strict)
32 elif nextchar == '{':
33 return parse_object((string, idx + 1), encoding, strict,
34 _scan_once, object_hook, object_pairs_hook)
35 elif nextchar == '[':
36 return parse_array((string, idx + 1), _scan_once)
37 elif nextchar == 'n' and string[idx:idx + 4] == 'null':
38 return None, idx + 4
39 elif nextchar == 't' and string[idx:idx + 4] == 'true':
40 return True, idx + 4
41 elif nextchar == 'f' and string[idx:idx + 5] == 'false':
42 return False, idx + 5
43
44 m = match_number(string, idx)
45 if m is not None:
46 integer, frac, exp = m.groups()
47 if frac or exp:
48 res = parse_float(integer + (frac or '') + (exp or ''))
49 else:
50 res = parse_int(integer)
51 return res, m.end()
52 elif nextchar == 'N' and string[idx:idx + 3] == 'NaN':
53 return parse_constant('NaN'), idx + 3
54 elif nextchar == 'I' and string[idx:idx + 8] == 'Infinity':
55 return parse_constant('Infinity'), idx + 8
56 elif nextchar == '-' and string[idx:idx + 9] == '-Infinity':
57 return parse_constant('-Infinity'), idx + 9
58 else:
59 #raise StopIteration # Here is where needs modification
60 raise ValueError(decoder.errmsg("Expecting propert name enclosed in double quotes",string,idx))
61 return _scan_once
62
63 make_scanner = py_make_scanner
Better put the 'make_scanner' function together with the new child class into a same file.
Just hit the same issue and in my case the problem was related to BOM (byte order mark) at the beginning of the file.
json.tool would refuse to process even empty file (just curly braces) until i removed the UTF BOM mark.
What I have done is:
opened my json file with vim,
removed byte order mark (set nobomb)
save file
This resolved the problem with json.tool. Hope this helps!
When your file is created. Instead of creating a file with content is empty. Replace with:
json.dump({}, file)
You could use cjson, that claims to be up to 250 times faster than pure-python implementations, given that you have "some long complicated JSON file" and you will probably need to run it several times (decoders fail and report the first error they encounter only).

Categories

Resources