JSONDecodeError: Extra data: Python - python

I am loading json from files using the code:
file = 'file_name'
obj_list = []
with open(file) as f:
for json_obj in f:
obj_list.append(loads(json_obj))
I get error:
JSONDecodeError: Extra data: line 1 column 21 (char 20)
All my files look like this but much larger.
{"some":"property2"}{"some":"property"}{"some":"property3"}
Is there a way to parse this in python for a large number of files?

Your json is not valid . It should be something like this
[{'some': 'property2'}, {'some': 'property'}, {'some': 'property3'}]

import json
with open(file, 'r') as f:
json_str = f'[{f.read()}]'
obj_list = json.loads(json_str)
Reading the content, adding [] to make it valid json, and then loading it with the json package.

Related

How to replace table tag with comma or semicolon in python list

When I post WordPress from a csv python list then need a WordPress table. Now how to remove [] and " with commas in my python list.
import csv
html_output = ''
names = []
#This 'my_list.csv' file have bunch line of data list
with open('my_list.csv', 'r') as data_file:
csv_data = csv.DictReader(data_file)
#Ingredients is header name
for line in csv_data:
if line['Ingredients'] == 'No Reward':
break
names.append(f"{line['Ingredients']}")
html_output += '\n<!-- wp:list -->'
for name in names:
html_output += f'\n\t <ul> <li>{name}</li> </ul>'
html_output += '\n<!-- /wp:list -->'
print(html_output)
When I post this python list in WordPress it gets an error and does not replace it appropriately in the HTML table.
Use ast.literal_eval() to parse the CSV column into a Python list. Then you can format the list as you like.
import ast
with open('my_list.csv', 'r') as data_file:
csv_data = csv.DictReader(data_file)
#Ingredients is header name
for line in csv_data:
if line['Ingredients'] == 'No Reward':
break
ingredients = ast.literal_eval(line['Ingredients'])
names.append(', '.join(ingredients))

Python API gets invalid json

I have just started using python a few days ago and get work out the JSON format.
I use requests to get JSON data by API. However, I get the wrong decoded JSON format (JSON validator finds errors).
webpage = 'https://parser-api.com/parser/arbitr_api/run.php'
API = 'cant post it' #
output_results = []
cases = ['А65-22925/2017']
for i in cases:
params = {'key':API, 'CaseNumber':i}
results = requests.get(webpage, params = params)
output_results.append(results.text)
print (output_results)
with open ('file_name_case.json', 'w', encoding='utf8') as wr:
wr.write (str(output_results))
that is the snippet of the response that I get, which is wrong:
['{"Cases":[{"CaseId":"998ecaef-3da8-45ab-9f56-90bfc3375e11","CaseNumber":"\\u041065-22925\\/2017","CaseType":"\\u0410","Thirds":[],"Plaintiffs":[{"Name":"\\u041e\\u041e\\u041e \\"\\u0412\\u0420-\\u041f\\u043b\\u0430\\u0441\\u0442\\", \\u0433.\\u041a\\u0430\\u0437\\u0430\\u043d\\u044c","Address":"421001, \\u0420\\u043e\\u0441\\u0441\\u0438\\u044f, \\u0433.\\u041a\\u0430\\u0437\\u0430\\u043d\\u044c, \\u0420\\u0422, \\u0443\\u043b.\\u0421.\\u0425\\u0430\\u043a\\u0438\\u043c\\u0430, \\u0434.60, \\u043e\\u0444\\u0438\\u0441 164","Id":"dc26df83-6361-4de0-bc93-8c20ae0a4417"}],"Respondents":[{"Name":"\\u0424\\u0435\\u0434\\u0435\\u0440\\u0430\\u043b\\u044c\\u043d\\u0430\\u044f \\u0422\\u0430\\u043c\\u043e\\u0436\\u0435\\u043d\\u043d\\u0430\\u044f \\u0441\\u043b\\u0443\\u0436\\u0431\\u0430 \\u041f\\u0440\\u0438\\u0432\\u043e\\u043b\\u0436\\u0441\\u043a\\u043e\\u0435 \\u0442\\u0430\\u043c\\u043e\\u0436\\u0435\\u043d\\u043d\\u043e\\u0435 \\u0443\\u043f\\u0440\\u0430\\u0432\\u043b\\u0435\\u043d\\u0438\\u0435 \\u0422\\u0430\\u0442\\u0430\\u0440\\u0441\\u0442\\u0430\\u043d\\u0441\\u043a\\u0430\\u044f \\u0442\\u0430\\u043c\\u043e\\u0436\\u043d\\u044f, \\u0433.\\u041a\\u0430\\u0437\\u0430\\u043d\\u044c","Address":"420094, \\u0420\\u043e\\u0441\\u0441\\u0438\\u044f, \\u0433.\\u041a\\u0430\\u0437\\u0430\\u043d\\u044c, \\u0420\\u0422, \\u0443\\u043b.\\u041a\\u043e\\u0440\\u043e\\u043b\\u0435\\u043d\\u043a\\u043e, \\u0434.56","Id":"4b21e3e9-9d0c-42ce-bbec-4e1615e34698"}]...
the right format suppose to be like this:
{"Cases":[{"CaseId":"998ecaef-3da8-45ab-9f56-90bfc3375e11","CaseNumber":"\u041065-22925\/2017","CaseType":"\u0410","Thirds":[],"Plaintiffs":[{"Name":"\u041e\u041e\u041e \"\u0412\u0420-\u041f\u043b\u0430\u0441\u0442\", \u0433.\u041a\u0430\u0437\u0430\u043d\u044c","Address":"421001, \u0420\u043e\u0441\u0441\u0438\u044f, \u0433.\u041a\u0430\u0437\u0430\u043d\u044c, \u0420\u0422, \u0443\u043b.\u0421.\u0425\u0430\u043a\u0438\u043c\u0430, \u0434.60, \u043e\u0444\u0438\u0441 164","Id":"dc26df83-6361-4de0-bc93-8c20ae0a4417"}],"Respondents":[{"Name":"\u0424\u0435\u0434\u0435\u0440\u0430\u043b\u044c\u043d\u0430\u044f \u0422\u0430\u043c\u043e\u0436\u0435\u043d\u043d\u0430\u044f \u0441\u043b\u0443\u0436\u0431\u0430 \u041f\u0440\u0438\u0432\u043e\u043b\u0436\u0441\u043a\u043e\u0435 \u0442\u0430\u043c\u043e\u0436\u0435\u043d\u043d\u043e\u0435 \u0443\u043f\u0440\u0430\u0432\u043b\u0435\u043d\u0438\u0435 \u0422\u0430\u0442\u0430\u0440\u0441\u0442\u0430\u043d\u0441\u043a\u0430\u044f \u0442\u0430\u043c\u043e\u0436\u043d\u044f, \u0433.\u041a\u0430\u0437\u0430\u043d\u044c","Address":"420094, \u0420\u043e\u0441\u0441\u0438\u044f, \u0433.\u041a\u0430\u0437\u0430\u043d\u044c, \u0420\u0422, \u0443\u043b.\u041a\u043e\u0440\u043e\u043b\u0435\u043d\u043a\u043e, \u0434.56","Id":"4b21e3e9-9d0c-42ce-bbec-4e1615e34698"}]
Please help
You can do something like this:
import json
with open('file_name_case.json', 'w', encoding='utf-8') as f:
json.dump(data, f, ensure_ascii=False, indent=4)
Although Python syntax is very similar in many ways to JSON syntax it is not valid JSON to just str(<some python object>). You need to use the json module to write JSON.
Instead of taking results.text directly, use results.json() to decode the JSON response from the server.
Now you have a Python list of dicts as opposed to a Python list of strings containing JSON.
Then, as in kup's answer you can convert back to JSON:
output_results = []
for idx in cases:
params = {'key': API, 'CaseNumber': idx}
results = requests.get(webpage, params=params)
output_results.append(results.json())
with open('file_name_case.json', 'w') as fobj:
json.dump(output_results, fobj)

Saving print output as dict or JSON

I have the following code which utilises boto3 for AWS.
import boto3
from trp import Document
# Document
s3BucketName = "bucket"
documentName = "doc.png"
# Amazon Textract client
textract = boto3.client('textract')
# Call Amazon Textract
response = textract.analyze_document(
Document={
'S3Object': {
'Bucket': s3BucketName,
'Name': documentName
}
},
FeatureTypes=["FORMS"])
#print(response)
doc = Document(response)
for page in doc.pages:
# Print fields
print("Fields:")
for field in page.form.fields:
print("Key: {}, Value: {}".format(field.key, field.value))
I am trying to save the output of that function as dict, JSON, or CSV, but I am not an experienced python programmer yet.
I tried this:
key_map = {}
filepath = 'output.txt'
with open(filepath) as fp:
line = fp.readline()
cnt = 1
while line:
for page in doc.pages:
# Print fields
print("Fields:")
for field in page.form.fields:
#print("Key: {}, Value: {}".format(field.key, field.value))
key_map[str(field.key, field.value)] = cnt
line = fp.readline()
cnt +=1
But I don't think that this solution is working. Any tips on how to save the output of that for loop as a JSON?
If you want as a csv output, you can use csv module as:
import csv
doc = Document(response)
with open('aws_doc.csv', mode='w') as aws_field_file:
field_write = csv.writer(aws_field_file, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
for page in doc.pages:
for field in page.form.fields:
# This will write it as your <key>, <value>
field_write.writerow([field.key, field.value])
In case you want headers in the file you can also use the DictWriter which would make it easy for you to just pass a dictionary:
https://docs.python.org/3.4/library/csv.html#csv.DictWriter

Python JSONDecodeError: Expecting value: line 1 column 1

I got an error : JSONDecodeError: Expecting value: line 1 column 1 (char 0). But don't understand why.
Here is my code :
import json
import urllib.request
url = "apiurl"
data = json.loads(url)
# Open the URL as Browser, not as python urllib
page = urllib.request.Request(url,headers={'User-Agent': 'Mozilla/5.0'})
infile = urllib.request.urlopen(page).read()
data = infile.decode('ISO-8859-1') # Read the content as string decoded with ISO-8859-1
command_obj = {x['command']: x for x in data}
with open('new_command.json', 'w') as f:
json.dump(command_obj, f, indent=2)
With this fonction, i'm just trying to fetch data from an api and modify its format. Thanks for your help
You're trying to read the URL itself (and not its content) as JSON:
data = json.loads(url)
... instead you want to read the content returned from the API as JSON:
# Open the URL as Browser, not as python urllib
page = urllib.request.Request(url,headers={'User-Agent': 'Mozilla/5.0'})
infile = urllib.request.urlopen(page).read()
data = infile.decode('ISO-8859-1')
# avoid re-using `data` variable name
json_data = json.loads(data)
However, be aware that JSON should always be returned as UTF-8, never as ISO-8859-1 / latin-1.

Save every scraped element in a loop to a json file

I am web scraping one website. When I scrape one URL, I write it to a dict. What I want to do is to write every dictionary to a json file. When I do the following loop, the file is saved as not a list, but as this structure {} {} that is not readable.
df_price_m = {}
with open(r"C:\Users\USER\Desktop\diploma\information.json", 'w', encoding='utf8') as fout:
row = 0
for url in data:
row +=1
driver.get(url)
user_name_xpath = "//h1[#itemprop='name' and #data-shmid='profilePrepName']"
user_name = get_elements(user_name_xpath)
user_about_xpath = "//*[#class='desktop-profile-page__about-text']"
user_about = get_elements(user_about_xpath)
df_info['id'] = url
df_info['user_name'] = user_name[0]
df_info['user_about'] = user_about[0]
json.dump(df_price_m, fout, ensure_ascii=False)
I get the folowing json:
{"id": "www.aina.com", user_name: "Aina Nurma", "user_about": "I am a student"}
{"id": "www.aina.ru", user_name: "Aina Nur", "user_about": "I am a teacher"}
Looks like you're missing some code but I'd suggest saving all of data as a list of dicts and then dumping it at the end rather than dumping to file having processed just one url

Categories

Resources