I am getting the above error for the last two lines, and I am not sure what this error means. I am trying to get the "pretty" print in the output. Could someone please provide some insight about this error? What have I missed?
import json
import urllib
serviceurl = 'http://python-data.dr-chuck.net/comments_42.json'
while True:
url = serviceurl + urllib.urlencode({'sensor':'false', 'address': 'address'})
print "Retrieving", url
uh = urllib.urlopen(url)
data = uh.read()
print "Retrieved", len(data), "characters"
try: js = json.loads(str(data))
except: js = None
js = js["comments"][0]["count"]
print js.dumps(js, indent = 4)
print sum(js.get('count', 0) for js in js['comments'])
JSON data looks like this:
{
comments: [
{
name: "Matthias"
count: 97
},
{
name: "Geomer"
count: 97
}
...
]
}
You assigned an integer to the variable js on the line before:
js = js["comments"][0]["count"]
You can't then call js.dumps(), because int objects have no such method. Perhaps you wanted to call json.dumps() instead?
print json.dumps(js, indent = 4)
You'll have more problems however, as you replaced js with an int you also can't loop over it here:
print sum(js.get('count', 0) for js in js['comments'])
Perhaps you should use a different variable for the count; count would work, and comment in the loop:
count = js["comments"][0]["count"]
print json.dumps(js, indent = 4)
print sum(comment.get('count', 0) for comment in js['comments'])
Related
How can I print only .jpg/.png urls from API using json?
`
import requests
import json
r = requests.get("https://random.dog/woof.json")
print("Kod:", r.status_code)
def jprint(obj):
text = json.dumps(obj, sort_keys=True, indent=4)
print(text)
jprint(r.json())
`
Results:
`
{
"fileSizeBytes": 78208,
"url": "https://random.dog/24141-29115-27188.jpg"
}
`
I tried .endswith() but without any success.
I'm beginner.
This example will use str.endswith to check, if dictionary value under the key url contains the string .jpg or .png.
I'm doing 10 requests and using time.sleep to wait 1 second between each request:
import requests
from time import sleep
url = "https://random.dog/woof.json"
for i in range(10):
print("Attempt {}".format(i + 1))
data = requests.get(url).json()
if data["url"].endswith((".jpg", ".png")):
print(data["url"])
sleep(1)
Prints (for example, data returned from the server is random):
Attempt 1
https://random.dog/aa8e5e24-5c58-4963-9809-10f4aa695cfc.jpg
Attempt 2
https://random.dog/5b2a4e74-58da-4519-a67b-d0eed900b676.jpg
Attempt 3
https://random.dog/56217498-0e6b-4c24-bdd1-cc5dbb2201bb.jpg
Attempt 4
https://random.dog/l6CIQaS.jpg
Attempt 5
Attempt 6
Attempt 7
https://random.dog/3b5eae93-b3bd-4012-b789-64eb6cdaac65.png
Attempt 8
https://random.dog/oq9izk0057hy.jpg
Attempt 9
Attempt 10
Try to use .lower() as endswith is case sensitive that's why when you use .endswith() it's not working. because some data it has upper case file format.
def jprint(obj):
text = json.dumps(obj, sort_keys=True, indent=4)
if 'url' in obj and obj['url'] and obj['url'].lower().endswith((".jpg", ".png")):
print(obj['url'])
This code should do it:
import requests
import json
r = requests.get("https://random.dog/woof.json")
print("Kod:", r.status_code)
def jprint(obj):
text = json.dumps(obj, sort_keys=True, indent=4)
print(text)
jprint(r.json())
to_remove = []
for key, item in r.json().items():
try:
if not (item.endswith(".png") or item.endswith(".gif") or item.endswith(".jpg") or item.endswith(".mp4")):
to_remove.append([key, item]) # The key is useless- delete it later
except AttributeError: # the value is not a string!
to_remove.append([key, item]) # The key is useless- delete it later
new_r = {}
for key, item in r.json().items():
if not [key, item] in to_remove:
new_r[key] = item # item was not in to_remove, so add it to the new filtered dict.
jprint(new_r)
Explanation
After we get the request, we loop through the r.json() dict, and if the value is either an int or does not end with a photo/video extension, add the key and value to an array. Then create a new dict, and store all values from the original dictionary (r) which are not mentioned in the array.
Example Output
Kod: 200
{
"fileSizeBytes": 2896573,
"url": "https://random.dog/2fc649b9-f688-4e65-a1a7-cdbc6228e0c0.mp4"
}
{
"url": "https://random.dog/2fc649b9-f688-4e65-a1a7-cdbc6228e0c0.mp4"
}
This question already has answers here:
What exactly do "u" and "r" string prefixes do, and what are raw string literals?
(7 answers)
Closed 2 years ago.
My current Cloud Run URL returns a long string, matching the exact format as described here.
When I run the following code in Google Apps Script, I get a Log output of '1'. What happens, is the entire string is put in the [0][0] position of the data array instead of actually being parsed.
function myFunction() {
const token = ScriptApp.getIdentityToken();
const options = {
headers: {'Authorization': 'Bearer ' + token}
}
var responseString = UrlFetchApp.fetch("https://*myproject*.a.run.app", options).getContentText();
var data = Utilities.parseCsv(responseString, '\t');
Logger.log(data.length);
}
My expected output is a 2D array as described in the aforementioned link, with a logged output length of 18.
I have confirmed the output of my response by:
Logging the responseString
Copying the output log into a separate var -> var temp = "copied-output"
Changing the parseCsv line to -> var data = Utilities.parseCsv(temp, '\t')
Saving and running the new code. This then outputs a successful 2D array with a length of 18.
So why is it, that my current code doesn't work?
Happy to try anything because I am out of ideas.
Edit: More information below.
Python script code
#app.route("/")
def hello_world():
# Navigate to webpage and get page source
driver.get("https://www.asxlistedcompanies.com/")
soup = BeautifulSoup(driver.page_source, 'html.parser')
# ##############################################################################
# Used by Google Apps Script to create Arrays
# This creates a two-dimensional array of the format [[a, b, c], [d, e, f]]
# var csvString = "a\tb\tc\nd\te\tf";
# var data = Utilities.parseCsv(csvString, '\t');
# ##############################################################################
long_string = ""
limit = 1
for row in soup.select('tr'):
if limit == 20:
break
else:
tds = [td.a.get_text(strip=True) if td.a else td.get_text(strip=True) for td in row.select('td')]
count = 0
for column in tds:
if count == 4:
linetext = column + r"\n"
long_string = long_string+linetext
else:
text = column + r"\t"
long_string = long_string+text
count = count+1
limit = limit+1
return long_string
GAS Code edited:
function myFunction() {
const token = ScriptApp.getIdentityToken();
const options = {
headers: {'Authorization': 'Bearer ' + token}
}
var responseString = UrlFetchApp.fetch("https://*myfunction*.a.run.app", options).getContentText();
Logger.log("The responseString: " + responseString);
Logger.log("responseString length: " + responseString.length)
Logger.log("responseString type: " + typeof(responseString))
var data = Utilities.parseCsv(responseString, '\t');
Logger.log(data.length);
}
GAS logs/output as requested:
6:17:11 AM Notice Execution started
6:17:22 AM Info The responseString: 14D\t1414 Degrees Ltd\tIndustrials\t21,133,400\t0.001\n1ST\t1ST Group Ltd\tHealth Care\t12,738,500\t0.001\n3PL\t3P Learning Ltd\tConsumer Discretionary\t104,613,000\t0.005\n4DS\t4DS Memory Ltd\tInformation Technology\t58,091,300\t0.003\n5GN\t5G Networks Ltd\t\t82,746,600\t0.004\n88E\t88 Energy Ltd\tEnergy\t42,657,800\t0.002\n8CO\t8COMMON Ltd\tInformation Technology\t11,157,900\t0.001\n8IH\t8I Holdings Ltd\tFinancials\t35,814,200\t0.002\n8EC\t8IP Emerging Companies Ltd\t\t3,199,410\t0\n8VI\t8VIC Holdings Ltd\tConsumer Discretionary\t13,073,200\t0.001\n9SP\t9 Spokes International Ltd\tInformation Technology\t21,880,100\t0.001\nACB\tA-Cap Energy Ltd\tEnergy\t7,846,960\t0\nA2B\tA2B Australia Ltd\tIndustrials\t95,140,200\t0.005\nABP\tAbacus Property Group\tReal Estate\t1,679,500,000\t0.082\nABL\tAbilene Oil and Gas Ltd\tEnergy\t397,614\t0\nAEG\tAbsolute Equity Performance Fund Ltd\t\t107,297,000\t0.005\nABT\tAbundant Produce Ltd\tConsumer Staples\t1,355,970\t0\nACS\tAccent Resources NL\tMaterials\t905,001\t0\n
6:17:22 AM Info responseString length: 1020
6:17:22 AM Info responseString type: string
6:17:22 AM Info 1.0
6:17:22 AM Notice Execution completed
Issue:
Using a r'' raw string flag makes \n and \t, a literal \ and n/t respectively and not a new line or a tab character. This explains why you were able to copy the "displayed" logs to a variable and execute it successfully.
Solution:
Don't use r flag.
Snippet:
linetext = column + "\n" #no flag
long_string = long_string+linetext
else:
text = column + "\t" #no flag
I face the following problem, for pa in data["result"] the following error is displayed "TypeError: list indices must be integers or slices, not str". Could you suggest me which kind of manipulation should I make to the code for letting it running without displaying the error?
with open("data/indicePA/indicePA.tsv", 'w') as f_indice_pa , open("data/indicePA/otherPA.tsv", 'w') as f_other:
writer_indice_pa = csv.writer(f_indice_pa, delimiter ='\t')
writer_other_pa = csv.writer(f_other, delimiter ='\t')
count = 0
res = ["cf", "cod_amm", "regione", "provincia", "comune", "indirizzo", "tipologia_istat", "tipologia_amm"]
writer_indice_pa.writerow(res)
writer_other_pa.writerow(["cf"])
for pa in data["result"]:
esito = pa["esitoUltimoTentativoAccessoUrl"]
if esito == "successo":
cf = pa["codiceFiscale"]
if cf in cf_set_amm:
try:
cod_amm = df_amm.loc[df_amm['Cf'] == cf].iloc[0]['cod_amm']
take0 = df_amm.loc[df_amm['cod_amm'] == cod_amm].iloc[0]
regione = take0['Regione'].replace("\t", "")
provincia = str(take0['Provincia']).replace("\t", "")
comune = take0['Comune'].replace("\t", "")
indirizzo = take0['Indirizzo'].replace("\t", "")
tipologia_istat = take0['tipologia_istat'].replace("\t", "")
tipologia_amm = take0['tipologia_amm'].replace("\t", "")
res = [cf, cod_amm, regione, provincia, comune, indirizzo, tipologia_istat, tipologia_amm]
writer_indice_pa.writerow(res)
except: # catch *all* exceptions
print("CF in df_amm",cf)
elif cf in cf_set_serv_fatt:
try:
cod_amm = df_serv_fatt.loc[df_serv_fatt['Cf'] == cf].iloc[0]['cod_amm']
take0 = df_amm.loc[df_amm['cod_amm'] == cod_amm].iloc[0]
regione = take0['Regione'].replace("\t", "")
provincia = str(take0['Provincia']).replace("\t", "")
comune = take0['Comune'].replace("\t", "")
indirizzo = take0['Indirizzo'].replace("\t", "")
tipologia_istat = take0['tipologia_istat'].replace("\t", "")
tipologia_amm = take0["tipologia_amm"].replace("\t", "")
res = [cf, cod_amm, regione, provincia, comune, indirizzo, tipologia_istat, tipologia_amm]
writer_indice_pa.writerow(res)
except: # catch *all* exceptions
#e = sys.exc_info()[0]
print("CF in df_serv_fatt",cf)
else:
#print(cf, " is not present")
count = count + 1
writer_other_pa.writerow([cf])
#if(count % 100 == 0):
#print(cf)
print("Totale cf non presenti in IndicePA: ", count)
f_indice_pa.flush()
f_other.flush()
I expect to obtain the following statement "Totale cf non presenti in IndicePA: 1148", because this code has been already run. But I face this problem. How to overcome it? Is there any manipulation that I can make to this original code?
Thanks for your help in advance.
It is possible to find a wider explanation of the code at the following link: link resource
According to the documentation for the json module you are using to parse that json file.
This chart can be found at https://docs.python.org/2/library/json.html Section 18.2.2
It seems like your json object is getting parsed as a list. According to that chart the object must be an array. Like such...
[
{
"result" : { someObject }
}
]
A simple fix might be editing the json so it is parsed correctly to a dict. For example...
{
"result": {
someObjectStuff
}
}
Another fix could be accessing the first element in the list. If your json looks like the following
[
{
"result" : { someObject }
}
]
Then changing the code to be
for pa in data[0]["result"]:
Might also fix the problem.
Would love some help here. Full context this is my first "purposeful" Python script. Prior to this I've only dabbled a bit and am honestly still learning so maybe I jumped in a bit too early here.
Long story short, been running all over fixing various type mismatches or just general indentation issues (dear lord python isn't forgiving on this).
I think I'm about finished but have a few last issues. Most of them seem to come from the same section too. This script is just mean to get a csv file that has 3 columns and use that to send requests based on the first column (either iOS or Android). The problem is when I'm creating the body to send...
Here's the code (a few tokens omitted for postability):
#!/usr/bin/python
# -*- coding: utf-8 -*-
import requests
import json
import pandas as pd
from tqdm import tqdm
from datetime import *
import uuid
import warnings
from math import isnan
import time
## throttling based on AF's 80 request per 2 minute rule
def throttle():
i = 0
while i <= 3:
print ("PAUSED FOR THROTTLING!" + "\n" + str(3-i) + " minutes remaining")
time.sleep(60)
i = i + 1
print (i)
return 0
## function for reformating the dates
def date():
d = datetime.utcnow() # # <-- get time in UTC
d = d.isoformat('T') + 'Z'
t = d.split('.')
t = t[0] + 'Z'
return str(t)
## function for dealing with Android requests
def android_request(madv_id,mtime,muuid,android_app,token,endpoint):
headers = {'Content-Type': 'application/json', 'Accept': 'application/json'}
params = {'api_token': token }
subject_identities = {
"identity_format": "raw",
"identity_type": "android_advertising_id",
"identity_value": madv_id
}
body = {
'subject_request_id': muuid,
'subject_request_type': 'erasure',
'submitted_time': mtime,
'subject_identities': dict(subject_identities),
'property_id': android_app
}
body = json.dumps(body)
res = requests.request('POST', endpoint, headers=headers,
data=body, params=params)
print("android " + res.text)
## function for dealing with iOS requests
def ios_request(midfa, mtime, muuid, ios_app, token, endpoint):
headers = {'Content-Type': 'application/json',
'Accept': 'application/json'}
params = {'api_token': token}
subject_identities = {
'identity_format': 'raw',
'identity_type': 'ios_advertising_id',
'identity_value': midfa,
}
body = {
'subject_request_id': muuid,
'subject_request_type': 'erasure',
'submitted_time': mtime,
'subject_identities': list(subject_identities),
'property_id': ios_app,
}
body = json.dumps(body)
res = requests.request('POST', endpoint, headers=headers, data=body, params=params)
print("ios " + res.text)
## main run function. Determines whether it is iOS or Android request and sends if not LAT-user
def run(output, mdf, is_test):
# # assigning variables to the columns I need from file
print ('Sending requests! Stand by...')
platform = mdf.platform
device = mdf.device_id
if is_test=="y":
ios = 'id000000000'
android = 'com.tacos.okay'
token = 'OMMITTED_FOR_STACKOVERFLOW_Q'
endpoint = 'https://hq1.appsflyer.com/gdpr/stub'
else:
ios = 'id000000000'
android = 'com.tacos.best'
token = 'OMMITTED_FOR_STACKOVERFLOW_Q'
endpoint = 'https://hq1.appsflyer.com/gdpr/opengdpr_requests'
for position in tqdm(range(len(device))):
if position % 80 == 0 and position != 0:
throttle()
else:
req_id = str(uuid.uuid4())
timestamp = str(date())
if platform[position] == 'android' and device[position] != '':
android_request(device[position], timestamp, req_id, android, token, endpoint)
mdf['subject_request_id'][position] = req_id
if platform[position] == 'ios' and device[position] != '':
ios_request(device[position], timestamp, req_id, ios, token, endpoint)
mdf['subject_request_id'][position] = req_id
if 'LAT' in platform[position]:
mdf['subject_request_id'][position] = 'null'
mdf['error status'][position] = 'Limit Ad Tracking Users Unsupported. Device ID Required'
mdf.to_csv(output, sep=',', index = False, header=True)
# mdf.close()
print ('\nDONE. Please see ' + output
+ ' for the subject_request_id and/or error messages\n')
## takes the CSV given by the user and makes a copy of it for us to use
def read(mname):
orig_csv = pd.read_csv(mname)
mdf = orig_csv.copy()
# Check that both dataframes are actually the same
# print(pd.DataFrame.equals(orig_csv, mdf))
return mdf
## just used to create the renamed file with _LOGS.csv
def rename(mname):
msuffix = '_LOG.csv'
i = mname.split('.')
i = i[0] + msuffix
return i
## adds relevant columns to the log file
def logs_csv(out, df):
mdf = df
mdf['subject_request_id'] = ''
mdf['error status'] = ''
mdf['device_id'].fillna('')
mdf.to_csv(out, sep=',', index=None, header=True)
return mdf
## solely for reading in the file name from the user. creates string out of filename
def readin_name():
mprefix = input('FILE NAME: ')
msuffix = '.csv'
mname = str(mprefix + msuffix)
print ('\n' + 'Reading in file: ' + mname)
return mname
def start():
print ('\nWelcome to GDPR STREAMLINE')
# # blue = OpenFile()
testing = input('Is this a test? (y/n) : ')
# return a CSV
name = readin_name()
import_csv = read(name)
output_name = rename(name)
output_file = logs_csv(output_name, import_csv)
run( output_name, output_file, testing)
# # print ("FILE PATH:" + blue)
## to disable all warnings in console logs
warnings.filterwarnings('ignore')
start()
And here's the error stacktrace:
Reading in file: test.csv
Sending requests! Stand by...
0%| | 0/384 [00:00<?, ?it/s]
Traceback (most recent call last):
File "a_GDPR_delete.py", line 199, in <module>
start()
File "a_GDPR_delete.py", line 191, in start
run( output_name, output_file, testing)
File "a_GDPR_delete.py", line 114, in run
android_request(device[position], timestamp, req_id, android, token, endpoint)
File "a_GDPR_delete.py", line 57, in android_request
body = json.dumps(body)
File "/Users/joseph/anaconda3/lib/python3.6/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/Users/joseph/anaconda3/lib/python3.6/json/encoder.py", line 199, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/Users/joseph/anaconda3/lib/python3.6/json/encoder.py", line 257, in iterencode
return _iterencode(o, 0)
File "/Users/joseph/anaconda3/lib/python3.6/json/encoder.py", line 180, in default
o.__class__.__name__)
TypeError: Object of type 'int64' is not JSON serializable
TL;DR:
Getting a typeError when calling this on a JSON with another nested JSON. I've confirmed that the nested JSON is the problem because if I remove the "subject_identities" section this compiles and works...but the API I'm using NEEDS those values so this doesn't actually do anything without that section.
Here's the relevant code again (and in the version I first used that WAS working previously):
def android (madv_id, mtime, muuid):
headers = {
"Content-Type": "application/json",
"Accept": "application/json"
}
params = {
"api_token": "OMMITTED_FOR_STACKOVERFLOW_Q"
}
body = {
"subject_request_id": muuid, #muuid,
"subject_request_type": "erasure",
"submitted_time": mtime,
"subject_identities": [
{ "identity_type": "android_advertising_id",
"identity_value": madv_id,
"identity_format": "raw" }
],
"property_id": "com.tacos.best"
}
body = json.dumps(body)
res = requests.request("POST",
"https://hq1.appsflyer.com/gdpr/opengdpr_requests",
headers=headers, data=body, params=params)
I get the feeling I'm close to this working. I had a much simpler version early on that worked but I rewrote this to be more dynamic and use less hard coded values (so that I can eventually use this to apply to any app I'm working with an not only the two it was made for).
Please be nice, I'm entirely new to python and also just rusty on coding in general (thus trying to do projects like this one)
You can check for numpy dtypes like so:
if hasattr(obj, 'dtype'):
obj = obj.item()
This will convert it to the closest equivalent data type
EDIT:
Apparently np.nan is JSON serializable so I've removed that catch from my answer
Thanks to everyone for helping so quickly here. Apparently I was deceived by the error message as the fix from #juanpa.arrivillaga did the job with one adjustment.
Corrected code was on these parts:
android_request(str(device[position]), timestamp, req_id, android, token, endpoint)
and here:
ios_request(str(device[position]), timestamp, req_id, ios, token, endpoint)
I had to cast to string apparently even though these values are not originally integers and tend to look like this instead ab12ab12-12ab-34cd-56ef-1234abcd5678
What is the simplest way to pretty-print a string of JSON as a string with indentation when the initial JSON string is formatted without extra spaces or line breaks?
Currently I'm running json.loads() and then running json.dumps() with indent=2 on the result. This works, but it feels like I'm throwing a lot of compute down the drain.
Is there a more simple or efficient (built-in) way to pretty-print a JSON string? (while keeping it as valid JSON)
Example
import requests
import json
response = requests.get('http://spam.eggs/breakfast')
one_line_json = response.content.decode('utf-8')
pretty_json = json.dumps(json.loads(response.content), indent=2)
print(f'Original: {one_line_json}')
print(f'Pretty: {pretty_json}')
Output:
Original: {"breakfast": ["spam", "spam", "eggs"]}
Pretty: {
"breakfast": [
"spam",
"spam",
"eggs"
]
}
json.dumps(obj, indent=2) is better than pprint because:
It is faster with the same load methodology.
It has the same or similar simplicity.
The output will produce valid JSON, whereas pprint will not.
pprint_vs_dumps.py
import cProfile
import json
import pprint
from urllib.request import urlopen
def custom_pretty_print():
url_to_read = "https://www.cbcmusic.ca/Component/Playlog/GetPlaylog?stationId=96&date=2018-11-05"
with urlopen(url_to_read) as resp:
pretty_json = json.dumps(json.load(resp), indent=2)
print(f'Pretty: {pretty_json}')
def pprint_json():
url_to_read = "https://www.cbcmusic.ca/Component/Playlog/GetPlaylog?stationId=96&date=2018-11-05"
with urlopen(url_to_read) as resp:
info = json.load(resp)
pprint.pprint(info)
cProfile.run('custom_pretty_print()')
>>> 71027 function calls (42309 primitive calls) in 0.084 seconds
cProfile.run('pprint_json()')
>>>164241 function calls (140121 primitive calls) in 0.208 seconds
Thanks #tobias_k for pointing out my errors along the way.
I think for a true JSON object print, it's probably as good as it gets. timeit(number=10000) for the following took about 5.659214497s:
import json
d = {
'breakfast': [
'spam', 'spam', 'eggs',
{
'another': 'level',
'nested': [
{'a':'b'},
{'c':'d'}
]
}
],
'foo': True,
'bar': None
}
s = json.dumps(d)
q = json.dumps(json.loads(s), indent=2)
print(q)
I tried with pprint, but it actually wouldn't print the pure JSON string unless it's converted to a Python dict, which loses your true, null and false etc valid JSON as mentioned in the other answer. As well it doesn't retain the order in which the items appeared, so it's not great if order is important for readability.
Just for fun I whipped up the following function:
def pretty_json_for_savages(j, indentor=' '):
ind_lvl = 0
temp = ''
for i, c in enumerate(j):
if c in '{[':
print(indentor*ind_lvl + temp.strip() + c)
ind_lvl += 1
temp = ''
elif c in '}]':
print(indentor*ind_lvl + temp.strip() + '\n' + indentor*(ind_lvl-1) + c, end='')
ind_lvl -= 1
temp = ''
elif c in ',':
print(indentor*(0 if j[i-1] in '{}[]' else ind_lvl) + temp.strip() + c)
temp = ''
else:
temp += c
print('')
# {
# "breakfast":[
# "spam",
# "spam",
# "eggs",
# {
# "another": "level",
# "nested":[
# {
# "a": "b"
# },
# {
# "c": "d"
# }
# ]
# }
# ],
# "foo": true,
# "bar": null
# }
It prints pretty alright, and unsurprisingly it took a whooping 16.701202023s to run in timeit(number=10000), which is 3 times as much as a json.dumps(json.loads()) would get you. It's probably not worthwhile to build your own function to achieve this unless you spend some time to optimize it, and with the lack of a builtin for the same, it's probably best you stick with your gun since your efforts will most likely give diminishing returns.