Loading multiple JSON files - python

So I am trying to load multiple JSON files with Python HTTP requests, but I cant figure out to do it corecctly.
Loading one JSON file with python is pretty simple:
response = requests.get(url)
te = response.content.decode()
da = json.loads(te[te.find("{"):te.rfind("}")+1]
But how can I load multiple JSON files?
I have a list of URLs and I tried to request every URL with a loop and then load every line of the result, but it seems this does not work.
This is the code I am using:
t = []
for url in urls:
re = requests.get(url)
te = req.content.decode()
daten = json.loads(te[te.find("{"):te.rfind("}")+1])
t.append(daten)
But I am getting this error:
JSONDecodeError: Expecting value: line 1 column 1 (char 0).
I am pretty new with JSOn but I do understand that I cant read it line for line with a loop, becuase it destructs the JSON struture(?).
So how can I read multiple JSON files?
EDIT: Found the error.
Some links are not in correct JSON.

With requests library, If the endpoint you are requesting returns a well formed json response, all you need to do is call the .json() method on the response object:
t = []
for url in urls:
re = requests.get(url)
t.append(re.json())
Then, if you want to handle bad responses, wrap the code above in a try:...except block

Assuming you receive correct json from any site, you didn't construct result json.
You might write something like
t = []
for url in urls:
t.append(requests.get(url).content.decode('utf-8'))
result = json.loads('{{"data": [{}]}}'.format(','.join(t)))

Related

TypeError: expected str, bytes or os.PathLike object, not dict

This is my code:
from os import rename, write
import requests
import json
url = "https://api.github.com/search/users?q=%7Bquery%7D%7B&page,per_page,sort,order%7D"
data = requests.get(url).json()
print(data)
outfile = open("C:/Users/vladi/Desktop/json files Vlad/file structure first attemp.json", "r")
json_object = json.load(outfile)
with open(data,'w') as endfile:
endfile.write(json_object)
print(endfile)
I want to call API request.
I want to take data from this URL: https://api.github.com/search/users?q=%7Bquery%7D%7B&page,per_page,sort,order%7D,
and rewrite it with my own data which is my file called file structure first attemp.json
and update this URL with my own data.
import requests
url = "https://api.github.com/search/usersq=%7Bquery%7D%7B&page,per_page,sort,order%7D"
data = requests.get(url)
with open(data,'w') as endfile:
endfile.write(data.text)
json.loads() returns a Python dictionary, which cannot be written to a file. Simply write the returned string from the URL.
response.json() is a built in feature that requests uses to load the JSON returned from the URL. So you are loading the JSON twice.

How to access url from dict structure of a json url

req = urllib.request.Request(url)
data = urllib.request.urlopen(req).read().decode()
Once i get the aws site string i can load it up in python, but how do i catch it?
The .json is actually a website with below structure. The aws website is a csv when you open it.
Is there a library within json which can help with this?
Structure
You can get URL like this:
import json
data = json.load(file object)
url = data.get('export').get('url')
If it is a string use:
json.loads(string)
If it is a json file use:
json.load(file object)
More Information

How to return data pulled from python requests call into json

I am trying to make a GHE API call and convert the returned data into JSON. I am sure this is fairly simple (my current code writes the data into a .txt file) but I am incredibly new to python.
I am having a hard time understanding how to use json.dumps.
import requests
import json
GITHUB_ENTERPRISE_TOKEN = 'token xxx'
SEARCH_QUERY = "Evidence+locker+Seed+in:readme"
headers = {
'Authorization': GITHUB_ENTERPRISE_TOKEN,
}
url = "https://github.ibm.com/api/v3/search/repositories?q=" + SEARCH_QUERY
#Setup url to include GHE api endpoint and the search query
response = requests.get(url, headers=headers)
with open('./evidencelockerevidence.txt', 'w') as file:
file.write(response.text)
#writes to a .txt file the evidence fetched from GHE
Rather than the last two lines of functional code writing the data into a .txt file I would like to return it as JSON object in the same directory.
json.dumps simply stringify, thus, serialize your JSON object so you can store it as a plain text file. Its counterpart is json.loads.
f = open('a.jsonl', 'wt')
f.write(json.dumps(jobj))
People usually write one JSON object per line, a.k.a, jsonl format.
json.dump directly store your JSON object to a file. Its counterpart is json.load.
json.dump(jobj, open('a.json', 'wt'))
A json format file contains only one JSON object in a single line or multiple lines.

Python - How to fill out a web form then download the file that is generated

I'm trying to use use python to go to this page, https://comtrade.un.org/data/, fill in the form, and "click" the download button. Then get the csv file that is generated.
Anyone have some sample code for automating the download in python?
Thx.
You might be interested in trying out pywinauto. I have not had too much experience, but I do believe it could do the job.
Good luck!
The site you are accessing has an exposed API and you can use that form to generate the API URL and simply call that to return a JSON or CSV response. To get this with Python you can use requests and the core JSON module to parse the data if you want to use the data inside Python:
CSV File
import requests
api_url = 'https://comtrade.un.org/api/get?max=500&type=C&freq=A&px=HS&ps=2017&r=all&p=0&rg=all&cc=TOTAL&fmt=csv'
response = requests.get(api_url)
data = response.content
with open('output.csv', 'wb') as output:
output.write(data)
Note the fmt=csv property in the URL.
Python Dictionary
import requests, json
api_url = 'https://comtrade.un.org/api/get?max=500&type=C&freq=A&px=HS&ps=2017&r=all&p=0&rg=all&cc=TOTAL'
response = requests.get(api_url)
data = json.loads(response.content)
print(data)
Note that the API URL in the example came from submitting the default form and clicking 'View API Call'. Under the generated table.

Writing data to csv or text file using python

I am trying to write some data to csv file by checking some condition as below
I will have a list of urls in a text file as below
urls.txt
www.example.com/3gusb_form.aspx?cid=mum
www.example_second.com/postpaid_mum.aspx?cid=mum
www.example_second.com/feedback.aspx?cid=mum
Now i will go through each url from the text file and read the content of the url using urllib2 module in python and will search a string in the entire html page. If the required string founds i will write that url in to a csv file.
But when i am trying to write data(url) in to csv file,it is saving like each character in to one coloumn as below instead of saving entire url(data) in to one column
h t t p s : / / w w w......
Code.py
import urllib2
import csv
search_string = 'Listen Capcha'
html_urls = open('/path/to/input/file/urls.txt','r').readlines()
outputcsv = csv.writer(open('output/path' + 'urls_contaning _%s.csv'%search_string, "wb"),delimiter=',', quoting=csv.QUOTE_MINIMAL)
outputcsv.writerow(['URL'])
for url in html_urls:
url = url.replace('\n','').strip()
if not len(url) == 0:
req = urllib2.Request(url)
response = urllib2.urlopen(req)
if str(search_string) in response.read():
outputcsv.writerow(url)
So whats wrong with the above code, what needs to be done in order to save the entire url(string) in to one column in a csv file ?
Also how can we write data to a text file as above ?
Edited
Also i had a url suppose like http://www.vodafone.in/Pages/tuesdayoffers_che.aspx ,
this url will be redirected to http://www.vodafone.in/pages/home_che.aspx?cid=che in browser actually, but when i tried through code as below it is same as the above given url
import urllib2, httplib
httplib.HTTPConnection.debuglevel = 1
request = urllib2.Request("http://www.vodafone.in/Pages/tuesdayoffers_che.aspx")
opener = urllib2.build_opener()
f = opener.open(request)
print f.geturl()
Result
http://www.vodafone.in/pages/tuesdayoffers_che.aspx?cid=che
So finally how to catch the redirected url with urllib2 and fetch the data from it ?
Change the last line to:
outputcsv.writerow([url])

Categories

Resources