How can I access these json object in python - python

I'm making some data visualization from movies database api and I already access the data in the normal way but when i load the json data and for loop to print it, the data that out is just the column but I need to access the object inside.
url = "https://api.themoviedb.org/3/discover/movie?api_key="+ api_key
+"&language=en- US&sort_by=popularity.desc&include_adult=
false&include_video=false&page=1" # api url
response = urllib.request.urlopen(url)
raw_json = response.read().decode("utf-8")
data = json.loads(raw_json)
for j in data:
print(j)
i expect the output would be
[{'popularity': 15,
'id': 611,
'video': False,
'vote_count': 1403,
'vote_average': 8.9,
'title': 'lalalalo'},{....}]
but the actual output is
page
total_results
total_pages
results

The results are one level down. You are looping through the metadata.
Try changing your code to
import json
import urllib.request
api_key = "your api code"
url = "https://api.themoviedb.org/3/discover/movie?api_key=" + api_key +"&language=en- US&sort_by=popularity.desc&include_adult=false&include_video=false&page=1" # api url
response = urllib.request.urlopen(url)
raw_json = response.read().decode("utf-8")
data = json.loads(raw_json)
for j in data['results']:
print(j)
You need to change
data
to
data['results']

you can simply use requests module...
import requests
import json
your_link = " "
r = requests.get(your_link)
data = json.loads(r.content)
You shall have the json loaded up, then use your key "results" ["results"] and loop through the data you got.

Related

How to get unshortened/redirected URL even when site 404s or fails in Python

I'm trying to get the destination of a bunch of t.co links from Twitter. I can get this for active links, but when they are 404 or dead links, the program dies. If I enter this into the browser, it shows me the destination URL.
Is there a way to do this in Python 3?
This is my existing code:
import requests
import pandas as pd
from requests.models import Response
# Loading my array of links
data = pd.read_json('tco-links.json')
links = pd.DataFrame(data)
output = []
session = requests.Session() # so connections are recycled
with open('output.json', 'w') as f:
for index, row in links.iterrows():
fullLink = 'http://' + row['link']
try:
response = session.head(fullLink, allow_redirects=True)
except:
# how I'm handling errors right now
response = Response()
response.url = 'Failed'
output.append({
'link': fullLink,
'id': row['id'],
'unshortened': response.url
})
for x in output:
f.write(json.dumps(x) + '\n')

How to search for books that have spaces in their title using Google books API

When i search for books with a single name(e.g bluets) my code works fine, but when I search for books that have two names or spaces (e.g white whale) I got an error(jinja2 synatx) how do I solve this error?
#app.route("/book", methods["GET", "POST"])
def get_books():
api_key =
os.environ.get("API_KEY")
if request.method == "POST":
book = request.form.get("book")
url =f"https://www.googleapis.com/books/v1/volumes?q={book}:keyes&key={api_key}"
response =urllib.request.urlopen(url)
data = response.read()
jsondata = json.loads(data)
return render_template ("book.html", books=jsondata["items"]
I tried to search for similar cases, and just found one solution, but I didn't understand it
Here is my error message
http.client.InvalidURL
http.client.InvalidURL: URL can't contain control characters. '/books/v1/volumes?q=white whale:keyes&key=AIzaSyDtjvhKOniHFwkIcz7-720bgtnubagFxS8' (found at least ' ')
Some chars in url need to be encoded - in your situation you have to use + or %20 instead of space.
This url has %20 instead of space and it works for me. If I use + then it also works
import urllib.request
import json
url = 'https://www.googleapis.com/books/v1/volumes?q=white%20whale:keyes&key=AIzaSyDtjvhKOniHFwkIcz7-720bgtnubagFxS8'
#url = 'https://www.googleapis.com/books/v1/volumes?q=white+whale:keyes&key=AIzaSyDtjvhKOniHFwkIcz7-720bgtnubagFxS8'
response = urllib.request.urlopen(url)
text = response.read()
data = json.loads(text)
print(data)
With requests you don't even have to do it manually because it does it automatically
import requests
url = 'https://www.googleapis.com/books/v1/volumes?q=white whale:keyes&key=AIzaSyDtjvhKOniHFwkIcz7-720bgtnubagFxS8'
r = requests.get(url)
data = r.json()
print(data)
You may use urllib.parse.urlencode() to make sure all chars are correctly encoded.
import urllib.request
import json
payload = {
'q': 'white whale:keyes',
'key': 'AIzaSyDtjvhKOniHFwkIcz7-720bgtnubagFxS8',
}
query = urllib.parse.urlencode(payload)
url = 'https://www.googleapis.com/books/v1/volumes?' + query
response = urllib.request.urlopen(url)
text = response.read()
data = json.loads(text)
print(data)
and the same with requests - it also doesn't need encoding
import requests
payload = {
'q': 'white whale:keyes',
'key': 'AIzaSyDtjvhKOniHFwkIcz7-720bgtnubagFxS8',
}
url = 'https://www.googleapis.com/books/v1/volumes'
r = requests.get(url, params=payload)
data = r.json()
print(data)

Find specific keyword in API JSON response - Python

I am trying to fetch a JSON response of multiple issues from an API and I am able to get the response successfully. My next part which I want to perform is to fetch/print only those lines which have specific keywords as "moviepass" and "login" in JSON tag "body". Here is my code
import json
import requests
api_url = '***************************************'
headers = {'Content-Type': 'application/json',
'Authorization':'Basic **************************'}
response = requests.get(api_url, headers=headers)
#print(response.text)
words = ('moviepass', 'login')
def lookingfor(words):
data = response.text
for line in data:
for word in words:
match = re.findall(word, line['body'])
if match:
print((word, line[]))
lookingfor(words)
My JSON looks like:
[{"tags":["moviepass"],"assignee_name":null,"app_id":"*******","hs_user_id":"*******","title":"1234","redacted":false,"updated_at":1611753805497,"messages":[{"body":"moviepass - Not '
'sure if this is what you guys meant or not but here '
'haha.","created_at":********,"author":{"name":"abc","id":"*****","emails":["abc#qwerty.com"]},"origin":"end-user","id":"*********"}]
You dont need regular expression.You can use json_data['tags']
But if you want to use regular expression, you need to convert json to string by using
import json
json.dumps(json_obj) #returns same object but type of string.
Convert JSON response and parse it - it's a list of [nested] dicts. You can use Response.json() method, no need to import json.
import requests
api_url = '***************************************'
headers = {'Content-Type': 'application/json',
'Authorization':'Basic **************************'}
words = ('moviepass', 'login')
response = requests.get(api_url, headers=headers)
data = response.json()
for item in data:
if any(word in item.get('tags', []) for word in words):
print(item)

How to download images from a website without the 'img' tag?

Recently I've been trying to learn how to webscrape in order to download all the images from my school directory. However, within the elements they are not storing the images under the img tag and instead have them ALL under this: background-image: url("/common/pages/GalleryPhoto.aspx?photoId=323070&width=180&height=180");
Anyway to bypass this??
Here is current code that will download images off of a targeted website
import os, requests, bsf n4, webbrowser, random
url = 'https://jhs.lsc.k12.in.us/staff_directory'
res = requests.get(url)
try:
res.raise_for_status()
except Exception as exc:
print('Sorry an error occured:', exc)
soup = bs4.BeautifulSoup(res.text, 'html.parser')
element = soup.select('background-image')
for i in range(len(element)):
url = element[i].get('img')
name = random.randrange(1, 25)
file = open(str(name) + '.jpg', 'wb')
res = requests.get(url)
for chunk in res.iter_content(10000):
file.write(chunk)
file.close()
print('done')
You can use the internal API this site is using to get the data including the image URL. It first gets the list of staff groups using the /settings endpoint then calls the /Search api with all the groupID
The flow is the following :
get the portletInstanceId value from a div tag with attribute data-portlet-instance-id
call the settings api and get the groups ID:
POST https://jhs.lsc.k12.in.us/Common/controls/StaffDirectory/ws/StaffDirectoryWS.asmx/Settings
call the search api with pagination parameter, you can choose how many people you want to request and the number per page :
POST https://jhs.lsc.k12.in.us/Common/controls/StaffDirectory/ws/StaffDirectoryWS.asmx/Search
The following script get the 20 first people and put the result in a pandas DataFrame:
import requests
from bs4 import BeautifulSoup
import pandas as pd
r = requests.get("https://jhs.lsc.k12.in.us/staff_directory")
soup = BeautifulSoup(r.content, "lxml")
portletInstanceId = soup.select('div[data-portlet-instance-id].staffDirectoryComponent')[0]["data-portlet-instance-id"]
r = requests.post("https://jhs.lsc.k12.in.us/Common/controls/StaffDirectory/ws/StaffDirectoryWS.asmx/Settings",
json = { "portletInstanceId": portletInstanceId })
groupIds = [t["groupID"] for t in r.json()["d"]["groups"]]
print(groupIds)
payload = {
"firstRecord": 0,
"groupIds": groupIds,
"lastRecord": 20,
"portletInstanceId": portletInstanceId,
"searchByJobTitle": True,
"searchTerm": "",
"sortOrder": "LastName,FirstName ASC"
}
r = requests.post("https://jhs.lsc.k12.in.us/Common/controls/StaffDirectory/ws/StaffDirectoryWS.asmx/Search",
json = payload)
results = r.json()["d"]["results"]
#add image url based on userID
for t in results:
t["imageURL"] = f'https://jhs.lsc.k12.in.us/{t["imageURL"]}' if t["imageURL"] else ''
df = pd.DataFrame(results)
#whole data
print(df)
#only image url
with pd.option_context('display.max_colwidth', 400):
print(df["imageURL"])
Try this on repl.it
You need to update firstRecord and lastRecord fields accordingly

Python JSON data into HTML table

I'm pretty lost. Not going to lie. I'm trying to figure out how to parse JSON data from the college scorecard API into an HTML file. I used Python to store the JSON data in a dictionary, but other than that, I'm pretty dang lost. How would you write an example sending this data to an HTML file?
def main():
url = 'https://api.data.gov/ed/collegescorecard/v1/schools.json'
payload = {
'api_key': "api_key_string",
'_fields': ','.join([
'school.name',
'school.school_url',
'school.city',
'school.state',
'school.zip',
'2015.student.size',
]),
'school.operating': '1',
'2015.academics.program_available.assoc_or_bachelors': 'true',
'2015.student.size__range': '1..',
'school.degrees_awarded.predominant__range': '1..3',
'school.degrees_awarded.highest__range': '2..4',
'id': '240444',
}
data = requests.get(url, params=payload).json()
for result in data['results']:
print result
main()
Output:
{u'school.city': u'Madison', u'school.school_url': u'www.wisc.edu', u
'school.zip': u'53706-1380', u'2015.student.size': 29579, u'school.st
ate': u'WI', u'school.name': u'University of Wisconsin-Madison'}
Edit: For clarification, I need to insert the return data to an HTML file that formats and removes data styling and places it onto a table.
Edit II: Json2html edit
data = requests.get(url, params=payload).json()
for result in data['results']:
print result
data_processed = json.loads(data)
formatted_table = json2html.convert(json = data_processed)
index= open("index.html","w")
index.write(formatted_table)
index.close()
Edit: Json2html output:
Output image here
Try using the json2html module! This will convert the JSON that was returned into a 'human readable HTML Table representation'.
This code will take your JSON output and create the HTML:
data_processed = json.loads(data)
formatted_table = json2html.convert(json = data_processed)
Then to save it as HTML you can do this:
your_file= open("filename","w")
your_file.write(formatted_table)
your_file.close()

Categories

Resources