Map JSON response from Ruby to Python - python

[EDIT]: Sorry I didn't explain in enough details the first time as I kind of wanted to figure the rest out for myself, but I ended up confusing myself even more
I have a small problem.
I wanted to take advantage of a website's API JSON response
{
"Class": {
"Id": 1948237,
"family": "nature",
"Timestamp": 941439
},
"Subtitles": [
{
"Id":151398,
"Content":"Tree",
"Language":"en"
},
{
"Id":151399,
"Content":"Bush,
"Language":"en"
}
]
}
So I'd like to print the url with a combined string of each line of subtitles, seperated by newlines
And I manage to do so in Ruby like this:
def get_word
r = HTTParty.get('https://example.com/api/new')
# Check if the request had a valid response.
if r.code == 200
json = r.parsed_response
# Extract the family and timestamp from the API response.
_, family, timestamp = json["Class"].values
# Build a proper URL
image_url = "https://example.com/image/" + family + "/" + timestamp.to_s
# Combine each line of subtitles into one string, seperated by newlines.
word = json["Subtitles"].map{|subtitle| subtitle["Content"]}.join("\n")
return image_url, word
end
end
However now I need to port this to python and because I'm terrible at python I can't really seem to figure it out.
I'm using requests instead of HTTParty as I think it's the best equivalent.
I tried doing this:
def get_word():
r = requests.request('GET', 'https://example.com/api/new')
if r.status_code == 200:
json = requests.Response
# [DOESN'T WORK] Extract the family and timestamp from the API response.
_, family, timestamp = json["Class"].values
# Build a proper URL
image_url = "https://example.com/image/" + family + "/" + timestamp.to_s
# Combine each line of subtitles into one string, seperated by newlines.
word = "\n".join(subtitle["Content"] for subtitle in json["Subtitles"])
print (image_url + '\n' + word)
get_word()
However I get stuck at extracting the JSON response and combining the lines

The Pythonic way is to use a list comprehension.
Word = "\n".join(subtitle["Content"] for subtitle in json["Subtitles"])

You might need to convert the incoming json to python dictionary
Assuming this is your response
response = {"Subtitles": ...}
#convert to dict
import json
json_data = json.loads(response)
# print content
for a_subtitle in response['Subtitles']:
print(a_subtitle['content'])
# extract family and timestamp
family = json_data["Class"]["family"]
timestamp = json_data["Class"]["Timestamp"]
image_url = "https://example.com/image/" + family + "/" + str(timestamp)

Related

Python, Json - get_wiki_main_image doesn't return a link for img

Why doesn't the script below return a photo url link? I try to modify the code but it has no effect.
import requests
import json
def get_wiki_main_image(title):
url = 'https://pl.wikipedia.org/wiki/Zamek_Kr%C3%B3lewski_na_Wawelu'
data = {
'action' :'query',
'format' : 'json',
'formatversion' : 2,
'prop' : 'pageimages|pageterms',
'piprop' : 'original',
'titles' : title
}
response = requests.get(url, data)
json_data = json.loads(response.text)
return json_data['query']['pages'][0]['original']['source'] if len(json_data['query']['pages']) >0 else 'Not found'
urllink = get_wiki_main_image('zamek królewski na wawelu')
print (urllink)
Thanks for help.
By observation, we notice that all the pictures in Wikipedia are in the folder https://upload.wikimedia.org/wikipedia/commons/thumb. If without using additional libraries:
import requests
r = requests.get('https://pl.wikipedia.org/wiki/Zamek_Kr%C3%B3lewski_na_Wawelu')
gen = r.iter_lines() # create a byte string generator
for s in gen:
# Is there such a substring, with the folder we need, in this line
if s.find(b'https://upload.wikimedia.org/wikipedia/commons/thumb') == -1:
continue
else:
ss = s.split(b'"') # split the byte string to separate the url
print(ss[3].decode('utf-8')) # take the url and convert it to a string
Console output:
https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Royal_Castle%2C_Wawel_Hill%2C_4_Wawel%2C_Old_Town%2C_Krak%C3%B3w%2C_Poland.jpg/1200px-Royal_Castle%2C_Wawel_Hill%2C_4_Wawel%2C_Old_Town%2C_Krak%C3%B3w%2C_Poland.jpg
https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Royal_Castle%2C_Wawel_Hill%2C_4_Wawel%2C_Old_Town%2C_Krak%C3%B3w%2C_Poland.jpg/800px-Royal_Castle%2C_Wawel_Hill%2C_4_Wawel%2C_Old_Town%2C_Krak%C3%B3w%2C_Poland.jpg
https://upload.wikimedia.org/wikipedia/commons/thumb/e/ea/Royal_Castle%2C_Wawel_Hill%2C_4_Wawel%2C_Old_Town%2C_Krak%C3%B3w%2C_Poland.jpg/640px-Royal_Castle%2C_Wawel_Hill%2C_4_Wawel%2C_Old_Town%2C_Krak%C3%B3w%2C_Poland.jpg
There are three static pictures on the site with different sizes.

Output non JSON data from regex web scraping to a JSON file

I'm using requests and regex to scrape data from an entire website and then save it to a JSON file, hosted on github so I and anyone else can access the data from other devices.
The first thing I tried was just to open every single page on the website and get all the data I want but I found that to be unnecessary so I decided to make two scripts, the first one finds the URL of every page on the site and the second one will be the one called which will then scrape the called URL. What I'm having trouble with right now is getting my data formatted correctly for the JSON file. Currently this is a sample of what the output looks like:
{
"Console":"/neo-geo-aes",
"Call ID":"62815",
"URL":"https://www.pricecharting.com/game/jp-sega-mega-drive/bare-knuckle"
}{
"Console":"/neo-geo-cd",
"Call ID":"62817",
"URL":"https://www.pricecharting.com/game/jp-sega-mega-drive/bare-knuckle-2"
}{
"Console":"/neo-geo-pocket-color",
"Call ID":"62578",
"URL":"https://www.pricecharting.com/game/jp-sega-mega-drive/batman"
}{
"Console":"/playstation",
"Call ID":"62580",
"URL":"https://www.pricecharting.com/game/jp-sega-mega-drive/batman-forever"
}
I've looked into this a lot and can't find a solution, here's the code in question:
import re
import requests
import json
##The base URL
URL = "https://www.pricecharting.com/"
r = requests.get(URL)
htmltext = r.text
##Find all system URLs
dataUrl = re.findall('(?<=<li><a href="\/console).*(?=">)', htmltext)
print(dataUrl)
##For each Item(number of consoles) find games
for i in range(len(dataUrl)):
##make console URL
newUrl = ("https://www.pricecharting.com/console" + dataUrl[i])
req = requests.get(newUrl)
newHtml = req.text
##Get item URLs
urlOne = re.findall('(?<=<a href="\/game).*(?=">)', newHtml)
itemId = re.findall('(?<=tr id="product-).*(?=" data)', newHtml)
##For every item in list(items per console)
out_list = []
for i in range(len(urlOne)):
##Make item URL
itemUrl = ("https://www.pricecharting.com/game" + urlOne[i])
callId = (itemId[i])
##Format for JSON
json_file_content = {}
json_file_content['Console'] = dataUrl[i]
json_file_content['Call ID'] = callId
json_file_content['URL'] = itemUrl
out_list.append(json_file_content)
data_json_filename = 'docs/result.json'
with open(data_json_filename, 'a') as data_json_file:
json.dump(out_list, data_json_file, indent=4)

Roblox Purchasing an item from catalog

I have written a script that should purchase an asset from catalog.
import re
from requests import post, get
cookie = "blablabla"
ID = 1562150
# getting x-csrf-token
token = post("https://auth.roblox.com/v2/logout", cookies={".ROBLOSECURITY": cookie}).headers['X-CSRF-TOKEN']
print(token)
# getting item details
detail_res = get(f"https://www.roblox.com/library/{ID}")
text = detail_res.text
productId = int(get(f"https://api.roblox.com/marketplace/productinfo?assetId={ID}").json()["ProductId"])
expectedPrice = int(re.search("data-expected-price=\"(\d+)\"", text).group(1))
expectedSellerId = int(re.search("data-expected-seller-id=\"(\d+)\"", text).group(1))
headers = {
"x-csrf-token": token,
"content-type": "application/json; charset=UTF-8"
}
data = {
"expectedCurrency": 1,
"expectedPrice": expectedPrice,
"expectedSellerId": expectedSellerId
}
buyres = post(f"https://economy.roblox.com/v1/purchases/products/{productId}", headers=headers,
data=data,
cookies={".ROBLOSECURITY": cookie})
if buyres.status_code == 200:
print("Successfully bought item")
The problem is that it somehow doesn't purchase any item with error 500 (InternalServerError).
Someone told me that if I add json.dumps() to the script it might work.
How to add json.dumps() here (I don't understand it though I read docs) and how to fix this so the script purchases item?
Big thanks to anyone who can help me.
Import the json package.
json.dumps() converts a python dictionary to a json string.
I'm guessing this is what you want.
buyres =
post(f"https://economy.roblox.com/v1/purchases/products/{productId}",
headers=json.dumps(headers),
data=json.dumps(data),
cookies={".ROBLOSECURITY": cookie})
I found the answer finally, I had to do it like this:
dataLoad = json.dumps(data)
buyres = post(f"https://economy.roblox.com/v1/purchases/products/{productId}", headers=headers,
data=dataLoad,
cookies={".ROBLOSECURITY": cookie})

Porting json response in Ruby to Python [duplicate]

This question already has answers here:
Map JSON response from Ruby to Python
(2 answers)
Closed 4 years ago.
Hey I made a program that takes advantage of a JSON API response in Ruby and I'd like to port it to python, but I don't really know how
JSON response:
{
"Class": {
"Id": 1948237,
"family": "nature",
"Timestamp": 941439
},
"Subtitles": [
{
"Id":151398,
"Content":"Tree",
"Language":"en"
},
{
"Id":151399,
"Content":"Bush,
"Language":"en"
}
]
}
And here's the Ruby code:
def get_word
r = HTTParty.get('https://example.com/api/new')
# Check if the request had a valid response.
if r.code == 200
json = r.parsed_response
# Extract the family and timestamp from the API response.
_, family, timestamp = json["Class"].values
# Build a proper URL
image_url = "https://example.com/image/" + family + "/" + timestamp.to_s
# Combine each line of subtitles into one string, seperated by newlines.
word = json["Subtitles"].map{|subtitle| subtitle["Content"]}.join("\n")
return image_url, word
end
end
Anyway I could port this code to Python using requests and maybe json modules?
I tried but failed miserably
Per request; what I've already tried:
def get_word():
r = requests.request('GET', 'https://example.com/api/new')
if r.status_code == 200:
# ![DOESN'T WORK]! Extract the family and timestamp from the API
json = requests.Response
_, family, timestamp = json["Class"].values
# Build a proper URL
image_url = "https://example.com/image/" + family + "/" + timestamp
# Combine each line of subtitles into one string, seperated by newlines.
word = "\n".join(subtitle["Content"] for subtitle in json["Subtitles"])
print (image_url + '\n' + word)
get_word()
The response and _, family, timestamp = json["Class"].values code don't work as I don't know how to port them.
If you're using the requests module, you can call requests.get() to make a GET call, and then use json() to get the JSON response. Also, you shouldn't be using json as a variable name if you're importing the json module.
Try making the following changes in your function:
def get_word():
r = requests.get("https://example.com/api/new")
if r.status_code == 200:
# Extract the family and timestamp from the API
json_response = r.json()
# json_response will now be a dictionary that you can simply use
...
And use the json_response dictionary to get anything you need for your variables.

How to determine if my Python Requests call to API returns no data

I have a query to an job board API using Python Requests. It then writes to a table, that is included in a web page. Sometimes the request will return no data(if there are no open jobs). If so, I want to write a string to the included file instead of the table. What is the best way to identify a response of no data? Is it as simple as: if response = "", or something along those lines?
Here is my Python code making the API request:
#!/usr/bin/python
import requests
import json
from datetime import datetime
import dateutil.parser
url = "https://data.usajobs.gov/api/Search"
querystring = {"Organization":"LF00","WhoMayApply":"All"}
headers = {
'authorization-key': "ZQbNd1iLrQ+rPN3Rj2Q9gDy2Qpi/3haXSXGuHbP1SRk=",
'user-agent': "jcarroll#fec.gov",
'host': "data.usajobs.gov",
'cache-control': "no-cache",
}
response = requests.request("GET", url, headers=headers, params=querystring)
responses=response.json()
with open('/Users/jcarroll/work/infoweb_branch4/rep_infoweb/trunk/fec_jobs.html', 'w') as jobtable:
jobtable.write("Content-Type: text/html\n\n")
table_head="""<table class="job_table" style="border:#000">
<tbody>
<tr>
<th>Vacancy</th>
<th>Grade</th>
<th>Open Period</th>
<th>Who May Apply</th>
</tr>"""
jobtable.write(table_head)
for i in responses['SearchResult']['SearchResultItems']:
start_date = dateutil.parser.parse(i['MatchedObjectDescriptor']['PositionStartDate'])
end_date = dateutil.parser.parse(i['MatchedObjectDescriptor']['PositionEndDate'])
jobtable.write("<tr><td><strong><a href='" + i['MatchedObjectDescriptor']['PositionURI'] + "'>" + i['MatchedObjectDescriptor']['PositionID'] + ", " + i['MatchedObjectDescriptor']['PositionTitle'] + "</a></strong></td><td>" + i['MatchedObjectDescriptor']['JobGrade'][0]['Code'] + "-" + i['MatchedObjectDescriptor']['UserArea']['Details']['LowGrade']+ " - " + i['MatchedObjectDescriptor']['UserArea']['Details']['HighGrade'] + "</td><td>" + start_date.strftime('%b %d, %Y')+ " - " + end_date.strftime('%b %d, %Y')+ "</td><td>" + i['MatchedObjectDescriptor']['UserArea']['Details']['WhoMayApply']['Name'] + "</td></tr>")
jobtable.write("</tbody></table>")
jobtable.close
You have a couple of options depending on what the response actually is. I assume, case 3 applies best:
# 1. Test if response body contains sth.
if response.text: # body as str
# ...
# body = response.content: # body as bytes, useful for binary data
# 2. Handle error if deserialization fails (because of no text or bad format)
try:
json_data = response.json()
# ...
except ValueError:
# no JSON returned
# 3. check that .json() did NOT return an empty dict/list
if json_data:
# ...
# 4. safeguard against malformed/unexpected data structure
try:
data_point = json_data[some_key][some_index][...][...]
except (KeyError, IndexError, TypeError):
# data does not have the inner structure you expect
# 5. check if data_point is actually something useful (truthy in this example)
if data_point:
# ...
else:
# data_point is falsy ([], {}, None, 0, '', ...)
If your APIs has been written with correct status codes, then
200 means successful response with a body
204 means successful response without body.
In python you can check your requirement as simply as the following
if 204 == response.status_code :
# do something awesome

Categories

Resources