I'm going crazy trying to get data through an API call using request and pandas. It looks like it's nested data, but I cant get the data i need.
https://xorosoft.docs.apiary.io/#reference/sales-orders/get-sales-orders
above is the api documentation. I'm just trying to keep it simple and get the itemnumber and qtyremainingtoship, but i cant even figure out how to access the nested data. I'm trying to use DataFrame to get it, but am just lost. any help would be appreciated. i keep getting stuck at the 'Data' level.
type(json['Data'])
df = pd.DataFrame(['Data'])
df.explode('SoEstimateHeader')
df.explode('SoEstimateHeader')
Cell In [64], line 1
df.explode([0:])
^
SyntaxError: invalid syntax
I used the link to grab a sample response from the API documentation page you provided. From the code you provided it looks like you are already able to get the data and I'm assuming the you have it as a dictionary type already.
From what I can tell I don't think you should be using pandas, unless its some downstream requirement in the task you are doing. But to get the ItemNumber & QtyRemainingToShip you can use the code below.
# get the interesting part of the data out of the api response
data_list = json['Data']
#the data_list is only one element long, so grab the first element which is of type dictionary
data = data_list[0]
# the dictionary has two keys at the top level
so_estimate_header = data['SoEstimateHeader']
# similar to the data list the value associated with "SoEstimateItemLineArr" is of type list and has 1 element in it, so we grab the first & only element.
so_estimate_item_line_arr = data['SoEstimateItemLineArr'][0]
# now we can grab the pieces of information we're interested in out of the dictionary
qtyremainingtoship = so_estimate_item_line_arr["QtyRemainingToShip"]
itemnumber = so_estimate_item_line_arr["ItemNumber"]
print("QtyRemainingToShip: ", qtyremainingtoship)
print("ItemNumber: ", itemnumber)
Output
QtyRemainingToShip: 1
ItemNumber: BC
Side Note
As a side note I wouldn't name any variables json because thats also the name of a popular library in python for parsing json, so that will be confusing to future readers and will clash with the name if you end up having to import the json library.
I'm working on a divvy dataset project.
I want to scrape information for each suggestion location and comments provided from here http://suggest.divvybikes.com/.
Am I able to scrape this information from Mapbox? It is displayed on a map so it must have the information somewhere.
I visited the page, and logged my network traffic using Google Chrome's Developer Tools. Filtering the requests to view only XHR (XmlHttpRequest) requests, I saw a lot of HTTP GET requests to various REST APIs. These REST APIs return JSON, which is ideal. Only two of these APIs seem to be relevant for your purposes - one is for places, the other for comments associated with those places. The places API's JSON contains interesting information, such as place ids and coordinates. The comments API's JSON contains all comments regarding a specific place, identified by its id. Mimicking those calls is pretty straightforward with the third-party requests module. Fortunately, the APIs don't seem to care about request headers. The query-string parameters (the params dictionary) need to be well-formulated though, of course.
I was able to come up with the following two functions: get_places makes multiple calls to the same API, each time with a different page query-string parameter. It seems that "page" is the term they use internally to split up all their data into different chunks - all the different places/features/stations are split up across multiple pages, and you can only get one page per API call. The while-loop accumulates all places in a giant list, and it keeps going until we receive a response which tells us there are no more pages. Once the loop ends, we return the list of places.
The other function is get_comments, which takes a place id (string) as a parameter. It then makes an HTTP GET request to the appropriate API, and returns a list of comments for that place. This list may be empty if there are no comments.
def get_places():
import requests
from itertools import count
api_url = "http://suggest.divvybikes.com/api/places"
page_counter = count(1)
places = []
for page_nr in page_counter:
params = {
"page": str(page_nr),
"include_submissions": "true"
}
response = requests.get(api_url, params=params)
response.raise_for_status()
content = response.json()
places.extend(content["features"])
if content["metadata"]["next"] is None:
break
return places
def get_comments(place_id):
import requests
api_url = "http://suggest.divvybikes.com/api/places/{}/comments".format(place_id)
response = requests.get(api_url)
response.raise_for_status()
return response.json()["results"]
def main():
from operator import itemgetter
places = get_places()
place_id = places[12]["id"]
print("Printing comments for the thirteenth place (id: {})\n".format(place_id))
for comment in map(itemgetter("comment"), get_comments(place_id)):
print(comment)
return 0
if __name__ == "__main__":
import sys
sys.exit(main())
Output:
Printing comments for the thirteenth place (id: 107062)
I contacted Divvy about this five years ago and would like to pick the conversation back up! The Evanston Divvy bikes are regularly spotted in Wilmette and we'd love to expand the system for riders. We could easily have four stations - at the Metra Train Station, and the CTA station, at the lakefront Gillson Park and possibly one at Edens Plaza in west Wilmette. Please, please, please contact me directly. Thanks.
>>>
For this example, I'm printing all the comments for the 13th place in our list of places. I picked that one because it is the first place which actually has comments (0 - 11 didn't have any comments, most places don't seem to have comments). In this case, this place only had one comment.
EDIT - If you wanted to save the place ids, latitude, longitude and comments in a CSV, you can try changing the main function to:
def main():
import csv
print("Getting places...")
places = get_places()
print("Got all places.")
fieldnames = ["place id", "latitude", "longitude", "comments"]
print("Writing to CSV file...")
with open("output.csv", "w") as file:
writer = csv.DictWriter(file, fieldnames)
writer.writeheader()
num_places_to_write = 25
for place_nr, place in enumerate(places[:num_places_to_write], start=1):
print("Writing place #{}/{} with id {}".format(place_nr, num_places_to_write, place["id"]))
writer.writerow(dict(zip(fieldnames, [place["id"], *place["geometry"]["coordinates"], [c["comment"] for c in get_comments(place["id"])]])))
return 0
With this, I got results like:
place id,latitude,longitude,comments
107098,-87.6711076553,41.9718155716,[]
107097,-87.759540081,42.0121073671,[]
107096,-87.747695446,42.0263916146,[]
107090,-87.6642036438,42.0162096564,[]
107089,-87.6609444613,41.8852953922,[]
107083,-87.6007853815,41.8199433342,[]
107082,-87.6355862613,41.8532736671,[]
107075,-87.6210737228,41.8862644836,[]
107074,-87.6210737228,41.8862644836,[]
107073,-87.6210737228,41.8862644836,[]
107065,-87.6499611139,41.9627251578,[]
107064,-87.6136027649,41.8332984674,[]
107062,-87.7073025402,42.0760990584,"[""I contacted Divvy about this five years ago and would like to pick the conversation back up! The Evanston Divvy bikes are regularly spotted in Wilmette and we'd love to expand the system for riders. We could easily have four stations - at the Metra Train Station, and the CTA station, at the lakefront Gillson Park and possibly one at Edens Plaza in west Wilmette. Please, please, please contact me directly. Thanks.""]"
In this case, I used the list-slicing syntax (places[:num_places_to_write]) to only write the first 25 places to the CSV file, just for demonstration purposes. However, after about the first thirteen were written, I got this exception message:
A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
So, I'm guessing that the comment-API doesn't expect to receive so many requests in such a short amount of time. You may have to sleep in the loop for a bit to get around this. It's also possible that the API doesn't care, and just happened to timeout.
I am trying to write my first API query with python. I am calling an extremely simple dataset. (http://api.open-notify.org/astros.json). This displays information about the number of people in space.
I can return the number, but I want to try and display the names. So far I have:
import requests
response = requests.get("http://api.open-notify.org/astros.json")
data = response.json()
print(data["number"])
Any help would be greatly appreciated.
If you want to get names or crafts just do this:
print("Fist name: ",data["people"][0]["name"])
print("Fist craft: ",data["people"][0]["craft"])
Eventually you can put it in a for loop like this:
for i in range(len(data["people"])):
print(data["people"][i]["name"])
You should iterate by data['people'] and then get the name:
for people in data['people']:
print(people['name'])
I've crawled a tracklist of 36.000 songs, which have been played on the Danish national radio station P3. I want to do some statistics on how frequently each of the genres have been played within this period, so I figured the discogs API might help labeling each track with genre. However, the documentation for the API doesent seem to include an example for querying the genre of a particular song.
I have a CSV-file with with 3 columns: Artist, Title & Test(Test where i want the API to label each song with the genre).
Here's a sample of the script i've built so far:
import json
import pandas as pd
import requests
import discogs_client
d = discogs_client.Client('ExampleApplication/0.1')
d.set_consumer_key('key-here', 'secret-here')
input = pd.read_csv('Desktop/TEST.csv', encoding='utf-8',error_bad_lines=False)
df = input[['Artist', 'Title', 'Test']]
df.columns = ['Artist', 'Title','Test']
for i in range(0, len(list(df.Artist))):
x = df.Artist[i]
g = d.artist(x)
df.Test[i] = str(g)
df.to_csv('Desktop/TEST2.csv', encoding='utf-8', index=False)
This script has been working with a dummy file with 3 records in it so far, for mapping the artist of a given ID#. But as soon as the file gets larger(ex. 2000), it returns a HTTPerror when it cannot find the artist.
I have some questions regarding this approach:
1) Would you recommend using the search query function in the API for retrieving a variable as 'Genre'. Or do you think it is possible to retrieve Genre with a 'd.' function from the API?
2) Will I need to aquire an API-key? I have succesfully mapped the 3 records without an API-key so far. Looks like the key is free though.
Here's the guide I have been following:
https://github.com/discogs/discogs_client
And here's the documentation for the API:
https://www.discogs.com/developers/#page:home,header:home-quickstart
Maybe you need to re-read the discogs_client examples, i am not an expert myself, but a newbie trying to use this API.
AFAIK, g = d.artist(x) fails because x must be a integer not a string.
So you must first do a search, then get the artist id, then d.artist(artist_id)
Sorry for no providing an example, i am python newbie right now ;)
Also have you checked acoustid for
It's a probably a rate limit.
Read the status code of your response, you should find an 429 Too Many Requests
Unfortunately, if that's the case, the only solution is to add a sleep in your code to make one request per second.
Checkout the api doc:
http://www.discogs.com/developers/#page:home,header:home-rate-limiting
I found this guide:
https://github.com/neutralino1/discogs_client.
Access the api with your key and try something like:
d = discogs_client.Client('something.py', user_token=auth_token)
release = d.release(774004)
genre = release.genres
If you found a better solution please share.
I'm trying to get my head around the dbpedia JSON schema and can't figure out an efficient way of extracting a specific node:
This is what dbpedia gives me:
http://dbpedia.org/data/Ceramic_art.json
I've got the whole thing as a JSON object in Python but don't really understand how to get the english abstract from this data. I've gotten this far:
u = "http://dbpedia.org/data/Ceramic_art.json"
data = urlfetch.fetch(url=u)
json_data = json.loads(data.content)
for j in json_data["http://dbpedia.org/resource/Ceramic_art"]:
if(j == "http://dbpedia.org/ontology/abstract"):
print "it's here"
Not sure how to proceed from here. As you can see there are multiple languages. I need to get the english abstract.
Thanks for your help,
g
print [abstract['value'] for abstract in json_data["http://dbpedia.org/resource/Ceramic_art"]["http://dbpedia.org/ontology/abstract"] if abstract['lang'] == 'en'][0]
Obviously, you'd want to do more error checking than that, in case the data is bad, but that's the basic idea.
It's a list of dicts. Just iterate through the elements of the list until you find the one whose value for u'lang' is u'en'.