Readtimeout error for API data with Sodapy client - python

I'm trying to make API calls on the consumer complaint dataset, available online (hhttps://data.consumerfinance.gov/dataset/Consumer-Complaints/s6ew-h6mp) with the SodaPy library (https://github.com/xmunoz/sodapy). I just want to get the csv data, the webpage says it has 906182 rows,
I've followed the example on GitHub as best as I can, but it's just not working. Here's the code:
from sodapy import Socrata
client = Socrata("data.consumerfinance.gov", "apptoken", username="myusername", password="mypassword")
results = client.get("s6ew-h6mp")
I want to get the entire dataset,but I keep getting the following error:
ReadTimeout: HTTPSConnectionPool(host='data.consumerfinance.gov', port=443): Read timed out. (read timeout=10)
Any clues on how to work through this?

By default, the Socrata connection will timeout after 10 seconds.
You are able to increase the timeout limit for the Socrata client by updating the 'timeout' instance variable like so:
from sodapy import Socrata
client = Socrata("data.consumerfinance.gov", "apptoken", username="myusername", password="mypassword")
# change the timeout variable to an arbitrarily large number of seconds
client.timeout = 50
results = client.get("s6ew-h6mp")

It's possible that the connection is timing out because the file is too large. You can try to download a subset of the data using the limit option, e.g.
results = client.get("s6ew-h6mp", limit=1000)
You can also query subsets of the data using SoQL keywords.
Otherwise, the sodapy module is built on the requests module so looking at the documentation for that could be useful.

Looking into the source code on GitHub, the constructor of Socrata has a parameter for the time out. The following code example will increase the timeout time from 10 to 25 seconds:
from sodapy import Socrata
client = Socrata("data.consumerfinance.gov", "apptoken", timeout=25)
results = client.get("s6ew-h6mp")

I think this may actually resolve the issue: make sure you request the data from the API endpoint. The 4x4 ID is slightly different (when viewing the dataset here, click Export then SODA API). Try:
results = client.get("jhzv-w97w")

Related

Steam Web API IPlayerService response empty after certain amount of requests

I'm writing a Python script that keeps track of my playtime. Every 30 seconds I 'refresh' the data I get from the Steam Web API. But after a certain amount of calls (around 60), the response is totally empty.
I am aware of the maximum of 100.000 API calls per day, but it doesn't seem like I'm getting rate-limited, because I also tested refreshing every 60 seconds, even every 5 minutes, but it always is empty after around 60 calls.
Here's some of the code:
from steam.webapi import WebAPI
from time import sleep
api = WebAPI(API_KEY, raw=False, format='json', https=True, http_timeout=10)
def game_data(steamids):
data = api.call('IPlayerService.GetOwnedGames', appids_filter=None, include_appinfo=True, include_free_sub=True, include_played_free_games=True, steamid=steamids)
return data['response']['games']
while True:
g_data = game_data(steamids)
playtime = []
for i in g_data:
playtime.append(i['playtime_forever'])
print(playtime)
sleep(30)
Output
{"response":{}}
I'm using the steam library, which works basically the same as requesting data with the request library. It seems like the problem is only with the IPlayerService interface.
In the image above I counted the amount of requests, and as you can see, the 60th request failed. It raises a keyError exception, because the response is empty.
Please let me know if you need any other information, and hopefully someone knows how to fix this.
Thanks in advance!
So I just found a fix, instead of GetOwnedGames use GetRecentlyPlayedGames.
This one doesn't seem to have a limit of 60, which the other one has. It's basically the exact same response, except it only returns games that have been played in the past 2 weeks, which is totally fine for my use.

Returning all Data from a CKAN API Request? (Python)

This is my first time using a CKAN Data API. I am trying to download public road accident data from a government website. It is only showing the first 100 rows. On the CKAN documentation it says that the default limit of rows it requests is "100".I am pretty sure you can write an ckan expression to the end of the url to give you the max rows but I am now sure how to write it. Please see python code below of what I have so of far. Is it possible? Thanks
is there any way I can write code similar to the psuedo ckan code request below?
url='https://data.gov.au/data/api/3/action/datastore_search?resource_id=d54f7465-74b8-4fff-8653-37e724d0ebbb&limit=MAX_ROWS'
CKAN Documentation reference: http://docs.ckan.org/en/latest/maintaining/datastore.html
There are several interesting fields in the documentation for ckanext.datastore.logic.action.datastore_search(), but the ones that pop out are limit and offset.
limit seems to have an absolute maximum of 32000 so depending on the amount of data you might still hit this limit.
offset seems to be the way to go. You keep calling the API with the offset increasing by a set amount until you have all the data. See the code below.
But, actually calling the API revealed something interesting. It generates a next URL which you can call, it automagically updates the offset based on the limit used (and maintaining the limit set on the initial call).
You can call this URL to get the next batch of results.
Some testing showed that it will go past the maximum though, so you need to check if the returned records are lower than the limit you use.
import requests
BASE_URL = "https://data.gov.au/data"
INITIAL_URL = "/api/3/action/datastore_search?resource_id=d54f7465-74b8-4fff-8653-37e724d0ebbb"
LIMIT = 10000
def get_all() -> list:
result = []
resp = requests.get(f"{BASE_URL}{INITIAL_URL}&limit={LIMIT}")
js = resp.json()["result"]
result.extend(js["records"])
while "_links" in js and "next" in js["_links"]:
resp = requests.get(BASE_URL + js["_links"]["next"])
js = resp.json()["result"]
result.extend(js["records"])
print(js["_links"]["next"]) # just so you know it's actually doing stuff
if len(js["records"]) < LIMIT:
# if it returned less records than the limit, the end has been reached
break
return result
print(len(get_all()))
Note, when exploring an API, it helps to check what exactly is returned. I used the simple code below to check what was returned, which made exploring the API a lot easier. Also, reading the docs helps, like the one I linked above.
from pprint import pprint
pprint(requests.get(BASE_URL+INITIAL_URL+"&limit=1").json()["result"])

API data retrieval with requests.get in Python keeps sending me back the same exact batch of data inside a Django View

Hello everyone and thanks a lot for your time.
I'm facing a really weird problem. There is this organization that has this API service up for us to retrieve data fronm them. It is a simple URL that returns a JSON with exactly 100 records.
So I created a Python code to retrieve this data and store it into our local database. So everytime we run the code, we get 100 records until the organization's API gets empty and there's nothing else to respond until the next day. So just to be clear. If the organization want's us to import 360 records, we have to run the API GET call 4 times, 3 times to get batches of 100 records and the fourth time to retrieve the last 60 records. After that if I run it for a 5th time, the response tells me there are no more records for the day.
So, my problem starts here, I wanted run the API GET call inside a while loop to retrive all the JSONs and store them on a list. But inside the while loop whenever we run again the API GET call, its response is the exact same one as the previous response. The data doesn't change at all and the API on the organization's side, doesn't send any more batches of records, because there are no requests from us. Let me show you how it looks.
import requests
listOfResponses = []
tempResponseList = []
while True:
tempResponseList = requests.get(url = apiURL, headers = headers, params = params).json()
if tempResponseList:
listOfResponses.append(tempResponseList)
tempResponseList = []
else:
print('There are no more records')
break
I have read many more articles regarding that te problem may be on the keep-alive property of the requests library in Python, but no matter what I try, it won't reset either the connection or the refresh the API GET call to retreive new data. I'm tied to having to run the program the times needed to retrieve all the data from the API.
I tried adding the { 'Connection':'close' } parameter on the headers of the request and it closed the connection but still no new data.
I tried ussing the requests.Session() method and closing the session but still no solution:
s = requests.Session()
#all the code above executed but instead of requests.get, I used s.get
#and then it was followed by this
s.close()
I also even tried a solution posted here at the forum that suggested adding this code after the s.close():
s.mount('http://', requests.adapters.HTTPAdapter())
s.mount('https://', requests.adapters.HTTPAdapter())
I'm a little bit confused with this so any help, observations or suggestions are greatly appreciated.

Tweepy error: code: 261 Application cannot perform write actions. How to work around this?

I was just trying to create a bot that uploads a tweet after a time interval (not necessarily regular). However, after a certain amount of tweets my app gets limited and restricted by twitter. Is there a work around for this?
The max number of tweets I've been able to send has been 30. I even tried using sleep() with random time limits but it still doesn't work.
import tweepy
import random
import time
consumerKey=''
consumerSecret=''
accessToken=''
accessTokenSec=''
def OAuth():
try:
auth=tweepy.OAuthHandler(consumerKey,consumerSecret)
auth.set_access_token(accessToken,accessTokenSec)
return auth
except Exception as e:
return None
oauth=OAuth()
api=tweepy.API(oauth,wait_on_rate_limit=True)
tweets=['i love roses','7 is my favourite number', 'Studies are hard','Guess how many donuts I just ate','A cat ran over my foot']
for i in range(40):
num2=random.randint(0,4)
randtime=random.randint(60,120)
api.update_with_media(imglink,+tweets[num2])
print("status uploaded")
time.sleep(randtime)
Same problem, unfortunately Twitter API have restrictions for normal users.
You need to are a company or something else. Twitter need to know how you use the data. There is no way sorry...
Same thing happened to me. You could create a new standalone app (in the overview) and replace the consumer and access tokens with the new ones. It worked for me.

Connection timeout error in bitly URL shortener

I am trying to use bitly-api-python library to shorten all the urls in an array.
def bitly3_shorten_oauth(url):
c = bitly3.Connection(access_token= bitly_access_token)
sh = c.shorten(url)
return sh['url']
for i in arr:
print i[1] , bitly3_shorten_oauth(i[1])
I am calling them one after other without any timeout, since I couldn't find any such precaution in the best practices documentation of bitly.
Here is my complete code, please have a look : http://pastie.org/8419004
but what is happening is that it shortens 2 or 3 of the urls and then goes to a connection timeout error
What might be causing this error and how do I debug it ?
From the documentation you linked:
bitly currently institutes per-hour, per-minute,
and per-IP rate limits for each API method
And
High-Volume Shorten Requests
If you need to shorten a large number of URLs at once, we recommend that
you leave ample time to spread these requests out over many hours. Our API
rate limits reset hourly, and rate limited batch requests can be resumed at
the top of the hour.
So it does look like you simply need to slow down your code.
If anybody finds this outdated post as a starting point, please note that the Bit.ly API rejects non-OAuth API keys nowadays.
In python get your API key with curl:
curl -u "username:password" -X POST "https://api-ssl.bitly.com/oauth/access_token"
Doc link
As of 2019, there is the bitlyshortener package, although it works only with Python ≥3.7. I have not experienced any error using it.

Categories

Resources