I am trying to use bitly-api-python library to shorten all the urls in an array.
def bitly3_shorten_oauth(url):
c = bitly3.Connection(access_token= bitly_access_token)
sh = c.shorten(url)
return sh['url']
for i in arr:
print i[1] , bitly3_shorten_oauth(i[1])
I am calling them one after other without any timeout, since I couldn't find any such precaution in the best practices documentation of bitly.
Here is my complete code, please have a look : http://pastie.org/8419004
but what is happening is that it shortens 2 or 3 of the urls and then goes to a connection timeout error
What might be causing this error and how do I debug it ?
From the documentation you linked:
bitly currently institutes per-hour, per-minute,
and per-IP rate limits for each API method
And
High-Volume Shorten Requests
If you need to shorten a large number of URLs at once, we recommend that
you leave ample time to spread these requests out over many hours. Our API
rate limits reset hourly, and rate limited batch requests can be resumed at
the top of the hour.
So it does look like you simply need to slow down your code.
If anybody finds this outdated post as a starting point, please note that the Bit.ly API rejects non-OAuth API keys nowadays.
In python get your API key with curl:
curl -u "username:password" -X POST "https://api-ssl.bitly.com/oauth/access_token"
Doc link
As of 2019, there is the bitlyshortener package, although it works only with Python ≥3.7. I have not experienced any error using it.
Related
I would like to get the information about all reviews from my server. That's my code that I used to achieve my goal.
from rbtools.api.client import RBClient
client = RBClient('http://my-server.net/')
root = client.get_root()
reviews = root.get_review_requests()
The variable reviews contains just 25 review requests (I expected much, much more). What's even stranger I tried something a bit different
count = root.get_review_requests(counts_only=True)
Now count.count is equal to 17164. How can I extract the rest of my reviews? I tried to check the official documentation but I haven't found anything connected to my problem.
According to the documentation (https://www.reviewboard.org/docs/manual/dev/webapi/2.0/resources/review-request-list/#webapi2.0-review-request-list-resource), counts_only is only a Boolean flag that indicates following:
If specified, a single count field is returned with the number of results, instead of the results themselves.
But, what you could do, is to provide it with status, so:
count = root.get_review_requests(counts_only=True, status='all')
should return you all the requests.
Keep in mind that I didn't test this part of the code locally. I referred to their repo test example -> https://github.com/reviewboard/rbtools/blob/master/rbtools/utils/tests/test_review_request.py#L643 and the documentation link posted above.
You have to use pagination (unfortunately I can't provide exact code without ability to reproduce your question):
The maximum number of results to return in this list. By default, this is 25. There is a hard limit of 200; if you need more than 200 results, you will need to make more than one request, using the “next” pagination link.
Looks like pagination helper class also available.
If you want to get 200 results you may set max_results:
requests = root.get_review_requests(max_results=200)
Anyway HERE is a good example how to iterate over results.
Also I don't recommend to get all 17164 results by one request even if it possible. Because total data of response will be huge (let's say if size one a result is 10KB total size will be more than 171MB)
This is my first time using a CKAN Data API. I am trying to download public road accident data from a government website. It is only showing the first 100 rows. On the CKAN documentation it says that the default limit of rows it requests is "100".I am pretty sure you can write an ckan expression to the end of the url to give you the max rows but I am now sure how to write it. Please see python code below of what I have so of far. Is it possible? Thanks
is there any way I can write code similar to the psuedo ckan code request below?
url='https://data.gov.au/data/api/3/action/datastore_search?resource_id=d54f7465-74b8-4fff-8653-37e724d0ebbb&limit=MAX_ROWS'
CKAN Documentation reference: http://docs.ckan.org/en/latest/maintaining/datastore.html
There are several interesting fields in the documentation for ckanext.datastore.logic.action.datastore_search(), but the ones that pop out are limit and offset.
limit seems to have an absolute maximum of 32000 so depending on the amount of data you might still hit this limit.
offset seems to be the way to go. You keep calling the API with the offset increasing by a set amount until you have all the data. See the code below.
But, actually calling the API revealed something interesting. It generates a next URL which you can call, it automagically updates the offset based on the limit used (and maintaining the limit set on the initial call).
You can call this URL to get the next batch of results.
Some testing showed that it will go past the maximum though, so you need to check if the returned records are lower than the limit you use.
import requests
BASE_URL = "https://data.gov.au/data"
INITIAL_URL = "/api/3/action/datastore_search?resource_id=d54f7465-74b8-4fff-8653-37e724d0ebbb"
LIMIT = 10000
def get_all() -> list:
result = []
resp = requests.get(f"{BASE_URL}{INITIAL_URL}&limit={LIMIT}")
js = resp.json()["result"]
result.extend(js["records"])
while "_links" in js and "next" in js["_links"]:
resp = requests.get(BASE_URL + js["_links"]["next"])
js = resp.json()["result"]
result.extend(js["records"])
print(js["_links"]["next"]) # just so you know it's actually doing stuff
if len(js["records"]) < LIMIT:
# if it returned less records than the limit, the end has been reached
break
return result
print(len(get_all()))
Note, when exploring an API, it helps to check what exactly is returned. I used the simple code below to check what was returned, which made exploring the API a lot easier. Also, reading the docs helps, like the one I linked above.
from pprint import pprint
pprint(requests.get(BASE_URL+INITIAL_URL+"&limit=1").json()["result"])
I'm trying to make API calls on the consumer complaint dataset, available online (hhttps://data.consumerfinance.gov/dataset/Consumer-Complaints/s6ew-h6mp) with the SodaPy library (https://github.com/xmunoz/sodapy). I just want to get the csv data, the webpage says it has 906182 rows,
I've followed the example on GitHub as best as I can, but it's just not working. Here's the code:
from sodapy import Socrata
client = Socrata("data.consumerfinance.gov", "apptoken", username="myusername", password="mypassword")
results = client.get("s6ew-h6mp")
I want to get the entire dataset,but I keep getting the following error:
ReadTimeout: HTTPSConnectionPool(host='data.consumerfinance.gov', port=443): Read timed out. (read timeout=10)
Any clues on how to work through this?
By default, the Socrata connection will timeout after 10 seconds.
You are able to increase the timeout limit for the Socrata client by updating the 'timeout' instance variable like so:
from sodapy import Socrata
client = Socrata("data.consumerfinance.gov", "apptoken", username="myusername", password="mypassword")
# change the timeout variable to an arbitrarily large number of seconds
client.timeout = 50
results = client.get("s6ew-h6mp")
It's possible that the connection is timing out because the file is too large. You can try to download a subset of the data using the limit option, e.g.
results = client.get("s6ew-h6mp", limit=1000)
You can also query subsets of the data using SoQL keywords.
Otherwise, the sodapy module is built on the requests module so looking at the documentation for that could be useful.
Looking into the source code on GitHub, the constructor of Socrata has a parameter for the time out. The following code example will increase the timeout time from 10 to 25 seconds:
from sodapy import Socrata
client = Socrata("data.consumerfinance.gov", "apptoken", timeout=25)
results = client.get("s6ew-h6mp")
I think this may actually resolve the issue: make sure you request the data from the API endpoint. The 4x4 ID is slightly different (when viewing the dataset here, click Export then SODA API). Try:
results = client.get("jhzv-w97w")
I am trying to get some data from upwork's api.
I am using Requests-OAuthlib .For one API request it works but for second one I get this error: "Duplicate timestamp/nonce combination, possible replay attack. Request rejected."
So I tried to modify Requests-OAuthlib and change timestamp and nonce manually by putting this inside constructor:
ur = u''+str(SystemRandom().random())
ur = ur.replace("0.","")
self.client.nonce = ur
ts = u'' + str(int(time()))
self.client.timestamp = ts
right after self.client = client_class( ...
But it still does not work.
I am a complete beginner on both python and OAuth so I would rather use this library instead of building the request url manually.
Here's the source code of the library Requests-OAuthlib source code
If I print them at the end of call they have the same values as the ones I set but setting them doesn't seem to have an effect , upwork still says replay attack.
Also I tried putting them in headers, still not working
r.headers['oauth_nonce'] = ur
r.headers['oauth_timestamp'] = ts
Update:
I printed r.headers and it containes these:
for first call
oauth_nonce="55156586115444478931487605669", oauth_timestamp="1487605669"
for second call
oauth_nonce="117844793977954758411487605670", oauth_timestamp="1487605670"
Nonces and timestamps are different from one another. So why is upwork giving me : "Duplicate timestamp/nonce combination, possible replay attack. Request rejected." ?
Update2: Probably it's just some crazy upwork behaviour, still waiting for an answer from them. I believe that because if I change something in the endpoint it's working, so nonces/timestamps seem unrelated to the problem.
Update3: I got an answer from upwork. Sincerly I can't understand the answer but if you consider it makes sense you can close the question. I found a workaround anyway.
https://community.upwork.com/t5/API-Questions-Answers/Wrong-API-error-message/td-p/306489
For anyone coming across this issue, I was banging my head against it for a few hours until I finally used Fiddler to look at the requests and responses.
The server was responding with a 302 redirect, and my http library was helpfully following the redirect and sending the same headers - which of course included the duplicate nonce and timestamp.
Does anyone have experience with the Dota 2 API library in Python called 'dota2api'? I wish to pull a list of 200 recent games filtered by various criteria. I'm using the get_match_history() request (see link). Here's my code:
import dota2api
key = '<key>'
api = dota2api.Initialise(key)
match_list = api.get_match_history(matches_requested=200)
I haven't specified any filters yet, since I can't even get the matches_requested argument to work. When I run this code, I get exactly 100 matches. I fact, no matter how I specify the matches_requested argument, I allways get 100 matches.
Does anyone know if I'm specifying the argument wrong or some other reason why it's working as intended?
Thanks in advance.
For such rarely used libraries it is hard to get an answer here.
I have found this issue on the library's Github:
You can't get more than 500 matches through get_match_history, it's
limited by valve api. One approach you can do is alternate hero_id,
like, requesting with account_id, hero_id and start_at_match_id (none
if first request), values assigned, this way you can get at least 500
matches of each hero from that account_id.
Probably that has since changed and now the parameter is ignored by the API completely. Try creating a new issue on the Github.