Using httplib2 in python 3 properly? (Timeout problems) - python

Hey, first time post, I'm really stuck on httplib2. I've been reading up on it from diveintopython3.org, but it mentions nothing about a timeout function. I look up the documentation, but the only thing I see is an ability to put a timeout int but there are no units specified (seconds? milliseconds? What's the default if None?) This is what I have (I also have code to check what the response is and try again, but it's never tried more than once)
h = httplib2.Http('.cache', timeout=None)
for url in list:
response, content = h.request(url)
more stuff...
So the Http object stays around until some arbitrary time, but I'm downloading a ton of pages from the same server, and after a while, it hangs on getting a page. No errors are thrown, the thing just hangs at a page. So then I try:
h = httplib2.Http('.cache', timeout=None)
for url in list:
try:
response, content = h.request(url)
except:
h = httplib2.Http('.cache', timeout=None)
more stuff...
But then it recreates another Http object every time (goes down the 'except' path)...I dont understand how to keep getting with the same object, until it expires and I make another. Also, is there a way to set a timeout on an individual request?
Thanks for the help!

Due to bug httplib2 measured the timeout in seconds multiplied by 2 until version 0.7.5 (2012-08-28).

Set the timeout to 1, and you'll pretty quickly know if it means one millisecond or one second.
I don't know what your try/except should solve, if it hangs on h.request(url) in one case it should hang in the other.
If you run out of memory in that code, then httplib2 doesn't get garbage collected properly. It may be that you have circular references (although it doesn't look like it above) or it may be a bug in httlib2.

Related

How to handle a timeout

I have an application using authorize.net, and we seem to have connection issues from time to time.
I don't think that the response is ever returned here.
response = createtransactioncontroller.getresponse()
When looking at the code for the authorizenet library, I noticed that the timeout is not being set.
self._httpResponse = requests.post(self.endpoint, data=xmlRequest, headers=constants.headers, proxies=proxyDictionary)
If that's the case, how should I handle the case that the response never gets returned? It seems like a hack to add a timeout to the authorizenet library because it is used outside the scope of my project, but it also doesn't seem like best practice to add my own timer. What solution would you recommend?

How can I speed this up? (urllib2, requests)

Problem: I am trying to validate a captcha can be anything from 0000-9999, using the normal requests module it takes around 45 minutes to go through all of them (0000-9999). How can I multithread this or speed it up? Would be really helpful if I can get the HTTP Status Code from the site to see if i successfully got the code correct or if it is incorrect (200 = correct, 400 = incorrect) If I could get two examples (GET and POST) of this that would be fantastic!
I have been searching for quite some time, most of the modules I look at are outdated (I have been using grequests recently)
example url = https://www.google.com/
example params = captcha=0001
example post data = {"captcha":0001}
Thank you!
You really shouldn't be trying to bypass a captcha programmatically!
You could use several threads to make simultaneous requests but at that point the service you're attacking will most likely ban your IP. At the very least, they've probably got throttling on the service; There's a reason it's supposed to take 45 minutes.
Threading in Python is usually achieved by creating a thread object with a run() method containing your long running code. In your case, you might want to create a thread object which takes a number range to poll. Once instantiated, you'd call the .start() method to have that thread begin working. If any thread should get a success message it would return a message to the main thread, halt itself, and the main thread could then tell all the other threads in the thread pool to stop.

How to wait for a POST request (requests.post) completion in Python?

I'm using the requests library in Python to do a POST call.
My POST call takes about 5 minutes to be completed. It will create a file in a S3 bucket.
After that, I want to download this file. However, I need to create an extra logic to wait for my POST to finish before executing the next line on my code to download the file.
Any suggestions?
Is it possible to use the subprocess library for this? If so, how would be the syntax?
Code:
import requets
r = requests.post(url)
# wait for the post call to finish
download_file(file_name)
It should already wait until it's finished.
Python, unlike Node.js, will block requests by default. You'd have to explicitly run it in another thread if you wanted to run it async. If it takes your POST request 5 minutes to fetch, then the download line won't run until the 5 minutes are up and the POST request is completed.
The question says the POST request takes 5 minutes to return, but maybe that's not quite right? Maybe the POST request returns promptly, but the server continues to grind 5 minutes creating the file for the S3 bucket? In that case, the need for a delay makes sense. The fact that a separate download is needed at all tends to support this interpretation (the requested info doesn't come back from the request itself).
If a failed download throws an exception, try this:
import time
r = requests.post(url)
while True:
time.sleep(60) # sixty second delay
try:
download_file(file_name)
break
except Error:
print ("File not ready, trying again in one minute")
Or if download_file simply returns False on failure:
import time
r = requests.post(url)
while True:
time.sleep(60) # sixty second delay
if download_file(file_name):
break
print ("File not ready, trying again in one minute")
Since my interpretation of the question is speculative, I'll delete this answer if it's not to the point.
Michael's answer is correct. However, in case you're running Selenium to crawl the webpage, the frontend JS takes some time to appropriately render and show the request result. In such scenarios I tend to use:
import time
time.sleep(5)
That said, in such cases you have explicit and implicit waits as other options, too. Take a look at: Slenium Waits documentation.
In case you're directly sending requests to the API, Python waits for the response until it's complete.

Various timeouts for python httplib

I'm implementing a little service that fetches web pages from various servers. I need to be able to configure different types of timeouts. I've tried mucking around with the settimeout method of sockets but it's not exactly as I'd like it. Here are the problems.
I need to specify a timeout for the initial DNS lookup. I understand this is done when I instantiate the HTTPConnection at the beginning.
My code is written in such a way that I first .read a chunk of data (around 10 MB) and if the entire payload fits in this, I move on to other parts of the code. If it doesn't fit in this, I directly stream the payload out to a file rather than into memory. When this happens, I do an unbounded .read() to get the data and if the remote side sends me, say, a byte of data every second, the connection just keeps waiting receiving one byte every second. I want to be able to disconnect with a "you're taking too long". A thread based solution would be the last resort.
httplib is to straight forward for what you are looking for.
I would recommend to take a look for http://pycurl.sourceforge.net/ and the http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPTTIMEOUT option.
The http://curl.haxx.se/libcurl/c/curl_easy_setopt.html#CURLOPT_NOSIGNAL option sounds also interesting:
Consider building libcurl with c-ares support to enable asynchronous DNS lookups, which enables nice timeouts for name resolves without signals.
Have you tried requests?
You can set timeouts conveniently http://docs.python-requests.org/en/latest/user/quickstart/#timeouts
>>> requests.get('http://github.com', timeout=0.001)
EDIT:
I missed the part 2 of the question. For that you could use this:
import sys
import signal
import requests
class TimeoutException(Exception):
pass
def get_timeout(url, dns_timeout=10, load_timeout=60):
def timeout_handler(signum, frame):
raise TimeoutException()
signal.signal(signal.SIGALRM, timeout_handler)
signal.alarm(load_timeout) # triger alarm in seconds
try:
response = requests.get(url, timeout=dns_timeout)
except TimeoutException:
return "you're taking too long"
return response
and in your code use the get_timeout function.
If you need the timeout to be available for other functions you could create a decorator.
Above code from http://pguides.net/python-tutorial/python-timeout-a-function/.

how to time-out gracefully while downloading with python

I'm downloading a huge set of files with following code in a loop:
try:
urllib.urlretrieve(url2download, destination_on_local_filesystem)
except KeyboardInterrupt:
break
except:
print "Timed-out or got some other exception: "+url2download
If the server times-out on URL url2download when connection is just initiating, the last exception is handled properly. But sometimes server responded, and downloading is started, but the server is so slow, that it'll takes hours for even one file, and eventually it returns something like:
Enter username for Clients Only at albrightandomalley.com:
Enter password for in Clients Only at albrightandomalley.com:
and just hangs there (although no username/passworde is aksed if the same link is downloaded through the browser).
My intention in this situation would be -- skip this file and go to the next one. The question is -- how to do that? Is there a way in python to specify how long is OK to work on downloading one file, and if more time is already spent, interrupt, and go forward?
Try:
import socket
socket.setdefaulttimeout(30)
If you're not limited to what's shipped with python out of the box, then the urlgrabber module might come in handy:
import urlgrabber
urlgrabber.urlgrab(url2download, destination_on_local_filesystem,
timeout=30.0)
There's a discussion of this here. Caveats (in addition to the ones they mention): I haven't tried it, and they're using urllib2, not urllib (would that be a problem for you?) (Actually, now that I think about it, this technique would probably work for urllib, too).
This question is more general about timing out a function:
How to limit execution time of a function call in Python
I've used the method described in my answer there to write a wait for text function that times out to attempt an auto-login. If you'd like similar functionality you can reference the code here:
http://code.google.com/p/psftplib/source/browse/trunk/psftplib.py

Categories

Resources