how to handle 30x when using feedparser to parse rss url - python

Now I am using Python 3 feedparser to parse some RSS url, this is my code:
if __name__ == "__main__":
try:
feed = feedparser.parse("https://ucw.moe/feed/rss")
print(feed.status)
except Exception as e:
logger.error(e)
but I get this error:
HTTP Error 302: The HTTP server returned a redirect error that would lead to an infinite loop.
The last 30x error message was:
Found
what should I do to fix this problem?

Use requests to get the feed before like:
import requests
import feedparser
page = requests.get("https://ucw.moe/feed/rss")
print(page.status_code)
feed = feedparser.parse(page.content)

Related

Python request.head(): How to get response code using python request module instead of Exception

It returns the 200, 301 and some other responses as expected. But when i try to get responses of some non existent website, instead of returning codes, it throws exception. Below is the code when i tried to get response code for "www.googl.com", I'm expecting a response code for this scenario.Even i can handle it in try and except but actually i need response code.
Code:
import requests
print (requests.head("https://www.googl.com"))
Since nothing is being returned, there is no response code, because there was no response.
Your best bet is just doing this:
import requests
try:
response = requests.head("https://www.googl.com")
except:
response = 404 # or whatever you want.

Bad URLs in Python 3.4.3

I am new to this so please help me. I am using urllib.request to open and reading webpages. Can someone tell me how can my code handle redirects, timeouts, badly formed URLs?
I have sort of found a way for timeouts, I am not sure if it is correct though. Is it? All opinions are welcomed! Here it is:
from socket import timeout
import urllib.request
try:
text = urllib.request.urlopen(url, timeout=10).read().decode('utf-8')
except (HTTPError, URLError) as error:
logging.error('Data of %s not retrieved because %s\nURL: %s', name, error, url)
except timeout:
logging.error('socket timed out - URL %s', url)
Please help me as I am new to this. Thanks!
Take a look at the urllib error page.
So for following behaviours:
Redirect: HTTP code 302, so that's a HTTPError with a code. You could also use the HTTPRedirectHandler instead of failing.
Timeouts: You have that correct.
Badly formed URLs: That's a URLError.
Here's the code I would use:
from socket import timeout
import urllib.request
try:
text = urllib.request.urlopen("http://www.google.com", timeout=0.1).read()
except urllib.error.HTTPError as error:
print(error)
except urllib.error.URLError as error:
print(error)
except timeout as error:
print(error)
I can't finding a redirecting URL, so I'm not exactly sure how to check the HTTPError is a redirect.
You might find the requests package is a bit easier to use (it's suggested on the urllib page).
Using requests package I was able to find a better solution. With the only exception you need to handle are:
try:
r = requests.get(url, timeout =5)
except requests.exceptions.Timeout:
# Maybe set up for a retry, or continue in a retry loop
except requests.exceptions.TooManyRedirects as error:
# Tell the user their URL was bad and try a different one
except requests.exceptions.ConnectionError:
# Connection could not be completed
except requests.exceptions.RequestException as e:
# catastrophic error. bail.
And to get the text of that page, all you need to do is:
r.text

How to send one more request when failed for once with python-requests?

I am writing a url fetcher. When I send a request like:
import requests
response = requests.get("http://example.com")
Sometimes an error like this occurs:
ConnectionError: ('Connection aborted.', BadStatusLine(""''''"))
But when I try one more time, it fixes. So I would like to send one more time when such an error occurs again. How can I do that? Thank you in advance!
Repeat the request again when the exception is raised.
import requests
url = "http://example.com"
try:
response = requests.get(url)
except requests.exception.ConnectionError:
response = requests.get(url)

How to accessing xml response when site also issues an HTTP error code

the following url returns the expected resonse in the browser:
http://ws.audioscrobbler.com/2.0/?method=user.getinfo&user=notonfile99&api_key=8e9de6bd545880f19d2d2032c28992b4
<lfm status="failed">
<error code="6">No user with that name was found</error>
</lfm>
But I am unable to access the xml in Python via the following code as an HTTPError exception is raised:
"due to an HTTP Error 400: Bad Request"
import urllib
urlopen('http://ws.audioscrobbler.com/2.0/?method=user.getinfo&user=notonfile99&api_key=8e9de6bd545880f19d2d2032c28992b4')
I see that I can work aound this via using urlretrieve rather than urlopen, but the html response gets written to disc.
Is there a way, just using the python v2.7 standard library, where I can get hold of the xml response, without having to read it from disc, and do housekeeping?
I see that this question has been asked before in a PHP context, but I don't know how to apply the answer to Python:
DOMDocument load on a page returning 400 Bad Request status
Copying from here: http://www.voidspace.org.uk/python/articles/urllib2.shtml#httperror
The exception that is thrown contains the full body of the error page:
#!/usr/bin/python2
import urllib2
try:
resp = urllib2.urlopen('http://ws.audioscrobbler.com/2.0/?method=user.getinfo&user=notonfile99&api_key=8e9de6bd545880f19d2d2032c28992b4')
except urllib2.HTTPError, e:
print e.code
print e.read()

google search error

I don't understand why this code works:
import urllib2
url = urllib2.urlopen('http://www.google.fr/search?hl=en&q=voiture').read()
print url
and not this one :
import urllib2
url = urllib2.urlopen('http://www.google.fr/search?hl=en&q=voiture&start=2&sa=N').read()
print url
it displays the following error:
**urllib2.HTTPError: HTTP Error 403: Forbidden**
Thanks ;)

Categories

Resources