urllib2.HTTPError: HTTP Error 404: Not Found

urllib2.HTTPError: HTTP Error 404: Not Found - python

My Error Message when running my python scripts using a raspberry pi
Traceback (most recent call last):>Traceback (most recent call last):
File "test.py", line 6, in (module)
import appengineauth
File "/home/pi/Downloads/google_appengine/appengineauth.py", line 30, in (module)
auth_resp = urllib2.urlopen(auth_req)
File "/usr/lib/python2.7/urllib2.py", line 154, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python2.7/urllib2.py", line 437, in open
response = meth(req, response)
File "/usr/lib/python2.7/urllib2.py", line 550, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python2.7/urllib2.py", line 475, in error
return self._call_chain(*args)
File "/usr/lib/python2.7/urllib2.py", line 409, in _call_chain
result = func(*args)
File "/usr/lib/python2.7/urllib2.py", line 558, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 404: Not Found
I'm able to access the website. Not too sure what is the actual problem.

If you're using https://github.com/adafruit/Tweet-a-Watt/blob/master/appengineauth.py (you don't tell us where you got your appengineauth.py from, thus forcing us to guess), and its line
auth_uri = 'https://www.google.com/accounts/ClientLogin'
then you're likely running into the deprecation documented at https://developers.google.com/identity/protocols/AuthForInstalledApps , and I quote:
Important: ClientLogin has been officially deprecated since April 20, 2012 and is now no longer available. Requests to ClientLogin will fail with a HTTP 404 response. We encourage you to migrate to OAuth 2.0 as soon as possible.
I.e, the 404 you're getting would then be exactly the symptom the warning tells you about, now that ClientLogin has been removed, more than 3.5 years after the original deprecation warning.
Not sure how best to connect your Raspberry Pi to App Engine (or any other Google service requiring authentication) with OAuth 2.0 (since ClientLogin is not an option any more). http://guy.carpenter.id.au/gaugette/2012/11/06/using-google-oauth2-for-devices/ (written shortly after the deprecation but smartly avoiding reliance on the already-deprecated ClientLogin service) recommends an "OAuth2 for Devices" library and summarizes how to use it; I haven't tried that library myself (and I don't have a Raspberry Pi to try it on) but it does seem like a potentially fruitful avenue for you to explore.

Related

BTC.COM API: HTTP Error 414: Request-URI Too Large

I am trying to connect to BTC.COM API and query for balances on a universe of wallets (approx. 500,000 wallets). It looks like it is too much for the API within one call. Could you help to read the error and to debug? My understanding is that the query is too big but I don't know where to look to know the limit. How many wallet the API handle for one call?
Any contribution is appreciated.
The API code is:
class MultiAddress:
def __init__(self, a):
self.final_balance = a['wallet']['final_balance']
def __repr__(self):
return "{"f'"balance": {self.final_balance}'"}"
def get_multi_address(addresses):
response = util.call_api(resource)
json_response = json.loads(response)
return MultiAddress(json_response)
p = get_multi_address(addresses=tuple_of_addresses)
sum_bal = p.final_balance
The error:
Traceback (most recent call last):
File "/Users/lolo/Documents/MARKET_RISK/python/util.py", line 32, in call_api
response = urlopen(base_url + resource, payload, timeout=TIMEOUT).read()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 414: Request-URI Too Large
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "explo_bal.py", line 56, in <module>
p = get_multi_address(addresses=tuple_of_addresses)
File "/Users/delalma/Documents/MARKET_RISK/python/explorer.py", line 165, in get_multi_address
response = util.call_api(resource)
File "/Users/delalma/Documents/MARKET_RISK/python/util.py", line 36, in call_api
raise APIException(handle_response(e.read()), e.code)
util.APIException: <html>
<head><title>414 Request-URI Too Large</title></head>
<body>
<center><h1>414 Request-URI Too Large</h1></center>
<hr><center>cloudflare</center>
</body>
</html>

I will answer this in a general context as I can't find the required details on the API in reference.
The 414 error-code can normally be solved By using a POST request: Convert query string to json object and sent to API request with POST
With GET requests, Max length of the request depends on server side as well as client-side. Most webserver has limit 8k which is configurable. On the client-side, the different browser has a different limit. The browser IE and Safari limit to 2k, Opera 4k, and Firefox 8k. means the max length for the GET request is 8k and min request length is 2k.
You can also consider a workaround like so
Suppose your URI has a string stringdata that is too long. You can simply break it into a number of parts depending on the limits of your server. Then submit the first one, in my case to write a file. Then submit the next ones to append to previously added data.
source

python console app to return time from time.is

I'm trying to make a terminal app to crawl a website and return the time of the entered city name. this is my code so far:
import re
import urllib.request
city = input('Enter city name: ')
url = 'https://time.is/'
rawData = urllib.request.urlopen(url).read()
decodedData = rawData.decode('utf-8')
print(decodedData)
after the last line i get this error:
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
rawData = urllib.request.urlopen(url).read()
File "~/Python\Python35-32\lib\urllib\request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "~/Python\Python35-32\lib\urllib\request.py", line 472, in open
response = meth(req, response)
File "~/Python\Python35-32\lib\urllib\request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "~/Python\Python35-32\lib\urllib\request.py", line 510, in error
return self._call_chain(*args)
File "~/Python\Python35-32\lib\urllib\request.py", line 444, in _call_chain
result = func(*args)
File "~/Python\Python35-32\lib\urllib\request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
why do i get this error? what's wrong?
[EDIT]
the reason is time.is banns requests. Always remember to read terms and conditions when doing web scraping. free APIs can be found to do the same job too.

When this happens, I usually open the debugger and try to find out whats being called when I access the website. It seems like time.is doesn't like having scripts call their website.
A quick search yielded this:
1532027279136 0 161_(UTC,_UTC+00:00) 1532027279104
Time.is is for humans. To use from scripts and apps, please ask about our API. Thank you!
Here are some APIs you could use to build your project. https://www.programmableweb.com/category/time/api

Why does this code not download the file and the downloader can download it successfully

The problem begins with this link
https://i1.pixiv.net/img-zip-ugoira/img/2017/04/05/00/24/41/62259492_ugoira600x600.zip
the file downloaded with the downloader is complete.
enter image description here
and I try to use python to download the file
from urllib import request
import sys
request.urlretrieve('https://i1.pixiv.net/img-zip-ugoira/img/2017/04/05/00/24/41/62259492_ugoira600x600.zip', '123.zip')
Traceback (most recent call last):
File "C:/Users/ssshooter/PycharmProjects/first/111.py", line 3, in <module>
request.urlretrieve('https://i1.pixiv.net/img-zip-ugoira/img/2017/04/05/00/24/41/62259492_ugoira600x600.zip', '123.zip')
File "C:\Users\ssshooter\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 248, in urlretrieve
with contextlib.closing(urlopen(url, data)) as fp:
File "C:\Users\ssshooter\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 223, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\ssshooter\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 532, in open
response = meth(req, response)
File "C:\Users\ssshooter\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 642, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Users\ssshooter\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 570, in error
return self._call_chain(*args)
File "C:\Users\ssshooter\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 504, in _call_chain
result = func(*args)
File "C:\Users\ssshooter\AppData\Local\Programs\Python\Python36\lib\urllib\request.py", line 650, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
It doesn't work.

The differences are:
You're using different SSL information: You're browser has a built-in set of certificate authorities. Python uses a set which comes with the OS. They differ & if the site you're accessing uses one know to your browser but not known to python, the python will throw an exception.
You're accessing using different User-Agents. Your browser is telling the server it's Chrome or IE or whatever. Python is telling the server it's python. For whatever reason, the server may decide it doesn't like that and return Forbidden.
The server may be working harder than you think: while it appears the request is for a simple file, you're really requesting a resource. It may be (though unlikely in this case) that the resource you're requesting results in multiple interactions between the server and your browser -- cookies, javascript, etc -- which are executed successfully in your browser, returned to the server & it then delivers the file. Your python request is not doing any of that.
Your browser (may) have existing state which your python does not. You say you can access the file using your browser, but it could be that works only because you've accessed other resources on the site, or logged in, or whatever. Your browser is communicating that information (perhaps a session_id via cookie?) with the server recognizes. Your python code states with no previous state, so the server forbids that.
Which is it in this case? You'll need to investigate. Can you get wget or curl to work? Debug your browser's access: what headers are being sent, what are you receiving in reply?

Getting JSON From URLOpen

I cannot consistently get JSON from a given url. It works only about 60% of the time
jsonurl = urlopen('http://www.reddit.com/r/funny/hot.json?limit=16')
r_content = json.load(jsonurl)['data']['children']
The program crashes on the second line sometimes, because the info from the url is not retrieved properly for some reason
With some debugging, I found out that I was getting the following error from the first line:
<addinfourl at 4321460952 whose fp = <socket._fileobject object at 0x10185b050>>
This error occurs about 40% of the time, the other 60% of the time, the code works perfectly. What am I doing wrong? How do I make the url opening more consistent?

It is usually not an issue from the client side. Your code is consistent in behavior but the server response can vary.
I ran your code a few times and It does throw up certain issues:
>>> jsonurl = urlopen('http://www.reddit.com/r/funny/hot.json?limit=16')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 444, in error
return self._call_chain(*args)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 429: Unknown
You have to handle cases where server response is anything but HTTP 200. You can wrap your code in a try / except block and you should pass jsonurl to json.loads() only when your request succeeds.
Also urlopen returns a file-like descriptor. Hence if you print jsourl, it simply provides jsonurl.__repr__() value. See below:
>>> jsonurl.__repr__()
'<addinfourl at 4393153672 whose fp = <socket._fileobject object at 0x105978450>>'
You have to look for the following::
>>> jsonurl.getcode()
200
>>>
and only if it 200, should you process the data obtained from the request.

Error 429 when invoking Reddit api from Google App Engine

I have been running a cron job on Google App Engine for over a month now without any issues. The job does a variety of things, one being that it uses urllib2 to make a call to retrieve a json response from Reddit as well as a few other sites. About two weeks ago I started seeing errors when invoking Reddit, but no errors when invoking the other sites. The error I am receiving is HTTP error 429.
I have tried executing the same code outside of Google App Engine and do not have any issues. I tried using urlFetch, but receive the same error.
You can see the error when using the app engine's interactive shell with the following code.
import urllib2
data = urllib2.urlopen('http://www.reddit.com/r/Music/.json', timeout=60)
Edit: Not sure why it always fails for me and not someone else. This is the error that I receive:
>>> import urllib2
>>> data = urllib2.urlopen('http://www.reddit.com/r/Music/.json', timeout=60)
Traceback (most recent call last):
File "/base/data/home/apps/s~shell-27/1.356011914885973647/shell.py", line 267, in get
exec compiled in statement_module.__dict__
File "<string>", line 1, in <module>
File "/base/python27_runtime/python27_dist/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/base/python27_runtime/python27_dist/lib/python2.7/urllib2.py", line 400, in open
response = meth(req, response)
File "/base/python27_runtime/python27_dist/lib/python2.7/urllib2.py", line 513, in http_response
'http', request, response, code, msg, hdrs)
File "/base/python27_runtime/python27_dist/lib/python2.7/urllib2.py", line 438, in error
return self._call_chain(*args)
File "/base/python27_runtime/python27_dist/lib/python2.7/urllib2.py", line 372, in _call_chain
result = func(*args)
File "/base/python27_runtime/python27_dist/lib/python2.7/urllib2.py", line 521, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
HTTPError: HTTP Error 429: Unknown
similar code running outside of app engine with no problem:
print urllib2.urlopen('http://www.reddit.com/r/Music/.json').read()
At first I thought it had to do with a timeout problem since it was originally working, but since there is not a timeout error but a the strange HttpError code, I'm not sure.
Any ideas?

Reddit rate limits the api pretty severely for the default user agent for the python shell. You need to set a unique user agent with your reddit username in it, like this:
User-Agent: super happy flair bot by /u/spladug
More info about the reddit api here https://github.com/reddit/reddit/wiki/API.

It's possible that Reddit is counting calls based on IP - which means that other applications on GAE which share your IP might already be exhausting the quota.
This might get better if you use Reddit API keys (I don't know if they issue them) or if they agree to rate limit API calls based on the app header.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.