I cannot consistently get JSON from a given url. It works only about 60% of the time
jsonurl = urlopen('http://www.reddit.com/r/funny/hot.json?limit=16')
r_content = json.load(jsonurl)['data']['children']
The program crashes on the second line sometimes, because the info from the url is not retrieved properly for some reason
With some debugging, I found out that I was getting the following error from the first line:
<addinfourl at 4321460952 whose fp = <socket._fileobject object at 0x10185b050>>
This error occurs about 40% of the time, the other 60% of the time, the code works perfectly. What am I doing wrong? How do I make the url opening more consistent?
It is usually not an issue from the client side. Your code is consistent in behavior but the server response can vary.
I ran your code a few times and It does throw up certain issues:
>>> jsonurl = urlopen('http://www.reddit.com/r/funny/hot.json?limit=16')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 126, in urlopen
return _opener.open(url, data, timeout)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 406, in open
response = meth(req, response)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 519, in http_response
'http', request, response, code, msg, hdrs)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 444, in error
return self._call_chain(*args)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 378, in _call_chain
result = func(*args)
File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py", line 527, in http_error_default
raise HTTPError(req.get_full_url(), code, msg, hdrs, fp)
urllib2.HTTPError: HTTP Error 429: Unknown
You have to handle cases where server response is anything but HTTP 200. You can wrap your code in a try / except block and you should pass jsonurl to json.loads() only when your request succeeds.
Also urlopen returns a file-like descriptor. Hence if you print jsourl, it simply provides jsonurl.__repr__() value. See below:
>>> jsonurl.__repr__()
'<addinfourl at 4393153672 whose fp = <socket._fileobject object at 0x105978450>>'
You have to look for the following::
>>> jsonurl.getcode()
200
>>>
and only if it 200, should you process the data obtained from the request.
Related
I am trying to delete a simple triple from GraphDB (version = GraphDB free) using python's SPARQLWrapper and the code snippet I found here: https://github.com/RDFLib/sparqlwrapper - Update example. I always get the following exception:
SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad request has been sent to the endpoint, probably the sparql query is bad formed.
my code is:
sparql = SPARQLWrapper('http://192.168.0.242:7200/repositories/DataCitation')
sparql.setMethod(POST)
sparql.setQuery("""
PREFIX pub: <http://ontology.ontotext.com/taxonomy/>
delete where {
<http://ontology.ontotext.com/resource/tsk9hdnas934> pub:occupation "Cook".
}
""")
results = sparql.query()
print(results.response.read())
When I do an ask or select statement to the same endpoint I get a valid result. Only the update statements do not work.
this is the full stacktrace
/home/filip/anaconda3/envs/TripleStoreCitationFramework/bin/python /home/filip/PycharmProjects/TripleStoreCitationFramework/GraphDB/Playground.py
Traceback (most recent call last):
File "/home/filip/anaconda3/envs/TripleStoreCitationFramework/lib/python3.8/site-packages/SPARQLWrapper/Wrapper.py", line 1073, in _query
response = urlopener(request)
File "/home/filip/anaconda3/envs/TripleStoreCitationFramework/lib/python3.8/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/home/filip/anaconda3/envs/TripleStoreCitationFramework/lib/python3.8/urllib/request.py", line 531, in open
response = meth(req, response)
File "/home/filip/anaconda3/envs/TripleStoreCitationFramework/lib/python3.8/urllib/request.py", line 640, in http_response
response = self.parent.error(
File "/home/filip/anaconda3/envs/TripleStoreCitationFramework/lib/python3.8/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/home/filip/anaconda3/envs/TripleStoreCitationFramework/lib/python3.8/urllib/request.py", line 502, in _call_chain
result = func(*args)
File "/home/filip/anaconda3/envs/TripleStoreCitationFramework/lib/python3.8/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400:
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/filip/PycharmProjects/TripleStoreCitationFramework/GraphDB/Playground.py", line 6, in <module>
citing.delete_triples("s")
File "/home/filip/PycharmProjects/TripleStoreCitationFramework/GraphDB/DataCiting.py", line 32, in delete_triples
results = sparql.query()
File "/home/filip/anaconda3/envs/TripleStoreCitationFramework/lib/python3.8/site-packages/SPARQLWrapper/Wrapper.py", line 1107, in query
return QueryResult(self._query())
File "/home/filip/anaconda3/envs/TripleStoreCitationFramework/lib/python3.8/site-packages/SPARQLWrapper/Wrapper.py", line 1077, in _query
raise QueryBadFormed(e.read())
SPARQLWrapper.SPARQLExceptions.QueryBadFormed: QueryBadFormed: a bad request has been sent to the endpoint, probably the sparql query is bad formed.
Response:
b'Missing parameter: query'
The endpoint which you need to insert statements isn't the simple SPARQL endpoint you use for ordinary queries, but, rather, the dedicated /statements endpoint:
http:///repositories//statements
This endpoint is also used for DELETE statements.
You can look up some examples in the RDF4J documentation.
Furthermore, if you are passing your data with a query string instead of it being a part of your request body, you need to be aware of the fact that it must start with a "?update=" instead of "?query=".opps
I am trying to connect to BTC.COM API and query for balances on a universe of wallets (approx. 500,000 wallets). It looks like it is too much for the API within one call. Could you help to read the error and to debug? My understanding is that the query is too big but I don't know where to look to know the limit. How many wallet the API handle for one call?
Any contribution is appreciated.
The API code is:
class MultiAddress:
def __init__(self, a):
self.final_balance = a['wallet']['final_balance']
def __repr__(self):
return "{"f'"balance": {self.final_balance}'"}"
def get_multi_address(addresses):
response = util.call_api(resource)
json_response = json.loads(response)
return MultiAddress(json_response)
p = get_multi_address(addresses=tuple_of_addresses)
sum_bal = p.final_balance
The error:
Traceback (most recent call last):
File "/Users/lolo/Documents/MARKET_RISK/python/util.py", line 32, in call_api
response = urlopen(base_url + resource, payload, timeout=TIMEOUT).read()
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 531, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 641, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 569, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/urllib/request.py", line 649, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 414: Request-URI Too Large
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "explo_bal.py", line 56, in <module>
p = get_multi_address(addresses=tuple_of_addresses)
File "/Users/delalma/Documents/MARKET_RISK/python/explorer.py", line 165, in get_multi_address
response = util.call_api(resource)
File "/Users/delalma/Documents/MARKET_RISK/python/util.py", line 36, in call_api
raise APIException(handle_response(e.read()), e.code)
util.APIException: <html>
<head><title>414 Request-URI Too Large</title></head>
<body>
<center><h1>414 Request-URI Too Large</h1></center>
<hr><center>cloudflare</center>
</body>
</html>
I will answer this in a general context as I can't find the required details on the API in reference.
The 414 error-code can normally be solved By using a POST request: Convert query string to json object and sent to API request with POST
With GET requests, Max length of the request depends on server side as well as client-side. Most webserver has limit 8k which is configurable. On the client-side, the different browser has a different limit. The browser IE and Safari limit to 2k, Opera 4k, and Firefox 8k. means the max length for the GET request is 8k and min request length is 2k.
You can also consider a workaround like so
Suppose your URI has a string stringdata that is too long. You can simply break it into a number of parts depending on the limits of your server. Then submit the first one, in my case to write a file. Then submit the next ones to append to previously added data.
source
I'm trying to make a terminal app to crawl a website and return the time of the entered city name. this is my code so far:
import re
import urllib.request
city = input('Enter city name: ')
url = 'https://time.is/'
rawData = urllib.request.urlopen(url).read()
decodedData = rawData.decode('utf-8')
print(decodedData)
after the last line i get this error:
Traceback (most recent call last):
File "<pyshell#13>", line 1, in <module>
rawData = urllib.request.urlopen(url).read()
File "~/Python\Python35-32\lib\urllib\request.py", line 163, in urlopen
return opener.open(url, data, timeout)
File "~/Python\Python35-32\lib\urllib\request.py", line 472, in open
response = meth(req, response)
File "~/Python\Python35-32\lib\urllib\request.py", line 582, in http_response
'http', request, response, code, msg, hdrs)
File "~/Python\Python35-32\lib\urllib\request.py", line 510, in error
return self._call_chain(*args)
File "~/Python\Python35-32\lib\urllib\request.py", line 444, in _call_chain
result = func(*args)
File "~/Python\Python35-32\lib\urllib\request.py", line 590, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
why do i get this error? what's wrong?
[EDIT]
the reason is time.is banns requests. Always remember to read terms and conditions when doing web scraping. free APIs can be found to do the same job too.
When this happens, I usually open the debugger and try to find out whats being called when I access the website. It seems like time.is doesn't like having scripts call their website.
A quick search yielded this:
1532027279136 0 161_(UTC,_UTC+00:00) 1532027279104
Time.is is for humans. To use from scripts and apps, please ask about our API. Thank you!
Here are some APIs you could use to build your project. https://www.programmableweb.com/category/time/api
So I'm trying to get the URL of a page in python3...
If I do the following,
from urllib.request import urlopen
html = urlopen("http://google.com/")
html.read()
I get the html as desired.
However, if I were to choose a different url, as in the following,
from urllib.request import urlopen
html = urlopen("http://www.stackoverflow.com/")
html.read()
I get the following error after the second line:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 153, in urlopen
return opener.open(url, data, timeout)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 461, in open
response = meth(req, response)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 574, in http_response
'http', request, response, code, msg, hdrs)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 499, in error
return self._call_chain(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 433, in _call_chain
result = func(*args)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/urllib/request.py", line 582, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
Any ideas why this would be happening and how to fix it?
If you look closer at the error message you'll see that it is a HTTP error and a special one:
HTTP Error 403: Forbidden
So you talked to the server and got your response back but you don't know why you were denied.
You can get a more detailed message in an HTML returned by the server with something like this:
from urllib.request import urlopen
from urllib.error import HTTPError
try:
html = urlopen("http://www.stackoverflow.com/")
except HTTPError as e:
print(e.read().decode('utf-8'))
html.read()
For me it says:
<h2 data-translate="what_happened">What happened?</h2>
<p>The owner of this website (www.stackoverflow.com) has banned your access based on your browser's signature (213702c58d2116a6-ua48).</p>
You can treat HTTPError as a file object (https://docs.python.org/3/library/urllib.error.html#urllib.error.HTTPError):
Though being an exception (a subclass of URLError), an HTTPError can
also function as a non-exceptional file-like return value (the same
thing that urlopen() returns). This is useful when handling exotic
HTTP errors, such as requests for authentication.
I've setup Kannel in Ubuntu using a USB Modem and I can send SMS via the browser using the URL as seen below
localhost:13013/cgi-bin/sendsms?username=kannel&password=kannel&to=+254781923855&text='Kid got swag'
In python, I have the following script which works only if the message to be sent does not have spaces.
import urllib.request
def send_sms(mobile_no, message):
url="http://%s:%d/cgi-bin/sendsms?username=%s&password=%s&to=%s&text=%s" \
% ('localhost', 13013, 'kannel', 'kannel', str(mobile_no), message)
f = urllib.request.urlopen(url)
print("sms sent")
If I call the function with NO spaces in the message, it works and the message is sent.
sms.send_sms('+254781923855', 'kid_got_swag')
If I have spaces in the message, it fails with the error belw
sms.send_sms('+254781923855', 'kid got swag')
Traceback (most recent call last):
File "/home/lukik/workspace/projx/src/short_message.py", line 24, in <module>
sms.send_sms('+254781923855', 'kid got swag')
File "/home/lukik/workspace/projx/src/short_message.py", line 18, in send_sms
f = urllib.request.urlopen(url)
File "/usr/lib/python3.2/urllib/request.py", line 139, in urlopen
return opener.open(url, data, timeout)
File "/usr/lib/python3.2/urllib/request.py", line 376, in open
response = meth(req, response)
File "/usr/lib/python3.2/urllib/request.py", line 488, in http_response
'http', request, response, code, msg, hdrs)
File "/usr/lib/python3.2/urllib/request.py", line 414, in error
return self._call_chain(*args)
File "/usr/lib/python3.2/urllib/request.py", line 348, in _call_chain
result = func(*args)
File "/usr/lib/python3.2/urllib/request.py", line 496, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request
I've tried other variants of calling urllib but they all fail coz of the spaces in the message....
In your request you send via browser, the message is inside quotes -
&text='Kid got swag'
Try that in your request -
url="http://%s:%d/cgi-bin/sendsms?username=%s&password=%s&to=%s&text='%s'" \
% ('localhost', 13013, 'kannel', 'kannel', str(mobile_no), message)
Notice the single quotes at &text='%s'.
PS: I'd recommend using requests for requests like this. You could construct your urls better that way, like this -
>>> payload = {'key1': 'value1', 'key2': 'value2'}
>>> r = requests.get("http://httpbin.org/get", params=payload)
URLs are not permitted to contain spaces. When you tried in your browser, the browser took care of correctly encoding the URL before issuing the request. In your program you need to encode the URL. Fortunately urllib has functions built-in to take careof the details.
http://docs.python.org/3.3/library/urllib.parse.html#url-quoting
You need to URL-encode the values you pass as parameters, otherwise a broken URL gets constructed, that's why the request fails. I believe urllib.parse.urlencode does what you need.