I have a DataFrame common_ips containing IPs as shown below.
I need to achieve two basic tasks:
Identify private and public IPs.
Check organisation for public IPs.
Here is what I am doing:
import json
import urllib
import re
baseurl = 'http://ipinfo.io/' # no HTTPS supported (at least: not without a plan)
def isIPpublic(ipaddress):
return not isIPprivate(ipaddress)
def isIPprivate(ipaddress):
if ipaddress.startswith("::ffff:"):
ipaddress=ipaddress.replace("::ffff:", "")
# IPv4 Regexp from https://stackoverflow.com/questions/30674845/
if re.search(r"^(?:10|127|172\.(?:1[6-9]|2[0-9]|3[01])|192\.168)\..*", ipaddress):
# Yes, so match, so a local or RFC1918 IPv4 address
return True
if ipaddress == "::1":
# Yes, IPv6 localhost
return True
return False
def getipInfo(ipaddress):
url = '%s%s/json' % (baseurl, ipaddress)
try:
urlresult = urllib.request.urlopen(url)
jsonresult = urlresult.read() # get the JSON
parsedjson = json.loads(jsonresult) # put parsed JSON into dictionary
return parsedjson
except:
return None
def checkIP(ipaddress):
if (isIPpublic(ipaddress)):
if bool(getipInfo(ipaddress)):
if 'bogon' in getipInfo(ipaddress).keys():
return 'Private IP'
elif bool(getipInfo(ipaddress).get('org')):
return getipInfo(ipaddress)['org']
else:
return 'No organization data'
else:
return 'No data available'
else:
return 'Private IP'
And applying it to my common_ips DataFrame with
common_ips['Info'] = common_ips.IP.apply(checkIP)
But it's taking longer than I expected. And for some IPs, it's giving incorrect Info.
For instance:
where it should have been AS19902 Department of Administrative Services as I cross-checked it by
and
What am I missing here ? And how can I achieve these tasks in a more Pythonic way ?
A blanket except: is basically always a bug. You are returning None instead of handling any anomalous or error response from the server, and of course the rest of your code has no way to recover.
As a first debugging step, simply take out the try/except handling. Maybe then you can find a way to put back a somewhat more detailed error handler for some cases which you know how to recover from.
def getipInfo(ipaddress):
url = '%s%s/json' % (baseurl, ipaddress)
urlresult = urllib.request.urlopen(url)
jsonresult = urlresult.read() # get the JSON
parsedjson = json.loads(jsonresult) # put parsed JSON into dictionary
return parsedjson
Perhaps the calling code in checkIP should have a try/except instead, and e.g. retry after sleeping for a bit if the server indicates that you are going too fast.
(In the absence of an authorization token, it looks like you are using the free version of this service, which is probably not in any way guaranteed anyway. Also maybe look at using their recommended library -- I haven't looked at it in more detail, but I would imagine it at the very least knows better how to behave in the case of a server-side error. It's almost certainly also more Pythonic, at least in the sense that you should not reinvent things which already exist.)
Related
I am running the following:
import geopy
geolocator = geopy.geocoders.OpenMapQuest(api_key='my_key_here')
location1 = geolocator.geocode('Madrid')
where my_key_here is my consumer key for mapquest, and I get the following error:
GeocoderInsufficientPrivileges: HTTP Error 403: Forbidden
Not sure what I am doing wrong.
Thanks!
I've also tried the same with the same result. After checking the Library, I found out, that the error is referring to the line, where the request ist build and it seems, that the API Key is not transmitted. If you add no key in the init statement, the api_key='' so I tried to change the line 66 in my own Library of the file: https://github.com/geopy/geopy/blob/master/geopy/geocoders/openmapquest.py to my key.
Still no success! The key itself works, I've tested it with calling the URL that is also called in the Library:
http://open.mapquestapi.com/nominatim/v1/search.php?key="MY_KEY"&format=json&json_callback=renderBasicSearchNarrative&q=westminster+abbey
no idea why this isn't working…
Cheers.kg
I made slight progress with fixing this one. I was able to get the query written correctly, but its the json parsing that kind of have me stumped. Maybe someone knows. I know the url is being sent correctly (I checked it in the browser and it returned a json object). Maybe someone knows how to parse the returned json object to get it to finally work.
Anyways, I had to go in the openmapquest.py source code, and starting from line 66, I made the following modifications:
self.api_key = api_key
self.api = "http://www.mapquestapi.com/geocoding/v1/address?"
def geocode(self, query, exactly_one=True, timeout=None): # pylint: disable=W0221
"""
Geocode a location query.
:param string query: The address or query you wish to geocode.
:param bool exactly_one: Return one result or a list of results, if
available.
:param int timeout: Time, in seconds, to wait for the geocoding service
to respond before raising a :class:`geopy.exc.GeocoderTimedOut`
exception. Set this only if you wish to override, on this call
only, the value set during the geocoder's initialization.
.. versionadded:: 0.97
"""
params = {
'key': self.api_key,
'location': self.format_string % query
}
if exactly_one:
params['maxResults'] = 1
url = "&".join((self.api, urlencode(params)))
print url # Print the URL just to make sure it's produced correctly
Now the task remains to get the _parse_json function working.
I have 2 files compiled by django-pipeline along with s3boto: master.css and master.js. They are set to "Public" in my buckets. However, when I access them, sometimes master.css is served, sometimes it errs with SignatureDoesNotMatch. The same with master.js. This doesn't happen on Chrome. What could I be missing?
EDIT: It now happens on Chrome too.
Happened to me too...
Took a few hours to find, but I figured it out eventually.
Turns out that if the right signature is :
ssCNsAOxLf5vA80ldAI3M0CU2%2Bw=
Then AWS will NOT accept:
ssCNsAOxLf5vA80ldAI3M0CU2+w=
Where the only difference is the translation of %2B to '+'.
S3BotoStorage actually yields it correctly but the encoding happens on CachedFilesMixin in the final line of the url method (return unquote(final_url)).
To fix it, I derived a new CachedFilesMixin to undo the "damage" (I should mention that I don't know why this unquote exists in the first place, so undoing it might cause other problems)
class MyCachedFilesMixin(CachedFilesMixin):
def url(self, *a, **kw):
s = super(MyCachedFilesMixin, self).url(*a, **kw)
if isinstance(s, unicode):
s = s.encode('utf-8', 'ignore')
scheme, netloc, path, qs, anchor = urlparse.urlsplit(s)
path = urllib.quote(path, '/%')
qs = urllib.quote_plus(qs, ':&=')
return urlparse.urlunsplit((scheme, netloc, path, qs, anchor))
Where I used the code I found here.
Hope this helps...
I had a similar issue causing SignatureDoesNotMatch errors when downloading files using an S3 signed URL and the python requests HTTP library.
My problem ended up being a bad content-type. The documentation at AWS on Authenticating REST Requests helped me figure it out, and has examples in Python.
I was struggling with this for a while, and I didn't like the idea of messing up with CachedFilesMixin (seemed like an overkill to me).
Until a proper fix is issued to the django platform, I've found quoting the signature two times is a good option. I know it's not pretty, but it works and it's simple.
So you'll just have to do something like this:
signature = urllib.quote_plus(signature.strip())
signature = urllib.quote_plus(signature.strip())
Hope it helps!
This article on Flask is a good resource on getting your signatures right: https://devcenter.heroku.com/articles/s3-upload-python
#app.route('/sign_s3/')
def sign_s3():
AWS_ACCESS_KEY = os.environ.get('AWS_ACCESS_KEY_ID')
AWS_SECRET_KEY = os.environ.get('AWS_SECRET_ACCESS_KEY')
S3_BUCKET = os.environ.get('S3_BUCKET')
object_name = request.args.get('s3_object_name')
mime_type = request.args.get('s3_object_type')
expires = int(time.time()+10)
amz_headers = "x-amz-acl:public-read"
put_request = "PUT\n\n%s\n%d\n%s\n/%s/%s" % (mime_type, expires, amz_headers, S3_BUCKET, object_name)
signature = base64.encodestring(hmac.new(AWS_SECRET_KEY,put_request, sha).digest())
signature = urllib.quote_plus(signature.strip())
url = 'https://%s.s3.amazonaws.com/%s' % (S3_BUCKET, object_name)
return json.dumps({
'signed_request': '%s?AWSAccessKeyId=%s&Expires=%d&Signature=%s' % (url, AWS_ACCESS_KEY, expires, signature),
'url': url
})
Simple workaround for me was to generate a new access key with only alphanumeric characters (ie no special characters such as "/", "+", etc. which AWS sometimes adds to the keys).
I have seen this thread already - How can I unshorten a URL?
My issue with the resolved answer (that is using the unshort.me API) is that I am focusing on unshortening youtube links. Since unshort.me is used readily, this returns almost 90% of the results with captchas which I am unable to resolve.
So far I am stuck with using:
def unshorten_url(url):
resolvedURL = urllib2.urlopen(url)
print resolvedURL.url
#t = Test()
#c = pycurl.Curl()
#c.setopt(c.URL, 'http://api.unshort.me/?r=%s&t=xml' % (url))
#c.setopt(c.WRITEFUNCTION, t.body_callback)
#c.perform()
#c.close()
#dom = xml.dom.minidom.parseString(t.contents)
#resolvedURL = dom.getElementsByTagName("resolvedURL")[0].firstChild.nodeValue
return resolvedURL.url
Note: everything in the comments is what I tried to do when using the unshort.me service which was returning captcha links.
Does anyone know of a more efficient way to complete this operation without using open (since it is a waste of bandwidth)?
one line functions, using requests library and yes, it supports recursion.
def unshorten_url(url):
return requests.head(url, allow_redirects=True).url
Use the best rated answer (not the accepted answer) in that question:
# This is for Py2k. For Py3k, use http.client and urllib.parse instead, and
# use // instead of / for the division
import httplib
import urlparse
def unshorten_url(url):
parsed = urlparse.urlparse(url)
h = httplib.HTTPConnection(parsed.netloc)
resource = parsed.path
if parsed.query != "":
resource += "?" + parsed.query
h.request('HEAD', resource )
response = h.getresponse()
if response.status/100 == 3 and response.getheader('Location'):
return unshorten_url(response.getheader('Location')) # changed to process chains of short urls
else:
return url
You DO have to open it, otherwise you won't know what URL it will redirect to. As Greg put it:
A short link is a key into somebody else's database; you can't expand the link without querying the database
Now to your question.
Does anyone know of a more efficient way to complete this operation
without using open (since it is a waste of bandwidth)?
The more efficient way is to not close the connection, keep it open in the background, by using HTTP's Connection: keep-alive.
After a small test, unshorten.me seems to take the HEAD method into account and doing a redirect to itself:
> telnet unshorten.me 80
Trying 64.202.189.170...
Connected to unshorten.me.
Escape character is '^]'.
HEAD http://unshort.me/index.php?r=http%3A%2F%2Fbit.ly%2FcXEInp HTTP/1.1
Host: unshorten.me
HTTP/1.1 301 Moved Permanently
Date: Mon, 22 Aug 2011 20:42:46 GMT
Server: Microsoft-IIS/6.0
X-Powered-By: ASP.NET
X-AspNet-Version: 2.0.50727
Location: http://resolves.me/index.php?r=http%3A%2F%2Fbit.ly%2FcXEInp
Cache-Control: private
Content-Length: 0
So if you use the HEAD HTTP method, instead of GET, you will actually end up doing the same work twice.
Instead, you should keep the connection alive, which will save you only a little bandwidth, but what it will certainly save is the latency of establishing a new connection every time. Establishing a TCP/IP connection is expensive.
You should get away with a number of kept-alive connections to the unshorten service equal to the number of concurrent connections your own service receives.
You could manage these connections in a pool. That's the closest you can get. Beside tweaking your kernel's TCP/IP stack.
Here a src code that takes into account almost of the useful corner cases:
set a custom Timeout.
set a custom User Agent.
check whether we have to use an http or https connection.
resolve recursively the input url and prevent ending within a loop.
The src code is on github # https://github.com/amirkrifa/UnShortenUrl
comments are welcome ...
import logging
logging.basicConfig(level=logging.DEBUG)
TIMEOUT = 10
class UnShortenUrl:
def process(self, url, previous_url=None):
logging.info('Init url: %s'%url)
import urlparse
import httplib
try:
parsed = urlparse.urlparse(url)
if parsed.scheme == 'https':
h = httplib.HTTPSConnection(parsed.netloc, timeout=TIMEOUT)
else:
h = httplib.HTTPConnection(parsed.netloc, timeout=TIMEOUT)
resource = parsed.path
if parsed.query != "":
resource += "?" + parsed.query
try:
h.request('HEAD',
resource,
headers={'User-Agent': 'curl/7.38.0'}
)
response = h.getresponse()
except:
import traceback
traceback.print_exec()
return url
logging.info('Response status: %d'%response.status)
if response.status/100 == 3 and response.getheader('Location'):
red_url = response.getheader('Location')
logging.info('Red, previous: %s, %s'%(red_url, previous_url))
if red_url == previous_url:
return red_url
return self.process(red_url, previous_url=url)
else:
return url
except:
import traceback
traceback.print_exc()
return None
import requests
short_url = "<your short url goes here>"
long_url = requests.get(short_url).url
print(long_url)
Well I have some old python code that seems to not work right, I have researched to the ends of the internet trying to find a fix.
def getURL(self, context):
# Make this an absolute URL- currently it's required for
# links placed in the RSS and XML feeds, and won't
# hurt elsewhere.
req = context['request']
port = req.host[2]
hostname = req.getRequestHostname()
if req.isSecure():
default = 443
else:
default = 80
if port == default:
hostport = ''
else:
hostport = ':%d' % port
path = posixpath.join('/stats',
*(tuple(self.target.pathSegments) + self.relativePathSegments))
return quote('http%s://%s%s%s' % (
req.isSecure() and 's' or '',
hostname,
hostport,
path), "/:")
now I think its just the context['request'] giving me issues but I'm not sure.
This code block was from the CIA.vc project (link.py to be exact), so if something doesn't make sense check there
also the 1st error i get from python is:
File "/home/justasic/cia/cia/LibCIA/Web/Stats/Link.py", line 41, in getURL port = req.host[2]
exceptions.TypeError: unindexable object
but I got more about the context['request'] not being defined after I found what I think was a simple fix
It seems to me like Context['request'] doesn't fit on there... where does Context come from? As param you get context all lowercase. Probably you sould use the param 'context' instead, so ...
a) make Context['request'] to context['request']
... or, if your are already using context in lowercase, and it's only a typo here on the post, then
b) I searched a while and found this snippet http://djangosnippets.org/snippets/2428/... so maybe something like this might work:
from django.template import resolve_variable
...
def getURL(self, context):
req = resolve_variable('request', context)
The answer to a previous question showed that Nexus implement a custom authentication helper called "NxBASIC".
How do I begin to implement a handler in python?
Update:
Implementing the handler per Alex's suggestion looks to be the right approach, but fails trying to extract the scheme and realm from the authreq.
The returned value for authreq is:
str: NxBASIC realm="Sonatype Nexus Repository Manager API""
AbstractBasicAuthHandler.rx.search(authreq) is only returning a single tuple:
tuple: ('NxBASIC', '"', 'Sonatype Nexus Repository Manager API')
so scheme,realm = mo.groups() fails. From my limited regex knowledge it looks like the standard regex from AbstractBasicAuthHandler should match scheme and realm, but it seems not to.
The regex is:
rx = re.compile('(?:.*,)*[ \t]*([^ \t]+)[ \t]+'
'realm=(["\'])(.*?)\\2', re.I)
Update 2:
From inspection of AbstractBasicAuthHandler, the default processing is to do:
scheme, quote, realm = mo.groups()
Changing to this works. I now just need to set the password against the correct realm. Thanks Alex!
If, as described, name and description are the only differences between this "NxBasic" and good old "Basic", then you could essentially copy-paste-edit some code from urllib2.py (which unfortunately doesn't expose the scheme name as easily overridable in itself), as follows (see urllib2.py's online sources):
import urllib2
class HTTPNxBasicAuthHandler(urllib2.HTTPBasicAuthHandler):
def http_error_auth_reqed(self, authreq, host, req, headers):
# host may be an authority (without userinfo) or a URL with an
# authority
# XXX could be multiple headers
authreq = headers.get(authreq, None)
if authreq:
mo = AbstractBasicAuthHandler.rx.search(authreq)
if mo:
scheme, realm = mo.groups()
if scheme.lower() == 'nxbasic':
return self.retry_http_basic_auth(host, req, realm)
def retry_http_basic_auth(self, host, req, realm):
user, pw = self.passwd.find_user_password(realm, host)
if pw is not None:
raw = "%s:%s" % (user, pw)
auth = 'NxBasic %s' % base64.b64encode(raw).strip()
if req.headers.get(self.auth_header, None) == auth:
return None
req.add_header(self.auth_header, auth)
return self.parent.open(req)
else:
return None
As you can see by inspection, I've just changed two strings from "Basic" to "NxBasic" (and the lowercase equivalents) from what's in urrlib2.py (in the abstract basic auth handler superclass of the http basic auth handler class).
Try using this version -- and if it's still not working, at least having it be your code can help you add print/logging statements, breakpoints, etc, to better understand what's breaking and how. Best of luck! (Sorry I can't help further but I don't have any Nexus around to experiment with).