urllib2 - get resource if you already know the IP - python

In my python script, I am fetching pages but I already know the IP of the server.
So I could save it the hassle of doing a DNS lookup, if I can some how pass in the IP and hostname in the request.
So, if I call
http://111.111.111.111/
and then pass the hostname in the HOST attribute, I should be OK. However the issue I see is on the server side, if the user looks at the incomming request (ie REQUEST_URI) then they will see I went for the IP.
Anyone have any ideas?

First, the main idea is suspicious. Well, you can "know" IP of the server but this knowledge is temporary and its correctness time is controlled by DNS TTLs. For stable configuration, server admin can provide DNS record with long TTL (e.g. a few days) so DNS request will be always fulfilled using the nearest caching resolver or nscd. For changing configuration, TTL can be reduced to a few seconds or ever to 0 (means no caching), and it can be useful for some kind of load balancers. You try to organize your own resolver cache which is TTL ignorant, and this can lead to requests to non-functioning or wrong servers, with incorrect contents. So, I suggest not to do this.
If you are strictly sure you shall do this and you can't use external tools as custom resolver or even /etc/hosts, try to install custom "opener" (see urllib2.build_opener() function in documentation) which overrides DNS lookup. However I didn't do this ever, the knowledge is only on documentation read just now.

You can add the ip address mapping to the hosts file.

Related

Get gTLD or ccTLD from IP address

There are many questions on SO related to fetching an IP address from URL, but not vice versa.
As the title suggests, I would like to get the website URL of its respective IP address. For instance:
>>> import socket
>>> print(socket.gethostbyname('google.com'))
This looks up the domain and returns 172.217.20.14. I am looking for the counter part like e.g.:
>>> print(socket.getnamebyhost('172.217.20.14'))
Anything similar that would return the domain as google.com for the IP specified.
Is this possible to do in python3?
If yes, how can this be achieved?
UPDATE
Unfortunately, the way I'm approaching this is wrong. There are IPs that share a one-to-many relationship i.e. the nameserver points to numerous urls, unless the PTR record indicates otherwise. My question rephrased:
How do IP-to-domain data providers like ipinfo.io return
top-level domains for a single IP?
To my understanding, the A or AAAA records play an important role, but the only thing I get from these are ns rather than the domain. I don't know how to extract the gTLD or ccTLD from the records. I'm open to any suggestions, if anyone is willing to share an answer on how to parse gTLD(s) or ccTLD(s) from any IP. Preferably in python, but a shell script would also suffice.
The socket.gethostbyaddr('172.217.20.14'), would be the right way to go here, but not necessarily. Here's why:
Domain to IP resolution goes like:
domain > root server > origin server > origin server's hostname to IP configurations.
Now to reverse engineer it, we have to take into account:
There can be multiple domains sharing that same IP address as is the case with shared hosting.
Assuming the domain has dedicated IP, the nslookup or gethostbyaddr 'should' return the domain name, but there can be proxy servers in-front, like Cloudflare and whatever Google is using.
So even if you do this manually like try to find out actual IP google's server is running on you cannot, as that would open their central server for all kinds of attacks, most importantly DDoS.

Python Uvicorn – obtain SSL certificate information

I have a gunicorn + uvicorn + fastApi stack.
(Basically, I am using https://hub.docker.com/r/tiangolo/uvicorn-gunicorn-fastapi docker image).
I've already implemented SSL based authentication by providing appropriate gunicorn configuration options: certfile, keyfile, ca_certs, cert_reqs.
And it works fine: user have to provide a client SSL certificate in order to be able to make an API calls.
What I need to do now is to obtain client certificate data and pass it further (add it to request headers) into my application, since it contains some client credentials.
For example, I've found a way to do it using gunicorn worker by overrding gunicorn.workers.sync.SyncWorker: https://gist.github.com/jmvrbanac/089540b255d6b40ca555c8e7ee484c13.
But is there a way to do the same thing using UvicornWorker? I've tried to look through the UvicornWorker's source code, but didn't find a way to do it.
I went deeper into the Uvicorn source code, and as far as I understand, in order to access the client TLS certificate data, I need to do some tricks with python asyncio library (https://docs.python.org/3/library/asyncio-eventloop.html), possibly with Server (https://docs.python.org/3/library/asyncio-eventloop.html#asyncio.Server) class and override some of the UvicornWorker's methods.
I am still not quite sure if it is possible to achieve the desired result though.
I ended up setting the nginx (Openresty) in front of my server and added a script to get a client certificate and put it into header.
Here is a part of my nginx config:
set_by_lua_block $client_cert {
local client_certificate = ngx.var.ssl_client_raw_cert
if (client_certificate ~= nil) then
client_certificate = string.gsub(client_certificate, "\n", "")
ngx.req.set_header("X-CLIENT-ID", client_certificate)
end
return client_certificate
}
It is also possible to extract some specific field from a client certificate (like CN, serial number etc.) directly inside nginx configuration, but I decided to pass the whole certificate further.
My problem is solved without using gunicorn as I originally wanted though, but this is the only good solution I've found so far.

How do I safely get the user's ip address in Flask that has a proxy?

I am using Flask and need to get the user's IP address. This is usually done through request.remote_addr but since this app is hosted at a 3rd party (and using cloudflare) it just returns the localhost.
Flask suggests getting the X-Forwarded-Host but then they immediately say it is a security risk. Is there a safe way to get the client's real ip?
The Problem
The issue here is not that the ProxyFix itself will cause the user to get access to your system, but rather the fact that the ProxyFix will take what was once mostly reliable information and replace it instead with potentially unreliable information.
For starters, when you don't use ProxyFix, the REMOTE_ADDR attribute is most likely retrieved from the source IP address in the TCP packets. While not impossible, the source IP address in TCP packets are tough to spoof. Therefore, if you need a reliable way to retrieve the user's IP address, REMOTE_ADDR is a good way to do it; in most cases, you can rely on it to provide you something that is accurate when you do request.remote_addr.
The problem is, of course, in a reverse-proxy situation the TCP connection is not coming from the end user; instead, the end user makes a TCP connection with the reverse proxy, and the reverse proxy then makes a second TCP connection with your web app. Therefore, the request.remote_addr in your app will have the IP address of the reverse proxy rather than the original user.
A Potential Solution
ProxyFix is supposed to solve this problem so that you can make request.remote_addr have the user's IP address rather than the proxy. It does this by looking at the typical HTTP header that remote proxies (like Apache and Nginx) add into the HTTP header (X-Forwarded-For) and use the user's IP address it finds there. Note that Cloudflare uses a different HTTP Header, so ProxyFix probably won't help you; you'll need to write your own implementation of this middleware to get request.remote_addr to use the original client's IP address. However, in the rest of this answer I will continue to refer to that fix as "ProxyFix".
This solution, however, is problematic. The problem is that while the TCP header is mostly reliable, the HTTP headers are not; if a user can bypass your reverse proxy and send data right to the server, they can put whatever they want in the HTTP header. For example, they can make the IP address in the HTTP header the IP address of someone else! If you use the IP address for authentication, the user can spoof that authentication mechanism. If you store the IP address in your database and then display it in your application to another user in HTML, the user could inject SQL or Javascript into the header, potentially causing SQL injection or XSS vulnerabilities.
So, to summarize; ProxyFix takes a known mostly-safe solution to retrieve the user's IP address from a TCP packet and switches it to using the not-very-safe-by-itself solution of parsing an easily-spoofed HTTP header.
Therefore, the recomendation to use ProxyFix ONLY in reverse proxy situations means just that: don't use this if you accept connections from places that are NOT the proxy. This is often means have the reverse proxy (like Nginx or Apache) handle all your incoming traffic and have your application that actually uses ProxyFix safe behind a firewall.
You should also read this post which explains how ProxyFix was broken in the past (although is now fixed). This will also explains how ProxyFix works, and give you ideas on how to set your num_proxies argument.
A Better Solution
Let's say your user is at point A, they send the request to Cloudflare (B) which eventually sends the request to your final application (point C). Cloudflare will send the IP address of A in the CF-Connecting-IP header.
As explained above, if the user finds the IP address to point C, they could send a specially crafted HTTP request directly to point C which includes any header info they want. ProxyFix will use its logic to determine what the IP address is from the HTTP header, which of course is problematic if you rely on that value for, well, mostly anything.
Therefore, you might want to look at using something like mod_cloudflare, which allows you to do these proxy fixes directly in the Apache mod, but only when the HTTP connection comes from Cloudflare IP addresses (as defined by the TCP IP source). You can also have it only accept connections from Cloudflare. See How do I restore original visitor IP to my server logs for more info on this and help doing this with other servers (like Nginx).
This should give you a start. However, keep in mind that you're still not "safe": you've only shut down one possible attack vector, and that attack vector assumed that the attacker knew the IP address of your actual application. In that case, the malicious user could try to do a TCP attack with a spoofed Cloudflare IP address, although this would be extremely difficult. More likely, if they wanted to cause havoc, they would just DDOS your source server since they've bypassed Cloudflare. So, there are plenty more things to think about in securing, your application. Hopefully this helps you with understanding how to make one part slightly safer.

Python Sockets: gethostbyaddr : Reverse DNS Lookup Failure

I've been having a problem with getting the host name while using socket.gethostbyaddr(ip_addr) on specific sites.
I will not go into detail about which site this is not working for.
so getting the host by name works fine for every site I've tried so far , but then when I try to get the site name from I get an error say
ing host not found.
A fix or an alternative would be nice for this to have complete data. If there is no fix I can merely leave out the host name. no biggie. Thanks for the help.
# not full code
hostip = socket.gethostbyname(hostname)
print socket.gethostbyaddr(hostip)
Error: socket.herror: [Errno 11004] host not found
Not every IP address has reverse DNS. Sometimes this is on purpose, sometimes it's because you're looking at an internal address and there's no need for it inside the network so it wasn't worth setting up, sometimes someone just screwed up.
Why would anyone do this on purpose? Most commonly, because multiple domain names map to the same IP address.
For example, a shared hosting site might map sites for three of its customers, www.foo.com, www.bar.com, and www.baz.com, all to 1.2.3.4. HTTP gives you the requested host name in a Host: header, so it can figure out which site your browser wanted to go. But outside of HTTP (or some other higher-level protocol), there's no way to figure out which of the three names you meant with 1.2.3.4. So, there's nothing they can provide that would be useful to you. There may also be a name like shared_1234.hostingcompany.com which is useful to their own IT people, in which case they might provide that, but otherwise, they won't bother with any reverse DNS.

Alternate host/IP for python script

I want my Python script to access a URL through an IP specified in the script instead of through the default DNS for the domain. Basically I want the equivalent of adding an entry to my /etc/hosts file, but I want the change to apply only to my script instead of globally on the whole server. Any ideas?
Whether this works or not will depend on whether the far end site is using HTTP/1.1 named-based virtual hosting or not.
If they're not, you can simply replace the hostname part of the URL with their IP address, per #Greg's answer.
If they are, however, you have to ensure that the correct Host: header is sent as part of the HTTP request. Without that, a virtual hosting web server won't know which site's content to give you. Refer to your HTTP client API (Curl?) to see if you can add or change default request headers.
You can use an explicit IP number to connect to a specific machine by embedding that into the URL: http://127.0.0.1/index.html is equivalent to http://localhost/index.html
That said, it isn't a good idea to use IP numbers instead of DNS entries. IPs change a lot more often than DNS entries, meaning your script has a greater chance of breaking if you hard-code the address instead of letting it resolve normally.

Categories

Resources