MOD_WSGI : "SERVER_NAME"( of php) equivalency in Python or WSGI? - python

Suppose there are 100 domain names pointed to my
server's IP address.
and I have not set up VirtualHosts for these domains.
and I do not plan on..
But what I want to do is simply extract what is before the.comso if the domain nameis:111.comall i want is the "111" value.my index page is a wsgi script. which means..
when these domains come to my server IP ..i can simply extract the value before the .com
depending on the value.. i can return different content..
and thus eliminate the need for creating VirtualHosts and so on via Apache and httpd.conf
is this a viable method.
if so.. i also would like to know the PHP equivalence for..SERVER_NAME
so that i can extract domains that point to my server ip.

The environ argument passed to the WSGI application handler contains the items normally found in $_SERVER in PHP, including a "SERVER_NAME" item.
print >>sys.stderr, environ['SERVER_NAME']

Related

Get gTLD or ccTLD from IP address

There are many questions on SO related to fetching an IP address from URL, but not vice versa.
As the title suggests, I would like to get the website URL of its respective IP address. For instance:
>>> import socket
>>> print(socket.gethostbyname('google.com'))
This looks up the domain and returns 172.217.20.14. I am looking for the counter part like e.g.:
>>> print(socket.getnamebyhost('172.217.20.14'))
Anything similar that would return the domain as google.com for the IP specified.
Is this possible to do in python3?
If yes, how can this be achieved?
UPDATE
Unfortunately, the way I'm approaching this is wrong. There are IPs that share a one-to-many relationship i.e. the nameserver points to numerous urls, unless the PTR record indicates otherwise. My question rephrased:
How do IP-to-domain data providers like ipinfo.io return
top-level domains for a single IP?
To my understanding, the A or AAAA records play an important role, but the only thing I get from these are ns rather than the domain. I don't know how to extract the gTLD or ccTLD from the records. I'm open to any suggestions, if anyone is willing to share an answer on how to parse gTLD(s) or ccTLD(s) from any IP. Preferably in python, but a shell script would also suffice.
The socket.gethostbyaddr('172.217.20.14'), would be the right way to go here, but not necessarily. Here's why:
Domain to IP resolution goes like:
domain > root server > origin server > origin server's hostname to IP configurations.
Now to reverse engineer it, we have to take into account:
There can be multiple domains sharing that same IP address as is the case with shared hosting.
Assuming the domain has dedicated IP, the nslookup or gethostbyaddr 'should' return the domain name, but there can be proxy servers in-front, like Cloudflare and whatever Google is using.
So even if you do this manually like try to find out actual IP google's server is running on you cannot, as that would open their central server for all kinds of attacks, most importantly DDoS.

Scrapy - use multiple IP Addresses for a host

Wasn't able to find anything in the docs/SO relating to my question.
So basically I'm crawling a website with 8 or so subdomains
They are all using Akamai/CDN.
My question is if I can find Ips of a few different Akamai data centres can, I somehow explicitly say this subdomain should use this ip for the host name etc.. So basically override the auto dns resolving...
As this would allow greater efficiency and I would imagine less likely to be throttled as I'd be distributing the crawling.
Thanks
You can just set your DNS names manually in your hosts file. On windows this can be found at C:\Windows\System32\Drivers\etc\hosts and on Linux in /etc/hosts

can WSGI get the full URL rather than simply : environ['SERVER_NAME'] ?.. if so.. Mod_Rewrite alternative?

Currently..
environ['SERVER_NAME']
can get the domain name. such as..
domaint.tld
or
sub.domain.tld
but can it also get..
domain.tld/file/in/question.extension
also ?
if that is possible.. then can this somehow replace.. mod rewrite and do something like this..
suppose the url is
domain.tld/01
we split this at
/
and then if it starts with 0 it must be something that we are looking for..
use the number after '0'
to pull out variable content.
can this replace mod_rewrite ?
i do not like touching mod_rewrite due to complex stuff of regex..
It cannot really replace mod_rewrite if the intent is that the rewritten URL be fed back in as a sub request into Apache as a whole with request content still available. All you can really do is perform URL rewriting for use of the URL and request within a specific Python WSGI application.
The qualification on the first about request content is that when using daemon mode of mod_wsgi, it is possible to return a 200 response with no content and a Location response header, where the value of the Location header is a server relative URL. This will cause a new sub request to be performed at the Apache level, but in doing that it can only issue a GET request with no request content.
The 200/Location response can thus be used as a way for a WSGI application to reroute a request the WSGI application handles back to Apache to serve up a different resource, such as a static file or even a PHP application. This is by no means though a general mechanism that could be used to replace mod_rewrite and would only be suitable in certain use cases.
And on the specific point of access to other parts of the original request information, you have:
SERVER_NAME
SCRIPT_NAME
PATH_INFO
See for example URL reconstruction in the WSGI specification at:
https://www.python.org/dev/peps/pep-3333/#url-reconstruction
Under Apache/mod_wsgi you also have REQUEST_URI, but that can be complicated to deal with as is not decoded nor normalised and can be a server relative URL or technically a URI format including scheme and host/port information.

How is the django's HttpRequest.META dictionary populated?

How is the django's HttpRequest.META dictionary populated? Do all the keys and values come from the headers of the http - request sent by the client? If so, must I assume that all of these values can be modified by the client?
I am asking because I can't find most of the keys in the headers that are displayed in my chrome debugging console. And some of those keys are definitely not the client's business, for example the username of a user logged in via Shibboleth. It makes no sense to me why this kind of data would be sent first from the server to the client and then back to the server via the http-request.
Most of request.META comes from the script's environment, cf the django.core.handler.wsgi.WSGIRequest class initializer. I'm talking about the wsgi handler only here but AFAICT it's currently the only concrete handler subclass and all other deployement options end up using wsgi one way or another (cf django.core.server.fastcgi and django.core.server.basehttp).
IOW: what you get in request.META depends on what the calling script passed in, which depends on the front server etc.
I believe you are correct, and the data should never be trusted, the only thing between you and the client is the server, e.g. ngix, which might modify the header, e.g. allow only certain size and so on. but i could be wrong :)
I meet the same problem when i try to add custom request.META keys in DJango.
From the official document:
With the exception of CONTENT_LENGTH and CONTENT_TYPE, as given above, any HTTP headers in the request are converted to META keys by converting all characters to uppercase, replacing any hyphens with underscores and adding an HTTP_ prefix to the name. So, for example, a header called X-Bender would be mapped to the META key HTTP_X_BENDER.
Note that runserver strips all headers with underscores in the name, so you won’t see them in META. This prevents header-spoofing based on ambiguity between underscores and dashes both being normalizing to underscores in WSGI environment variables. It matches the behavior of Web servers like Nginx and Apache 2.4+.
That means if you add a header named "new_meta", it will convert to "HTTP_NEW_META", add a header named "new" will convert to "HTTP_NEW".
By the way, the other keys in request.META without "HTTP_" prefix is come from the server env, you can found them by run export on your server host.

Alternate host/IP for python script

I want my Python script to access a URL through an IP specified in the script instead of through the default DNS for the domain. Basically I want the equivalent of adding an entry to my /etc/hosts file, but I want the change to apply only to my script instead of globally on the whole server. Any ideas?
Whether this works or not will depend on whether the far end site is using HTTP/1.1 named-based virtual hosting or not.
If they're not, you can simply replace the hostname part of the URL with their IP address, per #Greg's answer.
If they are, however, you have to ensure that the correct Host: header is sent as part of the HTTP request. Without that, a virtual hosting web server won't know which site's content to give you. Refer to your HTTP client API (Curl?) to see if you can add or change default request headers.
You can use an explicit IP number to connect to a specific machine by embedding that into the URL: http://127.0.0.1/index.html is equivalent to http://localhost/index.html
That said, it isn't a good idea to use IP numbers instead of DNS entries. IPs change a lot more often than DNS entries, meaning your script has a greater chance of breaking if you hard-code the address instead of letting it resolve normally.

Categories

Resources