Get in python what web server is used by a website - python

How can I know if a website is using apache, nginx or other and get this information in python? Thanks in advance

This information if available is given in the header of the response to a HTTP Request. With Python you can perform HTTP requests using the module requests.
Make a simple GET request to the interested site and then print the headers parameter of the returned object.
import requests
r = requests.get(YOUR_SITE)
print(r.headers)
The output is made of a dictionary of keys and value, you have to look for the Server parameter
server = r.headers['Server']
You must be aware that not all websites return this information for several reasons, so you could not find this key in the response header.

Related

Freedom of information act API. API key error

I am having some trouble running the Freedom of information act API in python. I am sure it is related to how I am implementing my API key but I am uncertain as to where I am dropping the ball. Any help is greatly appreciated.
import requests
apikey= ''
api_base_url = f"https://api.foia.gov/api/webform/submit"
endpoint = f"{api_base_url}{apikey}"
r = requests.get(endpoint)
print(r.status_code)
print(r.text)
there error I receive is requests.exceptions.InvalidSchema: No connection adapters were found for this website. Thanks again
According to the documentation, the API requires the API key to be passed as a request header parameter ("X-API-Key"). Your python code appears to be simply concatenating the API key and the URL.
The following Q&A explains how to set a request header using requests.
Using headers with the Python requests library's get method
It would be something like this:
import requests
apikey= ...
api_base_url = ...
r = requests.get(api_base_url,
headers={"X-API-Key": apikey})
print(r.status_code)
print(r.text)
Note that the documentation for the FOIA site explains what you need to do to submit a FIOA request form. It is significantly different to what your Python code is apparently trying to do. I would advise you to read the documentation. Also read the manual entry for the "curl" command so that you understand the requests that the examples show.

Unable to get complete source code of web page using Python [duplicate]

I would like to try send requests.get to this website:
requests.get('https://rent.591.com.tw')
and I always get
<Response [404]>
I knew this is a common problem and tried different way but still failed.
but all of other website is ok.
any suggestion?
Webservers are black boxes. They are permitted to return any valid HTTP response, based on your request, the time of day, the phase of the moon, or any other criteria they pick. If another HTTP client gets a different response, consistently, try to figure out what the differences are in the request that Python sends and the request the other client sends.
That means you need to:
Record all aspects of the working request
Record all aspects of the failing request
Try out what changes you can make to make the failing request more like the working request, and minimise those changes.
I usually point my requests to a http://httpbin.org endpoint, have it record the request, and then experiment.
For requests, there are several headers that are set automatically, and many of these you would not normally expect to have to change:
Host; this must be set to the hostname you are contacting, so that it can properly multi-host different sites. requests sets this one.
Content-Length and Content-Type, for POST requests, are usually set from the arguments you pass to requests. If these don't match, alter the arguments you pass in to requests (but watch out with multipart/* requests, which use a generated boundary recorded in the Content-Type header; leave generating that to requests).
Connection: leave this to the client to manage
Cookies: these are often set on an initial GET request, or after first logging into the site. Make sure you capture cookies with a requests.Session() object and that you are logged in (supplied credentials the same way the browser did).
Everything else is fair game but if requests has set a default value, then more often than not those defaults are not the issue. That said, I usually start with the User-Agent header and work my way up from there.
In this case, the site is filtering on the user agent, it looks like they are blacklisting Python, setting it to almost any other value already works:
>>> requests.get('https://rent.591.com.tw', headers={'User-Agent': 'Custom'})
<Response [200]>
Next, you need to take into account that requests is not a browser. requests is only a HTTP client, a browser does much, much more. A browser parses HTML for additional resources such as images, fonts, styling and scripts, loads those additional resources too, and executes scripts. Scripts can then alter what the browser displays and load additional resources. If your requests results don't match what you see in the browser, but the initial request the browser makes matches, then you'll need to figure out what other resources the browser has loaded and make additional requests with requests as needed. If all else fails, use a project like requests-html, which lets you run a URL through an actual, headless Chromium browser.
The site you are trying to contact makes an additional AJAX request to https://rent.591.com.tw/home/search/rsList?is_new_list=1&type=1&kind=0&searchtype=1&region=1, take that into account if you are trying to scrape data from this site.
Next, well-built sites will use security best-practices such as CSRF tokens, which require you to make requests in the right order (e.g. a GET request to retrieve a form before a POST to the handler) and handle cookies or otherwise extract the extra information a server expects to be passed from one request to another.
Last but not least, if a site is blocking scripts from making requests, they probably are either trying to enforce terms of service that prohibit scraping, or because they have an API they rather have you use. Check for either, and take into consideration that you might be blocked more effectively if you continue to scrape the site anyway.
One thing to note: I was using requests.get() to do some webscraping off of links I was reading from a file. What I didn't realise was that the links had a newline character (\n) when I read each line from the file.
If you're getting multiple links from a file instead of a Python data type like a string, make sure to strip any \r or \n characters before you call requests.get("your link"). In my case, I used
with open("filepath", 'w') as file:
links = file.read().splitlines()
for link in links:
response = requests.get(link)
In my case this was due to fact that the website address was recently changed, and I was provided the old website address. At least this changed the status code from 404 to 500, which, I think, is progress :)

Using python, Is it possible to directly send form data to a website server and receive response without using a browser?

I took a programming class in python so I know the basics of the language. A project I'm currently attempting involves submiting a form repeatedly untill the request is successful. In order to achieve a faster success using the program, I thought cutting the browser out of the program by directly sending and recieving data from the server would be faster. Also the website I'm creating the program for has a tendency to crash but I'm pretty sure i could still receive and send response to the server. Currently, im just researching different resources I could use to complete the task. I understand mechanize is easy to fill forms and submit them, but it requires a browser. So my question is what would be the best resource to use within python to communicate directly with the server without a browser.
I apologize if any of my knowledge is flawed. I did take the class but I'm still relatively new to the language.
Yes, there are plenty of ways to do this, but the easiest is the third party library called requests.
With that installed, you can do for example:
requests.post("https://mywebsite/path", {"key: "value"})
You can try this below.
from urllib.parse import urlencode
from urllib.request import Request, urlopen
url = 'https://httpbin.org/post' # Set destination URL here
post_fields = {'foo': 'bar'} # Set POST fields here
request = Request(url, urlencode(post_fields).encode())
json = urlopen(request).read().decode()
print(json)
I see from your tags that you've already decided to use requests.
Here's how to perform a basic POST request using requests:
Typically, you want to send some form-encoded data — much like an HTML
form. To do this, simply pass a dictionary to the data argument. Your
dictionary of data will automatically be form-encoded when the request
is made
import requests
payload = {'key1': 'value1', 'key2': 'value2'}
response = requests.post("http://httpbin.org/post", data=payload)
print(response.text)
I took this example from requests official documentation
I suggest you to read it and try also other examples available in order to become more confident and decide what approach suits your task best.

Python requests - saving cookie for later url usage

I been trying to get a cookie and post it to a url in later use in the program, but I cant seem to get the cookie parameters to work.
Right now I have
response = requests.get("url")
But how exactly do I retrive cookies from this url and post them to a new url (the same cookies). The tutorial in requests is somewhat vague on the topic and gives examples I cannot test. Hope someone can help with further examples.
This is python 2.7 btw.
You want to use a session:
s = requests.session()
response = s.get('url')
You use the session just like the requests module (it has the same methods), but it'll retain cookies for you and send them along on future requests.

shopify xml request from GAE python

I'm trying to make requests to the shopify.com API over GAE python
the url i have to request is not formed in the usual format.
it is composed like http://apikey:password#hostname/admin/resource.xml
with urllib I can request it but i cant set the headers for an xml request so it doesn't work.
urllib2, httplib... are having problems with the ':'.
I get either a 'nodename nor servname provided, or not known' or a 'nonnumeric port' because it expects a port number after the semicolon.
any help?
Look into how to do HTTP Basic authentication in Python. See especially the section on Doing it Properly.

Categories

Resources