How to check if a domain has a ssl certificate in python?

How to check if a domain has a ssl certificate in python? - python

So, I have a problem and 2 hours of research haven't helped.
I want to write a method that will return me a Boolean if a web-domain(e.g. "google.com", without http/s) has an ssl-certificate.
I have a big csv of domain names, which I need to process and check if the domains have a ssl-certificate, is there a method maybe in pythons ssl module?
Thank you for your help, TomiiPomii.

By using module ssl you can call ssl.get_server_certificate((host, port)) to retrieve SSL of specified host in PEM format. You will then have to parse the PEM file to actually retrieve the certificate values, but if you only care about the website having or not having it, simply check the return value.

Related

Python Request: SSL Verify

I am using python request module to hit rest api. I have to use SSL for security measures.
I see that i can set
requests.get(url,verify=/path/ca/bundle/)
However i am confused as to what needs to be passed as CA_BUNDLE?
I get the server certificate using
cert = ssl.get_server_certificate((server,port))
Can someone let me know, how i should use this certificate in my request? Should i convert the cert to X509/.pem/.der/.crt file ?

Solved it. Apparently i needed to get the entire certificate chain and create a CA bundle out of it.

Trying to extract Certificate information in Python

I am very new to python and cannot seem to figure out how to accomplish this task. I want to connect to a website and extract the certificate information such as issuer and expiration dates.
I have looked all over, tried all kinds of steps but because I am new I am getting lost in the socket, wrapper etc.
To make matters worse, I am in a proxy environment and it seems to really complicate things.
Does anyone know how I could connect and extract the information while behind the proxy?

As explained in this Answer:
You can still the server certificate with the
ssl.get_server_certificate() function, but it returns it in PEM
format.
import ssl
print ssl.get_server_certificate(('server.test.com', 443))
From here, I would use M2Crypto or OpenSSL to read the cert and get values:
# M2Crypto
cert = ssl.get_server_certificate(('www.google.com', 443))
x509 = M2Crypto.X509.load_cert_string(cert)
x509.get_subject().as_text()
# 'C=US, ST=California, L=Mountain View, O=Google Inc, CN=www.google.com'

Python SSL lib don't deal with proxies.

NET::ERR_CERT_COMMON_NAME_INVALID - Error Message

I built a website some time ago with Flask. Now all of a sudden when I try to navigate there I get the following:
NET::ERR_CERT_COMMON_NAME_INVALID
Your connection is not private
Attackers might be trying to steal your information from www.mysite.org (for example, passwords, messages, or credit cards). Learn more
Does anyone know what's going on?

The error means: The host name you use in the web browser does not match one of the names present in the subjectAlternativeName extension in the certificate.
If your server has multiple DNS entries you need to include all of into the certificate to be able to use them with https. If you access the server using it's IP address like https://10.1.2.3 then the IP address also have to present in the certificate (of course this only makes sense if you have a static IP address that never changes).

The certificate subject alternative name can be a domain name or IP address. If the certificate doesn’t have the correct subjectAlternativeName extension, users get a NET::ERR_CERT_COMMON_NAME_INVALID error letting them know that the connection isn’t private. If the certificate is missing a subjectAlternativeName extension, users see a warning in the Security panel in Chrome DevTools that lets them know the subject alternative name is missing.
https://support.google.com/chrome/a/answer/7391219?hl=en

For Chrome 58 and later, only the subjectAlternativeName extension, not commonName, is used to match the domain name and site certificate. So, if you are missing the Subject Alternative Name in your certificate then you will experience the NET::ERR_CERT_COMMON_NAME_INVALID error.
In order to have a Subject Alternate Name (SAN) on an SSL certificate, you must first edit your OpenSSL configuration. On Ubuntu/Debian, that can be found at /etc/ssl/openssl.cnf Find the section of that file with the heading [ v3_ca ], you can add the line with your SAN there:
subjectAltName = www.example.com

Is it safe to disable SSL certificate verification in pythons's requests lib?

I'm well aware of the fact that generally speaking, it's not. But in my particular case, I'm writing a simple python web-scraper which will be run as a cron job every hour and I'd like to be sure that it's not a risk to ignore verifying an SSL certificate by setting verify to False.
P.S.
The reason why I'm set on disabling this feature is because when trying to make a requests response = requests.get('url') It raises an SSLError and I don't see how to handle it.
EDIT:
Okay, with the help of sigmavirus24 and others I've finally managed to resolve the problem. Here's the explanation of how I did it:
I ran a test at https://ssllabs.com/ and according to the report provided by SSLLabs, the SSL error would get raised due to the "incomplete certificate chain" issue (for more details on how certificate verification works read sigmaviruses24's answer).
In my case, one of the intermediaries was missing.
I searched for its fingerprint using google and downloaded it in .pem format.
Then I used "certifi" (it's a python package for providing Mozilla's CA Bundle. If you don't have it, you can install it with sudo pip install certifi) to find the root cert (again by its fingerprint). This can be done as follows:
$ ipython
In [1]: import certifi
In [2]: certifi.where()
Out[2]: /usr/lib/python3.6/site-packages/certifi/cacert.pem
In [3]: quit
$ emacs -nw /usr/lib/python3.6/site-packages/certifi/cacert.pem
Or in bash you can issue $ emacs -nw $(python -m certifi) to open the cacert.pem file.
Concated two certs together in one file and then provided its path to the verify parameter.
Another (more simple but not always possible) way to do this is to download the whole chain from SSLLabs, right in front of the "Additional Certificates (if supplied)" section there's the "Downlaod server chain" button. Click it, save the chain in a .pem file and when calling requests's get method, provide the file path to the verify parameter.

The correct answer here is "it depends".
You've given us very little information to go on, so I'm going to make some assumptions and list them below (if any of them do not match, then you should reconsider your choice):
You are constantly connecting to the same website in your CRON job
You know the website fairly well and are certain that the certificate-related errors are benign
You are not sending sensitive data to the website in order to scrape it (such as login and user name)
If that is the situation (which I am guessing it is) then it should be generally harmless. That said, whether or not it is "safe" depends on your definition of that word in the context of two computers talking to each other over the internet.
As others have said, Requests does not attempt to render HTML, parse XML, or execute JavaScript. Because it simply is retrieving your data, then the biggest risk you run is not receiving data that can be verified came from the server you thought it was coming from. If, however, you're using requests in combination with something that does the above, there are a myriad of potential attacks that a malicious man in the middle could use against you.
There are also options that mean you don't have to forgo verification. For example, if the server uses a self-signed certificate, you could get the certificate in PEM format, save it to a file and provide the path to that file to the verify argument instead. Requests would then be able to validate the certificate for you.
So, as I said, it depends.
Update based on Albert's replies
So what appears to be happening is that the website in question sends only the leaf certificate which is valid. This website is relying on browser behaviour that currently works like so:
The browser connects to the website and notes that the site does not send it's full certificate chain. It then goes and retrieves the intermediaries, validates them, and completes the connection. Requests, however, uses OpenSSL for validation and OpenSSL does not contain any such behaviour. Since the validation logic is almost entirely in OpenSSL, Requests has no way to emulate a browser in this case.
Further, Security tooling (e.g., SSLLabs) has started counting this configuration against a website's security ranking. It's increasingly the opinion that websites should send the entire chain. If you encounter a website that doesn't, contacting them and informing them of that is the best course forward.
If the website refuses to update their certificate chain, then Requests' users can retrieve the PEM encoded intermediary certificates and stick them in a .pem file which they then provide to the verify parameter. Requests presently only includes Root certificates in its trust store (as every browser does). It will never ship intermediary certificates because there are just too many. So including the intermediaries in a bundle with the root certificate(s) will allow you to verify the website's certificate. OpenSSL will have a PEM encoded file that has each link in the chain and will be able to verify up to the root certificate.

This is probably one more appropriate on https://security.stackexchange.com/.
Effectively it makes it only slightly better than using HTTP instead of HTTPS. So almost all (apart from without the server's certificate someone would have to actively do something) of the risks of HTTP would apply.
Basically it would be possible to see both the sent and received data by a Man in The Middle attack.. or even if that site had ever been compromised and the certificate was stolen from them. If you are storing cookies for that site, those cookies will be revealed (i.e. if facebook.com then a session token could be stolen) if you are logging in with a username and password then that could be stolen too.
What do you do with that data once you retrieve it? Are you downloading any executable code? Are you downloading something (images you store on a web-server?) that a skilled attacker (even by doing something like modifying your DNS settings on your router) could force you to download a file ("news.php") and store on your web-server that could become executable (a .php script instead of a web-page)?

From the documentation:
Requests can also ignore verifying the SSL certficate if you set verify to False.
requests.get('https://kennethreitz.com', verify=False)
<Response [200]>
It is 'safe', if you aren't using sensitive information in your request.
You can't put a virus in the HTML itself (as far as I know), Javascript can be a vulnerability, so it's a great thing Python doesn't process it.
So all in all, you should be safe

Python requests send certificate as string

I cant seem to get the handshake working properly.
cert = 'path/to/cert_file.pem'
url = 'https://example.com/api'
requests.get(url, cert=cert, verify=True)
This is fine when I use it locally where I have the file physically.
We host our application on heroku and use environvariables.
The requests module doesnt seem to accept certificates as strings. eg.
$ export CERTIFICATE="long-list-of-characters"
requests.get(url, cert=get_env('CERTIFICATE'), verify=True)
I have also tried something like this:
cert = tempfile.NamedTemporaryFile()
cert.write(CERTIFICATE)
cert.seek(0)
requests.get(url, cert=cert.name, verify=True)
First of all, it works locally but not on heroku. Anyways, it doesnt feel like a solid solution.
I get a SSL handshake error.
Any suggestions?

Vasili's answer is technically correct, though per se it doesn't answer your question. The keyfile, truly, must be unencrypted to begin with.
I myself have just resolved a situation like yours. You were on the right path; all you had to do was
1. Pass delete=False to NamedTemporaryFile(), so the file wouldn't be deleted after calling close()
2. close() the tempfile before using it, so it would be saved
Note that this is a very unsafe thing to do. delete=False, as I understand, causes the file to stay on disk even after deleting the reference to it. So, to delete the file, you should manually call os.unlink(tmpfile.name).
Doing this with certificates is a huge security risk: you must ensure that the string with the certificate is secured and hidden and nobody has access to the server.
Nevertheless, it is quite a useful practice in case of, for example, managing your app both on a Heroku server as a test environment and in a Docker image built in the cloud, where COPY directives are not an option. It is also definitely better than storing the file in your git repository :D

This is an old question, but since I ended up here and the question wasn't answered I figure I'll point to the solution I came up with for a similar question that can be used to solve the OP's problem.
This can be done by monkey patching requests using this technique.

One simple hack is to use verify=False and not send the certificates at all. This works in most cases and when you are okay with not verifying the connection.

As per the requests documentation:
The private key to your local certificate must be unencrypted. Currently, Requests does not support using encrypted keys.
You can [also] specify a local cert to use as client side certificate, as a single file (containing the private key and the certificate) or as a tuple of both file's path:
requests.get('https://kennethreitz.com', cert=('/path/client.cert', '/path/client.key'))
You must include the path for both public and private key... or you can include the path to a single file that contains both.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.