Best way to obtain a secure connection using Python urllib

Best way to obtain a secure connection using Python urllib - python

Hope this will be an easy question for someone to answer. I'm in the process of programming an application in Python and part of the application uses an API to obtain a download link and then uses that link to download the corresponding server updates. Currently I'm accomplishing this using urllib.request.urlopen() at the moment and would like to do so securely. Therefore I'm wondering if just specifying https in the URL is enough or if I have to use the context parameter in addition?
The Python documentation is a bit vague in how it handles HTTPS requests but as I understand it right now specifying https in the URL should be sufficient.

In the documentation for urllib.request.urlopen, there is a reference to the http.client.HTTPSConnection class in regards to the context parameter. In the documentation for the HTTPSConnection class, there is a link for security considerations. In here it states:
For client use, if you don’t have any special requirements for your security policy, it is highly recommended that you use the create_default_context() function to create your SSL context. It will load the system’s trusted CA certificates, enable certificate validation and hostname checking, and try to choose reasonably secure protocol and cipher settings.
Given that the documentation on urllib.request.urlopen() shows that context is an optional parameter, you probably don't HAVE to use it to make secure https connections, but given what the security considerations section says, I would use
ssl.create_default_context()
to generate the context just as good practice
urllib.request.urlopen("https://www.stackoverflow.com", context=ssl.create_default_context())
EDIT
Upon reviewing the source code for urllib.request.urlopen, if you do not specify a context but you use an https url, it looks like it will provide a default context for you. If you don't provide a context to urlopen() it will call build_opener() and in THAT function's comments it states
The opener will use several default handlers, including support
for HTTP, FTP and when applicable HTTPS.
So the final answer is you should be fine with providing no context, all it should need is the url

Related

How to pass HTTP/HTTPS proxy credentials to boto3?

How can a proxy username & password be passed to boto3 without using environment variables?
There is a similar stack question, however the question & answer focus on the url/port specification. I am sitting behind a corporate proxy and need to also specify user credentials; but I am not allowed to put my login credentials in an environment variable.
I will have to read the username/password into memory, but once I have them where in boto3 do they get inputed?
Thank you in advance for your consideration and response.

Proxy configuration for boto3 is described here.
Passing username and password is not documented, however if you look at the underlying code (httpsession.py), it will extract username and password from a URL like https://username:password#example.com:443, and insert a Proxy-Authorization header using basic auth.
If that works with your company, you should be OK. However, some proxies require a different authorization method, and this will fail.
In that case will need to discuss your company's exact proxy mechanism with your IT group. They may suggest work-arounds, such as running your own proxy to handle authentication. Or they may permit you to use a cloud development tool that avoids the use of proxies.
I mention this because your deployment environment -- whether cloud or a local data center -- probably doesn't use an authenticating proxy. Which means that code written with the expectation of such a proxy won't work in a production deployment.

Is the data the requests library sends secure in python? [duplicate]

I'm about to use Python.requests to get data from my own online api to my local pc. My api requires authentication which for now is done trough simply posting user/pass:
params = {'user': 'username', 'pass':'password'}
requests.post(url, params=params)
Are this requests safe or is it going to allow a middle-man to capture that user/pass?
P.S My api is using a letsencrypt ssl certificate. Python version 3.7.0

this has nothing to do with the python-requests package, but with the HTTP (and HTTPS) protocols. HTTP is plain-text so anyone that manages to sniff your packets can read the content (hence the username/password pair in clear text). HTTPS uses strong encryption, so even someone sniffing your traffic will have a hard-time deciphering it - no encryption scheme is 100% safe of course but decrypting SSL traffic is currently way too costly even for the NSA.
IOW, what will make your requests "safe" is the use of the HTTPS protocol, not which python (or not python) package you use to write your client code.

Use the HTTPS protocol and it's safe provided you have a valid SSL certificate on your api. If you still feel paranoid/insecure, you can implement end-to-end encryption using an existing algorithm or create your custom algorithm either.

nodejs or python proxy lib with relative url support

I am looking for a lib that lets me roughly:
connect to localhost:port, but see http://somesite.com
rewrite all static assets to point to localhost:port instead of somesite.com
support cookies / authentication
i know that http://betterinternet.co/ does this already, but they wont give me their source code for some reason.
I assume this doesnt exist as free code, so if i were to write one, are there any nuances to it? If i replace all occurrences of somesite.com in html and headers, will that be enough?

So...you want an http proxy that does link rewriting? Sounds like Apache and mod_proxy_html. It's not written in node or Python, but I think it will do what you're asking.

I don't see any straight forward solution to your problem. If I've understood correctly, you want a caching HTTP proxy which serves static contents locally, with URL rewriting rules defined in Python (or nodejs). That's quite a task.
A caching HTTP proxy implementation is not trivial. So I'd use an existing implementation, such as Squid (or Apache if it does caching too).
You could then place a (relatively) simple HTTP server written in Python in front of that (e.g. based on BaseHTTPServer and urllib2) which performs the URL rewriting as you want them and forwards the requests to the proxy (or direct to internet).
The idea would be to rely on the proxy setup to perform all the processing you don't want to modify (including basic rewrite rules, authentication, caching and cache management) and limit your front-end implementation to performing only the custom rewriting you are interested in.

How to use SSL in Python?

I want to use SSL on Google App Engine. Is there a 3rd-party Python module I must use or can I just use the Google SDK?

Should work just fine out of the box, see;
https://code.google.com/appengine/docs/python/config/appconfig.html#Secure_URLs

"Use" SLL for what? Joachim has answered regarding serving your pages over SSL.
If you want an SSL client, then urlfetch allows https URLS. It gives you no control other than the "validate_certificate" boolean parameter, and I don't immediately see any documentation of what CAs/certificates it trusts. Of course it doesn't support any protocol other than HTTPS, but that's in keeping with the fact that in general, GAE does not allow free use of sockets.

For user-based and certificate-based authentication, do I want to use urllib, urllib2, or curl?

A few months ago, I hastily put together a Python program that hit my company's web services API. It worked in three different modes:
1) HTTP with no authentication
2) HTTP with user-name and password authentication
3) HTTPS with client certificate authentication
I got 1) to work with urllib, but ran into problems with 2) and 3). Instead of figuring it out, I ended up calculating the proper command-line parameters to curl, and executing it via os.system().
Now I get to re-write this program with the benefit of experience, and I'm not sure if I should use urllib, urllib2, or just stick with curl.
The urllib documentation mentions:
When performing basic authentication, a FancyURLopener instance
calls its prompt_user_passwd() method. The default implementation
asks the users for the required information on the controlling
terminal. A subclass may override this method to support
more appropriate behavior if needed.
It also mentions the **x509 argument to urllib.URLopener():
Additional keyword parameters, collected in x509, may be used for
authentication of the client when using the https: scheme. The
keywords key_file and cert_file are supported to provide an SSL
key and certificate; both are needed to support client
authentication.
But urllib2 is one greater than urllib, so naturally I want to use it instead. The urllib2 documentation is full of information about authentication handlers that seem to be designed for 2) above, but makes no mention whatsoever of client certificates.
My question: do I want to use urllib, because it appears to support everything I need to achieve? Or should I just stick with curl?
Thanks.
Edit: Maybe my question isn't specific enough, so here's another shot. Can I achieve what I want to do with urllib? Or with urllib2? Or am I forced to use curl out of necessity?

I believe that mechanize is the module you need.
EDIT: mechanize objects have this method for authentication: add_password(self, url, user, password, realm=None)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.