I built a simple Flask application that receives a POST request and performs some actions after receiving it. Here is my simple code:
#app.route('/<user>/', methods=['POST'])
def Receiver(user):
Query = User.query.filter_by(token=user)
Content = request.data.decode('UTF-8')
Data = {'Content': Content, 'Username': Query.Username, 'UserID': Query.UserID}
return jsonify(Data)
I would like to make this code as safe as possible, but i'm just getting started to Flask and security in general. What dangers can i run using this code? I'm using the variable user to make a query to my database, can it be harmful if that variable gets set to an SQL query, for example? What other threats should i consider in this case?
Here some of my thoughts to your question:
Why is this a POST request and not a GET request? POST requests are meant to change data, GETs are for queries.
You don't validate input data. What happens, if the user sends you e.g. a 100kB long user name? How will the database handle it? Will it have impact on performance? Will it allow a DOS attack on server/database?
Yes, SQL injection too. Everywhere where relational databases are concerned.
What if the user ID does not exist? Should we not return 404?
What is actually security? What is safety? The two terms are not interchangeable. Safety is when the code does not harm the world. Security is when the world does not harm the code.
There is a wide variety of things to consider that could impact your code security (meaning providing confidentiality, integrity and accessibility of the data the code touches), that are unrelated to your code like: communication channel protection, server misconfigurations, DDOS attacks... Even if your code is perfect, the system holding it might still be insecure.
Just to add to what Marek said, would also recommend changing to a GET... As long as there's no sensitive information being passed along in the URL. This link nicely explains the differences. It might be a good idea to look at encrypting the URL token string too, so that any parameters aren't passed over in plain text, as this leaves room for vulnerability.
Alongside this, if the site is to be made live- definitely ensure to use SSL encryption.
In terms of SQL validation, you'll need to sanitize the input before it ever reaches the database. You can do this in Flask, simply by using the HTML escape special chars... But Flask provides their own function. This link might help in that regard.
In terms of error handling, I found this tutorial mighty useful. That whole series of blog posts walks you right the way through.
Related
Developing a simple chat system via Django - I've implemented some methods for sending/receiving messages.
For example ( this isn't my implementation but just an example one from another chat application ) :
def post(request):
time.sleep(2)
if not request.is_ajax():
HttpResponse (" Not an AJAX request ")
if request.method == 'POST':
if request.POST['message']:
message = request.POST['message']
to_user = request.POST['to_user']
ChatMessage.objects.create(sender = request.user, receiver = User.objects.get(username = to_user), message = message, session = Session.objects.get(session_key = request.session.session_key))
return HttpResponse (" Not an POST request ")
Now that I have the method written - I need to test to see if the message is added. At this point I have not written any JavaScript for I.e. intervals to refresh and wait for messages ect. Should I go straight into writing JS or test this method first and see if it works correctly then write the JS for it? sounds like an idiotic question but I'm finding it hard to understand how I'd go about testing the method...
In my opinion it doesn't matter. You will find many developers who are test-before, and many who are test-after. Implement it in whatever order you prefer. Sometimes when in haste it is easier to write a working prototype without any tests first (especially if the interfaces aren't finalized and still change a lot during development, in this case you don't waste time by adjusting a lot of tests every time you have to change some interfaces.). Just don't be lazy to write the tests at least when you already have the finalized interfaces.
What really matters is to have well defined interfaces. From a backend perspective you will have an interface implementation (the view in this case) and one or more users for the interface: both the frontend and the test are users of the backend implementation but there could be even more. After having the interface it really doesn't matter which one you implement first. By mocking one side of the interface you can implement the other side without having the original version of the mocked side. For example by mocking/faking server responses with javascript code (with pre-baked possibly constant data) you can write only a frontend without any backend code. You can also write the backend first, or a the tests for the backend... You decide.
In teams where you have specialized developers (frontend/backend) you can agree on an interface and then both on the frontend and the backend side you can "mock" (fake) the other side of the interface: The frontend guys write some code that emulates the server responses with some fake data, and the backend guys write some tests that emulate the client with some fake requests. This way the frontend and backend team doesn't have to wait for each other to finish with the code and both the frontend and the backend are testable alone. Of course later it is recommended to add end-to-end (e2e) testing that tests the whole stack connected together.
Again, what really matters most is usually having well defined interfaces and not the code that is written around the interfaces. In crappy systems the problem is usually that you have only code without interfaces... If a system is architecturally well built and the interfaces are well defined then quite a lot of crappy code written around the interfaces can be manageable.
In case of django views that have a well defined interface I usually develop the backend first along with the tests. In your case the django test is super simple: You just create a django test client (https://docs.djangoproject.com/en/1.8/topics/testing/tools/#test-client), post some fake requests with it to simulate the client and then you check whether the db contains the expected objects as a result.
Some additional advices:
Decorate your view with #require_POST
I think you shouldn't use request.is_ajax() to deny responding to the client. request.is_ajax() is usually used to find out what kind of response is needed by the client. If the POST request was sent by a form of an html page then you want to generate another html page as a response. If the request was sent using ajax then you usually want to respond with processable data (json, xml, etc...) instead of html.
Since your method has to have a request made by the client and then served a HttpResponse (on-the-whole), you can only check that via an AJAX query from the front-end. I would recommend building a quick AJAX query, also because, the ChatMessage creation involves a session_key, which can only be found when a request is made in the first place.
I am writing a Django app, and am wondering if any client side validation is necessary. Django handles all validation through forms in python on the backend. If something validates wrong, the user is returned to the screen with all their information still there.
I can't see any reason I need to implement client side validation in Django? Is this true? The only reason I can think of is it would save a few hits to the server, but this seems negligible.
If you have a web application that faces the public internet client side, validation is pretty much a user expectation. You might be able to ignore this if volume is low and people are motivated to use your website.
For an company intranet site, the additional development cost may weigh against client side validation. However, if you use an available client framework (e.g. jquery or django-parsley) the additional cost for client side validation is actually fairly small and likely worth the effort in intraweb applications.
ADDED
Yes, as others had already stated client-side only validation is very bad as it is the same as no validation -- you can coerce the browser to send whatever you want back to the server.
You can do also do lots of nice things client side that you cannot server side. Sometimes these are closely related to client side validation.
E.g., limiting a comment to 500 characters. With client side code you can display a characters remaining count on screen -- with a little planning this can be integrated with the validation code.
Client side validation may improve user experience (less page reloads). It may decrease number of hits to he server (but sometimes this number is increased :). But it is not necessary.
Anyway server side validation is a must. You can't trust data from user input.
This is largely a matter of opinion, but I would have to say no - you don't need to implement client-side validation. Especially when you can get all of the errors from your Django form returned as JSON via a simple Ajax POST.
Django forms already do an excellent job of validating input, so why add yet more code you have to maintain in two places that does the same thing? You absolutely MUST do server-side validation anyway, so why not just do it all in one place?
Additionally, if you don't implement the same validations on the server as on the client, or worse - only do client-side validation, someone can always turn JavaScript off in the browser and possibly bypass your validation(s) or allow junk data to get into your database if you're not careful.
I'm making a new RESTful API in Flask that should accept both GET (for requesting the resource) and PATCH (for performing various incremental, non-idempotent updates) for a given object. The thing is that some of the data that's patched in must be authenticated, and some shouldn't be.
To clarify this with an example, let's say I'm building an app that let's everyone query how many times a resource has been clicked on and how many times its page has been viewed. It also let's people do an update on the resource in javascript saying the resource was clicked again (unauthenticated, since it's coming from the front-end). It additionally let's an authenticated backend increment the number of times the page has been viewed.
So, following RESTful principles, I'm thinking all three actions should be done on the same path-- something like /pages/some_page_name which should accept both GET and PATCH and should accept two different kinds of data with PATCH. The problem is that in Flask, it looks like authentication is always done with a decorator around a method, so if I have a method like #app.route('/pages/<page_id>', methods=['GET', 'PATCH']), my authentication would be done with a decorator like #auth.login_required for that whole method, which would force even the methods that don't require authentication to be authenticated.
So, my question is three-fold:
Am I right in structuring all three actions mentioned under the same path/ is this important?
If I am right, and this is important, how do I require authentication only for the one type of PATCH?
If this is not important, what's a better or simpler way to structure this API?
I see several problems with your design.
let's say I'm building an app that let's everyone query how many times a resource has been clicked on and how many times its page has been viewed
Hmm. This isn't really a good REST design. You can't have clients query select "properties" of resources, only the resources themselves. If your resource is a "page", then a GET request to /pages/some_page_name should return something like this (in JSON):
{
'url': 'http://example.com/api/pages/some_page_name',
'clicks': 35,
'page_views': 102,
<any other properties of a page resource here>
}
It also let's people do an update on the resource in javascript saying the resource was clicked again
"clicking something" is an action, so it isn't a good REST model. I don't know enough about your project so I can be wrong, but I think the best solution for this is to let the user click the thing, then the server will receive some sort of a request (maybe a GET to obtain the resource that was clicked?). The server is then in a position to increment the clicks property of the resource on its own.
(unauthenticated, since it's coming from the front-end).
This can be dangerous. If you allow changes to your resources from anybody, then you are open to attacks, which may be a problem. Nothing will prevent me from looking at your Javascript and reverse engineering your API, and then send bogus requests to artificially change the counters. This may be an acceptable risk, but make sure you understand this may happen.
It additionally let's an authenticated backend increment the number of times the page has been viewed.
Backend? Is this a client or a server? Sounds like it should be a client. Once again, "incrementing" is not a good match for REST type APIs. Let the server manage the counters based on the requests it receives from clients.
Assuming I understand what you are saying, it seems to me you only need to support GET. The server can update these counters on its own as it receives requests, clients do not need to bother with that.
UPDATE: After some additional info provided in the comments below, what I think you can do to be RESTful is to also implement a PUT request (or PATCH if you are into partial resource updates).
If you do a PUT, then the client will send the same JSON representation above, but it will increment the corresponding counter. You could add validation in the server to ensure that the counters are incremented sequentially, and return a 400 status code if it finds that they are not (maybe this validation is skipped for certain authenticated users, up to you). For example, starting from the above example, if you need to increment the clicks (but not the page views), then send a PUT request with:
{
'url': 'http://example.com/api/pages/some_page_name',
'clicks': 36,
'page_views': 102
}
If you are using PATCH, then you can remove the items that don't change:
{
'clicks': 36
}
I honestly feel this is not the best design for your problem. You have very specific client and server here, that are designed to work with each other. REST is a good design for decoupled clients and servers, but if you are on both sides of the line then REST doesn't really give you a lot.
Now regarding your authentication question, if your PUT/PATCH needs to selectively authenticate, then you can issue the HTTP Basic authentication exchange only when necessary. I wrote the Flask-HTTPAuth extension, you can look at how I implemented this exchange and copy the code into your view function, so that you can issue it only when necessary.
I hope this clarifies things a bit.
I'm looking to see if more of the seasoned web service veterans can comment on the best way to design a RESTful URI in where I need mandatory parameters. Case in point, I'd like to design an URI that requests data:
example.com/request/distribution
However, from my understanding is that the approach is that more data should return at the higher levels while more detailed data would be returned if applying more specific URI keywords but in my case, I need at least 3 values for that to happen. Those 3 values would be a date value, an account value and proprietary distribution code values. For example:
example.com/request/distribution?acct=123&date=20030102&distcode=1A;1B;1C
Is that considered an "RESTful" URL or is there a better approach that would make more sense? Any input is greatly appreciated.
BTW, Python is the language of choice. Thanks!
URI's cannot, by definition, be "unRESTful" of themselves because the URI specification was guided by the REST architectural style. How you use a URI can violate the REST style by:
Not following the "client-server" constraint; for example, by using WebSockets to implement server push.
Not following the "identification of resources" constraint; for example, using a portion of the URI to specify control data or resource metadata rather than stick to identifying a resource, or by identifying resource via some mechanism other than the URI (like session state or other out-of-band mechanisms).
Not following the "manipulation of resources through representations" constraint; for example, by using the querystring portion of a URI to transfer state.
Not following the "self-descriptive messages" constraint; for example, using HTTP GET to modify state, or transferring JSON with a Content-Type of "text/html".
Not following the "hypermedia as the engine of application state" constraint; for example, not providing the user agent hyperlinks to follow, but instead assuming it will construct them using out-of-band knowledge.
Not following the "layered system" constraint, by requiring the client to know details about the innards of how the server works (especially requiring the client to provide them in a request).
None of the above are necessarily bad choices. They might be the best choice for your system because they foster certain architectural properties (such as efficiency or security) . They're just not part of the REST style.
The fact that your resource is identified by multiple mandatory segments is part and parcel of the design of URI's. As Anton points out, the choice between example.com/request/distribution?acct=123&date=20030102&distcode=1A;1B;1C and, say, example.com/accounts/123/distributions/20030102/1A;1B;1C is purely one of data design, and not a concern at the URI layer itself. There is nothing wrong, for example, with responding to a PUT, POST, or DELETE request to the former. A client which failed to follow a link to either one would be considered broken. A system which expected either one to be made available to the client by some means other than a hypermedia response would be considered "unRESTful".
It's better to go about creating RESTful API in terms of resources first, not URIs. It has more to do with your data design than, say, with your language of choice.
E.g., you have a Distribution resource. You want to represent it in your web-based API, so it needs to have an appropriate unique resource identifier (URI). It should be simple, readable, and unlikely to change. This would be a decent example:
http://example.com/api/distribution/<some_unique_id>
Think twice before putting more things and hierarchy into your URIs.
You don't want to change your URIs as your data model or authentication scheme evolve. Changing URIs is uncool and pain for you and developers that use your API. So, if you need to pass authentication to the back-end, you probably should use GET parameters or HTTP headers (AWS S3 API, for example, allows both).
Putting too much into GET parameters (e.g., http://example.com/api/distribution/?id=<some_unique_id>) may seem like a bad idea, but IMO it doesn't really matter[0]—as long as you keep your API documentation accessible and up-to-date.
[0] Update: For read-only APIs, at least. For CRUD APIs, as #daniel has pointed out, it's more convenient when you have endpoints like in the first example above. That way you can nicely use HTTP methods by enabling GET, PUT, DELETE for individual resources at /api/distribution/<id>, and POST to /api/distribution to create new distributions.
While researching the answer, found a nice presentation about RESTful APIs: Designing HTTP Interfaces and RESTful Web Services.
The RESTful way is to represent the data as a resource, not parameters to a request:
example.com/distribution/123/20030102/1A;1B;1C
When you think about RESTful, most of the times you also should think about CRUD.
example.com/request/distribution?acct=123&date=20030102&distcode=1A;1B;1C
is fine for a GET-Request to show something (The R in CRUD).
But what URLs do you consider for the CUD-Parts?
A client wants to ensure that I cannot read sensitive data from their site, which will still be administered by me. In practice, this means that I'll have database access, but it can't be possible for me to read the contents of certain Model Fields. Is there any way to make the data inaccessible to me, but still decrypted by the server to be browsed by the client?
This is possible with public key encryption. I have done something similar before in PHP but the idea is the same for a Django app:
All data on this website was stored encrypted using a private key held by the system software. The corresponding public key to decrypt the data was held by the client in a text file.
When the client wanted to access their data, they pasted the public key into an authorisation form (holding the key in the session) which unlocked the data.
When done, they deauthorised their session.
This protected the information against authorised access to the web app (so safe against weak username/passwords) and also from leaks at the database level.
This is still not completely secure: if you have root access to the machine you can capture the key as it is uploaded, or inspect the session information. For that the cure could be to run the reading software on the client's machine and access the database through an API.
I realise this is an old question but I thought I'd clarify that it is indeed possible.
No, it's not possible to have data that is both in a form you can't decrypt it, and in a form where you can decrypt it to show it to the client simultaneously. The best you can do is a reversible encryption on the content so at least if your server is compromised their data is safe.
Take a look at Django-fields
You might find Django Encrypted Fields useful.
You and your client could agree on them being obscured. A simple XOR operation or something similar will make the values unreadable in the admin and they can be decoded just in time they are needed in the site.
This way you can safely administer the site without "accidentally" reading something.
Make sure your client understands that it is technically possible for you to get the actual contents but that it would require active effort.
Some other issues to consider are that the web application will then not be able to sort or easily query on the encrypted fields. It would be helpful to know what administrative functions the client wants you to have. Another approach would be to have a separate app / access channel that does not show the critical data but still allows you to perform your admin functions only.