Serving all REST requests over GET with Tornado - python

I have a REST (or almost REST) web api,
I want the API users to be able to use all the api, even if for some reason the can only make GET calls, so the plan is to accept a url parameter (query string) like request_method that can be GET (default) or POST, PUT, DELETE and I want to route them.
My question is other than the standard request handler overrides and checking in each httpRequestHandler in the get(self) method if this is meant to be a POST, PUT, DELETE and calling the appropriate functions, is there a way to do this "routing" in a more general way, like in the URL patterns in the application definition or overriding a routing function or something?
To make it clear, these requests are all coming over GET with a parameter for example like ?request_method=POST
Any suggestions is appreciated.
Possible Solutions:
only have a ".*" url pattern and handle all the routing in a single RequestHandler. Should work fine, except that I won't be taking advantage of the url pattern matching features of Tornado.
add an if to all the get(self) methods in all the request handlers and check if the request should be handled by get if not, call the relevant method.

This would be a very foolish thing to do. Both Chrome and Firefox, along with many other web user agents, will speculatively fetch (GET) some or all of the links on a page, including your request_method=DELETE URLs. You will find your database has been emptied out just because someone was looking around. Do not deliberately break HTTP. GET is defined to be a "safe" method, meaning it's okay to GET any URL you like and nothing bad will happen.
EDIT for others in similar situations:
The OP says he is using JSONP and is in control of both the API server and the client web app. In such a case the ideal solution is Cross-Origin Resource Sharing (CORS, spec), although this technology requires IE8+, Firefox 3.5+, Safari 4+ or Chrome 3+. If you need to target earlier browsers, and you control both domains, I would recommend merging the content of the two domains at least for your own client web app. The api domain can remain for external clients, but they would be restricted by the CORS browser requirements.

Related

Do I need a regex to determine if a url is a dynamic url? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I want to create a function to determine if a well-formed url was dynamically generated (according to this article https://www.webopedia.com/TERM/D/dynamic_URL.html)
My first attempt was to check if any of these characters appear in the url:
ex:
def is_dynamic_url(url):
for ch in ["?", "&", "%", "+", "=", "$", "cgi-bin", ".cgi"]:
if ch in url:
return True
Is this sufficient or are there edge cases I am not considering?
You can't determine if a URL is 'dynamic' from the characters in the string. Web servers have moved way beyond CGI scripts served from a hard-coded URL path. Even when the article was current, it was never more than a weak heuristic.
A URL is simply an address for a resource; the acronym stands for Universal Resource Locator. When the URL starts with http: or https:, you have a URL for a web page, but URLs can address far more than just web pages.
For the type of URLs that article talks about, a client (your browser, say) will use the first portion, between // and / to connect to a specific server to exchange messages using the HTTP standard. The client sends everything after the host information (the path component) to the server. For this question. the full URL shown in the browser is https://stackoverflow.com/questions/53230441/do-i-need-a-regex-to-determine-if-a-url-is-a-dynamic-url, so the browser uses an encrypted connection to the server named stackoverflow.com, and sends it a request to serve the /questions/53230441/do-i-need-a-regex-to-determine-if-a-url-is-a-dynamic-url path.
How the server responds is entirely up to the server. A HTTP server is essentially a black box in this exchange. It can do whatever it likes with the information, and within the broad confines of the HTTP standard, it can produce a response by any means it likes.
In the very early days of the web, the HTTP server would only ever map the path given directly to a filesystem. For example, based on the server configuration the path /foo/bar/baz.html would be mapped to the filename /var/data/www/foo/bar/baz.html file, and if it existed the server would read the contents of that file and return those contents back to the client together some metadata, and that was it. If you wanted to customise this process, you either wrote your own HTTP server or used some kind of extension mechanism specific to the web server. The NCSA web server had a different mechanism from the Netscape server which differed from the Apache HTTP server, etc. Not many sites needed this kind of processing, computers powerful enough to run databases were expensive, and programming such exotic behaviour took a lot of time and specialist knowledge.
Then the NCSA HTTP server implementation created a standard for delegating a HTTP request to arbitrary programs (such as scripts), called the Common Gateway Interface, or CGI, and because everything was still centered on mapping URL paths to files, web site administrators were expected to put CGI programs in a dedicated directory, usually named CGI-bin. A path starting with that name would then be mapped to such a configured location and instead of reading files found there and serving them back, the file would instead be executed and the result that the file produced was passed back. For a while that was the most common way to build a website that didn't consist of just static files.
And to let you pass information from the client to the server, the most common way to configure a CGI program is to use additional information in the URL, such as the query string (the part starting at ? if there is one). The standard HTTP server of yore did not let you alter the URL path for a CGI script much, but would pass through the query string unaltered. So adding ?foo=1&bar=2 is a good way to configure such a script.
And that's the kind of URL that article refers to; it gives you a simple heuristic for judging if a URL might map to a CGI script and so might be called dynamic. It was never meant to be a hard and fast rule that you can teach a computer to look for though.
These days, we have moved far, far beyond CGI scripts. Modern HTTP web servers make it really easy to map every request, regardless of path to a (set) of long-running processes, or are themselves directly handling requests via embedded programming language support. For example, Stack Overflow itself is built using the ASP.NET framework, which runs directly in the Microsoft IIS HTTP server. Every page you see on this site is 'dynamic' in that it shows you information that is combined from different sources (databases, configuration files, templates stored on disk, etc.). The /questions/53230441/do-i-need-a-regex-to-determine-if-a-url-is-a-dynamic-url path is dissected by the Stack Overflow application map to dedicated pieces of code configured to handle patterns in the URL. A path that starts with /questions/ and a series of digits, followed by / and more text, results in database queries for information on the question with number 53230441.
It's trivial these days to build such a site yourself. Take a look at a simple web framework like Flask for example. With Python and the Flask library installed, I can put
from flask import Flask
app = Flask(__name__)
#app.route('/')
#app.route('/<name>')
def hello_world(name='World'):
return f'Hello, {name}!'
into a file named site.py, execute the command FLASK_APP=site flask run and point my browser to the URL http://localhost:5000/ and see the text Hello, World! appear, or load http://localhost:5000/Han instead and see Hello, Han! in the browser. Those are dynamic URLs too!
Note: I haven't even touched on using JavaScript in the web browser here, which adds a whole new level of dynamism, where the client is now smart and can change the behaviour of web pages, load additional URLs in the background and keep changing web page content all the time.
All this means that you can’t tell much, if anything, from just the characters in a URL anymore as to what it’ll produce or if that result was built ”dynamically”.

Structuring RESTful API in Flask when some methods require authentication and some don't

I'm making a new RESTful API in Flask that should accept both GET (for requesting the resource) and PATCH (for performing various incremental, non-idempotent updates) for a given object. The thing is that some of the data that's patched in must be authenticated, and some shouldn't be.
To clarify this with an example, let's say I'm building an app that let's everyone query how many times a resource has been clicked on and how many times its page has been viewed. It also let's people do an update on the resource in javascript saying the resource was clicked again (unauthenticated, since it's coming from the front-end). It additionally let's an authenticated backend increment the number of times the page has been viewed.
So, following RESTful principles, I'm thinking all three actions should be done on the same path-- something like /pages/some_page_name which should accept both GET and PATCH and should accept two different kinds of data with PATCH. The problem is that in Flask, it looks like authentication is always done with a decorator around a method, so if I have a method like #app.route('/pages/<page_id>', methods=['GET', 'PATCH']), my authentication would be done with a decorator like #auth.login_required for that whole method, which would force even the methods that don't require authentication to be authenticated.
So, my question is three-fold:
Am I right in structuring all three actions mentioned under the same path/ is this important?
If I am right, and this is important, how do I require authentication only for the one type of PATCH?
If this is not important, what's a better or simpler way to structure this API?
I see several problems with your design.
let's say I'm building an app that let's everyone query how many times a resource has been clicked on and how many times its page has been viewed
Hmm. This isn't really a good REST design. You can't have clients query select "properties" of resources, only the resources themselves. If your resource is a "page", then a GET request to /pages/some_page_name should return something like this (in JSON):
{
'url': 'http://example.com/api/pages/some_page_name',
'clicks': 35,
'page_views': 102,
<any other properties of a page resource here>
}
It also let's people do an update on the resource in javascript saying the resource was clicked again
"clicking something" is an action, so it isn't a good REST model. I don't know enough about your project so I can be wrong, but I think the best solution for this is to let the user click the thing, then the server will receive some sort of a request (maybe a GET to obtain the resource that was clicked?). The server is then in a position to increment the clicks property of the resource on its own.
(unauthenticated, since it's coming from the front-end).
This can be dangerous. If you allow changes to your resources from anybody, then you are open to attacks, which may be a problem. Nothing will prevent me from looking at your Javascript and reverse engineering your API, and then send bogus requests to artificially change the counters. This may be an acceptable risk, but make sure you understand this may happen.
It additionally let's an authenticated backend increment the number of times the page has been viewed.
Backend? Is this a client or a server? Sounds like it should be a client. Once again, "incrementing" is not a good match for REST type APIs. Let the server manage the counters based on the requests it receives from clients.
Assuming I understand what you are saying, it seems to me you only need to support GET. The server can update these counters on its own as it receives requests, clients do not need to bother with that.
UPDATE: After some additional info provided in the comments below, what I think you can do to be RESTful is to also implement a PUT request (or PATCH if you are into partial resource updates).
If you do a PUT, then the client will send the same JSON representation above, but it will increment the corresponding counter. You could add validation in the server to ensure that the counters are incremented sequentially, and return a 400 status code if it finds that they are not (maybe this validation is skipped for certain authenticated users, up to you). For example, starting from the above example, if you need to increment the clicks (but not the page views), then send a PUT request with:
{
'url': 'http://example.com/api/pages/some_page_name',
'clicks': 36,
'page_views': 102
}
If you are using PATCH, then you can remove the items that don't change:
{
'clicks': 36
}
I honestly feel this is not the best design for your problem. You have very specific client and server here, that are designed to work with each other. REST is a good design for decoupled clients and servers, but if you are on both sides of the line then REST doesn't really give you a lot.
Now regarding your authentication question, if your PUT/PATCH needs to selectively authenticate, then you can issue the HTTP Basic authentication exchange only when necessary. I wrote the Flask-HTTPAuth extension, you can look at how I implemented this exchange and copy the code into your view function, so that you can issue it only when necessary.
I hope this clarifies things a bit.

nodejs or python proxy lib with relative url support

I am looking for a lib that lets me roughly:
connect to localhost:port, but see http://somesite.com
rewrite all static assets to point to localhost:port instead of somesite.com
support cookies / authentication
i know that http://betterinternet.co/ does this already, but they wont give me their source code for some reason.
I assume this doesnt exist as free code, so if i were to write one, are there any nuances to it? If i replace all occurrences of somesite.com in html and headers, will that be enough?
So...you want an http proxy that does link rewriting? Sounds like Apache and mod_proxy_html. It's not written in node or Python, but I think it will do what you're asking.
I don't see any straight forward solution to your problem. If I've understood correctly, you want a caching HTTP proxy which serves static contents locally, with URL rewriting rules defined in Python (or nodejs). That's quite a task.
A caching HTTP proxy implementation is not trivial. So I'd use an existing implementation, such as Squid (or Apache if it does caching too).
You could then place a (relatively) simple HTTP server written in Python in front of that (e.g. based on BaseHTTPServer and urllib2) which performs the URL rewriting as you want them and forwards the requests to the proxy (or direct to internet).
The idea would be to rely on the proxy setup to perform all the processing you don't want to modify (including basic rewrite rules, authentication, caching and cache management) and limit your front-end implementation to performing only the custom rewriting you are interested in.

What's the best design for a RESTful URI with multiple mandatory parameters?

I'm looking to see if more of the seasoned web service veterans can comment on the best way to design a RESTful URI in where I need mandatory parameters. Case in point, I'd like to design an URI that requests data:
example.com/request/distribution
However, from my understanding is that the approach is that more data should return at the higher levels while more detailed data would be returned if applying more specific URI keywords but in my case, I need at least 3 values for that to happen. Those 3 values would be a date value, an account value and proprietary distribution code values. For example:
example.com/request/distribution?acct=123&date=20030102&distcode=1A;1B;1C
Is that considered an "RESTful" URL or is there a better approach that would make more sense? Any input is greatly appreciated.
BTW, Python is the language of choice. Thanks!
URI's cannot, by definition, be "unRESTful" of themselves because the URI specification was guided by the REST architectural style. How you use a URI can violate the REST style by:
Not following the "client-server" constraint; for example, by using WebSockets to implement server push.
Not following the "identification of resources" constraint; for example, using a portion of the URI to specify control data or resource metadata rather than stick to identifying a resource, or by identifying resource via some mechanism other than the URI (like session state or other out-of-band mechanisms).
Not following the "manipulation of resources through representations" constraint; for example, by using the querystring portion of a URI to transfer state.
Not following the "self-descriptive messages" constraint; for example, using HTTP GET to modify state, or transferring JSON with a Content-Type of "text/html".
Not following the "hypermedia as the engine of application state" constraint; for example, not providing the user agent hyperlinks to follow, but instead assuming it will construct them using out-of-band knowledge.
Not following the "layered system" constraint, by requiring the client to know details about the innards of how the server works (especially requiring the client to provide them in a request).
None of the above are necessarily bad choices. They might be the best choice for your system because they foster certain architectural properties (such as efficiency or security) . They're just not part of the REST style.
The fact that your resource is identified by multiple mandatory segments is part and parcel of the design of URI's. As Anton points out, the choice between example.com/request/distribution?acct=123&date=20030102&distcode=1A;1B;1C and, say, example.com/accounts/123/distributions/20030102/1A;1B;1C is purely one of data design, and not a concern at the URI layer itself. There is nothing wrong, for example, with responding to a PUT, POST, or DELETE request to the former. A client which failed to follow a link to either one would be considered broken. A system which expected either one to be made available to the client by some means other than a hypermedia response would be considered "unRESTful".
It's better to go about creating RESTful API in terms of resources first, not URIs. It has more to do with your data design than, say, with your language of choice.
E.g., you have a Distribution resource. You want to represent it in your web-based API, so it needs to have an appropriate unique resource identifier (URI). It should be simple, readable, and unlikely to change. This would be a decent example:
http://example.com/api/distribution/<some_unique_id>
Think twice before putting more things and hierarchy into your URIs.
You don't want to change your URIs as your data model or authentication scheme evolve. Changing URIs is uncool and pain for you and developers that use your API. So, if you need to pass authentication to the back-end, you probably should use GET parameters or HTTP headers (AWS S3 API, for example, allows both).
Putting too much into GET parameters (e.g., http://example.com/api/distribution/?id=<some_unique_id>) may seem like a bad idea, but IMO it doesn't really matter[0]—as long as you keep your API documentation accessible and up-to-date.
[0] Update: For read-only APIs, at least. For CRUD APIs, as #daniel has pointed out, it's more convenient when you have endpoints like in the first example above. That way you can nicely use HTTP methods by enabling GET, PUT, DELETE for individual resources at /api/distribution/<id>, and POST to /api/distribution to create new distributions.
While researching the answer, found a nice presentation about RESTful APIs: Designing HTTP Interfaces and RESTful Web Services.
The RESTful way is to represent the data as a resource, not parameters to a request:
example.com/distribution/123/20030102/1A;1B;1C
When you think about RESTful, most of the times you also should think about CRUD.
example.com/request/distribution?acct=123&date=20030102&distcode=1A;1B;1C
is fine for a GET-Request to show something (The R in CRUD).
But what URLs do you consider for the CUD-Parts?

getting epydoc output from output during runtime?

I'm trying to create a self documenting python web service.
My Situation:
I've got an object accessible via RESTful python web service.
http://foo.com/api/monkey
And, what I'd like to do is if
there's an error during a call to http://foo.com/api/monkey like http://foo.com/api/monkey/get was called without "&monkey_id={some number}"
*or*
the web service call is made specifcally to http://foo.com/api/monkey/help
then i want it to return the formatted html epydoc output for that object (dynamically).
I've reviewed cornice, but its a pain because I don't want to use Pyramid. I really don't want to be coupled to any particular web framework to be able to do this.
My question is this: Is what I want to do, with epydoc, possible?
The first use case ("there's an error during a call") is poorly-defined. 404 errors, for example, don't result in help pages, they're perfectly ordinary.
A http://foo.com/api/bad/path/get request can't figure out which help page to send, since it didn't make a monkey request.
Also, putting /get on your path is not really very RESTful at all. Doing /monkey/get/monkey_id={some number} is considered bad form. You should consider doing /monkey/{some mnumber}/. That's considered RESTful.
There are very, very few situations where you'll want to show help.
However, there may be some kinds of error handling where you do want to show help.
For these you should provide a 301 redirect to http://foo.com/api/monkey/help instead of some other error page.
Your ordinary http://foo.com/api/monkey/help URL's should be handled by Apache (or nginx or lighttpd or whatever your web server is) to redirect to the static Epydoc-produced HTML files.

Categories

Resources