loading from app.yaml as a wsgi application

loading from app.yaml as a wsgi application - python

I couldn't immediately determine if this was possible by reading the source of the sdk.
But, is there a way to get a wsgi version of the application that dev_appserver would load from app.yaml?
I was so hoping there would be a function like
def app_from_yaml(path_to_yaml):
...
If this existed I could actually write automated tests for the blobstore logic and not have to do the crap manually anymore. Any Ideas?

I'm not aware of any solution that does what you want. I strongly suspect the reason is that dev_appserver does a lot of things when loading an application, including parsing the various yaml files, set up routing, stub out APIs (both, App Engine and Python), restrict the environment to emulate appserver, and so on. A function app_from_yaml(path_to_yaml) would have to do what dev_appserver.py does. And since dev_appserver.py does it already, I think no-one bothered to add another implementation.
I see 2 ways to solve your problem.
Make the blobstore API more testable
start your app with dev_appserver and run tests against it
The former one is rather difficult because it would require to refactor how things are done currently which risks the introduction of subtle regressions. The latter is something we do a lot for such large tests (which are really integration tests). We use gaedriver for that.
In your specific case where you want to test blobstore (which we do, too), we start an app from within a test, upload a blob using a certain URL and then hit another and check if the blob was processed correctly. It's not as nice as using testbed, but it works, is fairly straight-forward and fairly fast.

I'm not 100% sure of what you're asking, but the answer may lie in google.appengine.ext.webapp.util.run_wsgi_app.
In terms of the blobstore itself, there is already google.appengine.api.blobstore.blobstore_stub which can be used to test against the Blobstore (though I don't really understand what "manually" means in your context, so maybe that's not helpful).

Related

Should I take steps to ensure a Django app can scale before writing it?

So, I'm looking at writing an app with python2 django(-rest-framework), postgres and angular.
I'm aware there are lots of things that can be done
multi-server setup behind load balancer
DB replication/sharding?
caching (in various ways)
swapping DRF serialiser for serpy
running on python3
running on pypy
my question is - Which of these (or other things) should really be done right at the start of the project?

Write with scalability in mind.
scaliability is not limited just to production servers/environments but also to development environments.
Always write with scalability in mind.
At development
Scalability at development let you develop the product seemlessly.
Structure your repository
Use git branching models like GitFlow so that developers can work on parallel, or a single developer can switch working on different features. Use a bug tracker.
Design your apps.
Before actually writing a single line of code, write down what apps you are going to write. Design apps as to minimize Relations (ManyToMany, ForeignKey, etc..), imports. Django offers pulggable app architecture feel free to use it wisely.
Write your tests first.
This ensures that you can migrate(production environments), upgrade and downgrade with less pain and hairloss. Trust me writing tests feels so boring, but it is worth to.
Abstract models, managers
Use Abstract Models and Managers, it can eliminate bolier plate model code and help you maintain the code.
Name variables, classes and methods descriptive.
Name the variables, classes and methods descriptive, as you would be able to know what it represents without looking documentation.
Document code.
Feel free to document classes and methods, so you or other peers who look into the code get an idea of what it is indented for than stacktracing to see what a method is doing.
Use debug toolbar
Use django debug toolbar on developement, as you test your API, use prefectch_related() and select_related() to minimize/eliminate duplicated queries.
Modularize code.
Modularize the code. Python and django in general encourage use of modules. Modules are easy to manage. Use classes, more inheritance and abstract base classes to reuse code.
Use Continuous Integration
use continuous integration test your repo and make sure new pushes don't break the system.
At production.
scalability at production let you serve the product to infinite users seemlessly.
For multi server setup
Stick with rest design principles.
Eliminate sessions.
Use distributed cache like Redis.
Swapping DRF serializer for serpy
Start with serpy if you need more speed and if you are comfortable with. It is better to stick with Serpy than rewriting DRF serializer as writing both looks smiliar, but make sure you are not wasting time by optimizing for the lost 1 or 2ms.
Running on python3
Depends on the libraries that you plan to use.
Running on pypy
pypy is faster than the standard implementations. It depends on the library compatibility to use pypy. A list of compatibile package and status of compatibility.
now the question,
Which of these (or other things) should really be done right at the start of the project?
Ans: Developemet (1,2,3,4,5,6,7,8) Production(1,2)

I don't think you need to start worrying about the setup right away. I would discourage premature optimizations. Rather, run the app in production, profile it. See what affects the performance when you hit scale - you would know what's the bottleneck.

The first and main things you have to get right are a clean and correct db schema and clear, readable and correctly factored (DRY... unless it's accidental duplication) and decoupled code. If you know to design a relational DB schema and learn to use Python and Django properly you shouldn't have much problems so far, and if you get both these things right it will (well it should) be easy to scale - by adding cache where needed (Redis, Memcache, or an intermediary NoSQL document database storing "pre-processed" versions of your often accessed data), adding servers, load-balancing etc, depending on your application's needs. Django is built to scale easily, and unless you do stupid things it does scale easily.

How do I make a web server to make timed multiple choice tests?

I'd like to make a webapp that asks people multiple choice questions, and times how long they take to answer.
I'd like those who want to, to be able to make accounts, and to store the data for how well they've done and how their performance is increasing.
I've never written any sort of web app before, although I'm a good programmer and understand how http works.
I'm assuming (without evidence) that it's better to use a 'framework' than to hack something together from scratch, and I'd appreciate advice on which framework people think would be most appropriate.
I hope that it will prove popular, but would rather get something working than spend time at the start worrying about scaling. Is this sane?
And I'd like to be able to develop and test this on my own machine, and then deploy it to a virtual server or some other hosting solution.
I'd prefer to use a language like Clojure or Lisp or Haskell, but if the advantages of using, say, Python or Ruby would outweigh the fact that I'd enjoy it more in a more maths-y language, then I like both of those too.
I probably draw the line at perl, but if perl or even something like Java or C have compelling advantages then I'm quite happy with them too. They just don't seem appropriate for this sort of thing.

I'd prefer to use a language like Clojure or Lisp or Haskell
If you're comfortable with Haskell your option is Yesod (I love it).
If not, what is your priority?
learn, enjoy and productivity -> then your option is Haskell + Yesod.
productivity quickly -> Python, Ruby, ...

I can recommend to start with Flask. It's very easy.
from flask import Flask
app = Flask(__name__)
#app.route("/")
def hello():
return "Hello World!"
if __name__ == "__main__":
app.run()
with templates (jinja,mako, etc.) you can easily create dynamic web pages: http://flask.pocoo.org/docs/templating/
An easy solution for your time calculation problem would probably be to put the server time in the form for the question and calculate the delta when getting the response back.
EDIT : Oh, and another thing, you can deploy a Flask app relatively easily on Heroku, which is a free cloud application platform.

When the server-side creates the form, encode an hidden field with the timestamp of the request, so when the users POSTs his form, you can see the time difference.
How to implement that is up to you, which server you have available, and several other factors.

As far as Haskell web frameworks go, I would also recommend Happstack, but Yesod is itself fine software with perhaps a stronger support base. In either case, do not be talked out of Haskell; I wrote my own blog engine (presently running The Ambulatory Sesquipedalian) using Happstack and found no reason to believe that Haskell is unsuited to these tasks. Local testing followed by deployment to a VPS is natural for both--they will run on about anything you can install the Haskell Platform on.
As far as the timing goes: however you implement it, be careful that users cannot fake times. Since you seem to be committed to user accounts, I would recommend having users check out tests, leaving a timestamped entry in their profile of the time they started the test and checking against that when they submit their answers. Solutions that use client-side state are also possible, but remember to protect against client manipulation of data (editing cookies or manually constructing POSTs with hidden fields altered)--you would probably need to somehow encrypt the time (it could still be altered, but not to a user-chosen value).

Difference between django-webtest and selenium

I have been reading about testing in django. One thing that was recommended was use of django-webtest for functional testing. I found a decent article here that teaches how to go about functional testing in selenium using python. But people have also recommended Ian Bicking's WebTest's extension djagno-webtest to use for testing forms in django. How is testing with webtest and testing with selenium different in context of django forms?
So from functional testing point of view:
How does django-webtest and selenium go side by side?
Do we need to have both of them or any one would do?

The key difference is that selenium runs an actual browser, while WebTest hooks to the WSGI.
This results in the following differences:
You can't test JS code with WebTest, since there is nothing to run it.
WebTest is much faster since it hooks to the WSGI, this also means a smaller memory footprint
WebTest does not require to actually run the server on a port so it's a bit easier to parallize
WebTest does not check different problems that occur with actual browsers, like specific browser version bugs (cough.. internet explorer.. cough..)
Bottom line:
PREFER to use WebTest, unless you MUST use Selenium for things that can't be tested with WebTest.

The important thing to know about Selenium is that it's primarily built to be a server-agnostic testing framework. It doesn't matter what framework or server-side implementation is used to create the front-end as long as it behaves as expected. Also, while you can (and when possible you probably should) write tests manually in Selenium, many tests are recorded macros of someone going through the motions that are then turned into code automatically.
On the other hand, django-webtest is built to work specifically on Django websites. It's actually a Django-specific extension to WebTest, which is not Django-only, but WSGI-only (and therefore Python-only). Because of that, it can interact with the application with a higher level of awareness of how things work on the server. This can make running tests faster and can also makes it easy to write more granular, detailed tests. Also, unlike Selenium, your tests can't be automatically written as recorded macros.
Otherwise, the two tools have generally the same purpose and are intended to test the same kinds of things. That said, I would suggest picking one rather than using both.

Testing for external resource consistency / skipping django tests

I'm writing tests for a Django application that uses an external data source. Obviously, I'm using fake data to test all the inner workings of my class but I'd like to have a couple of tests for the actual fetcher as well. One of these will need to verify that the external source is still sending the data in the format my application expects, which will mean making the request to retrieve that information in the test.
Obviously, I don't want our CI to come down when there is a network problem or the data provider has a spot of downtime. In this case, I would like to throw a warning that skips the rest of that test method and doesn't contribute to an overall failure. This way, if the data arrives successfully I can check it for consistency (failing if there is a problem) but if the data cannot be fetched it logs a warning so I (or another developer) know to quickly check that the data source is ok.
Basically, I would like to test my external source without being dependant on it!
Django's test suite uses Python's unittest module (at least, that's how I'm using it) which looks useful, given that the documentation for it describes Skipping tests and expected failures. This feature is apparently 'new in version 2.7', which explains why I can't get it to work - I've checked the version of unittest I have installed on from the console and it appears to be 1.63!
I can't find a later version of unittest in pypi so I'm wondering where I can get hold of the unittest version described in that document and whether it will work with Django (1.2).
I'm obviously open to recommendations / discussion on whether or not this is the best approach to my problem :)
[EDIT - additional information / clarification]
As I said, I'm obviously mocking the dependancy and doing my tests on that. However, I would also like to be able to check that the external resource (which typically is going to be an API) still matches my expected format, without bringing down CI if there is a network problem or their server is temporarily down. I basically just want to check the consistency of the resource.
Consider the following case...
If you have written a Twitter application, you will have tests for all your application's methods and behaviours - these will use fake Twitter data. This gives you a complete, self-contained set of tests for your application. The problem is that this doesn't actually check that the application works because your application inherently depends on the consistency of Twitter's API. If Twitter were to change an API call (perhaps change the URL, the parameters or the response) the application would stop working even though the unit tests would still pass. (Or perhaps if they were to completely switch off basic authentication!)
My use case is simpler - I have a single xml resource that is used to import information. I have faked the resource and tested my import code but I would like to have a test that checks the format of that xml resource has not changed.
My question is about skipping tests in Django's test runner so I can throw a warning if the resource is unavailable without the tests failing, specifically getting a version of Python's unittest module that supports this behaviour. I've given this much background information to allow anyone with experience in this area to offer alternative suggestions.
Apologies for the lengthy question, I'm aware most people won't read this now.
I've 'bolded' the important bits to make it easier to read.

I created a separate answer since your edit invalidated my last answer.
I assume you're running on Python version 2.6 - I believe the changes that you're looking for in unittest are available in Python version 2.7. Since unittest is in the standard library, updating to Python 2.7 should make those changes available to you. Is that an option that will work for you?
One other thing that I might suggest is to maybe separate the "external source format verification" test(s) into a separate test suite and run that separately from the rest of your unit tests. That way your core unit tests are still fast and you don't have to worry about the external dependencies breaking your main test suites. If you're using Hudson, it should be fairly easy to create a separate job that will handle those tests for you. Just a suggestion.

The new features in unittest in 2.7 have been backported to 2.6 as unittest2. You can just pip install and substitute unittest2 for unittest and your tests will work as thyey did plus you get the new features without upgrading to 2.7.

What are you trying to test? The code in your Django application(s) or the dependency? Can you just Mock whatever that external dependency is? If you're just wanting to test your Django application, then I would say Mock the external dependency, so your tests are not dependent upon that external resource's availability.
If you can post some code of your "actual fetcher" maybe you will get some tips on how you could use mocks.

How would one make Python objects persistent in a web-app?

I'm writing a reasonably complex web application. The Python backend runs an algorithm whose state depends on data stored in several interrelated database tables which does not change often, plus user specific data which does change often. The algorithm's per-user state undergoes many small changes as a user works with the application. This algorithm is used often during each user's work to make certain important decisions.
For performance reasons, re-initializing the state on every request from the (semi-normalized) database data quickly becomes non-feasible. It would be highly preferable, for example, to cache the state's Python object in some way so that it can simply be used and/or updated whenever necessary. However, since this is a web application, there several processes serving requests, so using a global variable is out of the question.
I've tried serializing the relevant object (via pickle) and saving the serialized data to the DB, and am now experimenting with caching the serialized data via memcached. However, this still has the significant overhead of serializing and deserializing the object often.
I've looked at shared memory solutions but the only relevant thing I've found is POSH. However POSH doesn't seem to be widely used and I don't feel easy integrating such an experimental component into my application.
I need some advice! This is my first shot at developing a web application, so I'm hoping this is a common enough issue that there are well-known solutions to such problems. At this point solutions which assume the Python back-end is running on a single server would be sufficient, but extra points for solutions which scale to multiple servers as well :)
Notes:
I have this application working, currently live and with active users. I started out without doing any premature optimization, and then optimized as needed. I've done the measuring and testing to make sure the above mentioned issue is the actual bottleneck. I'm sure pretty sure I could squeeze more performance out of the current setup, but I wanted to ask if there's a better way.
The setup itself is still a work in progress; assume that the system's architecture can be whatever suites your solution.

Be cautious of premature optimization.
Addition: The "Python backend runs an algorithm whose state..." is the session in the web framework. That's it. Let the Django framework maintain session state in cache. Period.
"The algorithm's per-user state undergoes many small changes as a user works with the application." Most web frameworks offer a cached session object. Often it is very high performance. See Django's session documentation for this.
Advice. [Revised]
It appears you have something that works. Leverage to learn your framework, learn the tools, and learn what knobs you can turn without breaking a sweat. Specifically, using session state.
Second, fiddle with caching, session management, and things that are easy to adjust, and see if you have enough speed. Find out whether MySQL socket or named pipe is faster by trying them out. These are the no-programming optimizations.
Third, measure performance to find your actual bottleneck. Be prepared to provide (and defend) the measurements as fine-grained enough to be useful and stable enough to providing meaningful comparison of alternatives.
For example, show the performance difference between persistent sessions and cached sessions.

I think that the multiprocessing framework has what might be applicable here - namely the shared ctypes module.
Multiprocessing is fairly new to Python, so it might have some oddities. I am not quite sure whether the solution works with processes not spawned via multiprocessing.

I think you can give ZODB a shot.
"A major feature of ZODB is transparency. You do not need to write any code to explicitly read or write your objects to or from a database. You just put your persistent objects into a container that works just like a Python dictionary. Everything inside this dictionary is saved in the database. This dictionary is said to be the "root" of the database. It's like a magic bag; any Python object that you put inside it becomes persistent."
Initailly it was a integral part of Zope, but lately a standalone package is also available.
It has the following limitation:
"Actually there are a few restrictions on what you can store in the ZODB. You can store any objects that can be "pickled" into a standard, cross-platform serial format. Objects like lists, dictionaries, and numbers can be pickled. Objects like files, sockets, and Python code objects, cannot be stored in the database because they cannot be pickled."
I have read it but haven't given it a shot myself though.
Other possible thing could be a in-memory sqlite db, that may speed up the process a bit - being an in-memory db, but still you would have to do the serialization stuff and all.
Note: In memory db is expensive on resources.
Here is a link: http://www.zope.org/Documentation/Articles/ZODB1

First of all your approach is not a common web development practice. Even multi threading is being used, web applications are designed to be able to run multi-processing environments, for both scalability and easier deployment .
If you need to just initialize a large object, and do not need to change later, you can do it easily by using a global variable that is initialized while your WSGI application is being created, or the module contains the object is being loaded etc, multi processing will do fine for you.
If you need to change the object and access it from every thread, you need to be sure your object is thread safe, use locks to ensure that. And use a single server context, a process. Any multi threading python server will serve you well, also FCGI is a good choice for this kind of design.
But, if multiple threads are accessing and changing your object the locks may have a really bad effect on your performance gain, which is likely to make all the benefits go away.

This is Durus, a persistent object system for applications written in the Python
programming language. Durus offers an easy way to use and maintain a consistent
collection of object instances used by one or more processes. Access and change of a
persistent instances is managed through a cached Connection instance which includes
commit() and abort() methods so that changes are transactional.
http://www.mems-exchange.org/software/durus/
I've used it before in some research code, where I wanted to persist the results of certain computations. I eventually switched to pytables as it met my needs better.

Another option is to review the requirement for state, it sounds like if the serialisation is the bottle neck then the object is very large. Do you really need an object that large?
I know in the Stackoverflow podcast 27 the reddit guys discuss what they use for state, so that maybe useful to listen to.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.