Optimization Techniques in Python

Optimization Techniques in Python - python

Recently i have developed a billing application for my company with Python/Django. For few months everything was fine but now i am observing that the performance is dropping because of more and more users using that applications. Now the problem is that the application is now very critical for the finance team. Now the finance team are after my life for sorting out the performance issue. I have no other option but to find a way to increase the performance of the billing application.
So do you guys know any performance optimization techniques in python that will really help me with the scalability issue
Guys we are using mysql database and its hosted on apache web server on Linux box. Secondly what i have noticed more is the over all application is slow and not the database transactional part. For example once the application is loaded then it works fine but if they navigate to other link on that application then it takes a whole lot of time.
And yes we are using HTML, CSS and Javascript

As I said in comment, you must start by finding what part of your code is slow.
Nobody can help you without this information.
You can profile your code with the Python profilers then go back to us with the result.
If it's a Web app, the first suspect is generally the database. If it's a calculus intensive GUI app, then look first at the calculations algo first.
But remember that perf issues car be highly unintuitive and therefor, an objective assessment is the only way to go.

ok, not entirely to the point, but before you go and start fixing it, make sure everyone understands the situation. it seems to me that they're putting some pressure on you to fix the "problem".
well first of all, when you wrote the application, have they specified the performance requirements? did they tell you that they need operation X to take less than Y secs to complete? Did they specify how many concurrent users must be supported without penalty to the performance? If not, then tell them to back off and that it is iteration (phase, stage, whatever) one of the deployment, and the main goal was the functionality and testing. phase two is performance improvements. let them (with your help obviously) come up with some non functional requirements for the performance of your system.
by doing all this, a) you'll remove the pressure applied by the finance team (and i know they can be a real pain in the bum) b) both you and your clients will have a clear idea of what you mean by "performance" c) you'll have a base that you can measure your progress and most importantly d) you'll have some agreed time to implement/fix the performance issues.
PS. that aside, look at the indexing... :)

A surprising feature of Python is that the pythonic code is quite efficient... So a few general hints:
Use built-ins and standard functions whenever possible, they're already quite well optimized.
Try to use lazy generators instead one-off temporary lists.
Use numpy for vector arithmetic.
Use psyco if running on x86 32bit.
Write performance critical loops in a lower level language (C, Pyrex, Cython, etc.).
When calling the same method of a collection of objects, get a reference to the class function and use it, it will save lookups in the objects dictionaries (this one is a micro-optimization, not sure it's worth)
And of course, if scalability is what matters:
Use O(n) (or better) algorithms! Otherwise your system cannot be linearly scalable.
Write multiprocessor aware code. At some point you'll need to throw more computing power at it, and your software must be ready to use it!

before you can "fix" something you need to know what is "broken". In software development that means profiling, profiling, profiling. Did I mention profiling. Without profiling you don't know where CPU cycles and wall clock time is going. Like others have said to get any more useful information you need to post the details of your entire stack. Python version, what you are using to store the data in (mysql, postgres, flat files, etc), what web server interface cgi, fcgi, wsgi, passenger, etc. how you are generating the HTML, CSS and assuming Javascript. Then you can get more specific answers to those tiers.

You may be interested in this document I've found some time ago.
As personal advice, be as more pythonic as you can: lazy evaluation is the keyword, so learn to use iterators and generators.

For the type of application you are describing (a web application probably backed by a database) your performance problems are unlikely to be language specific. They are far more likely to stem from design or architecture issues, though they could be simple coding problems too.
To sort this out you need to figure out where the bottlenecks are in your application and for that you need some sort of profiler.
Once you have found your bottlenecks you will be in a much better position. You can evaluate then problem areas for common issues including:
Design and Architecture issues
SQL anti-patterns
Incorrect usage of your framework (perhaps relying on inappropriate defaults)
Badly structured algorithms
The specifics of any solution are going to depend on the specifics of the bottlenecks your find.

http://wiki.python.org/moin/PythonSpeed/PerformanceTips
I optimized some python code a while back, the most surprising thing to me was how much each function call costs. If you minimize function calls or replace loops with builtins you'll be running much faster.

There are some great suggestions here… So let me suggest an implementation detail. I have found the runprofileserver command found in django-command-extensions very convenient for profiling my Django code.

I am not sure if this would solve the problem but you should have a look at psyco

Related

Is Python's speed enough or should I use Pyrex?

info:
I'm using Django.
question:
Is Python's speed enough for providing a low latency web service or should I translate my functions to C using Pyrex?

Lots of people do use Python to implement web services (hence Django existing at all), and find it low enough latency for their purposes. So in one sense, the answer is a trivial "yes".
To answer properly requires lots more information and study, and isn't really appropriate for SO's format. For starters, you need to know how fast is "fast enough" (and even for that, you need to figure out how much latency there's going to be due to other factors, such as network latency). It also obviously depends on what your implementation actually is; if all your program does is fetch records from a database, then the code execution will probably be dwarfed by database and network latency whether you use pure Python or C. OTOH, if you're solving arbitrarily large NP-hard computational problems, Python might be starting to look a little less attractive. OTOOH, if you're solving really tricky to implement computational problems, Python will probably dramatically decrease the time it takes you to have your service at all, and a slow service is usually preferable to a non-existent one.
With no actual concrete knowledge, the existence of other web services written in Python makes me intuit that you'll probably be fine in Python, and you should just go and do it and then see if there are any performance bottlenecks that would benefit from being Pyrexed. There's the usual "premature optimisation is the root of all evil" line to consider; before you've even written any code is WAY too early to be thinking about optimisation. As long as it's not blindingly obvious that your approach can never be fast enough, go with the simplest implementation and speed it up later.

Really, the only way you can know (IMO) is to try it and see. If and when you start experiencing performance issues, then it is time to profile, and see if it is code execution, or something else causing the delays.
Personally, I think you will have no problems. But then again, it depends on what exactly your web service is doing.

If you think about translating the code you haven't even written yet to C you might as well write your web service in C from the start. That'll get you the lowest latency possible.

As I understand it it, you don't want to use Pyrex anyway. You want to use Cython as it's a more advanced version of the same thing.
Secondly, surely the beauty of using something like Cython is that you can just write your code in Python, and if it's not fast enough, the changes aren't huge to get the speedups you need.
Optimise when you know there's a problem.

Python/Django for an enterprise large scale web based system?

My company is highly dependent on Java and jsf; All projects since I was hired are implemented using them. But most of those projects are facing problems related to performance and availability. So am finally considering a shift to other technologies and I have tried to research in the net, and im about to decide to try python. But before i start i would like to hear ur answer that Python would solve me the performance problems we are facing.
To make things clear the performance problems we mostly face are related to glassfish server and page loading. We are currently using ice faces and have tried wood stock back then. Additionally I can't use .net for some policy related issues. And PHP is also out of question due to some security leaks experienced in earlier projects.
So am expecting to read the pros and cons related with performance and availability in trying to convince my boss and customers in to python.

I have some doubts that you will gain performance by using Django or a Python based solution. I don't know the Glassfish server nor how it scales up but unless badly designed I don't see why it should perform badly.
From the explanation of your performance issues, it doesn't seem to be a problem of language speed but instead, server configuration and availability.
Assuming that your Java code is reasonably optimal (i.e. efficient and acceptably fast), you won't solve the problem by using some Python solution. Instead you should invest some time into studying caching mechanisms and/or proxy solutions.
Depending on how your server is setup, an additional advice would be to let all the static content be served by a dedicated server such as Apache, nginx or similar and only leave the the dynamic content on to be interpreted by your glassfish server.
Since your projects are written in Java you are in theory using a language that can potentially be faster than Python, I don't see why a Python solution would perform better unless there is something wrong with the framework you are using.
If you want to talk about prototyping or faster development, then that's a different subject discussed multiple times on stackoverflow.

Performance of Python worth the cost?

I'm looking at implementing a fuzzy logic controller based on either PyFuzzy (Python) or FFLL (C++) libraries.
I'd prefer to work with python but am unsure if the performance will be acceptable in the embedded environment it will work in (either ARM or embedded x86 proc both ~64Mbs of RAM).
The main concern is that response times are as fast as possible (an update rate of 5hz+ would be ideal >2Hz is required). The system would be reading from multiple (probably 5) sensors from an RS232 port and provide 2/3 outputs based on the results of the fuzzy evaluation.
Should I be concerned that Python will be too slow for this task?

In general, you shouldn't obsess over performance until you've actually seen it become a problem. Since we don't know the details of your app, we can't say how it'd perform if implemented in Python. And since you haven't implemented it yet, neither can you.
Implement the version you're most comfortable with, and can implement fastest, first. Then benchmark it. And if it is too slow, you have three options which should be done in order:
First, optimize your Python code
If that's not enough, write the most performance-critical functions in C/C++, and call that from your Python code
And finally, if you really need top performance, you might have to rewrite the whole thing in C++. But then at least you'll have a working prototype in Python, and you'll have a much clearer idea of how it should be implemented. You'll know what pitfalls to avoid, and you'll have an already correct implementation to test against and compare results to.

Python is very slow at handling large amounts of non-string data. For some operations, you may see that it is 1000 times slower than C/C++, so yes, you should investigate into this and do necessary benchmarks before you make time-critical algorithms in Python.
However, you can extend python with modules in C/C++ code, so that time-critical things are fast, while still being able to use python for the main code.

Make it work, then make it work fast.

If most of your runtime is spent in C libraries, the language you use to call these libraries isn't important. What language are your time-eating libraries written in ?

From your description, speed should not be much of a concern (and you can use C, cython, whatever you want to make it faster), but memory would be. For environments with 64 Mb max (where the OS and all should fit as well, right ?), I think there is a good chance that python may not be the right tool for target deployment.
If you have non trivial logic to handle, I would still prototype in python, though.

I never really measured the performance of pyfuzzy's examples, but as the new version 0.1.0 can read FCL files as FFLL does. Just describe your fuzzy system in this format, write some wrappers, and check the performance of both variants.
For reading FCL with pyfuzzy you need the antlr python runtime, but after reading you should be able to pickle the read object, so you don't need the antlr overhead on the target.

Will python provide enough performance for a proxy?

I want to start writing a http proxy that will modify responses according to some rules/filters I will configure. However, before I start coding it, I want to make sure I'm making the right choice in going with Python. Later, this tool would have to be able to process a lot of requests, so, I would like to know I can count on it later on to be able to perform when "push comes to shove".

As long as the bulk of the processing uses Python's built-in modules it should be fine as far as performance. The biggest strength of Python is its clear syntax and ease of testing/maintainability. If you find that one section of your code is slowing down the process, you can rewrite that section and use it as a C module, while keeping the bulk of your control code in Python.
However if you're looking to make the most optimized Python Code you may want to check out this SO post.

Yes, I think you will find Python to be perfectly adequate for your needs. There's a huge number of web frameworks, WSGI libraries, etc. to choose from, or learn from when building your own.
There's an interesting post on the Python History blog about how Python was supporting high performance websites in 1996.

This will depend on the library you use more than the language itself. The twisted framework is known to scale well.
Here's a proxy server example in python/twisted to get you started.
Bottomline: choose your third party tools wisely and I'm sure you'll be fine.

Python performs pretty well for most tasks, but you'll need to change the way you program if you're used to other languages. See Python is not Java for more info.
If plain old CPython doesn't give the performance you need, you have other options as well.
As has been mentioned, you can extend it in C (using a tool like swig or Pyrex). I also hear good things about PyPy as well, but bear in mind that it uses a restricted subset of Python. Lastly, a lot of people use psyco to speed up performance.

If it is decided that our system needs an overhaul, what is the best way to go about it?

We are mainting a web application that is built on Classic ASP using VBScript as the primary language. We are in agreement that our backend (framework if you will) is out dated and doesn't provide us with the proper tools to move forward in a quick manner. We have pretty much embraced the current webMVC pattern that is all over the place, and cannot do it, in a reasonable manner, with the current technology. The big missing features are proper dispatching and templating with inheritance, amongst others.
Currently there are two paths being discussed:
Port the existing application to Classic ASP using JScript, which will allow us to hopefully go from there to .NET MSJscript without too much trouble, and eventually end up on the .NET platform (preferably the MVC stuff will be done by then, ASP.NET isn't much better than were we are on now, in our opinions). This has been argued as the safer path with less risk than the next option, albeit it might take slightly longer.
Completely rewrite the application using some other technology, right now the leader of the pack is Python WSGI with a custom framework, ORM, and a good templating solution. There is wiggle room here for even django and other pre-built solutions. This method would hopefully be the quickest solution, as we would probably run a beta beside the actual product, but it does have the potential for a big waste of time if we can't/don't get it right.
This does not mean that our logic is gone, as what we have built over the years is fairly stable, as noted just difficult to deal with. It is built on SQL Server 2005 with heavy use of stored procedures and published on IIS 6, just for a little more background.
Now, the question. Has anyone taken either of the two paths above? If so, was it successful, how could it have been better, etc. We aren't looking to deviate much from doing one of those two things, but some suggestions or other solutions would potentially be helpful.

Don't throw away your code!
It's the single worst mistake you can make (on a large codebase). See Things You Should Never Do, Part 1.
You've invested a lot of effort into that old code and worked out many bugs. Throwing it away is a classic developer mistake (and one I've done many times). It makes you feel "better", like a spring cleaning. But you don't need to buy a new apartment and all new furniture to outfit your house. You can work on one room at a time... and maybe some things just need a new paintjob. Hence, this is where refactoring comes in.
For new functionality in your app, write it in C# and call it from your classic ASP. You'll be forced to be modular when you rewrite this new code. When you have time, refactor parts of your old code into C# as well, and work out the bugs as you go. Eventually, you'll have replaced your app with all new code.
You could also write your own compiler. We wrote one for our classic ASP app a long time ago to allow us to output PHP. It's called Wasabi and I think it's the reason Jeff Atwood thought Joel Spolsky went off his rocker. Actually, maybe we should just ship it, and then you could use that.
It allowed us to switch our entire codebase to .NET for the next release while only rewriting a very small portion of our source. It also caused a bunch of people to call us crazy, but writing a compiler is not that complicated, and it gave us a lot of flexibility.
Also, if this is an internal only app, just leave it. Don't rewrite it - you are the only customer and if the requirement is you need to run it as classic asp, you can meet that requirement.

Use this as an opportunity to remove unused features! Definitely go with the new language. Call it 2.0. It will be a lot less work to rebuild the 80% of it that you really need.
Start by wiping your brain clean of the whole application. Sit down with a list of its overall goals, then decide which features are needed based on which ones are used. Then redesign it with those features in mind, and build.
(I love to delete code.)

It works out better than you'd believe.
Recently I did a large reverse-engineering job on a hideous old collection of C code. Function by function I reallocated the features that were still relevant into classes, wrote unit tests for the classes, and built up what looked like a replacement application. It had some of the original "logic flow" through the classes, and some classes were poorly designed [Mostly this was because of a subset of the global variables that was too hard to tease apart.]
It passed unit tests at the class level and at the overall application level. The legacy source was mostly used as a kind of "specification in C" to ferret out the really obscure business rules.
Last year, I wrote a project plan for replacing 30-year old COBOL. The customer was leaning toward Java. I prototyped the revised data model in Python using Django as part of the planning effort. I could demo the core transactions before I was done planning.
Note: It was quicker to build a the model and admin interface in Django than to plan the project as a whole.
Because of the "we need to use Java" mentality, the resulting project will be larger and more expensive than finishing the Django demo. With no real value to balance that cost.
Also, I did the same basic "prototype in Django" for a VB desktop application that needed to become a web application. I built the model in Django, loaded legacy data, and was up and running in a few weeks. I used that working prototype to specify the rest of the conversion effort.
Note: I had a working Django implementation (model and admin pages only) that I used to plan the rest of the effort.
The best part about doing this kind of prototyping in Django is that you can mess around with the model, unit tests and admin pages until you get it right. Once the model's right, you can spend the rest of your time fiddling around with the user interface until everyone's happy.

Whatever you do, see if you can manage to follow a plan where you do not have to port the application all in one big bang. It is tempting to throw it all away and start from scratch, but if you can manage to do it gradually the mistakes you do will not cost so much and cause so much panic.

Half a year ago I took over a large web application (fortunately already in Python) which had some major architectural deficiencies (templates and code mixed, code duplication, you name it...).
My plan is to eventually have the system respond to WSGI, but I am not there yet. I found the best way to do it, is in small steps. Over the last 6 month, code reuse has gone up and progress has accelerated.
General principles which have worked for me:
Throw away code which is not used or commented out
Throw away all comments which are not useful
Define a layer hierarchy (models, business logic, view/controller logic, display logic, etc.) of your application. This has not to be very clear cut architecture but rather should help you think about the various parts of your application and help you better categorize your code.
If something grossly violates this hierarchy, change the offending code. Move the code around, recode it at another place, etc. At the same time adjust the rest of your application to use this code instead of the old one. Throw the old one away if not used anymore.
Keep you APIs simple!
Progress can be painstakingly slow, but should be worth it.

I would not recommend JScript as that is definitely the road less traveled.
ASP.NET MVC is rapidly maturing, and I think that you could begin a migration to it, simultaneously ramping up on the ASP.NET MVC framework as its finalization comes through.
Another option would be to use something like ASP.NET w/Subsonic or NHibernate.

Don't try and go 2.0 ( more features then currently exists or scheduled) instead build your new platform with the intent of resolving the current issues with the code base (maintainability/speed/wtf) and go from there.

A good place to begin if you're considering the move to Python is to rewrite your administrator interface in Django. This will help you get some of the kinks worked out in terms of getting Python up and running with IIS (or to migrate it to Apache). Speaking of which, I recommend isapi-wsgi. It's by far the easiest way to get up and running with IIS.

I agree with Michael Pryor and Joel that it's almost always a better idea to continue evolving your existing code base rather than re-writing from scratch. There are typically opportunities to just re-write or re-factor certain components for performance or flexibility.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.