psyco seems to be quite helpful in optimizing Python code, and it does it in a very non-intrusive way.
Therefore, one has to wonder. Assuming you're always on a x86 architecture (which is where most apps run these days), why not just always use psyco for all Python code? Does it make mistakes sometimes and ruins the correctness of the program? Increases the runtime for some weird cases?
Have you had any negative experiences with it? My most negative experience so far was that it made my code faster by only 15%. Usually it's better.
Naturally, using psyco is not a replacement for efficient algorithms and coding. But if you can improve the performance of your code for the cost of two lines (importing and calling psyco), I see no good reason not to.
1) The memory overhead is the main one, as described in other answers. You also pay the compilation cost, which can be prohibitive if you aren't selective. From the user reference:
Compiling everything is often overkill for medium- or large-sized applications. The drawbacks of compiling too much are in the time spent compiling, plus the amount of memory that this process consumes. It is a subtle balance to keep.
2) Performance can actually be harmed by Psyco compilation. Again from the user guide ("known bugs" section):
There are also performance bugs: situations in which Psyco slows down the code instead of accelerating it. It is difficult to make a complete list of the possible reasons, but here are a few common ones:
The built-in map and filter functions must be avoided and replaced by list comprehension. For example, map(lambda x: x*x, lst) should be replaced by the more readable but more recent syntax [x*x for x in lst].
The compilation of regular expressions doesn't seem to benefit from Psyco. (The execution of regular expressions is unaffected, since it is C code.) Don't enable Psyco on this module; if necessary, disable it explicitely, e.g. by calling psyco.cannotcompile(re.compile).
3) Finally, there are some relatively obscure situations where using Psyco will actually introduce bugs. Some of them are listed here.
Psyco currently uses a lot of memory.
It only runs on Intel 386-compatible
processors (under any OS) right now.
There are some subtle semantic
differences (i.e. bugs) with the way
Python works; they should not be
apparent in most programs.
See also the caveats section. For a hard example, I noticed that my web app with Cheetah-generated templates and DB I/O gained no appreciable speedup.
When using pyglet I found that I couldn't use psyco on the entire app without making my app non-functional. I could use it in small sections of math-heavy code, of course, but it wasn't necessary, so I didn't bother.
Also, psyco has done strange things with my profiling results (such as, well, not alter them at all from the non-psyco version). I suspect it doesn't play nice with the profiling code.
I just don't really use it unless I really want the speed, which is not all that often. My priority is algorithm optimization, which generally results in nicer speedups.
It also depends where your bottleneck is. I am mostly doing web apps and there the bottlenecks are probably more IO and database. So you should know where to optimize.
Also beware that maybe you first should think about your code instead of directly throwing psyco at it. So I agree with Devin, that algorithm optimizations should come first and they might have a smaller chance of unwanted sideeffects.
psyco is dead and not longer maintained. It is time to find another
One should never rely on some magic bullet to fix your problems. Using psyco to make a slow program faster is usually not necessary. Bad algorithms can be rewritten, and parts that require speed could be written in another language. Of course, your question asks why we don't use it for the speed boost anyways, and there's a bit of overhead when you use psyco. Psyco uses memory, and those two lines just sorta feel like overhead when you look at them. As for my personal reason on why I don't use psyco, it's because it doesn't support x86_64, which I see as the new up and coming architecture (especially with 2038 approaching sooner or later). My alternative is pypy, but I'm not entirely fond of that either.
A couple of other things:
It doesn't seem to be very actively maintained.
It can be a memory hog.
Quite simply: "Because the code already runs fast enough".
Related
I'm rephrasing my question because I think many thought it was the question "does python have threads". It does, but CPython also has the GIL, which will never schedule more than one thread at any given time. That makes CPython threads useless for cpu-intensive computations.
I need to use threads; process parallelism won't work for me because of the IPC costs (I have large shared objects).
I'm currently using Jython (no GIL) with JyNI so that I can use numpy. JyNI is alpha, but it does now support numpy. I got this to work. However, JyNI is alpha and buggy, and the whole process is slow.
I've read a bunch of old threads. I wonder whether there has been a viable option since then? I'm forced to use python 2.7.
Thanks.
At the moment, Jython is still considerably slower than CPython. Depending on the program and how well the JIT can optimize it, multithreading might or might not pay off. Jython's primary design goal is compatibility, before performance. It is mainly intended for glue code and there is still a lot of potential for efficiency improvements. See e.g. zippy for a blazingly fast Python implementation in Java, however it is experimental and lacks Jython's compatibility level. In a way this represents the opposite design goal.
Now adding JyNI to Jython does not exactly make it faster, but so far I found that performace optimization in JyNI would be premature and usually the Jython part dominates the runtime anyway. Also, e.g. for NumPy the native numerics workload vastly dominates the glue code cost.
Finally, note that JyNI must emulate a GIL on C side. For details have a look at the paper https://arxiv.org/abs/1607.00825. Maybe it will be possible to operate certain extensions without a GIL - it depends on implementation detail, how sensitive an extension is to that. Currently the C-side GIL is mandatory. That's why you might not benefit from Java multithreading when using NumPy. C-extensions have the option to explicitly release the GIL e.g. during computationally intense operations that don't interact with the interpreter. I don't know if NumPy makes use of this.
JyNI is alpha and buggy
Please make sure to report bugs at the issue tracker.
info:
I'm using Django.
question:
Is Python's speed enough for providing a low latency web service or should I translate my functions to C using Pyrex?
Lots of people do use Python to implement web services (hence Django existing at all), and find it low enough latency for their purposes. So in one sense, the answer is a trivial "yes".
To answer properly requires lots more information and study, and isn't really appropriate for SO's format. For starters, you need to know how fast is "fast enough" (and even for that, you need to figure out how much latency there's going to be due to other factors, such as network latency). It also obviously depends on what your implementation actually is; if all your program does is fetch records from a database, then the code execution will probably be dwarfed by database and network latency whether you use pure Python or C. OTOH, if you're solving arbitrarily large NP-hard computational problems, Python might be starting to look a little less attractive. OTOOH, if you're solving really tricky to implement computational problems, Python will probably dramatically decrease the time it takes you to have your service at all, and a slow service is usually preferable to a non-existent one.
With no actual concrete knowledge, the existence of other web services written in Python makes me intuit that you'll probably be fine in Python, and you should just go and do it and then see if there are any performance bottlenecks that would benefit from being Pyrexed. There's the usual "premature optimisation is the root of all evil" line to consider; before you've even written any code is WAY too early to be thinking about optimisation. As long as it's not blindingly obvious that your approach can never be fast enough, go with the simplest implementation and speed it up later.
Really, the only way you can know (IMO) is to try it and see. If and when you start experiencing performance issues, then it is time to profile, and see if it is code execution, or something else causing the delays.
Personally, I think you will have no problems. But then again, it depends on what exactly your web service is doing.
If you think about translating the code you haven't even written yet to C you might as well write your web service in C from the start. That'll get you the lowest latency possible.
As I understand it it, you don't want to use Pyrex anyway. You want to use Cython as it's a more advanced version of the same thing.
Secondly, surely the beauty of using something like Cython is that you can just write your code in Python, and if it's not fast enough, the changes aren't huge to get the speedups you need.
Optimise when you know there's a problem.
I have a huge grammar developed for pyparsing as part of a large, pure Python application.
I have reached the limit of performance tweaking and I'm at the point where the diminishing returns make me start to look elsewhere. Yes, I think I know most of the tips and tricks and I've profiled my grammar and my application to dust.
What next?
I hope to find a parser that gives me the same readability, usability (I'm using many advanced features of pyparsing such as parse-actions to start the post processing of the input which is being parsed) and python integration but at 10× the performance.
I love the fact the the grammar is pure Python.
All my basic blocks are regular expressions, so reusing them would be nice.
I know I can't have everything so I am willing to give up on some of the features I have today to get to the requested 10× performance.
Where do I go from here?
It looks like the pyparsing folks have anticipated your problem. From https://github.com/pyparsing/pyparsing/blob/master/docs/HowToUsePyparsing.rst :
Performance of pyparsing may be slow for complex grammars and/or large input strings. The psyco package can be used to improve the speed of the pyparsing module with no changes to grammar or program logic - observed improvments have been in the 20-50% range.
However, as Vangel noted in the comments below, psyco is an obsolete project as of March 2012. Its successor is the PyPy project, which starts from the same basic approach to performance: use a JIT native-code compiler instead of a bytecode interpreter. You should be able to achieve similar or greater gains with PyPy if switching Python implementations will work for you.
If you're really a speed demon, but want to keep some of the legibility and declarative syntax, I'd suggest having a look at ANTLR. Probably not the Python-generating backend; I'm skeptical whether that's mature or high-performance enough for your needs. I'm talking about the goods: the C backend that started it all.
Wrap a Python C extension module around the entry point to the parser, and turn it loose.
Having said that, you'll be giving up a lot in this transition: basically any Python you want to do in your parser will have to be done through the C API (not altogether pretty). Also, you'll have to get used to very different ways of doing things. ANTLR has its charms, but it's not based on combinators, so there's not the easy and fluid relationship between your grammar and your language that there is in pyparsing. Plus, it's its own DSL, much like lex/yacc, which can present a learning curve – but, because it's LL based, you'll probably find it easier to adapt to your needs.
Switch to a generated C/C++ parser (using ANTLR, flex/bison, etc.). If you can delay all the action rules until after you are done parsing, you might be able to build an AST with trivial code and then pass that back to your python code via something like SWIG and process on it with your current actions rules. OTOH, for that to give you a speed boost, the parsing has to be the heavy lifting. If your action rules are the big cost, then this will buy you nothing unless you write your action rules in C as well (but you might have to do it anyway to avoid paying for whatever impedance mismatch you get between the python and C code).
If you really want performance for large grammars, look no farther than SimpleParse (which itself relies on mxTextTools, a C extension). However, know now that it comes at the cost of being more cryptic and requiring that you be well-versed in EBNF.
It's definitely not the more Pythonic route, and you're going to have to start all over with an EBNF grammar to use SimpleParse.
A bit late to the party, but PLY (Python Lex-Yacc), has served me very well. PLY gives you a pure Python framework for constructing lex-based tokenizers, and yacc-based LR parsers.
I went this route when I hit performance issues with pyparsing.
Here is a somewhat old but still interesting article on Python parsing which includes benchmarks for ANTLR, PLY and pyparsing. PLY is roughly 4 times faster than pyparsing in this test.
There's no way to know what kind of benefit you'll get without just testing it, but it's within the range of possibility that you could get 10x benefit just from using Unladen Swallow if your process is long-running and repetitive. (Also, if you have many things to parse and you typically start a new interpreter for each one, Unladen Swallow gets faster - to a point - the longer you run your process, so while parsing one input might not show much gain, you might get significant gains on the 2nd and 3rd inputs in the same process).
(Note: pull the latest out of SVN - you'll get far better performance than the latest tarball)
Recently i have developed a billing application for my company with Python/Django. For few months everything was fine but now i am observing that the performance is dropping because of more and more users using that applications. Now the problem is that the application is now very critical for the finance team. Now the finance team are after my life for sorting out the performance issue. I have no other option but to find a way to increase the performance of the billing application.
So do you guys know any performance optimization techniques in python that will really help me with the scalability issue
Guys we are using mysql database and its hosted on apache web server on Linux box. Secondly what i have noticed more is the over all application is slow and not the database transactional part. For example once the application is loaded then it works fine but if they navigate to other link on that application then it takes a whole lot of time.
And yes we are using HTML, CSS and Javascript
As I said in comment, you must start by finding what part of your code is slow.
Nobody can help you without this information.
You can profile your code with the Python profilers then go back to us with the result.
If it's a Web app, the first suspect is generally the database. If it's a calculus intensive GUI app, then look first at the calculations algo first.
But remember that perf issues car be highly unintuitive and therefor, an objective assessment is the only way to go.
ok, not entirely to the point, but before you go and start fixing it, make sure everyone understands the situation. it seems to me that they're putting some pressure on you to fix the "problem".
well first of all, when you wrote the application, have they specified the performance requirements? did they tell you that they need operation X to take less than Y secs to complete? Did they specify how many concurrent users must be supported without penalty to the performance? If not, then tell them to back off and that it is iteration (phase, stage, whatever) one of the deployment, and the main goal was the functionality and testing. phase two is performance improvements. let them (with your help obviously) come up with some non functional requirements for the performance of your system.
by doing all this, a) you'll remove the pressure applied by the finance team (and i know they can be a real pain in the bum) b) both you and your clients will have a clear idea of what you mean by "performance" c) you'll have a base that you can measure your progress and most importantly d) you'll have some agreed time to implement/fix the performance issues.
PS. that aside, look at the indexing... :)
A surprising feature of Python is that the pythonic code is quite efficient... So a few general hints:
Use built-ins and standard functions whenever possible, they're already quite well optimized.
Try to use lazy generators instead one-off temporary lists.
Use numpy for vector arithmetic.
Use psyco if running on x86 32bit.
Write performance critical loops in a lower level language (C, Pyrex, Cython, etc.).
When calling the same method of a collection of objects, get a reference to the class function and use it, it will save lookups in the objects dictionaries (this one is a micro-optimization, not sure it's worth)
And of course, if scalability is what matters:
Use O(n) (or better) algorithms! Otherwise your system cannot be linearly scalable.
Write multiprocessor aware code. At some point you'll need to throw more computing power at it, and your software must be ready to use it!
before you can "fix" something you need to know what is "broken". In software development that means profiling, profiling, profiling. Did I mention profiling. Without profiling you don't know where CPU cycles and wall clock time is going. Like others have said to get any more useful information you need to post the details of your entire stack. Python version, what you are using to store the data in (mysql, postgres, flat files, etc), what web server interface cgi, fcgi, wsgi, passenger, etc. how you are generating the HTML, CSS and assuming Javascript. Then you can get more specific answers to those tiers.
You may be interested in this document I've found some time ago.
As personal advice, be as more pythonic as you can: lazy evaluation is the keyword, so learn to use iterators and generators.
For the type of application you are describing (a web application probably backed by a database) your performance problems are unlikely to be language specific. They are far more likely to stem from design or architecture issues, though they could be simple coding problems too.
To sort this out you need to figure out where the bottlenecks are in your application and for that you need some sort of profiler.
Once you have found your bottlenecks you will be in a much better position. You can evaluate then problem areas for common issues including:
Design and Architecture issues
SQL anti-patterns
Incorrect usage of your framework (perhaps relying on inappropriate defaults)
Badly structured algorithms
The specifics of any solution are going to depend on the specifics of the bottlenecks your find.
http://wiki.python.org/moin/PythonSpeed/PerformanceTips
I optimized some python code a while back, the most surprising thing to me was how much each function call costs. If you minimize function calls or replace loops with builtins you'll be running much faster.
There are some great suggestions here… So let me suggest an implementation detail. I have found the runprofileserver command found in django-command-extensions very convenient for profiling my Django code.
I am not sure if this would solve the problem but you should have a look at psyco
I'm looking at implementing a fuzzy logic controller based on either PyFuzzy (Python) or FFLL (C++) libraries.
I'd prefer to work with python but am unsure if the performance will be acceptable in the embedded environment it will work in (either ARM or embedded x86 proc both ~64Mbs of RAM).
The main concern is that response times are as fast as possible (an update rate of 5hz+ would be ideal >2Hz is required). The system would be reading from multiple (probably 5) sensors from an RS232 port and provide 2/3 outputs based on the results of the fuzzy evaluation.
Should I be concerned that Python will be too slow for this task?
In general, you shouldn't obsess over performance until you've actually seen it become a problem. Since we don't know the details of your app, we can't say how it'd perform if implemented in Python. And since you haven't implemented it yet, neither can you.
Implement the version you're most comfortable with, and can implement fastest, first. Then benchmark it. And if it is too slow, you have three options which should be done in order:
First, optimize your Python code
If that's not enough, write the most performance-critical functions in C/C++, and call that from your Python code
And finally, if you really need top performance, you might have to rewrite the whole thing in C++. But then at least you'll have a working prototype in Python, and you'll have a much clearer idea of how it should be implemented. You'll know what pitfalls to avoid, and you'll have an already correct implementation to test against and compare results to.
Python is very slow at handling large amounts of non-string data. For some operations, you may see that it is 1000 times slower than C/C++, so yes, you should investigate into this and do necessary benchmarks before you make time-critical algorithms in Python.
However, you can extend python with modules in C/C++ code, so that time-critical things are fast, while still being able to use python for the main code.
Make it work, then make it work fast.
If most of your runtime is spent in C libraries, the language you use to call these libraries isn't important. What language are your time-eating libraries written in ?
From your description, speed should not be much of a concern (and you can use C, cython, whatever you want to make it faster), but memory would be. For environments with 64 Mb max (where the OS and all should fit as well, right ?), I think there is a good chance that python may not be the right tool for target deployment.
If you have non trivial logic to handle, I would still prototype in python, though.
I never really measured the performance of pyfuzzy's examples, but as the new version 0.1.0 can read FCL files as FFLL does. Just describe your fuzzy system in this format, write some wrappers, and check the performance of both variants.
For reading FCL with pyfuzzy you need the antlr python runtime, but after reading you should be able to pickle the read object, so you don't need the antlr overhead on the target.