Python code optimization - traceback_utils error_handler - python

I am trying to speed up some python code. It's a fairly complex system, including a loop running inference of a Convolutional Neural network (Tensorflow 2) and subsequently handle its results in a reconstruction algorithm. The NN running on GPU is fast enough, but the subsequent python code can't catch up. So I profiled the code and found the hot-path to include a lot of error-handling stuff it seems:
In a production setup I'd love to get rid of this for performance. Is there any way I can achieve this in python? It seems there is some back-tracing/error-handling involved in each function call. Is that correct? If so, in C++ I'd probably try inlining functions as a remedy. Are there similar strategies in python? I'd appreciate any general advice.
Note: No minimum working example or code, as it's a fairly complex setup and I figure the question is not very specific to my code. But I can share the code and provide more detailed information if people think it'd be helpful.

Related

Python and C++ performance comparison

In a lecture I've encountered the following problem:
Given a simple program which computes the sum of a column in a large data set, performance of a python and a c++ implementation are being compared. The main bottleneck should be reading the data. The computation itself is rather simple. On first execution, the python version is about 2 times slower than c++ which makes sense.
Then on the second execution, the c++ program speeds up from 4 seconds to 1 second because apparently the "first execution is I/O bound, second is CPU bound". This still makes sense since probably the file contents were cached omitting the slow reading from disk.
However, the python implementation did not speed up at all on the second run, despite the warm cache. I know python is slow, but is it that slow? Does this mean that executing this simple computation in python is slower than reading about .7 GB from disk?
If this is always the case, I'm wondering why the biggest deep learning frameworks I know (PyTorch, tensorflow) have python apis. For real time object detection for example, it must be slower to parse the input (read frames from a video, maybe preprocess) to the network and to interpret the output, than performing the forward propagation itself on a gpu.
Have I misunderstood something? Thank you.
That's not so easy to answer without implementation details, but in general, python is known for it's much less cache friendliness, because you mostly haven't the option to low-level optimize cache behaviour in python. However, this isn't always correct. You propably can optimize the cache friendliness in python directly, or you use parts of c++ code for critical sections. But always consider, that you can just optimize your code better in C++. So if you have really critical code parts, where you want to achieve every percent of speed and effiency, you should use C++. That's the reason, that many programs use both, C++ for raw performance things and python for a nice interface and program structure.

How to use pypy with python existing calculation code?

I am little bit new to python and I have a large code base written in python 3.3.2 (32 bit). It uses numpy 1.7.1 and takes a very long time to run because of computationally intensive calculations.
I need to parallelize the code to increase the performance. I am thinking about using pypy to parallelize but am unsure how to use it with existing code.
I have search Google but couldn't find and appropriate or satisfactory answer. I also read about using cython but I am unsure how to use that as well.
Could anyone provide pointers on increasing the performance of my code?
Since you're new to Python, I highly recommend taking the time to survey all of the possibilities before jumping into something pypy which may not be appropriate for your needs. There are lots of ways to speed up NumPy code, and the best way really depends on exactly what you're doing.
A great starting off point is Ian Oszvald's High Performance Python tutorial. Don't just watch it: follow along and try out the examples!
From there, you should think about whether parallelization will help. There are several options for parallelization, like the stdlib's multiprocessing module, but a lot of people in the scientific space are using IPython's parallelization capabilities. For learning about this, check out Min Ragan-Kelley's IPython Parallel tutorial (pt 1, pt 2, pt 3).
Once you have some sense of what Python is capable of, pick a method for speeding up your code and try it out. When you run into more specific problems, StackOverflow will be able to provide more concrete answers than just some links to tutorials ;)

Instrumentation for numerical linear algebra in Python

I use numpy for numerical linear algebra. I suspect that I can get much better performance if I make small modifications in how I carry out certain computations so that they are more memory efficient, for example.
I was wondering if there is any form of instrumentation available in python to detect cache and TLB misses. There is a very nice api, PAPI, that I learned about in a recent class but it doesn't have a Python interface:
http://icl.cs.utk.edu/papi/overview/index.html
Also, is there a good way in general to profile numpy or other python numerical code? The timeit module is hard to integrate into code. mpi4py has a nice way to profile using the MPE library. A snippet from demo code (demo/mpe-logging/cpilog.py):
communication = MPE.newLogState("Comunicate", "red")
with communication:
comm.Bcast([n, MPI.INT], root=0)
A log file is created that can be displayed graphically. But this is a bit MPI specific.
Thanks.
Robert Kern (one of the NumPy devs) wrote line_profiler for exactly this scenario. It is more suited to profiling NumPy-heavy code than hotspot/cProfile.
Maybe one of the provided profilers might help you find the hotspots?
see profiling python
These will probably not give enough detail to trigger direct action, but should indicate where to look for improvement and help to determine the point of diminishing returns.

Just Curious about Python+Numpy to Realtime Gesture Recognition

i 'm just finish labs meeting with my advisor, previous code is written in matlab and it run offline mode not realtime mode, so i decide to convert to python+numpy (in offline version) but after labs meeting, my advisor raise issue about speed of realtime recognition, so i have doubt about speed of python+numpy to do this project. or better in c? my project is about using electronic glove (2x sensors) to get realtime data and do data processing, recognition process
NumPy is very fast if you follow some basic rules. You should avoid Python loops, using the operators provided by NumPy instead whenever you can. This and this should be a good starting points.
After reading through that, why don't you write some simple code in both Matlab and NumPy and compare the performance? If it performs well in NumPy, it should be enough to convince your advisor, especially if the code is representative of the actual algorithms you are using in your project.
Note: you should also see that your algorithm really is suited for realtime recognition.
I think the answer depends on three things: how well you code in Matlab, how well you code in Python/Numpy, and your algorithm. Both Matlab and Python can be fast for number crunching if you're diligent about vectorizing everything and using library calls.
If your Matlab code is already very good I would be surprised if you saw much performance benefit moving to Numpy unless there's some specific idiom you can use to your advantage. You might not even see a large benefit moving to C. I this case your effort would likely be better spent tuning your algorithm.
If your Matlab code isn't so good you could 1) write better Matlab code, 2) rewrite in good Numpy code, or 3) rewrite in C.
You might look at OpenCV, which has Python libs
ctypes-opencv
and opencv-cython;
I haven't used these myself.
Ideally you want to combine a fast-running C inner loop
with a flexible Python/Numpy play-with-algorithms.
Bytheway google "opencv gesture recognition" → 6680 hits.

What is MATLAB good for? Why is it so used by universities? When is it better than Python? [closed]

As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 10 years ago.
Locked. This question and its answers are locked because the question is off-topic but has historical significance. It is not currently accepting new answers or interactions.
I've been recently asked to learn some MATLAB basics for a class.
What does make it so cool for researchers and people that works in university?
I saw it's cool to work with matrices and plotting things... (things that can be done easily in Python using some libraries).
Writing a function or parsing a file is just painful. I'm still at the start, what am I missing?
In the "real" world, what should I think to use it for? When should it can do better than Python? For better I mean: easy way to write something performing.
UPDATE 1: One of the things I'd like to know the most is "Am I missing something?" :D
UPDATE 2: Thank you for your answers. My question is not about buy or not to buy MATLAB. The university has the possibility to give me a copy of an old version of MATLAB (MATLAB 5 I guess) for free, without breaking the license. I'm interested in its capabilities and if it deserves a deeper study (I won't need anything more than basic MATLAB in oder to pass the exam :P ) it will really be better than Python for a specific kind of task in the real world.
Adam is only partially right. Many, if not most, mathematicians will never touch it. If there is a computer tool used at all, it's going to be something like Mathematica or Maple. Engineering departments, on the other hand, often rely on it and there are definitely useful things for some applied mathematicians. It's also used heavily in industry in some areas.
Something you have to realize about MATLAB is that it started off as a wrapper on Fortran libraries for linear algebra. For a long time, it had an attitude that "all the world is an array of doubles (floats)". As a language, it has grown very organically, and there are some flaws that are very much baked in, if you look at it just as a programming language.
However, if you look at it as an environment for doing certain types of research in, it has some real strengths. It's about as good as it gets for doing floating point linear algebra. The notation is simple and powerful, the implementation fast and trusted. It is very good at generating plots and other interactive tasks. There are a large number of `toolboxes' with good code for particular tasks, that are affordable. There is a large community of users that share numerical codes (Python + NumPy has nothing in the same league, at least yet)
Python, warts and all, is a much better programming language (as are many others). However, it's a decade or so behind in terms of the tools.
The key point is that the majority of people who use MATLAB are not programmers really, and don't want to be.
It's a lousy choice for a general programming language; it's quirky, slow for many tasks (you need to vectorize things to get efficient codes), and not easy to integrate with the outside world. On the other hand, for the things it is good at, it is very very good. Very few things compare. There's a company with reasonable support and who knows how many man-years put into it. This can matter in industry.
Strictly looking at your Python vs. MATLAB comparison, they are mostly different tools for different jobs. In the areas where they do overlap a bit, it's hard to say what the better route to go is (depends a lot on what you're trying to do). But mostly Python isn't all that good at MATLAB's core strengths, and vice versa.
Most of answers do not get the point.
There is ONE reason matlab is so good and so widely used:
EXTREMELY FAST CODING
I am a computer vision phD student and have been using matlab for 4 years, before my phD I was using different languages including C++, java, php, python... Most of the computer vision researchers are using exclusively matlab.
1) Researchers need fast prototyping
In research environment, we have (hopefully) often new ideas, and we want to test them really quick to see if it's worth keeping on in that direction. And most often only a tiny sub-part of what we code will be useful.
Matlab is often slower at execution time, but we don't care much. Because we don't know in advance what method is going to be successful, we have to try many things, so our bottle neck is programming time, because our code will most often run a few times to get the results to publish, and that's all.
So let's see how matlab can help.
2) Everything I need is already there
Matlab has really a lot of functions that I need, so that I don't have to reinvent them all the time:
change the index of a matrix to 2d coordinate: ind2sub extract all patches of an image: im2col; compute a histogram of an image: hist(Im(:)); find the unique elements in a list unique(list); add a vector to all vectors of a matrix bsxfun(#plus,M,V); convolution on n-dimensional arrays convn(A); calculate the computation time of a sub part of the code: tic; %%code; toc; graphical interface to crop an image: imcrop(im);
The list could be very long...
And they are very easy to find by using the help.
The closest to that is python...But It's just a pain in python, I have to go to google each time to look for the name of the function I need, and then I need to add packages, and the packages are not compatible one with another, the format of the matrix change, the convolution function only handle doubles but does not make an error when I give it char, just give a wrong output... no
3) IDE
An example: I launch a script. It produces an error because of a matrix. I can still execute code with the command line. I visualize it doing: imagesc(matrix). I see that the last line of the matrix is weird. I fix the bug. All variables are still set. I select the remaining of the code, press F9 to execute the selection, and everything goes on. Debuging becomes fast, thanks to that.
Matlab underlines some of my errors before execution. So I can quickly see the problems. It proposes some way to make my code faster.
There is an awesome profiler included in the IDE. KCahcegrind is such a pain to use compared to that.
python's IDEs are awefull. python without ipython is not usable. I never manage to debug, using ipython.
+autocompletion, help for function arguments,...
4) Concise code
To normalize all the columns of a matrix ( which I need all the time), I do:
bsxfun(#times,A,1./sqrt(sum(A.^2)))
To remove from a matrix all colums with small sum:
A(:,sum(A)<e)=[]
To do the computation on the GPU:
gpuX = gpuarray(X);
%%% code normally and everything is done on GPU
To paralize my code:
parfor n=1:100
%%% code normally and everything is multi-threaded
What language can beat that?
And of course, I rarely need to make loops, everything is included in functions, which make the code way easier to read, and no headache with indices. So I can focus, on what I want to program, not how to program it.
5) Plotting tools
Matlab is famous for its plotting tools. They are very helpful.
Python's plotting tools have much less features. But there is one thing super annoying. You can plot figures only once per script??? if I have along script I cannot display stuffs at each step ---> useless.
6) Documentation
Everything is very quick to access, everything is crystal clear, function names are well chosen.
With python, I always need to google stuff, look in forums or stackoverflow.... complete time hog.
PS: Finally, what I hate with matlab: its price
I've been using matlab for many years in my research. It's great for linear algebra and has a large set of well-written toolboxes. The most recent versions are starting to push it into being closer to a general-purpose language (better optimizers, a much better object model, richer scoping rules, etc.).
This past summer, I had a job where I used Python + numpy instead of Matlab. I enjoyed the change of pace. It's a "real" language (and all that entails), and it has some great numeric features like broadcasting arrays. I also really like the ipython environment.
Here are some things that I prefer about Matlab:
consistency: MathWorks has spent a lot of effort making the toolboxes look and work like each other. They haven't done a perfect job, but it's one of the best I've seen for a codebase that's decades old.
documentation: I find it very frustrating to figure out some things in numpy and/or python because the documentation quality is spotty: some things are documented very well, some not at all. It's often most frustrating when I see things that appear to mimic Matlab, but don't quite work the same. Being able to grab the source is invaluable (to be fair, most of the Matlab toolboxes ship with source too)
compactness: for what I do, Matlab's syntax is often more compact (but not always)
momentum: I have too much Matlab code to change now
If I didn't have such a large existing codebase, I'd seriously consider switching to Python + numpy.
Hold everything. When's the last time you programed your calculator to play tetris? Did you actually think you could write anything you want in those 128k of RAM? Likely not. MATLAB is not for programming unless you're dealing with huge matrices. It's the graphing calculator you whip out when you've got Megabytes to Gigabytes of data to crunch and/or plot. Learn just basic stuff, but also don't kill yourself trying to make Python be a graphing calculator.
You'll quickly get a feel for when you want to crunch, plot or explore in MATLAB and when you want to have all that Python offers. Lots of engineers turn to pre and post processing in Python or Perl. Occasionally even just calling out to MATLAB for the hard bits.
They are such completely different tools that you should learn their basic strengths first without trying to replace one with the other. Granted for saving money I'd either use Octave or skimp on ease and learn to work with sparse matrices in Perl or Python.
MATLAB is great for doing array manipulation, doing specialized math functions, and for creating nice plots quick.
I'd probably only use it for large programs if I could use a lot of array/matrix manipulation.
You don't have to worry about the IDE as much as in more formal packages, so it's easier for students without a lot of programming experience to pick up.
MATLAB is a popular and widely adapted piece of a
sophisticated software package. It'd be a mistake to think
it's merely a math software since it has a wide range of
"toolboxes". I recently used Matplotlib to plot some data
from a database and it did the job without needing all the
bells and whistles of MATLAB. However, it may not be proper
to compare Python and MATLAB in every situation. As with
everything else the decision depends on what you need to do.
I used MATLAB in undergrad for control systems design and
simulation and also for image processing in grad school. For
these fields MATLAB makes the most sense because of the
powerful control and image processing toolboxes. As everyone
mentioned, array operations, which are used in every MATLAB
script you'd need to write, are very easy with MATLAB.
Another nice thing about MATLAB is that it's very easy and
fast to do prototyping and trying out ideas using the built
in toolbox functions. For instance, it takes no effort to
import an image and compute it's histogram or do some simple
processing on it. One disadvantage of MATLAB could be it's
speed because of its interpreted nature. However, if one
really needs speed than he can choose to implement the
tested logic in C/C++, etc.
For further comparison with Python, I can say that MATLAB
provides a full package for you to do your work without the
need of looking around for external libraries and
implementing extra functions.
One last point about MATLAB which I see is not mentioned in
the answers here is that it has a very powerful visual
modeling/simulation environment called Simulink. It's
easier to design and simulate larger systems with Simulink.
Finally, again, it all depends on the problem you need to
solve. If your problem domain can make use of one of
MATLAB's toolboxes and you have access to MATLAB then you
can be sure that you'll have the right tool to solve it.
MATLAB, as mentioned by others, is great at matrix manipulation, and was originally built as an extension of the well-known BLAS and LAPACK libraries used for linear algebra. It interfaces well with other languages like Java, and is well favored by engineering and scientific companies for its well developed and documented libraries. From what I know of Python and NumPy, while they share many of the fundamental capabilities of MATLAB, they don't have the full breadth and depth of capabilities with their libraries.
Personally, I use MATLAB because that's what I learned in my internship, that's what I used in grad school, and that's what I used in my first job. I don't have anything against Python (or any other language). It's just what I'm used too.
Also, there is another free version in addition to scilab mentioned by #Jim C from gnu called Octave.
Personally, I tend to think of Matlab as an interactive matrix calculator and plotting tool with a few scripting capabilities, rather than as a full-fledged programming language like Python or C. The reason for its success is that matrix stuff and plotting work out of the box, and you can do a few very specific things in it with virtually no actual programming knowledge. The language is, as you point out, extremely frustrating to use for more general-purpose tasks, such as even the simplest string processing. Its syntax is quirky, and it wasn't created with the abstractions necessary for projects of more than 100 lines or so in mind.
I think the reason why people try to use Matlab as a serious programming language is that most engineers (there are exceptions; my degree is in biomedical engineering and I like programming) are horrible programmers and hate to program. They're taught Matlab in college mostly for the matrix math, and they learn some rudimentary programming as part of learning Matlab, and just assume that Matlab is good enough. I can't think of anyone I know who knows any language besides Matlab, but still uses Matlab for anything other than a few pure number crunching applications.
The most likely reason that it's used so much in universities is that the mathematics faculty are used to it, understand it, and know how to incorporate it into their curriculum.
Between matplotlib+pylab and NumPy I don't think there's much actual difference between Matlab and python other than cultural inertia as suggested by #Adam Bellaire.
I believe you have a very good point and it's one that has been raised in the company where I work. The company is limited in it's ability to apply matlab because of the licensing costs involved. One developer proved that Python was a very suitable replacement but it fell on ignorant ears because to the owners of those ears...
No-one in the company knew Python although many of us wanted to use it.
MatLab has a name, a company, and task force behind it to solve any problems.
There were some (but not a lot) of legacy MatLab projects that would need to be re-written.
If it's worth £10,000 (??) it's gotta be worth it!!
I'm with you here. Python is a very good replacement for MatLab.
I should point out that I've been told the company uses maybe 5% to 10% of MatLabs capabilities and that is the basis for my agreement with the original poster
MATLAB is a fantastic tool for
prototyping
engineering simulation and
fast visualization of data
You can really play with, visualize and test your ideas on a data set very effectively. It should not be regarded as an alternative to other software languages used for product development. I highly recommend it for the above tasks, though it is expensive - free alternatives like Octave and Python are catching up.
Seems to be pure inertia. Where it is in use, everyone is too busy to learn IDL or numpy in sufficient detail to switch, and don't want to rewrite good working programs. Luckily that's not strictly true, but true enough in enough places that Matlab will be around a long time. Like Fortran (in active use where i work!)
The main reason it is useful in industry is the plug-ins built on top of the core functionality. Almost all active Matlab development for the last few years has focused on these.
Unfortunately, you won't have much opportunity to use these in an academic environment.
I know this question is old, and therefore may no longer be
watched, but I felt it was necessary to comment. As an
aerospace engineer at Georgia Tech, I can say, with no
qualms, that MATLAB is awesome. You can have it quickly
interface with your Excel spreadsheets to pull in data about
how high and fast rockets are flying, how the wind affects
those same rockets, and how different engines matter. Beyond
rocketry, similar concepts come into play for cars, trucks,
aircraft, spacecraft, and even athletics. You can pull in
large amounts of data, manipulate all of it, and make sure
your results are as they should be. In the event something is
off, you can add a line break where an error occurs to debug
your program without having to recompile every time you want
to run your program. Is it slower than some other programs?
Well, technically. I'm sure if you want to do the number
crunching it's great for on an NVIDIA graphics processor, it
would probably be faster, but it requires a lot more effort
with harder debugging.
As a general programming language, MATLAB is weak. It's not
meant to work against Python, Java, ActionScript, C/C++ or
any other general purpose language. It's meant for the
engineering and mathematics niche the name implies, and it
does so fantastically.
MATLAB WAS a wrapper around commonly available libraries.
And in many cases it still is. When you get to larger
datasets, it has many additional optimizations, including
examining and special casing common problems (reducing to
sparse matrices where useful, for example), and handling
edge cases. Often, you can submit a problem in a standard
form to a general function, and it will determine the best
underlying algorithm to use based on your data. For small
N, all algorithms are fast, but MATLAB makes determining the
optimal algorithm a non-issue.
This is written by someone who hates MATLAB, and has tried
to replace it due to integration issues. From your
question, you mention getting MATLAB 5 and using it for a
course. At that level, you might want to look at
Octave, an open source implementation with the same
syntax. I'm guessing it is up to MATLAB 5 levels by now (I
only play around with it). That should allow you to "pass
your exam". For bare MATLAB functionality it seems to be
close. It is lacking in the toolbox support (which, again,
mostly serves to reformulate the function calls to forms
familiar to engineers in the field and selects the right
underlying algorithm to use).
One reason MATLAB is popular with universities is the same reason a lot of things are popular with universities: there's a lot of professors familiar with it, and it's fairly robust.
I've spoken to a lot of folks who are especially interested in MATLAB's nascent ability to tap into the GPU instead of working serially. Having used Python in grad school, I kind of wish I had the licks to work with MATLAB in that case. It sure would make vector space calculations a breeze.
It's been some time since I've used Matlab, but from memory it does provide (albeit with extra plugins) the ability to generate source to allow you to realise your algorithm on a DSP.
Since python is a general purpose programming language there is no reason why you couldn't do everything in python that you can do in matlab. However, matlab does provide a number of other tools - eg. a very broad array of dsp features, a broad array of S and Z domain features.
All of these could be hand coded in python (since it's a general purpose language), but if all you're after is the results perhaps spending the money on Matlab is the cheaper option?
These features have also been tuned for performance. eg. The documentation for Numpy specifies that their Fourier transform is optimised for power of 2 point data sets. As I understand Matlab has been written to use the most efficient Fourier transform to suit the size of the data set, not just power of 2.
edit: Oh, and in Matlab you can produce some sensational looking plots very easily, which is important when you're presenting your data. Again, certainly not impossible using other tools.
I think you answered your own question when you noted that Matlab is "cool to work with matrixes and plotting things". Any application that requires a lot of matrix maths and visualisation will probably be easiest to do in Matlab.
That said, Matlab's syntax feels awkward and shows the language's age. In contrast, Python is a much nicer general purpose programming language and, with the right libraries can do much of what Matlab does. However, Matlab is always going to have a more concise syntax than Python for vector and matrix manipulation.
If much of your programming involves these sorts of manipulations, such as in signal processing and some statistical techniques, then Matlab will be a better choice.
First Mover Advantage. Matlab has been around since the late 1970s. Python came along more recently, and the libraries that make it suitable for Matlab type tasks came along even more recently. People are used to Matlab, so they use it.
Matlab is good at doing number crunching. Also Matrix and matrix manipulation. It has many helpful built in libraries(depends on the what version) I think it is easier to use than python if you are going to be calculating equations.

Categories

Resources