In a program, I am using multiple libraries that include boost bindings to the C++ eigen library. (In my case, OMPL & Pinocchio, but I guess the problem is not related to the specific libraries)
Both of them implement a binding for the same C++ class (an STL vector of integers). My problem is, the two bindings differ from each other - in my specific case,
pinocchio.pinocchio_pywrap.StdVec_Int has a tolist() method, while ompl.util._util.vectorInt doesn't.
It seems that, whatever library is imported first, is the deciding factor for which binding is used. This is dangerous because it leads to errors that are hard to trace, especially for externals that are using the code - all of a sudden, methods fail that worked before, just because of a new import.
My preferred solution to this problem would be forcing both libraries to use their own bindings only - but I don't know if that is possible.
Alternatively - is there a way to globally define which C++ bindings to prefer in case two libraries collide?
Example for the "hard-to-explain-bug": Imagine running several unittests in one session - and the one on ompl-related methods runs first. It fails later on when the tolist() method is used on an object which is expected to be of type pinocchio.pinocchio_pywrap.StdVec_Int. You start to debug and run the failing unittest only - and it runs smoothly because ompl was not imported before...
Related
So you've got some legacy code lying around in a fairly hefty project. How can you find and delete dead functions?
I've seen these two references: Find unused code and Tool to find unused functions in php project, but they seem specific to C# and PHP, respectively.
Is there a Python tool that'll help you find functions that aren't referenced anywhere else in the source code (notwithstanding reflection/etc.)?
In Python you can find unused code by using dynamic or static code analyzers. Two examples for dynamic analyzers are coverage and figleaf. They have the drawback that you have to run all possible branches of your code in order to find unused parts, but they also have the advantage that you get very reliable results.
Alternatively, you can use static code analyzers that just look at your code, but don't actually run it. They run much faster, but due to Python's dynamic nature the results may contain false positives.
Two tools in this category are pyflakes and vulture. Pyflakes finds unused imports and unused local variables. Vulture finds all kinds of unused and unreachable code. (Full disclosure: I'm the maintainer of Vulture.)
The tools are available in the Python Package Index https://pypi.org/.
I'm not sure if this is helpful, but you might try using the coverage, figleaf or other similar modules, which record which parts of your source code is used as you actually run your scripts/application.
Because of the fairly strict way python code is presented, would it be that hard to build a list of functions based on a regex looking for def function_name(..) ?
And then search for each name and tot up how many times it features in the code. It wouldn't naturally take comments into account but as long as you're having a look at functions with less than two or three instances...
It's a bit Spartan but it sounds like a nice sleepy-weekend task =)
unless you know that your code uses reflection, as you said, I would go for a trivial grep. Do not underestimate the power of the asterisk in vim as well (performs a search of the word you have under your cursor in the file), albeit this is limited only to the file you are currently editing.
Another solution you could implement is to have a very good testsuite (seldomly happens, unfortunately) and then wrap the routine with a deprecation routine. if you get the deprecation output, it means that the routine was called, so it's still used somewhere. This works even for reflection behavior, but of course you can never be sure if you don't trigger the situation when your routine call is performed.
its not only searching function names, but also all the imported packages not in use.
you need to search the code for all the imported packages (including aliases) and search used functions, then create a list of the specific imports from each package (example instead of import os, replace with from os import listdir, getcwd,......)
A way I usually use to understand a project architecture in C++ is to set a break point in a particular function using GDB, and then using the traceback feature, I can easily understand the function calls between the different classes. I am totally new to Python and I want to ask what are the best Technics used to understand the Python project,
I took a look at traceback but the problem is that it is only tracing the functions inside the same module, so if the caller is in a different module it won't be traced. Besides, the size of the stack is also limited, correct me if I am wrong.
Could you please share the Technics you are using based on your own experience.
I don't know where you got the idea that python's tracebacks would be limited to a single module or limited in size - when an exception occurs, you of course have the full stack trace available.
This being said, Python has a full step debugger in it's stdlib, which lets you inspect and navigate the whole call stack. And there are of course third-part step debuggers in various IDEs and custom shells or environments (ie IPython etc).
NB the inspect module might also be of interest to you - and not only for stack inspection ;-)
I am starting a new Python project that is supposed to run both sequentially and in parallel. However, because the behavior is entirely different, running in parallel would require a completely different set of classes than those used when running sequentially. But there is so much overlap between the two codes that it makes sense to have a unified code and defer the parallel/sequential behavior to a certain group of classes.
Coming from a C++ world, I would let the user set a Parallel or Serial class in the main file and use that as a template parameter to instantiate other classes at runtime. In Python there is no compilation time so I'm looking for the most Pythonic way to accomplish this. Ideally, it would be great that the code determines whether the user is running sequentially or in parallel to select the classes automatically. So if the user runs mpirun -np 4 python __main__.py the code should behave entirely different than when the user calls just python __main__.py. Somehow it makes no sense to me to have if statements to determine the type of an object at runtime, there has to be a much more elegant way to do this. In short, I would like to avoid:
if isintance(a, Parallel):
m = ParallelObject()
elif ifinstance(a, Serial):
m = SerialObject()
I've been reading about this, and it seems I can use factories (which somewhat have this conditional statement buried in the implementation). Yet, using factories for this problem is not an option because I would have to create too many factories.
In fact, it would be great if I can just "mimic" C++'s behavior here and somehow use Parallel/Serial classes to choose classes properly. Is this even possible in Python? If so, what's the most Pythonic way to do this?
Another idea would be to detect whether the user is running in parallel or sequentially and then load the appropriate module (either from a parallel or sequential folder) with the appropriate classes. For instance, I could have the user type in the main script:
from myPackage.parallel import *
or
from myPackage.serial import *
and then have the parallel or serial folders import all shared modules. This would allow me to keep all classes that differentiate parallel/serial behavior with the same names. This seems to be the best option so far, but I'm concerned about what would happen when I'm running py.test because some test files will load parallel modules and some other test files would load the serial modules. Would testing work with this setup?
You may want to check how a similar issue is solved in the stdlib: https://github.com/python/cpython/blob/master/Lib/os.py - it's not a 100% match to your own problem, nor the only possible solution FWIW, but you can safely assume this to be a rather "pythonic" solution.
wrt/ the "automagic" thing depending on execution context, if you decide to go for it, by all means make sure that 1/ both implementations can still be explicitely imported (like os.ntpath and os.posixpath) so they are truly unit-testable, and 2/ the user can still manually force the choice.
EDIT:
So if I understand it correctly, this file you points out imports modules depending on (...)
What it "depends on" is actually mostly irrelevant (in this case it's a builtin name because the target OS is known when the runtime is compiled, but this could be an environment variable, a command line argument, a value in a config file etc). The point was about both conditional import of modules with same API but different implementations while still providing direct explicit access to those modules.
So in a similar way, I could let the user type from myPackage.parallel import * and then in myPackage/init.py I could import all the required modules for the parallel calculation. Is this what you suggest?
Not exactly. I posted this as an example of conditional imports mostly, and eventually as a way to build a "bridge" module that can automagically select the appropriate implementation at runtime (on which basis it does so is up to you).
The point is that the end user should be able to either explicitely select an implementation (by explicitely importing the right submodule - serial or parallel and using it directly) OR - still explicitely - ask the system to select one or the other depending on the context.
So you'd have myPackage.serial and myPackage.parallel (just as they are now), and an additional myPackage.automagic that dynamically selects either serial or parallel. The "recommended" choice would then be to use the "automagic" module so the same code can be run either serial or parallel without the user having to care about it, but with still the ability to force using one or the other where it makes sense.
My fear is that py.test will have modules from parallel and serial while testing different files and create a mess
Why and how would this happen ? Remember that Python has no "process-global" namespace - "globals" are really "module-level" only - and that python's import is absolutely nothing like C/C++ includes.
import loads a module object (can be built directly from python source code, or from compiled C code, or even dynamically created - remember, at runtime a module is an object, instance of the module type) and binds this object (or attributes of this object) into the enclosing scope. Also, modules are garanteed (with a couple caveats, but those are to be considered as error cases) to be imported only once for a given process (and then cached) so importing the same module twice in a same process will yield the same object (IOW a module is a singleton).
All this means that given something like
# module A
def foo():
return bar(42)
def bar(x):
return x * 2
and
# module B
def foo():
return bar(33)
def bar(x):
return x / 2
It's garanteed that however you import from A and B, A.foo will ALWAYS call A.bar and NEVER call B.bar and B.foo will only ever call B.bar (unless you explicitely monkeyptach them of course but that's not the point).
Also, this means that within a module you cannot have access to the importing namespace (the module or function that's importing your module), so you cannot have a module depending on "global" names set by the importer.
To make a long story short, you really need to forget about C++ and learn how Python works, as those are wildly different languages with wildly different object models, execution models and idioms. A couple interesting reads are http://effbot.org/zone/import-confusion.htm and https://nedbatchelder.com/text/names.html
EDIT 2:
(about the 'automagic' module)
I would do that based on whether the user runs mpirun or just python. However, it seems it's not possible (see for instance this or this) in a portable way without a hack. Any ideas in that direction?
I've never ever had anything to do with mpi so I can't help with this - but if the general consensus is that there's no reliable portable way to detect this then obviously there's your answer.
This being said, simple stupid solutions are sometimes overlooked. In your case, explicitly setting an environment variable or passing a command-line switch to your main script would JustWork(tm), ie the user should for example use
SOMEFLAG=serial python main.py
vs
SOMEFLAG=parallel mpirun -np4 python main.py
or
python main.py serial
vs
mpirun -np4 python main.py parallel
(whichever works best for you needs - is the most easily portable).
This of course requires a bit more documentation and some more effort from the end-user but well...
I'm not really what you're asking here. Python classes are just (callable/instantiable) objects themselves, so you can of course select and use them conditionally. If multiple classes within multiple modules are involved, you can also make the imports conditional.
if user_says_parallel:
from myPackage.parallel import ParallelObject
ObjectClass = ParallelObject
else:
from myPackage.serial import SerialObject
ObjectClass = SerialObject
my_abstract_object = ObjectClass()
If that's very useful depends on your classes and the effort it takes to make sure they have the same API so they're compatible when replacing each other. Maybe even inheritance à la ParallelObject => SerialObject is possible, or at least a common (virtual) base class to put all the shared code. But that's just the same as in C++.
i just discovered http://code.google.com/p/re2, a promising library that uses a long-neglected way (Thompson NFA) to implement a regular expression engine that can be orders of magnitudes faster than the available engines of awk, Perl, or Python.
so i downloaded the code and did the usual sudo make install thing. however, that action had seemingly done little more than adding /usr/local/include/re2/re2.h to my system. there seemed to be some *.a file in addition, but then what is it with this *.a extension?
i would like to use re2 from Python (preferrably Python 3.1) and was excited to see files like make_unicode_groups.py in the distro (maybe just used during the build process?). those however were not deployed on my machine.
how can i use re2 from Python?
update two friendly people have pointed out that i could try to build DLLs / *.so files from the sources and then use Python’s ctypes library to access those. can anyone give useful pointers how to do just that? i’m pretty much clueless here, especially with the first part (building the *.so files).
update i have also posted this question (earlier) to the re2 developers’ group, without reply till now (it is a small group), and today to the (somewhat more populous) comp.lang.py group [—thread here—]. the hope is that people from various corners can contact each other. my guess is a skilled person can do this in a few hours during their 20% your-free-time-belongs-google-too timeslice; it would tie me for weeks. is there a tool to automatically dumb-down C++ to whatever flavor of C that Python needs to be able to connect? then maybe getting a viable result can be reduced to clever tool chaining.
(rant)why is this so difficult? to think that in 2010 we still cannot have our abundant pieces of software just talk to each other. this is such a roadblock that whenever you want to address some C code from Python you must always cruft these linking bits. this requires a lot of work, but only delivers an extension module that is specific to the version of the C code and the version of Python, so it ages fast.(/rant) would it be possible to run such things in separate processes (say if i had an re2 executable that can produce results for data that comes in on, say, subprocess/Popen/communicate())? (this should not be a pure command-line tool that necessitates the opening of a process each time it is needed, but a single processs that runs continuously; maybe there exist wrappers that sort of ‘demonize’ such C code).
David Reiss has put together a Python wrapper for re2. It doesn't have all of the functionality of Python's re module, but it's a start. It's available here: http://github.com/facebook/pyre2.
Possible yes, easy no. Looking at the re2.h, this is a C++ library exposed as a class. There are two ways you could use it from Python.
1.) As Tuomas says, compile it as a DLL/so and use ctypes. In order to use it from python, though, you would need to wrap the object init and methods into c style externed functions. I've done this in the past with ctypes by externing functions that pass a pointer to the object around. The "init" function returns a void pointer to the object that gets passed on each subsequent method call. Very messy indeed.
2.) Wrap it into a true python module. Again those functions exposed to python would need to be extern "C". One option is to use Boost.Python, that would ease this work.
SWIG handles C++ (unlike ctypes), so it may be more straightforward to use it.
You could try to build re2 into its own DLL/so and use ctypes to call functions from that DLL/so. You will probably need to define your own entry points in the DLL/so.
You can use the python package https://pypi.org/project/google-re2/. Although look at the bottom, there are a few requirements to install yourself before installing the python package.
So you've got some legacy code lying around in a fairly hefty project. How can you find and delete dead functions?
I've seen these two references: Find unused code and Tool to find unused functions in php project, but they seem specific to C# and PHP, respectively.
Is there a Python tool that'll help you find functions that aren't referenced anywhere else in the source code (notwithstanding reflection/etc.)?
In Python you can find unused code by using dynamic or static code analyzers. Two examples for dynamic analyzers are coverage and figleaf. They have the drawback that you have to run all possible branches of your code in order to find unused parts, but they also have the advantage that you get very reliable results.
Alternatively, you can use static code analyzers that just look at your code, but don't actually run it. They run much faster, but due to Python's dynamic nature the results may contain false positives.
Two tools in this category are pyflakes and vulture. Pyflakes finds unused imports and unused local variables. Vulture finds all kinds of unused and unreachable code. (Full disclosure: I'm the maintainer of Vulture.)
The tools are available in the Python Package Index https://pypi.org/.
I'm not sure if this is helpful, but you might try using the coverage, figleaf or other similar modules, which record which parts of your source code is used as you actually run your scripts/application.
Because of the fairly strict way python code is presented, would it be that hard to build a list of functions based on a regex looking for def function_name(..) ?
And then search for each name and tot up how many times it features in the code. It wouldn't naturally take comments into account but as long as you're having a look at functions with less than two or three instances...
It's a bit Spartan but it sounds like a nice sleepy-weekend task =)
unless you know that your code uses reflection, as you said, I would go for a trivial grep. Do not underestimate the power of the asterisk in vim as well (performs a search of the word you have under your cursor in the file), albeit this is limited only to the file you are currently editing.
Another solution you could implement is to have a very good testsuite (seldomly happens, unfortunately) and then wrap the routine with a deprecation routine. if you get the deprecation output, it means that the routine was called, so it's still used somewhere. This works even for reflection behavior, but of course you can never be sure if you don't trigger the situation when your routine call is performed.
its not only searching function names, but also all the imported packages not in use.
you need to search the code for all the imported packages (including aliases) and search used functions, then create a list of the specific imports from each package (example instead of import os, replace with from os import listdir, getcwd,......)