I'm using a function from a python package that changes class attributes of multiple classes. Therefore, when I run the function once it's fine, but when I run it twice I run into problems. As a work-around I decided to reload the python packages, but I don't know how to do it in a way that all changes to the package or package's classes attributes are restored to the initial values.
(I know this might look like duplication, but I really read multiple posts on stackoverflow and can't find a solution that works for me.)
This is a toy example, that uses widely known library, but hopefully it explains well what is the issue:
import pandas
let's say something is changing or adding an attribute
pandas.tmp = 1
from various stackover questions I understood that I should use importlib.reload (using py3.7):
import importlib as imp
imp.reload(pandas)
but pandas still has tmp and this still returns 1
pandas.tmp
I've also tried to use del with similar result
del pandas
import pandas
# still returns 1
pandas.tmp
I've even tried to use del sys.modules["pandas"] but it leads to various problems depending on the package (I don't understand them, so won't even try to explain) and it does not solve my problems anyway.
Related
First a small example to illustrate what I mean. Suppose I have a class MyClass in a subpackage in somepackage. Suppose I can import in these two ways, because the sys path is set accordingly:
from somepackage.subpackage import MyClass as mc1
from subpackage import MyClass as mc2
then:
mc1() == mc2()
will return False. Because apparently Python thinks these are two objects of different classes as far as I understand it.
Is my reasoning correct here? How to deal with such sitatuations? This seems like an easy way to break code.
To make this a little bit easier, I give a more concrete example for this issue. The above class is actually part of a library. This library seems to relies on the fact that somepackage is on the syspath and directly imports from subpackackes like in the second line above. For this reason I need to modify the syspath in my application as well to include somepackage, so it will find the imports in the library when using it.
I think it's already bad enough that I have to modify syspath at all, but at least I didn't want to rely on this modification in my own code that I can control and import the library with the full name ("the regular way"). I then ran exactly into the issue described above.
I have from somepackage.subpackage import MyClass as mc1 and create an object to the library code. The library compares the object with a second one it created after importing the class with from subpackage import MyClass as mc2. And then the code failed.
Do I just have to accept that the library relies on the syspath modification and have to import all libraries the same way? This is what I currently do, but it feels like really bad. Is there any better way?
How to detect such issues in general? I had "luck" that this lead in my use case to an Exception, so I found the problem very quickly. But in generally these sort of bugs seem extremely dangerous to me.
Small Bonus Question: Is there some guideline or something similar, which says that libraries shouldn't import stuff like this? Something I could just send to the author to convince them that it would be better to not do it like this. Because at least to me it seems like bad style.
I'm using this decorator to manage __all__ in a DRY manner:
def export(obj):
mod = sys.modules[obj.__module__]
if hasattr(mod, '__all__'):
mod.__all__.append(obj.__name__)
else:
mod.__all__ = [obj.__name__]
return obj
For names imported with import * PyCharm issues an unresolved reference error, which is understandable, since it doesn't run the code before analysis. But it is an obvious inconvenience.
How would you solve it (or maybe already solved)?
My assumptions:
Adding some automatic linter plugin or altering existing PyCharm's inspection code would be fine.
Something that's actually editing a .py source is viable, but not fine.
This method is probably not the best one, therefore suggesting another convenient technique of dealing with exports is fine too.
You may be interested in an alternative approach to managing __all__:
https://pypi.org/project/auto-all/
This provides a start_all() and end_all() function to place in your module around the items you want to make accessible. This approach works with PyCharms code inspection.
from auto_all import start_all, end_all
# Imports outside the start and end function calls are not included in __all__.
from pathlib import Path
def a_private_function():
print("This is a private function.")
# Start defining externally accessible objects.
start_all(globals())
def a_public_function():
print("This is a public function.")
# Stop defining externally accessible objects.
end_all(globals())
I feel like this is a reasonable approach to managing __all__, and one that I used on more complex packages. The source code for the package is small, so could easily be included direct in your code to avoid external dependencies if you need.
The reason I use this is I have some modules where lots of items need to be "exported" and a I want to keep imported items out of the export list. I have multiple developers working on the code and it's easy to add new items and forget to include them in __all__, so automating this helps.
In legacy system, We have created init module which load information and used by various module(import statement). It's big module which consume more memory and process longer time and some of information is not needed or not used till now. There is two propose solution.
Can we determine in Python who is using this module.Fox Example
LoadData.py ( init Module)
contain 100 data member
A.py
import LoadData
b = LoadData.name
B.py
import LoadData
b = LoadData.width
In above example, A.py using name and B.py is using width and rest information is not required (98 data member is not required).
is there anyway which help us to determine usage of LoadData module along with usage of data member.
In simple, we need to traverse A.py and B.py and find manually to identify usage of object.
I am trying to implement first solution as I have more than 1000 module and it will be painful to determine by traversing each module. I am open to any tool which can integrate into python
Your question is quite broad, so I can't give you an exact answer. However, what I would generally do here is to run a linter like flake8 over the whole codebase to show you where you have unused imports and if you have references in your files to things that you haven't imported. It won't tell you if a whole file is never imported by anything, but if you remove all unused imports, you can then search your codebase for imports of a particular module and if none are found, you can (relatively) safely delete that module.
You can integrate tools like flake8 with most good text editors, so that they highlight mistakes in real time.
As you're trying to work with legacy code, you'll more than likely have many errors when you run the tool, as it looks out for style issues as well as the kinds of import/usage issues that youre mention. I would recommend fixing these as a matter of principle (as they they are non-functional in nature), and then making sure that you run flake8 as part of your continuous integration to avoid regressions. You can, however, disable particular warnings with command-line arguments, which might help you stage things.
Another thing you can start to do, though it will take a little longer to yield results, is write and run unit tests with code coverage switched on, so you can see areas of your codebase that are never executed. With a large and legacy project, however, this might be tough going! It will, however, help you gain better insight into the attribute usage you mention in point 1. Because Python is very dynamic, static analysis can only go so far in giving you information about atttribute usage.
Also, make sure you are using a version control tool (such as git) so that you can track any changes and revert them if you go wrong.
I have this core python module we use in our facility called mfxLib. I need to be able to keep different version of this module without breaking all the other modules/plugin that are importing this module.
My solution was keep a duplicate of my module by renaming them mfxLib01 and mfxLib02 then
to replace the original mfxLib module with an empty module containing only a __init__.py file that import the latest version.
# content of mfxLib.__init__.py
from mfxLib02 import *
This seems logical and seems to work but I was wondering if there was a common practice for doing this? guidelines to follow? etc
Thanks
You can import a module as another name. Commonly people use this to save typing in a long module name, for example:
import numpy as np
np.array([1,2,3,4])
Hence you could do:
import mfxLib01 as mfxLib
or
import mfxLib02 as mfxLib
then your code uses mfxLib everywhere.
That might help...
If you have different scripts requiring different versions, your current approach should be the the best, but I'd suggest using a version control system like Git or SVN. That would allow you to commit and revert to earlier versions easily, as well as share the module with other users.
Version control will almost certainly make your life easier. In addition to Petterson's recommendations, consider Mercurial. Like git and SVN, it's free. It's also written in Python and should run without difficulty on any of your systems.
Spacedman's recommendations are also useful, especially if the differences between the versions represent customizations for particular systems and the customizations are relatively stable. Note that you can use that approach in combination with a version control system.
Finally, it's always worthwhile to make a strong effort to write your module so that it can work without modification everywhere. Often, you can accomplish this by adding some optional arguments to a few key functions to handle the different requirements. Python is really convenient in that regard because keyword arguments at the end of the arg list are always optional, so you can easily arrange to provide the existing behavior by giving them suitable default values.
def foo(oldarg1, oldarg2, newarg1=None):
if newarg1 != None:
## behave differently
else:
## behave as usual
I have a bunch of Python modules I want to clean up, reorganize and refactor (there's some duplicate code, some unused code ...), and I'm wondering if there's a tool to make a map of which module uses which other module.
Ideally, I'd like a map like this:
main.py
-> task_runner.py
-> task_utils.py
-> deserialization.py
-> file_utils.py
-> server.py
-> (deserialization.py)
-> db_access.py
checkup_script.py
re_test.py
main_bkp0.py
unit_tests.py
... so that I could tell which files I can start moving around first (file_utils.py, db_access.py), which files are not used by my main.py and so could be deleted, etc. (I'm actually working with around 60 modules)
Writing a script that does this probably wouldn't be very complicated (though there are different syntaxes for import to handle), but I'd also expect that I'm not the first one to want to do this (and if someone made a tool for this, it might include other neat features such as telling me which classes and functions are probably not used).
Do you know of any tools (even simple scripts) that assist code reorganization?
Do you know of a more exact term for what I'm trying to do? Code reorganization?
Python's modulefinder does this. It is quite easy to write a script that will turn this information into an import graph (which you can render with e.g. graphviz): here's a clear explanation. There's also snakefood which does all the work for you (and using ASTs, too!)
You might want to look into pylint or pychecker for more general maintenance tasks.
Writing a script that does this probably wouldn't be very complicated (though there are different syntaxes for import to handle),
It's trivial. There's import and from module import. Two syntax to handle.
Do you know of a more exact term for what I'm trying to do? Code reorganization?
Design. It's called design. Yes, you're refactoring an existing design, but...
Rule One
Don't start a design effort with what you have. If you do, you'll only "nibble around the edges" making small and sometimes inconsequential changes.
Rule Two
Start a design effort with what you should have had if you'd only been smarter. Think broadly and clearly about what you're really supposed to be doing. Ignore what you did.
Rule Three
Design from the ground up (or de novo as some folks say) with the correct package and module architecture.
Create a separate project for this.
Rule Four
Test First. Write unit tests for your new architecture. If you have existing unit tests, copy them into the new project. Modify the imports to reflect the new architecture and rewrite the tests to express your glorious new simplification.
All the tests fail, because you haven't moved any code. That's a good thing.
Rule Five
Move code into the new structure last. Stop moving code when the tests pass.
You don't need to analyze imports to do this, BTW. You're just using grep to find modules and classes. The old imports and the tangled relationships among the old imports doesn't matter, and doesn't need to be analyzed. You're throwing it away. You don't need tools smarter than grep.
If feel an urge to move code, you must be very disciplined. (1) you must have test(s) which fail and then (2) you can move some code to pass the failing test(s).
chuckmove is a tool that lets you recursively rewrite imports in your entire source tree to refer to a new location of a module.
chuckmove --old sound.utils --new media.sound.utils src
...this descends into src, and rewrites statements that import sound.utils to import media.sound.utils instead. It supports the whole range of Python import formats. I.e. from x import y, import x.y.z as w etc.
Modulefinder may not work with Python 3.5*, but pydeps worked very well:
Installation:
sudo apt install python-pygraphviz
pip install pydeps
Then, in the directory where you want to map from,
pydeps --max-bacon=0 .
..to create a map of maximum depth.
*An issue in Python 3.5 but not 3.6 caused the problems with modulefinder, similar to this