Mapping module imports in Python for easy refactoring - python

I have a bunch of Python modules I want to clean up, reorganize and refactor (there's some duplicate code, some unused code ...), and I'm wondering if there's a tool to make a map of which module uses which other module.
Ideally, I'd like a map like this:
main.py
-> task_runner.py
-> task_utils.py
-> deserialization.py
-> file_utils.py
-> server.py
-> (deserialization.py)
-> db_access.py
checkup_script.py
re_test.py
main_bkp0.py
unit_tests.py
... so that I could tell which files I can start moving around first (file_utils.py, db_access.py), which files are not used by my main.py and so could be deleted, etc. (I'm actually working with around 60 modules)
Writing a script that does this probably wouldn't be very complicated (though there are different syntaxes for import to handle), but I'd also expect that I'm not the first one to want to do this (and if someone made a tool for this, it might include other neat features such as telling me which classes and functions are probably not used).
Do you know of any tools (even simple scripts) that assist code reorganization?
Do you know of a more exact term for what I'm trying to do? Code reorganization?

Python's modulefinder does this. It is quite easy to write a script that will turn this information into an import graph (which you can render with e.g. graphviz): here's a clear explanation. There's also snakefood which does all the work for you (and using ASTs, too!)
You might want to look into pylint or pychecker for more general maintenance tasks.

Writing a script that does this probably wouldn't be very complicated (though there are different syntaxes for import to handle),
It's trivial. There's import and from module import. Two syntax to handle.
Do you know of a more exact term for what I'm trying to do? Code reorganization?
Design. It's called design. Yes, you're refactoring an existing design, but...
Rule One
Don't start a design effort with what you have. If you do, you'll only "nibble around the edges" making small and sometimes inconsequential changes.
Rule Two
Start a design effort with what you should have had if you'd only been smarter. Think broadly and clearly about what you're really supposed to be doing. Ignore what you did.
Rule Three
Design from the ground up (or de novo as some folks say) with the correct package and module architecture.
Create a separate project for this.
Rule Four
Test First. Write unit tests for your new architecture. If you have existing unit tests, copy them into the new project. Modify the imports to reflect the new architecture and rewrite the tests to express your glorious new simplification.
All the tests fail, because you haven't moved any code. That's a good thing.
Rule Five
Move code into the new structure last. Stop moving code when the tests pass.
You don't need to analyze imports to do this, BTW. You're just using grep to find modules and classes. The old imports and the tangled relationships among the old imports doesn't matter, and doesn't need to be analyzed. You're throwing it away. You don't need tools smarter than grep.
If feel an urge to move code, you must be very disciplined. (1) you must have test(s) which fail and then (2) you can move some code to pass the failing test(s).

chuckmove is a tool that lets you recursively rewrite imports in your entire source tree to refer to a new location of a module.
chuckmove --old sound.utils --new media.sound.utils src
...this descends into src, and rewrites statements that import sound.utils to import media.sound.utils instead. It supports the whole range of Python import formats. I.e. from x import y, import x.y.z as w etc.

Modulefinder may not work with Python 3.5*, but pydeps worked very well:
Installation:
sudo apt install python-pygraphviz
pip install pydeps
Then, in the directory where you want to map from,
pydeps --max-bacon=0 .
..to create a map of maximum depth.
*An issue in Python 3.5 but not 3.6 caused the problems with modulefinder, similar to this

Related

How To Remove Unused Python Function Automatically [duplicate]

So you've got some legacy code lying around in a fairly hefty project. How can you find and delete dead functions?
I've seen these two references: Find unused code and Tool to find unused functions in php project, but they seem specific to C# and PHP, respectively.
Is there a Python tool that'll help you find functions that aren't referenced anywhere else in the source code (notwithstanding reflection/etc.)?
In Python you can find unused code by using dynamic or static code analyzers. Two examples for dynamic analyzers are coverage and figleaf. They have the drawback that you have to run all possible branches of your code in order to find unused parts, but they also have the advantage that you get very reliable results.
Alternatively, you can use static code analyzers that just look at your code, but don't actually run it. They run much faster, but due to Python's dynamic nature the results may contain false positives.
Two tools in this category are pyflakes and vulture. Pyflakes finds unused imports and unused local variables. Vulture finds all kinds of unused and unreachable code. (Full disclosure: I'm the maintainer of Vulture.)
The tools are available in the Python Package Index https://pypi.org/.
I'm not sure if this is helpful, but you might try using the coverage, figleaf or other similar modules, which record which parts of your source code is used as you actually run your scripts/application.
Because of the fairly strict way python code is presented, would it be that hard to build a list of functions based on a regex looking for def function_name(..) ?
And then search for each name and tot up how many times it features in the code. It wouldn't naturally take comments into account but as long as you're having a look at functions with less than two or three instances...
It's a bit Spartan but it sounds like a nice sleepy-weekend task =)
unless you know that your code uses reflection, as you said, I would go for a trivial grep. Do not underestimate the power of the asterisk in vim as well (performs a search of the word you have under your cursor in the file), albeit this is limited only to the file you are currently editing.
Another solution you could implement is to have a very good testsuite (seldomly happens, unfortunately) and then wrap the routine with a deprecation routine. if you get the deprecation output, it means that the routine was called, so it's still used somewhere. This works even for reflection behavior, but of course you can never be sure if you don't trigger the situation when your routine call is performed.
its not only searching function names, but also all the imported packages not in use.
you need to search the code for all the imported packages (including aliases) and search used functions, then create a list of the specific imports from each package (example instead of import os, replace with from os import listdir, getcwd,......)

Structure of a python project that is not a package

If you search over the internet about python project structures, you will find some articles about python package structure. Based on it, What I want to know is if there is any kind of instructions for creating structure for python projects that isn't packages, that is, projects that the code is the end code it self?
For example, I created a package that handles some requests of some specific endpoints. This package will serve the main code that will handle the data fetched by this package. The main code is not a package, that is, it don't have classes and __init__ files, because in this software layer, there will be no necessity of code reuse. Instead, the main code relate straight to the end it self.
Is there any instructions for it?
It would be good to see the structure itself instead of reading the description of it - it can help visualize the problem and answer properly to your case 😉
projects that isn't packages, that is, projects that the code is the end code it self
In general, I would say you should always structure your code! And by telling that, I mean exactly the work with the modules/packages. It is needed mostly to sperate the responsibilities and to introduce things that can be reused. It also gives the possibility to find things easier/faster instead of going through the unstructured tones of the code.
Of course, as I said, it is a general thought and as far as you are experienced you can experiment with the structure to find the best one for the project which you are working on. But without any structure, you won't survive in a bigger project (or the life will be harder than you want).

Determine usage/creation of object and data member into another module

In legacy system, We have created init module which load information and used by various module(import statement). It's big module which consume more memory and process longer time and some of information is not needed or not used till now. There is two propose solution.
Can we determine in Python who is using this module.Fox Example
LoadData.py ( init Module)
contain 100 data member
A.py
import LoadData
b = LoadData.name
B.py
import LoadData
b = LoadData.width
In above example, A.py using name and B.py is using width and rest information is not required (98 data member is not required).
is there anyway which help us to determine usage of LoadData module along with usage of data member.
In simple, we need to traverse A.py and B.py and find manually to identify usage of object.
I am trying to implement first solution as I have more than 1000 module and it will be painful to determine by traversing each module. I am open to any tool which can integrate into python
Your question is quite broad, so I can't give you an exact answer. However, what I would generally do here is to run a linter like flake8 over the whole codebase to show you where you have unused imports and if you have references in your files to things that you haven't imported. It won't tell you if a whole file is never imported by anything, but if you remove all unused imports, you can then search your codebase for imports of a particular module and if none are found, you can (relatively) safely delete that module.
You can integrate tools like flake8 with most good text editors, so that they highlight mistakes in real time.
As you're trying to work with legacy code, you'll more than likely have many errors when you run the tool, as it looks out for style issues as well as the kinds of import/usage issues that youre mention. I would recommend fixing these as a matter of principle (as they they are non-functional in nature), and then making sure that you run flake8 as part of your continuous integration to avoid regressions. You can, however, disable particular warnings with command-line arguments, which might help you stage things.
Another thing you can start to do, though it will take a little longer to yield results, is write and run unit tests with code coverage switched on, so you can see areas of your codebase that are never executed. With a large and legacy project, however, this might be tough going! It will, however, help you gain better insight into the attribute usage you mention in point 1. Because Python is very dynamic, static analysis can only go so far in giving you information about atttribute usage.
Also, make sure you are using a version control tool (such as git) so that you can track any changes and revert them if you go wrong.

Is importing a file good in Python

I have around 80 lines of a function in a file. I need the same functionality in another file so I am currently importing the other file for the function.
My question is that in terms of running time on a machine which technique would be better :- importing the complete file and running the function or copying the function as it is and run it from same package.
I know it won't matter in a large sense but I want to learn it in the sense that if we are making a large project is it better to import a complete file in Python or just add the function in the current namespace.....
Importing is how you're supposed to do it. That's why it's possible. Performance is a complicated question, but in general it really doesn't matter. People who really, really need performance, and can't be satisfied by just fixing the basic algorithm, are not using Python in the first place. :) (At least not for the tiny part of the project where the performance really matters. ;) )
Importing is good cause it helps you manage stuff easily. What if you needed the same function again? Instead of making changes at multiple places, there is just one centralized location - your module.
In case the function is small and you won't need it anywhere else, put it in the file itself.
If it is complex and would require to be used again, separate it and put it inside a module.
Performance should not be your concern here. It should hardly matter. And even if it does, ask yourself - does it matter to you?
Copy/Paste cannot be better. Importing affects load-time performance, not run-time (if you import it at the top-level).
The whole point of importing is to allow code reuse and organization.
Remember too that you can do either
import MyModule
to get the whole file or
from MyModule import MyFunction
for when you only need to reference that one part of the module.
If the two modules are unrelated except for that common function, you may wish to consider extracting that function (and maybe other things that are related to that function) into a third module.

How can you find unused functions in Python code?

So you've got some legacy code lying around in a fairly hefty project. How can you find and delete dead functions?
I've seen these two references: Find unused code and Tool to find unused functions in php project, but they seem specific to C# and PHP, respectively.
Is there a Python tool that'll help you find functions that aren't referenced anywhere else in the source code (notwithstanding reflection/etc.)?
In Python you can find unused code by using dynamic or static code analyzers. Two examples for dynamic analyzers are coverage and figleaf. They have the drawback that you have to run all possible branches of your code in order to find unused parts, but they also have the advantage that you get very reliable results.
Alternatively, you can use static code analyzers that just look at your code, but don't actually run it. They run much faster, but due to Python's dynamic nature the results may contain false positives.
Two tools in this category are pyflakes and vulture. Pyflakes finds unused imports and unused local variables. Vulture finds all kinds of unused and unreachable code. (Full disclosure: I'm the maintainer of Vulture.)
The tools are available in the Python Package Index https://pypi.org/.
I'm not sure if this is helpful, but you might try using the coverage, figleaf or other similar modules, which record which parts of your source code is used as you actually run your scripts/application.
Because of the fairly strict way python code is presented, would it be that hard to build a list of functions based on a regex looking for def function_name(..) ?
And then search for each name and tot up how many times it features in the code. It wouldn't naturally take comments into account but as long as you're having a look at functions with less than two or three instances...
It's a bit Spartan but it sounds like a nice sleepy-weekend task =)
unless you know that your code uses reflection, as you said, I would go for a trivial grep. Do not underestimate the power of the asterisk in vim as well (performs a search of the word you have under your cursor in the file), albeit this is limited only to the file you are currently editing.
Another solution you could implement is to have a very good testsuite (seldomly happens, unfortunately) and then wrap the routine with a deprecation routine. if you get the deprecation output, it means that the routine was called, so it's still used somewhere. This works even for reflection behavior, but of course you can never be sure if you don't trigger the situation when your routine call is performed.
its not only searching function names, but also all the imported packages not in use.
you need to search the code for all the imported packages (including aliases) and search used functions, then create a list of the specific imports from each package (example instead of import os, replace with from os import listdir, getcwd,......)

Categories

Resources