Utilizing the Dependency-Graph of pip

Utilizing the Dependency-Graph of pip - python

I want to write a visualization of the Dependency-Graph of all python-packages installed with pip. My problem is that the code is poorly documented, and im unable to find where the Graph is stored in the source Code.
I hope someone has enough knowledge about pip-sourcecode to help me out.
Also im new to python and am not sure if i should just make my adjustments in the existing source-code, or write a module for it, although im leaning more towards the latter.
// edit: I can get all installed modules via pip freeze, but that givbes me only one list without the dependencies. So i have to find a way to extract the dependencies from that list.

Yes, its code is quite unreadable if you're not used to it. I don't recall something like that and I would not use it. You may find yourself better suited with distlib, which has a module just for that: https://distlib.readthedocs.org/en/latest/depgraph.html

Heres what i found during my search:
Pip doesn't use a Dependency-graph at all internally. (As of version 1.3.X)
So one solution is to do the following:
You can install setuptools, if you havent allready. It brings a module named pkg_resources.
This module has all the tools, to see all installed modules (not only the ones installed with pip) in your desired dists-directory. You can then read out the metadata (including requirements/dependencies) with methods that are as well included in pkg_resources.

Related

if I make a function that calls a library like numpy, and someone else use it, they will need to have numpy installed?

it is very much the title. if I have a code that use a non built-in library in my repository in github and someone copy it, this person will have to have that library installed, right?

Short answer, Yes.
Long Answer, Yes, but actually you do the following in order to make the script executable on other systems.
Add a requirements.txt file, which specifies the libraries used and needed to be installed. Usually, this is used in a virtual environment. This makes sure that the packages/libraries used will not get mixed up with the main python installation
This is a rough solution, and I would use it in very extreme scenarios. (I used it when I had to run a python code on AWS Lambda where the library I used was compiled in C beforehand.) You can directly copy the Library folder in your code and use it. Mind you, this will increase the code size and is Absolutely not recommended to be done.

How do I determine which requirements are actually needed in setup.py?

I'm cleaning up packaging for a python project I didn't create. Currently, it does some explicitly unsupported magic to get its dependencies from a requirements.txt file. The file looks like it may have been generated by pip freeze; there are fixed versions for everything, and many apparently-extraneous packages listed. I am pretty sure some of these aren't real dependencies, but I don't know which ones.
Given just the source tree, how would I figure out, from scratch, what dependencies ought to be included in install_requires?
As a first stab, I'm grepping for non-stdlib import statements. I hope there's a better way.

There's no way to do this perfectly, because Python is too flexible.
But it's usually possible to do it well enough.
You can use start with the stdlib's modulefinder.
Beyond that, a number of projects—mostly projects designed for building binary executables, installers, etc. for Python apps—have come up with heuristics that go even farther.
These usually work. And, when they fail, you usually immediately spot it on your first test. Even if they aren't sufficient, they're at the very least good sample code. Here are a few off the top of my head:
cx_Freeze
py2exe
py2app
pyInstaller
In case you're wondering why it's impossible:
Even forgetting about the program of dependencies in C extension modules, Python is just too flexible to catch all the ways you could import a module via static analysis.
Sure, you'd have to be dealing with code written by someone crazy enough to use explicitly unsupported magic for no good reason… but if you were, there's nothing to stop someone from writing this instead of import lxml:1
with open('picture.jpg', encoding='cp500') as f:
getattr(sys.modules[11], codecs.encode('vzcbeg_zbqhyr', 'rot13'))(f.read().strip())
In reality, things aren't going to be that bad. But they could easily be too bad for rg import to be sufficient.
You could try to detect all the imports dynamically with a simple import hook, but that's only guaranteed to work if you can exercise 100% of the code paths.
1. Of course this only works if importlib was the 12th module loaded, and if picture.jpg is not a JPEG image but a textfile whose contents are, in EBCDIC, lxml\n

I've had great results with pipreqs that will automatically generate a requirements.txt file from your source code.
pipreqs /home/project/location
Successfully saved requirements file in /home/project/location/requirements.txt

I wrote a tool, realreq, specifically for this issue.
You can install it using pip python3 -m pip install realreq. Using it is easy as:
realreq -s /path/to/your/source
It will then gather your dependencies actually used in your source code.

I mean, the most effective way would honestly be to go through the code line by line and determine what packages may not be needed, what packages need updates, etc. I know Python 2 and 3 both have ModuleFinder which finds all the modules a script needs to successfully compile and run, but I've never used it before, so not sure how effective it is, especially for what you're doing. However, if you're interested, I'll attach the link below.
https://docs.python.org/3/library/modulefinder.html

import openpyxl in django

I am quite new to Python and Django. I have a problem with integrating a python package (openpyxl) to my django app. I'd like to use the methods of these files into my views.py file.
My problem is first that I don't know where's the best place to put the openpyxl folder containing all the files in my file hierarchy.
My hierarchy looks like this:
http://imgur.com/t4iOX98
Is it well placed? Should I put it outside the international folder? inside the carte_interactive folder?
And my biggest problem is inside the __init__.py of openpyxl. I get errors lines like this one:
from openpyxl.xml import LXML
Where there is no resolved reference to LXML, but is actually defined in the xml file of openpyxl.
Is it my bad file placement that caused this? or is it Django?, or is it openpyxl's fault? Do anyone have an idea?
You can see openpyxl's source files here, where I downloaded them:
https://bitbucket.org/openpyxl/openpyxl/src
If you need any more details, please ask!
Thanks in advance!

I applaud your enthusiasm for wanting to learn Django while being new to Python. That said, the way you have things set up right now will make your life unnecessarily difficult to manage.
I would first recommend reading up on best practices for setting up a Django project. Just doing a quick google search for "Django project layout best practices" will give you a lot of resources, but they'll all essentially tell you to do what's in the SO answer above.
The second very basic thing is using pip to install and use other python packages. This is especially important for a django project, where you often have a lot of dependencies outside of Django. Pip is a program to install additionaly python packages. They get installed in your PYTHONPATH, which is just a list of filepaths on disk where python will look for additional packages. If you're on a *NIX system, this is usually in something like /usr/lib/python2.7/. Once you have something in your python path, you can from any piece of code, use other libraries you've installed via the python import system. Essentially, all this more or less does is look through each location in your PYTHONPATHs for the library you're trying to import.
Finally, in regards specifically to lxml, you will want to install it via apt or some other package installer. (e.g. on ubuntu, apt install python-lxml
In order to keep track of all your external python-dependencies, stuff them in a file named "requirements.txt" in the top level directory. This is a pretty standard thing to do for Django projects, so don't worry about shipping code with ALL dependencies inside the project.

Thanks to all of you! I'm using Jetbrains Pycharm and when I wrote import openpyxl, it gave me the choice to install the package. I suppose it does it with pip, which would certainly have worked the same. And I put the package in requirements.txt, so that other users would only have to install this requirement!
It works now! And thanks for the link on the best practices. I'll read that!

How do I use script provided by Anaconda without using the given python?

It seems that when I install Anaconda, I can't neither normal python or the python provided with Anaconda, even though Anaconda already in path.
I do realize that Anaconda also come with a python, but it come both with 2.7 and 3.2, but it's kinda scary due to the path conflict that I had earlier. It may ended like this, fortunately it goes normal when I uninstalled it
After uninstalling, my plan is to only take the installed library then uninstall Anaconda:
Reinstall Anaconda
Copy the library (scipy,numpy,etc)
Paste it to normal Python2
Uninstall Anaconda and its family
Hapilly ever after
But this seems not foolproof, is there any better way?
Note: As I mentioned, I know Anaconda have python avaliable too, so my other alternative is to uninstall normal python and just use Anaconda. But again, when I saw they provided two version, I decided to take the way mentioned above.

Python is getting more complex and installing libraries in a way that they work is becoming more brittle. You can install pip which will try to download the source code for libraries and compile them for your OS (which might or might not need a C compiler locally installed and working).
Anaconda tries to solve this hazzle by providing a set of working, well maintained libraries which you can install easily using the conda tool. When I installed the product last time, it didn't try to install both Python 2 and 3 - you have to select either. It also asks whether it should add itself to your path; you can say "no".
But you have to chose between: "I know exactly what I'm doing" - then you're on your own. Or you can say "I don't know enough" and trust some unknown expert to get it right most of the time.
You copy&paste approach might work since I haven't seen a Python library where absolute paths were compiled in. On the other hand, some of those libraries have hundreds of thousands of lines of code. It's hard to say which one of them will break when you start moving things around.

Difference between installing and importing modules

New to Python, so excuse my lack of specific technical jargon. Pretty simple question really, but I can't seem to grasp or understand the concept.
It seems that a lot of modules require using pip or easy_install and running setup.py to "install" into your python installation or your virtualenv. What is the difference between installing a module and simply taking it and importing the into another script? It seems that you access the modules the same way.
Thanks!

It's like the difference between:
Uploading a photo to the internet
Linking the photo URL inside an HTML page
Installing puts the code somewhere python expects those kinds of things to be, and the import statement says "go look there for something named X now, and make the data available to me for use".

For a single module, it usually doesn't make any difference. For complicated webs of modules, though, an installation program may do many things that wouldn't be immediately obvious. For example, it may also copy data files into locations the new modules can find them, put executables (binary libraries, or DLLs on Windws, for example) where the new modules can find them, do different things depending on which version of Python you have, and so on.
If deploying a web of modules were always easy, nobody would have written setup programs to begin with ;-)

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Utilizing the Dependency-Graph of pip - python

Yes, its code is quite unreadable if you're not used to it. I don't recall something like that and I would not use it. You may find yourself better suited with distlib, which has a module just for that: https://distlib.readthedocs.org/en/latest/depgraph.html

Related

if I make a function that calls a library like numpy, and someone else use it, they will need to have numpy installed?

How do I determine which requirements are actually needed in setup.py?

import openpyxl in django

How do I use script provided by Anaconda without using the given python?

Difference between installing and importing modules

Categories

Resources