Difference between installing and importing modules

Difference between installing and importing modules - python

New to Python, so excuse my lack of specific technical jargon. Pretty simple question really, but I can't seem to grasp or understand the concept.
It seems that a lot of modules require using pip or easy_install and running setup.py to "install" into your python installation or your virtualenv. What is the difference between installing a module and simply taking it and importing the into another script? It seems that you access the modules the same way.
Thanks!

It's like the difference between:
Uploading a photo to the internet
Linking the photo URL inside an HTML page
Installing puts the code somewhere python expects those kinds of things to be, and the import statement says "go look there for something named X now, and make the data available to me for use".

For a single module, it usually doesn't make any difference. For complicated webs of modules, though, an installation program may do many things that wouldn't be immediately obvious. For example, it may also copy data files into locations the new modules can find them, put executables (binary libraries, or DLLs on Windws, for example) where the new modules can find them, do different things depending on which version of Python you have, and so on.
If deploying a web of modules were always easy, nobody would have written setup programs to begin with ;-)

Related

Is it possible to have users not pip install modules and instead include the modules used in a different folder and then import that?

I want to know if I can create a python script with a folder in the same directory with all the assets of a python module, so when someone wants to use it, they would not have to pip install module, because it would import from the directory.

Yes, you can, but it doesn't mean that you should.
First, ask yourself who is suposed to use that code.
If you plan to give it to consumers, it would be a good idea to use a tool like py2exe and create executable file which would include all modules and not allow for code to be changed.
If you plan to share it with another developer, you might want to look into virtual environments and requirements.txt file.
There are multiple reasons why sharing modules is bad idea:
It is harder to update modules later, at least without upgrading whole project.
It uses more space on version control, which can create issues on huge projects with hundreds of modules and branches
It might be illegal as some licenses specifically forbid including their code in your source code.
The pip install of some module might do different things depending on operating system version or installed packages. The modules on your machine might be suboptimal on someone else's machine, and in some instances might not even work.
And probably more that I can't think of right now.
The only situation where I saw this being unavoidable was when the module didn't support python implementation the application was running on. The module was changed, and its source was put under lib folder with the rest of the libraries.

I think you can add the directory with python modules into PYTHONPATH. Then people want to use those modules just need has this envvar set.
https://docs.python.org/3/using/cmdline.html#envvar-PYTHONPATH

How do I determine which requirements are actually needed in setup.py?

I'm cleaning up packaging for a python project I didn't create. Currently, it does some explicitly unsupported magic to get its dependencies from a requirements.txt file. The file looks like it may have been generated by pip freeze; there are fixed versions for everything, and many apparently-extraneous packages listed. I am pretty sure some of these aren't real dependencies, but I don't know which ones.
Given just the source tree, how would I figure out, from scratch, what dependencies ought to be included in install_requires?
As a first stab, I'm grepping for non-stdlib import statements. I hope there's a better way.

There's no way to do this perfectly, because Python is too flexible.
But it's usually possible to do it well enough.
You can use start with the stdlib's modulefinder.
Beyond that, a number of projects—mostly projects designed for building binary executables, installers, etc. for Python apps—have come up with heuristics that go even farther.
These usually work. And, when they fail, you usually immediately spot it on your first test. Even if they aren't sufficient, they're at the very least good sample code. Here are a few off the top of my head:
cx_Freeze
py2exe
py2app
pyInstaller
In case you're wondering why it's impossible:
Even forgetting about the program of dependencies in C extension modules, Python is just too flexible to catch all the ways you could import a module via static analysis.
Sure, you'd have to be dealing with code written by someone crazy enough to use explicitly unsupported magic for no good reason… but if you were, there's nothing to stop someone from writing this instead of import lxml:1
with open('picture.jpg', encoding='cp500') as f:
getattr(sys.modules[11], codecs.encode('vzcbeg_zbqhyr', 'rot13'))(f.read().strip())
In reality, things aren't going to be that bad. But they could easily be too bad for rg import to be sufficient.
You could try to detect all the imports dynamically with a simple import hook, but that's only guaranteed to work if you can exercise 100% of the code paths.
1. Of course this only works if importlib was the 12th module loaded, and if picture.jpg is not a JPEG image but a textfile whose contents are, in EBCDIC, lxml\n

I've had great results with pipreqs that will automatically generate a requirements.txt file from your source code.
pipreqs /home/project/location
Successfully saved requirements file in /home/project/location/requirements.txt

I wrote a tool, realreq, specifically for this issue.
You can install it using pip python3 -m pip install realreq. Using it is easy as:
realreq -s /path/to/your/source
It will then gather your dependencies actually used in your source code.

I mean, the most effective way would honestly be to go through the code line by line and determine what packages may not be needed, what packages need updates, etc. I know Python 2 and 3 both have ModuleFinder which finds all the modules a script needs to successfully compile and run, but I've never used it before, so not sure how effective it is, especially for what you're doing. However, if you're interested, I'll attach the link below.
https://docs.python.org/3/library/modulefinder.html

Lazily download/install python submodules

Would it be possible to create a python module that lazily downloads and installs submodules as needed? I've worked with "subclassed" modules that mimic real modules, but I've never tried to do so with downloads involved. Is there a guaranteed directory that I can download source code and data to, that the module would then be able to use on subsequent runs?
To make this more concrete, here is the ideal behavior:
User runs pip install magic_module and the lightweight magic_module is installed to their system.
User runs the code import magic_module.alpha
The code goes to a predetermine URL, is told that there is an "alpha" subpackage, and is then given the URLs of alpha.py and alpha.csv files.
The system downloads these files to somewhere that it knows about, and then loads the alpha module.
On subsequent runs, the user is able to take advantage of the downloaded files to skip the server trip.
At some point down the road, the user could run a import magic_module.alpha ; alpha._upgrade() function from the command line to clear the cache and get the latest version.
Is this possible? Is this reasonable? What kinds of problems will I run into with permissions?

Doable, certainly. The core feature will probably be import hooks. The relevant module would be importlib in python 3.
Extending the import mechanism is needed when you want to load modules that are stored in a non-standard way. Examples include [...] modules that are loaded from a database over a network.
Convenient, probably not. The import machinery is one of the parts of python that has seen several changes over releases. It's undergoing a full refactoring right now, with most of the existing things being deprecated.
Reasonable, well it's up to you. Here are some caveats I can think of:
Tricky to get right, especially if you have to support several python versions.
What about error handling? Should application be prepared for import to fail in normal circumstances? Should they degrade gracefully? Or just crash and spew a traceback?
Security? Basically you're downloading code from someplace, how do you ensure the connection is not being hijacked?
How about versionning? If you update some of the remote modules, how can make the application download the correct version?
Dependencies? Pushing of security updates? Permissions management?
Summing it up, you'll have to solve most of the issues of a package manager, along with securing downloads and permissions issues of course. All those issues are tricky to begin with, easy to get wrong with dire consequences.
So with all that in mind, it really comes down to how much resources you deem worth investing into that, and what value that adds over a regular use of readily available tools such as pip.
(the permission question cannot really be answered until you come up with a design for your package manager)

How to package/ distribute python applications

I've spent countless hours trying to understand this and unfortunately I haven't gotten to an answer yet. Or at least I don't think I have.
First up I should say that I am a Java Developer. I've only recently started working with Python and the build-process is a bit...odd for me.
In my mind I write an application, I compile it to run and I package it into a .jar for other people to use. Either as a library or for end-users to execute and have fun with it. (ignoring stuff like maven or gradle...)
I wrote a little CLT in python that consists of ~6 files and I wanted to distribute it. From what I could find I was supposed to write a setup.py and I found some guides on how to do that but ... to be honest I'm not even sure what that did. I could get my source code bundled into a tar.gz with some other meta data or it would create some weird files that I don't know what to do with.
Then I found PyInstaller and it worked great to package everything into a binary. However I've run into some problems trying to create a Debian package and it has made me re-assess and question the fact that there doesn't seem to be something in Python (without having to use an external tool) that lets me package/ bundle/ whatever my application into a single file to be run.
And that's something I can't get my head around. I mean...before there were tools like PyInstaller and P2Exe and what not, how did people distribute their applications? Am I expected to write a C application, somehow include the python code in there and compile that? Sorry if this seems like a stupid question but I'm really asking. I've googled around so much and spent so much time on it and haven't found a satisfactory answer so I hope someone here can help me with this! Thanks.

If you package your Python code for pip, you can include some executable scripts that start your program. I don't know how the situation was 5 years ago when this question got asked, but nowadays pip is pretty much integrated with Python, to the point that there's a standard library module to bootstrap pip in case it's missing:
https://docs.python.org/3/library/ensurepip.html
The situation is different if you want to package an application for some other package manager, like Anaconda or the package managers of various Linux distributions, or as a Windows installer. Obviously, you'll have to create a separate package for each package manager or installation technique you want to support.

How to deal with third party Python imports when sharing a script with other people on Git?

I made a Python module (https://github.com/Yannbane/Tick.py) and a Python program (https://github.com/Yannbane/Flatland.py). The program imports the module, and without it, it cannot work. I have intended for people to download both of these files before they can run the program, but, I am concerned about this a bit.
In the program, I've added these lines:
sys.path.append("/home/bane/Tick.py")
import tick
"/home/bane/Tick.py" is the path to my local repo of the module that needs to be included, but this will obviously be different to other people! How can I solve this situation better?

What suggested by #Lattyware is a viable option. However, its not uncommon to have core dependencies boundled with the main program (Django and PyDev do this for example). This works fine especially if the main code is tweaked against a specific version of the library.
In order to avoid the troubles mentioned by Lattyware when it comes to code maintenance, you should look into git submodules, which allow precisely this kind of layout, keeping code versioning sane.
From the structure of your directory it seems that both files live in the same directory. This might be the tell-tale than they might be two modules of a same package. In that case you should simply add an empty file called __init__.py to the directory, and then your import could work by:
import bane.tick
or
from bane import tick
Oh, and yes... you should use lower case for module names (it's worth to take a in-depth look at PEP8 if you are going to code in python! :)
HTH!

You might want to try submitting your module to the Python Package Index, that way people can easily install it (pip tick) into their path, and you can just import it without having to add it to the python path.
Otherwise, I would suggest simply telling people to download the module as well, and place it in a subdirectory of the program. If you really feel that is too much effort, you could place a copy of the module into the repository for the program (of course, that means ensuring you keep both versions up-to-date, which is a bit of a pain, although I imagine it may be possible just to use a symlink).
It's also worth noting your repo name is a bit misleading, capitalisation is often important, so you might want to call the repo tick.py to match the module, and python naming conventions.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Difference between installing and importing modules - python

Related

Is it possible to have users not pip install modules and instead include the modules used in a different folder and then import that?

How do I determine which requirements are actually needed in setup.py?

Lazily download/install python submodules

How to package/ distribute python applications

How to deal with third party Python imports when sharing a script with other people on Git?

Categories

Resources