How to structure a simple command line based python project - python

I wrote a command line app in a single file app_name.py and it works. Now I am breaking it into different .py files for easier management and readability. I have placed all these py files in a src/ folder as I see that a lot on github.
app_name/
src/
file1.py
file2.py
cli.py
__init__.py
I have placed all the imports in my __init__.py The relative imports like from .file1 import function1 are not in __init__ and are placed within the individual files where there are needed. For example,
#!usr/bin/env python
#__init__.py
import os
import argparse
#etc...
run_cli()
In cli.py, I have
from .file1 import function1
def run_cli(): pass
if __name__ == '__main__':
run_cli()
This is because when I use, on the actual command line app_name <arguments> then the __name__ isnt main, so I call run_cli() in __init__.py. Although this doesn't look right, as I will have to call src not app_name

I think you may be confusing two things. Having a src directory in your source distribution is a reasonably common, idiomatic thing to do; having that in the installed package or app, not so much.
There are two basic ways to structure this.
First, you can build a Python distribution that installs a package into site-packages, and installs a script into wherever Python scripts go on the PATH. The Python Packaging User Guide covers this in its tutorial on building and distributing packages. See the Layout and Scripts sections, and the samples linked from there.
This will usually give you an installed layout something like this:
<...somewhere system/user/venv .../lib/pythonX.Y/site-packages>
app_name/
file1.py
file2.py
cli.py
__init__.py
<...somewhere.../bin/>
app_name
But, depending on how the user has chosen to install things, it could instead be an egg, or a zipped package, or a wheel, or anything else. You don't care, as long as your code works. In particular, your code can assume that app_name is an importable package.
Ideally, the app_name script on your path is an "entry-point" script (pip itself may be a good example on your system), ideally one built on the fly at install time. With setuptools, You can just specify which package it should import and which function it should call in that package, it will do everything else—making sure to shebang the Python actually used at install time, figuring out how to pkg_resources up the package and add it to the sys.path (not needed by default, but if you don't want it to be importable, you can make that work), and so on.
As mentioned in a comment from the original asker, python-commandline-bootstrap might help you put this solution together more quickly.
The alternative is to keep everything out of site-packages, and make the package specific to your app. In this case, what you basically want to do is:
Install a directory that contains both the package (either as a directory, or zipped up) and the wrapper script.
In the wrapper script, add dirname(abspath(argv[0])) to sys.path before the import.
Symlink the wrapper script into some place on the user's PATH.
In this case, you have to write the wrapper script manually, but then you don't need anything fancy this way.
But often, you don't really want an application to depend on the user having some specific version and setup of Python. You may want to use a tool like pyInstaller, py2exe, py2app, cx_Freeze, zc.buildout, etc. that does all of the above and more. They all do different things, all the way up to the extreme of building a Mac .app bundle directory with a custom, standalone, stripped Python framework and stdlib and a wrapper executable that embeds the framework.
Either way, you really don't want to call the package directory src. That means the package itself will be named src, which is not a great name. If your app is called spamifier, you want to see spamifier in your tracebacks, not src, right?

Related

strategy for making sure imports of custom modules within project work from crontab?

I have a code project with various python scripts and modules. The folder structure of the github project is something like this:
/data_collection
/analysis
/modules
/helpers
Most of the scripts in data_collection and analysis will import stuff from modules or helpers. The code for doing this, in an example script /data_collection/pull_data.py, would be something like this:
import sys
sys.path.insert(0, '..')
from modules import my_module
from helpers import my_helper
now, if i simply run this code from the shell (from the dir that the script is in) - easy, it works just fine.
BUT: I want to run this from the crontab. It doesn't work, because crontab's PWD is always the cron user's home dir.
Now, I realise that I could add PWD=/path/to/project at the top of cron. But, what if I also have other project's scripts firing from cron?
I also realise that I could reorganise the whole folder structure of the project, perhaps putting all these folders into a folder called app and adding __init__.py to each folder -- but I'm not really in a position to do that at this moment.
So - I wonder, is there a possibility to achieve the following:
retain the relative paths in sys.path.insert within scripts (or perhaps get some solution that avoids the sys.path business altogether (so that it can run without modification on other systems)
be able to run these scripts from the crontab while also running scripts that live in other project directories from the crontab
Many thanks in advance!
I've created an experimental import library: ultraimport
It allows to do file system based imports with relative (or absolute) paths that will always work, no matter how you run the code or which user is running the code (given the user has read access to the files you're trying to import).
In your example script /data_collection/pull_data.py you would then write:
import ultraimport
my_module = ultraimport('__dir__/../modules/my_module.py')
my_helper = ultraimport('__dir__/../helpers/my_helper.py')
No need for any __init__.py and no need to change your directory structure and no need to change sys.path. Also your current working directory (which you can get running pwd) does not play any role for finding the import files.

Is it necessary to add my project to the environment variables PATH or PYTHONPATH?

I been reading a lot on how to set up the projects and having the __init__.py in the folder structures and also one above in the root folder where the project is been sitting.
I have this folder structure, and running in pycharm work, since it adds the path to environment variables when it starts.
C:\test_folder\folder_struture\project
C:\test_folder\folder_struture\project\pak1
C:\test_folder\folder_struture\project\pak1\pak_num1.py
C:\test_folder\folder_struture\project\pak1\__init__.py
C:\test_folder\folder_struture\project\program
C:\test_folder\folder_struture\project\program\program.py
C:\test_folder\folder_struture\project\program\__init__.py
C:\test_folder\folder_struture\project\__init__.py
C:\test_folder\folder_struture\__init__.py
When I try to run program.py where I have:
from project.pak1 import pak_num1
I get the error that the module doesn't exists. when I add the project to the PYTHONPATH variable or the PATH variable in my windows machine, now everything works fine. Is it possible that the tutorials are missing the part of setting the root folder of the project into the environs since they assume you already did it?
For every project do I need to put the root project into the environment variables or is there is a way for python to recognize that is in a python module/python structure?
Adding it to the environ will let me import absolute,
but if I try to do and relative with
from ..pak1 import pak_num1
I get:
ImportError: attempted relative import with no known parent package
If I run program.py does it look for the __init__.py in the same folder and if it find it, does it go one level up to find another __init__.py and so on to get the structure?
If you are writing lots of little scripts and start wanting to organize some of them into utility packages, mostly to be used by other scripts (ouside those packages), put all the scripts that are to be called from the command-line (your entry-points for execution, or main scripts, or whatever name you want to call this scripts that run as commands from the command line) side-by-side, in the same folder as the root folders of all your packages.
Then you can import (anything) from any package from the top-level scripts. Starting the execution from the top level scripts not only gives access to any package, as also allows for all the package internal imports to work, given that an __init__.py file rests in all package/sub-package folder/sub-folder.
For code inside a package to import from another sibling package (sibling at the top-level folder) you need either to append to sys.path the sibling package, or for example, to wrap everything yet in another folder again with an __init__.py file there. Then again you should start execution from outside this overall package, not from the scripts that were top-level before you do this.
Think of packages as something to be imported for use, not to start running from a random inner entry point.
Many times a better alternative is to configure a virtualenv, and every package you install in the environment becomes known in that environment. This also isolates the dependencies from project to project, including the Python version at use if you need. Note that this is solving also a different problem, but a hairy one, by the way.

Do I need PYTHONPATH

There are many of similar questions about PYTHONPATH and imports but I didn't find exactly what I needed.
I have a git repository that contains a few python helper scripts. The scripts are naturally organized in a few packages. Something like:
scripts/main.py
scripts/other_main.py
scripts/__init__.py
a/foo.py
a/bar.py
a/__init__py
b/foo.py
b/bar.py
b/__init__.py
__init__.py
scripts depends on a and b. I'm using absolute import in all modules. I run python3 scripts/main.py. Everything works as long as I set up PYTHONPATH to the root of my project.
However, I'd like to avoid users the hassle of setting up an environment variable.
What would be the right way to go? I expected this to work like in java, where the current dir is in the classpath by default but it doesn't seem to be the case. I've also tried relative import without success.
EDIT: it seems to work if I remove the top-level __init__.py
Firstly, you're right in that I don't think you need the top-level __init__.py. Removing it doesn't solve any import error for me though.
You won't need to set PYTHONPATH and there are a few alternatives that I can think of:
Use a virtual environment (https://virtualenv.pypa.io/en/latest/). This would also require you to package up your code into an installable package (https://packaging.python.org/). I won't explain this option further since it's not directly related to your question.
Move your modules under your scripts directory. Python automatically adds the script's directory into the Python path.
Modify the sys.path variable in your scripts so they can find your local modules.
The second option is the most straightforward.
The third option would require you to add some python code to the top of your scripts, above your normal imports. In your main.py it would look like:
#!/usr/bin/env python
import os.path, sys
sys.path.insert(0, os.path.dirname(os.path.dirname(__file__)))
import a
import b
What this does is:
Take the filename of the script
calculate the parent directory of the directory of the script
Prepend that directory to sys.path
Then do normal imports of your modules

Editing pythonpath for every script in a project

So I structure almost all of my projects like this:
root/
|- scripts/
|- src/
|- etc. ...
I put runnable scripts in scripts/ and importable modules in src/, and by convention run every script from the root directory (so I always stay in root, then type 'python scripts/whatever')
In order to be able to import code from src/, I've decided to start every script with this:
import sys
import os
# use this to make sure we always have the dir we ran from in path
sys.path.append(os.getcwd())
To make sure root/ is always in the path for scripts being run from root.
My question is: is this considered bad style? I like my conventions of always running scripts from the root directory, and keeping my scripts separate from my modules, but it seems like a weird policy to always edit the path variable for every script I write.
If this is considered bad style, could you provide alternative recommendations? Either different ways for me to keep my existing conventions or recommendations for different ways to structure projects would be great!
Thanks!
My recommendation is to use:
root/$ python -m scripts.whatever
With the -m you use the . notation rather than the file path and you won't need to setup path code in each of the scripts because the -m tells Python to start looking for imports in the directory where you called Python.
If your file structure also happens to be installed using setup.py and may be found within the site_packages there are some other things to consider:
If you call -m from the root of the directory structure (as I've shown above) it will call the code found in your directories
If you call -m from anywhere else, it will find the installed code from sys.path and call that
This can be a subtle gotcha if you happen to be running an interpreter that has your package installed and you are trying to make changes to your scripts locally and can't figure out why your changes aren't there (this happened to a coworker of mine who wasn't using virtual environments).

How to build a single python file from multiple scripts?

I have a simple python script, which imports various other modules I've written (and so on). Due to my environment, my PYTHONPATH is quite long. I'm also using Python 2.4.
What I need to do is somehow package up my script and all the dependencies that aren't part of the standard python, so that I can email a single file to another system where I want to execute it. I know the target version of python is the same, but it's on linux where I'm on Windows. Otherwise I'd just use py2exe.
Ideally I'd like to send a .py file that somehow embeds all the required modules, but I'd settle for automatically building a zip I can just unzip, with the required modules all in a single directory.
I've had a look at various packaging solutions, but I can't seem to find a suitable way of doing this. Have I missed something?
[edit] I appear to be quite unclear in what I'm after. I'm basically looking for something like py2exe that will produce a single file (or 2 files) from a given python script, automatically including all the imported modules.
For example, if I have the following two files:
[\foo\module.py]
def example():
print "Hello"
[\bar\program.py]
import module
module.example()
And I run:
cd \bar
set PYTHONPATH=\foo
program.py
Then it will work. What I want is to be able to say:
magic program.py
and end up with a single file, or possibly a file and a zip, that I can then copy to linux and run. I don't want to be installing my modules on the target linux system.
I found this useful:
http://blog.ablepear.com/2012/10/bundling-python-files-into-stand-alone.html
In short, you can .zip your modules and include a __main__.py file inside, which will enable you to run it like so:
python3 app.zip
Since my app is small I made a link from my main script to __main__.py.
Addendum:
You can also make the zip self-executable on UNIX-like systems by adding a single line at the top of the file. This may be important for scripts using Python3.
echo '#!/usr/bin/env python3' | cat - app.zip > app
chmod a+x app
Which can now be executed without specifying python
./app
Use stickytape module
stickytape scripts/blah --add-python-path . > /tmp/blah-standalone
This will result with a functioning script, but not necessarily human-readable.
You can try converting the script into an executable file.
First, use:
pip install pyinstaller
After installation type ( Be sure you are in your file of interest directory):
pyinstaller --onefile --windowed filename.py
This will create an executable version of your script containing all the necessary modules. You can then transfer (copy and paste) this executable to the PC or machine you want to run your script.
I hope this helps.
You should create an egg file. This is an archive of python files.
See this question for guidance: How to create Python egg file
Update: Consider wheels in 2019
The only way to send a single .py is if the code from all of the various modules were moved into the single script and they your'd have to redo everything to reference the new locations.
A better way of doing it would be to move the modules in question into subdirectories under the same directory as your command. You can then make sure that the subdirectory containing the module has a __init__.py that imports the primary module file. At that point you can then reference things through it.
For example:
App Directory: /test
Module Directory: /test/hello
/test/hello/__init__.py contents:
import sayhello
/test/hello/sayhello.py contents:
def print_hello():
print 'hello!'
/test/test.py contents:
#!/usr/bin/python2.7
import hello
hello.sayhello.print_hello()
If you run /test/test.py you will see that it runs the print_hello function from the module directory under the existing directory, no changes to your PYTHONPATH required.
If you want to package your script with all its dependencies into a single file (it won't be a .py file) you should look into virtualenv. This is a tool that lets you build a sandbox environment to install Python packages into, and manages all the PATH, PYTHONPATH, and LD_LIBRARY_PATH issues to make sure that the sandbox is completely self-contained.
If you start with a virgin Python with no additional libraries installed, then easy_install your dependencies into the virtual environment, you will end up with a built project in the virtualenv that requires only Python to run.
The sandbox is a directory tree, not a single file, but for distribution you can tar/zip it. I have never tried distributing the env so there may be path dependencies, I'm not sure.
You may need to, instead, distribute a build script that builds out a virtual environment on the target machine. zc.buildout is a tool that helps automate that process, sort of like a "make install" that is tightly integrated with the Python package system and PyPI.
I've come up with a solution involving modulefinder, the compiler, and the zip function that works well. Unfortunately I can't paste a working program here as it's intermingled with other irrelevant code, but here are some snippets:
zipfile = ZipFile(os.path.join(dest_dir, zip_name), 'w', ZIP_DEFLATED)
sys.path.insert(0, '.')
finder = ModuleFinder()
finder.run_script(source_name)
for name, mod in finder.modules.iteritems():
filename = mod.__file__
if filename is None:
continue
if "python" in filename.lower():
continue
subprocess.call('"%s" -OO -m py_compile "%s"' % (python_exe, filename))
zipfile.write(filename, dest_path)
Have you taken into considerations Automatic script creation of distribute the official packaging solution.
What you do is create a setup.py for you program and provide entry points that will be turned into executables that you will be able run. This way you don't have to change your source layout while still having the possibility to easily distribute and run you program.
You will find an example on a real app of this system in gunicorn's setup.py

Categories

Resources