Pythonic way for magaging and using user-created shared libraries

Pythonic way for magaging and using user-created shared libraries - python

tl;dr I have a directory of common files outside of my various project directories. What is the pythonic way of using/importing these common files inside my projects, and for building them into an output directory.
Background:
I'm in school and taking a data structures class that uses Python as the language. I'm learning the languages as I take the class but have having some issues trying to maintain a shared code base.
In all of the other languages I've used, both compiled and interpreted, there has been a fairly intuitive way of being able to keep shared modules separate from the code that is using them so that updating a shared module doesn't require updates to the calling code.
This is how I initially had my directory structure organized.
/.../Projects
Assignment_1
__init__.py
classA.py
classB.py
Assignment_2
__init__.pu
classC.py
(etc)
After realizing that much of the functionality of classA and classB would be required later on, I reorganized to this:
/.../Projects
Common
Sorters
__init__.py
BubbleSort.py
MergeSort.py
__init__.py
SimpleProfiler.py
Assignment_1
__init__.py
main.py
Assignment_2
__init__.py
main.py
My issue is that I can't find a good way of importing things like SimpleProfiler or MergeSort from main,py. Right now I'm manually coping all of the Common files into each assignment, which is bad.
I understand that one possible solution is to update the path to include the common folder form within each main.py file, but I've also read that this is very hacky and isn't encouraged.
Another Stackoverflow answer to a similar question suggested that the user structure everything under one large project. I tried this but still couldn't import modules from one sibling into another sibling.
My other issue is how to package everything together when submitting the assignment. In other languages it was easy to implement a build script that would scan the main project for any imports, then copy (flatten) those imported files into a single output directory which I could then compress and submit for grading. I'm using PyCharm, but can't seem to find a way to reference the imports as part of the build process. Is there any kind of script for this? Whatever the solution is, I need to be able to submit the project in such a way that all the instructor has to do is call a single python file (such as main.py)
This issue isn't unique to a school setting, but seems universal to most programming projects. So, what is the pythonic way of managing a shared code base and for building that shared code into a final project?

[Disclaimer: I think it is better to use PYTHONPATH environment variable]
I think of two very similar alternatives:
/.../Projects
Common
Sorters
__init__.py
BubbleSort.py
MergeSort.py
__init__.py
SimpleProfiler.py
assignment_1.py
assignment_2.py
If you use, from assignment_1.py, the following import: from Common.Sorters.BubbleSort import bubble_sort. This is because, by default, PYTHONPATH considers the current path as a valid PYTHONPATH. This is assuming that you are invoking the scripts assignment_* directly.
The other alternative would be:
/.../Projects
Common
Sorters
__init__.py
BubbleSort.py
MergeSort.py
__init__.py
SimpleProfiler.py
Assignment_1
__init__.py
__main__.py
Assignment_2
__init__.py
__main__.py
And invoking the assignments like so: python -m Assignment_1 (from the Projects folder). By default, "executing" a module like that will load its __main__.py code. (This is not a rigurous explanation, although the official one is a bit short).
It works for the same reasons as before: Python interpreter will consider the current path as a valid PYTHONPATH.

Try setting PYTHONPATH environment variable to your directory.
Python first searches for files being imported in sys.path, and the first directory in sys.path is the current directory. PYTHONPATH is the next where python will look for files.

On the minimum end, make a PyCharm run configuration that sets your PYTHONPATH before executing and include the other directory. That way you don't need to do a sys.path call in your code.
Closer to the "perfect" end, make your other directory into a Python package with a setup.py. Then, using the interpreter from your project, do "python path/to/other/dir/setup.py develop" to bring your separately-developed package into the consuming project.

Related

Importing modules from adjacent folders in Python

I wanted to ask if there is a way to import modules/functions from a folder in a project that is adjacent to another folder.
For example lets say I have two files:
project/src/training.py
project/lib/functions.py
Now both these folders have the __init__.py file in them. If I wanted to import functions.py into training.py, it doesn't seem to detect. I'm trying to use from lib.functions import * .I know this works from the upper level of the folder structure, where I can call both files from a script, but is there a way to do it files in above/sideways folders?

Fundamentally, the best way of doing this depends on exactly how the two modules are related. There's two possibilities:
The modules are part of one cohesive unit that is intended to be used as a single whole, rather than as separate pieces. (This is the most common situation, and it should be your default if you're not sure.)
The functions.py module is a dependency for training.py but is intended to be used separately.
Cohesive unit
If the modules are one cohesive unit, this is not the standard way of structuring a project in Python.
If you need multiple modules in the same project, the standard way of structuring the folders is to include all the modules in a single package, like so:
project/
trainingproject/
__init__.py
training.py
functions.py
other/
...
project/
...
folders/
...
The __init__.py file causes Python to recognize the trainproject/ directory as a single unit called a package. Using a package enables to use of relative imports:
training.py
from . import functions
# The rest of training.py code
Assuming your current directory is project, you can then invoke training.py as a module:
python -m trainingproject.training
Separate units
If your modules are actually separate packages, then the simplest idiomatic solutions during development is to modify the PYTHONPATH environment variable:
sh-derviative syntax:
# All the extra PYTHONPATH references on the right are to append if it already has values
# Using $PWD ensures that the path in the environment variable is absolute.
PYTHONPATH=$PYTHONPATH${PYTHONPATH:+:}$PWD/lib/
python ./src/training.py
PowerShell syntax:
$env:PYTHONPATH = $(if($env:PYTHONPATH) {$env:PYTHONPATH + ';'}) + (Resolve-Path ./lib)
python ./src/training.py
(This is possible in Command Prompt, too, but I'm omitting that since PowerShell is preferred.)
In your module, you would just do a normal import statement:
training.py
import functions
# Rest of training.py code
Doing this will work when you deploy your code to production as well if you copy all the files over and set up the correct paths, but you might want to consider putting functions.py in a wheel and then installing it with pip. That will eliminate the need to set up PYTHONPATH by installing functions.py to site-packages, which will make the import statement just work out of the box. That will also make it easier to distribute functions.py for use with other scripts independent of training.py. I'm not going to cover how to create a wheel here since that is beyond the scope of this question, but here's an introduction.

Yes, it’s as simple as writing the entire path from the working directory:
from project.src.training import *
Or
from project.lib.functions import *

I agree with what polymath stated above. If you were also wondering how to run these specific scripts or functions once they are imported, use: your_function_name(parameters), and to run a script that you have imported from the same directory, etc, use: exec(‘script_name.py). I would recommend making functions instead of using the exec command however, because it can be a bit hard to use correctly.

Creating Modules with PyCharm

I am working with PyCharm and am trying to create a module from code I've created so that I can import it into new files. In IntelliJ you can start the module creator but in PyCharm this option does not seem to exist.
Without a module when I type:
import my_code
I receive a warning saying "No module named my_code".
I've tried creating packages to replace the module but this does not work.
How do you repackage code in PyCharm so you can import it into a new file?
The project structure is quite simple. I have a number of files I've created as part of a tutorial. I want to make one of the files, "Importing_Files" a module so that I can import it into another file, i.e., "Import_Tester". I've added a picture below to show the tree.

Here's what I would suggest. It looks like you've already tried to set things up correctly, but you need to organize things in Pycharm a bit differently. I ran into a similar problem, which is why I think having an answer to this question is useful.
Your .idea directory is within the package, which makes things awkward. Try this:
Create a new Pycharm project based on the top level of the project.
Make src and test directories within that project, and set them as source root and test root, respectively.
Move the HelloWorld package into src (make sure it's still recognized as a package).
Create new files in src with main sections for any functions you need to run from the command line, add imports for your package, and move your main code into it.
For any main functions that define tests, do the same thing -- create files with main logic in the tests directory. Unit tests are a better way to do that, but this directory structure should work.
Remove the old project (delete the .idea directory in HelloWorld).
The final project layout should look something like this:
CompletePythonMasterClassUdemy
.idea
src
command_line_main.py
HelloWorld
__init__.py
...
test
test_account.py
This is a better way to organize things that should work both within and outside of Pycharm. Unlike the Java world, Python doesn't have as many common conventions for correctly setting up projects. There are very likely better ways to do things, but this works well for me. It should work well for people getting started with Python library development.

Python init.py proved to be irrelevant in version 3.4?

I am running python 3.4 on the main.py file in the same directory.
/root directory is not in python path. It is simply the current directory that the script is executing in. All pycache folders were deleted after each test
So why exactly is __init__.py important? I thought it was necessary as stated in this post:
What is __init__.py for?
If you remove the init.py file, Python will no longer look for submodules inside that directory, so attempts to import the module will fail.
Right now, it seems to me that __init__.py is nothing more than an optional constructor where we do housekeeping and other optional things like specifying the "all" variable, etc. But not a critical item to have.
Image showing the results of the test:
Can someone explain the discrepancy or what is the cause of this issue?

As confusing as it may be, although the basics will work without __init__.py files, you should probably still use them. Many external tools, as well as package-related functions in the standard library, will not work as expected without them. More words of wisdom here (as well as a misleading accepted answer): Is __init__.py not required for packages in Python 3.3+.

Found Answer
In essence, init.py is not needed, and its purpose is for legacy and optional housekeeping tasks that you may or may not want or need in Python versions 2.7 vs 3.0+. However, it is important to take into account that they have slightly different behavior during more complex parsing if you are building something more complex.
Please refer to the following links for additional reading material:
https://www.python.org/dev/peps/pep-0420/#namespace-packages-today
How do I create a namespace package in Python?
What's the difference between a Python module and a Python package?
https://softwareengineering.stackexchange.com/questions/276888/python-namespace-vs-module-with-underscores

Best practice for handling path/executables in project scripts in Python (e.g. something like Django's manage.py, or fabric)

I do a lot of work on different projects (I'm a scientist) in a fairly standardised directory structure. e.g.:
project
/analyses/
/lib
/doc
/results
/bin
I put all my various utility scripts in /bin/ because cleanliness is next to godliness. However, I have to hard code paths (e.g. ../../x/y/z) and then I have to run things within ./bin/ or they break.
I've used Django and that has /manage.py which runs various django-things and automatically handles the path. I've also used fabric to run various user defined functions.
Question: How do I do something similar? and what's the best way? I can easily write something in /manage.py to inject the root dir into sys.path etc, but then I'd like to be able to do "./manage.py foo" which would run /bin/foo.py. Or is it possible to get fabric to call executables from a certain directory?
Basically - I want something easy and low maintenance. I want to be able to drop an executable script/file/whatever into ./bin/ and not have to deal with path issues or import issues.
What is the best way to do this?

Keep Execution at TLD
In general, try to keep your runtime at top-level. This will straighten out your imports tremendously.
If you have to do a lot of import addressing with relative imports, there's probably a
better way.
Modifying The Path
Other poster's have mentioned the PYTHONPATH. That's a great way to do it permanently in your shell.
If you don't want to/aren't able to manipulate the PYTHONPATH project path directly you can use sys.path to get yourself out of relative import hell.
Using sys.path.append
sys.path is just a list internally. You can append to it to add stuff to into your path.
Say I'm in /bin and there's a library markdown in lib/. You can append a relative paths with sys.path to import what you want.
import sys
sys.path.append('../lib')
import markdown
print markdown.markdown("""
Hello world!
------------
""")
Word to the wise: Don't get too crazy with your sys.path additions. Keep your schema simple to avoid yourself a lot confusion.
Overly eager imports can sometimes lead to cases where a python module needs to import itself, at which point execution will halt!
Using Packages and __init__.py
Another great trick is creating python packages by adding __init__.py files. __init__.py gets loaded before any other modules in the directory, so it's a great way to add imports across the entire directory. This makes it an ideal spot to add sys.path hackery.
You don't even need to necessarily add anything to the file. It's sufficient to just do touch __init__.py at the console to make a directory a package.
See this SO post for a more concrete example.

In a shell script that you source (not run) in your current shell you set the following environment variables:
PATH=$PATH:$PROJECTDIR/bin
PYTHONPATH=$PROJECTDIR/lib
Then you put your Python modules and package tree in your projects ./lib directory. Python automatically adds the PYTHONPATH environment variable to sys.path.
Then you can run any top-level script from the shell without specifying the path, and any imports from your library modules are looked for in the lib directory.
I recommend very simple top-level scripts, such as:
#!/usr/bin/python
import sys
import mytool
mytool.main(sys.argv)
Then you never have to change that, you just edit the module code, and also benefit from the byte-code caching.

You can easily achieve your goals by creating a mini package that hosts each one of your projects. Use paste scripts to create a simple project skeleton. And to make it executable, just install it via setup.py develop. Now your bin scripts just need to import the entry point to this package and execute it.

What differences does `init` make to a directory?

In python, a directory containing one or more modules sometimes has __init__.py, so that the directory can be treated as a python package, is this correct? What differences the __init__ makes? (also another Q, is a python module just a python code-file with related and possibly independent (to other files) set of classes, functions and variables?)

Here's an explanation for why __init__.py is needed:
The __init__.py files are required to make Python treat the directories as containing packages; this is done to prevent directories with a common name, such as string, from unintentionally hiding valid modules that occur later on the module search path. In the simplest case, __init__.py can just be an empty file, but it can also execute initialization code for the package or set the __all__ variable, described later.
As I've just recommended to another poster, the tutorial on modules is pretty informative.

In addition, the contents of __init__.py becomes the contents of the package when treated as a module, i.e. the contents of somepackage/__init__.py will be found in dir(somepackage) when you import somepackage.
Modules themselves can be Python code, specially-crafted C code, or they could be an artificial construct injected by the executable that loads the Python VM.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.