Python: basic project structure & import paths - python

For my Python 3 project, I have the following directory structure:
├── myapp
│   ├── compute
│   │   └── compute.py
│   └── mymain.py
├── setup.py
└── tests
└── compute
└── compute_test.py
My goal is to be able to run the code here in three ways:
Unit tests. I've randomly chosen pytest for these but whichever framework should be fine;
python myapp/mymain.py <arguments> for when I want to do a quick "manual test";
Something like pip install and/or a Docker image for a proper deployment.
Now, the first and third of these seem to be no problem, but I'm having trouble with the middle one.
Here are the contents of the files:
compute.py:
import math
class MyComputation:
# performs an extremely difficult and relevant computation
#staticmethod
def compute(i: int) -> float:
return math.sqrt(abs(i))
compute_test.py:
import pytest
from myapp.compute.compute import MyComputation
def test_computation_normal_case():
ins = [-4, 9, -36, 121]
outs = list(map(lambda i: MyComputation.compute(i), ins))
expected = [2.0, 3.0, 6.0, 11.0]
assert outs == expected
mymain.py:
import random
from myapp.compute.compute import MyComputation
class MyApp:
#staticmethod
def main():
print("Loading data...")
i = random.randint(1, 100000)
print("Input: {}".format(i))
print("Computing...")
f = MyComputation.compute(i)
print("Output: {}".format(f))
print("Done!")
if __name__ == "__main__":
MyApp.main()
When I run, say, pytest from the command line, it works fine: finds the test, runs it, test passes.
However, when I try to run the main class:
$ python myapp/mymain.py
Traceback (most recent call last):
File "myapp/mymain.py", line 8, in <module>
from myapp.compute.compute import MyComputation
ImportError: No module named myapp.compute.compute
It makes no difference whether I add __init__.py files inside the directories or not.
But if I add the following to mymain.py, it can then be run from the command line as expected:
import os
import sys
root_path = os.path.abspath(os.path.join(os.path.dirname(os.path.abspath(__file__)), '../'))
sys.path.insert(0, root_path)
So, questions:
1) What is the correct, Pythonic, idiomatic way to do the main class? What I want is essentially "run this code here, in-place, as is". Where do I put my main class? Do I need to pip install my stuff locally first? Do I need to do the imports differently?
2) Surely the sys.path.insert() stuff cannot be the "official" way of accomplishing what I want to do here? There must be a less ridiculous way... right?
3) Why do the unit tests work just fine while the main class doesn't? Does the unit test framework do something similar to the sys.path.insert() stuff under the covers? Or is there a better way of handling the imports?

Related

issue with nested python module importing when executing imported function itself

I believe this question must have already been asked but I cannot find an explanation for my problem, sorry if it is a duplicate.
Folder
├── Generator.py
└── modules
├── Function1.py
└── Subfunction.py
Generator.py imports Function1, and Function1 imports Subfunction.
Function1 must be able to be run as a standalone program and as an imported module of Generator.py
It is not a problem itself, as I am using the if __ name__ == "__ main__": to recognize the call type.
But the program fails on importing Subfunction depending on the code I am executing.
# Generator.py
import Function1
# Function1.py
import Subfunction
import modules.Subfunction
The first one works if I execute Function1.py, but it fails if I run Generator.py
The second one works if I execute Generator.py, but it fails if I run Function1.py
I thought imports and relative paths are related to the module where the code is placed, not from a perspective of the top-caller. I tried import .modules.Function1 and import .Function1 but the issue remains.
Is there any elegant way to import Subfunction for both uses, or do I need to include import under if name == main or trap it in try/except?
Edit: all code for #Bastien B
In this shape it works if I execute Function1.py itself.
If I execute Generator.py, I get the ModuleNotFoundError: No module named 'Function1'
# Generator.py
import Function1
print(Function1.Function1_return)
# Function1.py
def Function1_return():
return Subfunction.Subfunction_return()
import Subfunction
if __name__ == '__main__':
print(Function1_return())
# Subfunction.py
def Subfunction_return():
return "this is subfunction"
From what i can see your 'if name ...' is not the problem.
If you use a local venv this should work just fine:
generator.py
from Folder.modules.function1 import Function1_return
print(Function1_return())
Function1.py
from Folder.modules.Subfunction import Subfunction_return
def Function1_return():
return Subfunction_return()
if __name__ == '__main__':
print(Function1_return())
Subfunction.py
def Subfunction_return():
return "this is subfunction"
Depending of your files structure you may have to tweak this a beat

Mock.patch in python unittest could work for two paths

I have a dir which has such structure:
├── aaa.py
├── src
│   └── subsrc
│   ├── else.py
│   └── util.py (there is a "foo" function")
└── tests
├── __init__.py
└── unittests
├── __init__.py
└── test_aaa.py
so "aaa.py", "tests" dir and "src" dir are in project root. and in "test_aaa.py", I use mock to mock function in "util.py":
from src.subsrc.util import foo
import pytest
from unittest import mock
#mock.patch("src.subsrc.util.foo")
def test_foo(mock):
mock.return_value = 111
and then I run python3.7 -m pytest inside "unittests" dir, it worked. This makes sense to me since pytest will find the first dir without __init__.py and then add it to PATH(in this case project root dir will be added) so it could find "src.subsrc.util.foo".
But then I made a small change to "test_aaa.py", in its "mock.patch", I added "aaa" at the beginning:
from src.subsrc.util import foo
import pytest
from unittest import mock
#mock.patch("aaa.src.subsrc.util.foo")
def test_foo(mock):
mock.return_value = 111
it still worked, "aaa.py" is an executable, in "aaa.py":
#!python3.7
from src.subsrc.else import other
if __name__ = "__main__":
# ...
pass
I am very confused why #mock.patch("aaa.src.subsrc.util.foo") also worked, is Python so smart that it could ignore 'aaa' then go "src.subsrc.." to find what it needs? Thanks!
update:
I suspect if because "aaa.py"'s name is special so I changed it to different names, but it still worked. Like I change it to "bbb.py", then in mock.patch, "aaa.src..." does not work but "bbb.src..." still worked. So I am sure "mock.patch" find this executable first.
update:
I guess it could be related to how "mock.patch()" works?
Your example seems to be a bit too stripped-down, but I'll try to expand it in order to explain. When reading about mocks in Python, you will often encounter the phrase "mock it where it's used", which isn't really helpful if you are new to the topic (but here's an excellent article on this concept).
In your test_aaa.py you will probably want to test some functionality of your aaa.py module, which may call some function from src/subsrc/util.py. After importing your foo() function in the aaa.py module, that's the exact location where you should point #mock.patch to: #mock.patch("aaa.foo"). By doing this, your mock will have access to all invocations of foo() in the functions you are about to test, namely aaa.do_something(). I've expanded your example as follows:
# aaa.py
from src.subsrc.util import foo
def do_something():
return foo()
if __name__ == "__main__":
value = do_something()
print(f"value is {value}")
# src/subsrc/util.py
def foo():
return 222
# tests/unittests/test_aaa.py
from unittest import mock
from aaa import do_something
#mock.patch("aaa.foo")
def test_foo(foo_mocked):
foo_mocked.return_value = 111
value = do_something()
assert value == 111
When executing this like python aaa.py, I get the output as expected (value is 222) while the test passes with its assert value == 111.
In your example, #mock.patch("src.subsrc.util.foo") obviously worked, but probably didn't do what you intended. From your example code, I cannot see how #mock.patch("aaa.src.subsrc.util.foo") shouldn't have returned a ModuleNotFoundError.

Convert python script to directory with __main__.py

As much as I think I understand python's import system, I still find my self lost...
I want to change a file (which is my programs main entry point) into a directory, yet I can't get the imports to run successfully
I can't seem to understand how to get sys.path to match.
$ cat > prog.py << EOF
> import sys
> pprint(sys.path[0])
> EOF
$ python3 prog.py
/home/me/pyprogram
$ mkdir prog
$ mv prog.py prog/__main__.py
$ python3 prog
prog
$ mv prog/__main__.py prog/__init__.py
$ python3 prog/__init__.py
/home/me/pyprogram/prog
for a bit more context on what I am trying to achieve, (and I might be designing my program wrong, feedback gladly accepted)
$ tree --dirsfirst
.
├── prog
│ ├── data_process.py
│ └── __init__.py
├── destination.py
└── source.py
1 directory, 4 files
$ cat source.py
def get():
return 'raw data'
$ cat destination.py
def put(data):
print(f"{data} has ',
'/usr/lib/python37.zip',
'/usr/lib/python3.7',
'/usr/lib/python3.7/lib-dynload',
'/home/me/.local/lib/python3.7/site-packages',
'/usr/local/lib/python3.7/dist-packages',
'/usr/lib/python3/dist-packages']
been passed successfully")
$ cat prog/__init__.py
#!/usr/bin/env python
import os
class Task:
def __init__(self, func, args=None, kwargs=None):
self.func = func
self.args = args if args else []
self.kwargs = kwargs if kwargs else {}
def run(self):
self.func(*self.args, **self.kwargs)
tasks = []
def register_task(args=None, kwargs=None):
def registerer(func):
tasks.append(Task(func, args, kwargs))
return func
return registerer
for module in os.listdir(os.path.dirname(os.path.abspath(__file__))):
if module.startswith('_') or module.startswith('.'):
continue
__import__(os.path.splitext(module)[0])
del module
for task in tasks:
task.run()
$ cat prog/data_process.py
from source import get
from destination import put
from . import register_task
#register_task(kwargs={'replace_with': 'cleaned'})
def process(replace_with):
raw = get()
cleaned = raw.replace('raw', replace_with)
put(cleaned)
$ python3 prog/__init__.py
Traceback (most recent call last):
File "prog/__init__.py", line 27, in <module>
__import__(os.path.splitext(module)[0])
File "/home/me/pyprogram/prog/data_process.py", line 1, in <module>
from source import get
ModuleNotFoundError: No module named 'source'
$ mv prog/__init__.py prog/__main__.py
$ python3 prog/
Traceback (most recent call last):
File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/usr/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "prog/__main__.py", line 27, in <module>
__import__(os.path.splitext(module)[0])
File "prog/data_process.py", line 1, in <module>
from source import get
ModuleNotFoundError: No module named 'source'
Project structure update
I changed the structure;
1. Placing all libraries into utils.
2. Placing all projects into projects (using __init__.py to allow for easy import of all created projects in the folder).
3. Main program script program.py in the top project directory.
Project structure:
$ tree
.
├── utils
│   ├── source.py
│   ├── remote_dest.py
│   ├── local_dest.py
│   └── __init__.py
├── projects
│   ├── process2.py
│   ├── process1.py
│   └── __init__.py
└── program.py
Contents of libraries defined in utils directory:
$ cat utils/source.py
"""
Emulates expensive resource to get,
bringing the need to cache it for all client projects.
"""
import time
class _Cache:
def __init__(self):
self.data = None
_cache = _Cache()
def get():
"""
Exposed source API for getting the data,
get from remote resource or returns from available cache.
"""
if _cache.data is None: # As well as cache expiration.
_cache.data = list(_expensive_get())
return _cache.data
def _expensive_get():
"""
Emulate an expensive `get` request,
prints to console if it was invoked.
"""
print('Invoking expensive get')
sample_data = [
'some random raw data',
'which is in some raw format',
'it is so raw that it will need cleaning',
'but now it is very raw'
]
for row in sample_data:
time.sleep(1)
yield row
$ cat utils/remote_dest.py
"""
Emulate limited remote resource.
Use thread and queue to have the data sent in the backround.
"""
import time
import threading
import queue
_q = queue.Queue()
def put(data):
"""
Exposed remote API `put` method
"""
_q.put(data)
def _send(q):
"""
Emulate remote resource,
prints to console when data is processed.
"""
while True:
time.sleep(1)
data = q.get()
print(f"Sending {data}")
threading.Thread(target=_send, args=(_q,), daemon=True).start()
$ cat utils/local_dest.py
"""
Emulate second source of data destination.
Allowing to demonstrate need from shared libraries.
"""
import datetime
import os
# Create `out` dir if it doesn't yet exist.
_out_dir = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'out')
if not os.path.exists(_out_dir):
os.makedirs(_out_dir)
def save(data):
"""
Exposed API to store data locally.
"""
out_file = os.path.join(_out_dir, 'data.txt')
with open(out_file, 'a') as f:
f.write(f"[{datetime.datetime.now().strftime('%Y-%m-%d %H:%M:%S')}] {data}\n")
Main program execution script contents:
$ cat program.py
#!/usr/bin/env python
import os
class Task:
"""
Class storing `func` along with its `args` and `kwargs` to be run with.
"""
def __init__(self, func, args=None, kwargs=None):
self.func = func
self.args = args if args else []
self.kwargs = kwargs if kwargs else {}
def run(self):
"""
Executes stored `func` with its arguments.
"""
self.func(*self.args, **self.kwargs)
def __repr__(self):
return f"<Task({self.func.__name__})>"
# List that will store the registered tasks to be executed by the main program.
tasks = []
def register_task(args=None, kwargs=None):
"""
Registers decorated function along with the passed `args` and `kwargs` in the `tasks` list
as a `Task` for maintained execution.
"""
def registerer(func):
print(f"Appending '{func.__name__}' in {__name__}")
tasks.append(Task(func, args, kwargs)) # Saves the function as a task.
print(f"> tasks in {__name__}: {tasks}")
return func # returns the function untouched.
return registerer
print(f"Before importing projects as {__name__}. tasks: {tasks}")
import projects
print(f"After importing projects as {__name__}. tasks: {tasks}")
print(f"Iterating over tasks: {tasks} in {__name__}")
while True:
for task in tasks:
task.run()
break # Only run once in the simulation
Contents of the individual projects defined in the projects directory:
$ cat projects/process1.py
"""
Sample project that uses the shared remote resource to get data
and passes it on to another remote resource after processing.
"""
from utils.source import get
from utils.remote_dest import put
from program import register_task
#register_task(kwargs={'replace_with': 'cleaned'})
def process1(replace_with):
raw = get()
for record in raw:
put(record.replace('raw', replace_with))
$ cat projects/process2.py
"""
Sample project that uses the shared remote resource to get data
and saves it locally after processing.
"""
from utils.source import get
from utils.local_dest import save
from program import register_task
#register_task()
def process2():
raw = get()
for record in raw:
save(record.replace('raw', '----'))
Content of __init__.py file in the projects directory:
$ cat projects/__init__.py
"""
use __init__ file to import all projects
that might have been registered with `program.py` using `register_task`
"""
from . import process1, process2
# TODO: Dynamically import all projects (whether file or directory (as project)) that wil be created in the `projects` directory automatically (ignoring any modules that will start with an `_`)
# Something in the sense of:
# ```
# for module in os.listdir(os.path.dirname(os.path.abspath(__file__))):
# if module.startswith('_') or module.startswith('.'):
# continue
# __import__(os.path.splitext(module)[0])
# ```
Yet when I run the program I see that;
1. program.py gets executed twice (once as __main__ and once as program).
2. The tasks are appended (in the second execution run).
Yet when iterating over the tasks, none are found.
$ python3 program.py
Before importing projects as __main__. tasks: []
Before importing projects as program. tasks: []
After importing projects as program. tasks: []
Iterating over tasks: [] in program
Appending 'process1' in program
> tasks in program: [<Task(process1)>]
Appending 'process2' in program
> tasks in program: [<Task(process1)>, <Task(process2)>]
After importing projects as __main__. tasks: []
Iterating over tasks: [] in __main__
I don't understand;
Why is the main (program.py) file being executed twice, I thought that there can't be circular imports as python caches the imported modules?
(I took the idea of the circular imports used in flask applications, i.e. app.py imports routes, models etc. which all of them import app and use it to define the functionality, and app.py imports them back so that the functionality is added (as flask only runs app.py))
Why is the tasks list empty after the processes are appended to it?
After comparing my circular import to a flask based app that does circular imports as follows
Sample flask program that uses circular imports
Flask app structure
(venv) $ echo $FLASK_APP
mgflask.py
(venv) $ tree
.
├── app
│   ├── models
│   │   ├── __init__.py
│   │   ├── post.py
│   │   └── user.py
│   ├── templates/
│   ├── forms.py
│   ├── __init__.py
│   └── routes.py
├── config.py
└── mgflask.py
(venv) $ cat mgflask.py
#!/usr/bin/env python
from app import app
# ...
(venv) $ cat app/__init__.py
from flask import Flask
from config import Config
# ... # config imports
app = Flask(__name__) # <---
# ... # config setup
from . import routes, models, errors # <---
(venv) $ cat app/routes.py
from flask import render_template, flash, redirect, url_for, request
# ... # import extensions
from . import app, db # <---
from .forms import ...
from .models import ...
#app.route('/')
def index():
return render_template('index.html', title='Home')
(venv) $ flask run
* Serving Flask app "mgflask.py" (lazy loading)
* Environment: production
WARNING: Do not use the development server in a production environment.
Use a production WSGI server instead.
* Debug mode: on
* Running on http://127.0.0.1:5000/ (Press CTRL+C to quit)
* Restarting with stat
* Debugger is active!
* Debugger PIN: ???-???-???
I restructured my app by;
I moved the Task class, tasks list, register_task decorator function into projects/__init__.py and in the bottom of the init.py file I import the projects defined in the directory
In the program.py file I just from projects import tasks and everything works as desired.
the only question that stays is what is the difference between running prog.py vs prog/ (which contains __main__.py) (first iteration of my question here...)

How to execute python file from other directory?

I've got this structure:
│
├ main.py
├ dir
| ├─ data.txt
| └─ other.py
Contents from other.py:
print(open('data.txt', 'utf-8').read())
I run main.py. It must start dir/other.py.
But other.py for works needs data.txt. Is there a way to start other.py from main.py, not editing other.py?
Note
User must be able to start other.py manualy without any errors
For this purpose you can use the import keyword. All you have to do is create an __init__.py script under the dir directory which will define the directory as a library. Then you can just use import others in the main script.
It is recommended to modify the others.py script with the below snippet
if __name__ == '__main__':
// do stuff
otherwise it will execute the library each time you import it
update
It is far more simple. You just have to change directory with the os.chdir("./dir") call. After that you can run a simple import and the script will be executed.
./dir/other.py:
print("Module starts")
print(open('data', 'r').read())
print("Module ends")
./main.py
print("Main start")
import os
os.chdir("./dir")
from others import other
print("Main end" )
You can import other in main file like from dir.other import *

Using a fake mongoDB for pytest testing

I have code that connects to a MongoDB Client and I'm trying to test it. For testing, I don't want to connect to the actual client, so I'm trying to figure out make a fake one for testing purposes. The basic flow of the code is I have a function somewhere the creates a pymongo client, then queries that and makes a dict that is used elsewhere.
I want to write some tests using pytest that will test different functions and classes that will call get_stuff. My problem is that get_stuff calls mongo() which is what actually makes the connection to the database. I was trying to just use pytest.fixture(autouse=True) and mongomock.MongoClient() to replace mongo().
But this isn't replacing the mongo_stuff.mongo(). Is there some way I can tell pytest to replace a function so my fixture is called instead of the actual function? I thought making the fixture would put my testing mongo() higher priority in the namespace than the function in the actual module.
Here is an example file structure with my example:
.
├── project
│   ├── __init__.py
│   ├── mongo_stuff
│   │   ├── __init__.py
│   │   └── mongo_stuff.py
│   └── working_class
│   ├── __init__.py
│   └── somewhere_else.py
└── testing
├── __init__.py
└── test_stuff.py
mongo_stuff.py
import pymongo
def mongo():
return pymongo.MongoClient(connection_params)
def get_stuff():
db = mongo() # Makes the connection using another function
stuff = query_function(db) # Does the query and makes a dict
return result
somewhere_else.py
from project.mongo_stuff import mongo_stuff
mongo_dict = mongo_stuff.get_stuff()
test_stuff.py
import pytest
import mongomock
#pytest.fixture(autouse=True)
def patch_mongo(monkeypatch):
db = mongomock.MongoClient()
def fake_mongo():
return db
monkeypatch.setattr('project.mongo_stuff.mongo', fake_mongo)
from poject.working_class import working_class # This starts by calling project.mongo_stuff.mongo_stuff.get_stuff()
And this will currently give me a connection error since the connection params in mongo_stuff.py are only made to work in the production environment. If I put the import statement from test_stuff.py into a test function, then it works fine and mongomock db will be used in the testing enviornment. I also tried change the setattr to monkeypatch.setattr('project.working_class.mongo_stuff.mongo', fake_mongo) which also does not work.
You're halfway there: you have created a mock for the db client, now you have to patch the mongo_stuff.mongo function to return the mock instead of a real connection:
#pytest.fixture(autouse=True)
def patch_mongo(monkeypatch):
db = mongomock.MongoClient()
def fake_mongo():
return db
monkeypatch.setattr('mongo_stuff.mongo', fake_mongo)
Edit:
The reason why you get the connection error is that you are importing somewhere_else on module level in test_stuff, and somewhere_else runs connection code also on module level. So patching with fixtures will come too late and will have no effect. You have to patch the mongo client before the import of somewhere_else if you want to import on module level. This will avoid the error raise, but is extremely ugly:
from project.mongo_stuff import mongo_stuff
import mongomock
import pytest
from unittest.mock import patch
with patch.object(mongo_stuff, 'mongo', return_value=mongomock.MongoClient()):
from project.working_class import somewhere_else
#patch.object(mongo_stuff, 'mongo', return_value=mongomock.MongoClient())
def test_db1(mocked_mongo):
mongo_stuff.mongo()
assert True
#patch.object(mongo_stuff, 'mongo', return_value=mongomock.MongoClient())
def test_db2(mocked_mongo):
somewhere_else.foo()
assert True
You should rather avoid running code on module level when possible, or run the imports that execute code on module level inside the tests (as you already found out in the comments).

Categories

Resources