Python: Project Package / Module structure dependency Problem

Python: Project Package / Module structure dependency Problem - python

I was hoping someone could help me figure out an odd "dependency" problem. I have a fairly large python project, with a slimmed down structure that looks like:
Sitka
│ DataTickers.py
│ example.csv
│ FinDates.py
│ SitkaMongo.py
│ tickers_csv.csv
│ __init__.py
│
├───Fin
│ │ main.py
│ │ md_provider_control.py
│ │ Tofino.py
│ │ __init__.py
│ │
│ │
│ ├───Instruments
│ │ │ market_standard_instruments.py
│ │ └ __init__.py
│ │
│ ├───Env
│ │ │ CurveClass.py
│ │
│ ├───Utils
│ │ charting.py
│ │ exchange_identifier_mapper.py
│ │ fin_mapper.py
│ │ md_provider_simulation.py
│ └ __init__.py
Tofino.py:
from .Env.CurveClass import CurveData as _CurveData
class Tofino():
def __init__(self, mdp, VAL_ENV = None):
mdp.tofino = self # link Tofino
# Public VE Refernce
self.val_env = VAL_ENV
self.ir_config = VAL_ENV.market
market_standard_instruments.py:
# Standard Imports
import Sitka.FinDates as fdate
import datetime as dt
import re
from itertools import product
# bunch of functions after this.
CurveClass.py:
import pandas as pd
import datetime as dt
from dateutil.relativedelta import relativedelta
class CurveData():
def __init__(self):
self.do_stuff= self._stuff()
main.py
from Sitka.FinDates import getMainDates
# Sitka- Custom Imports
from .md_provider_control import MD_ProviderV3
from .Tofino import Tofino
import Sitka.Fin.Instruments.market_standard_instruments as mkt_std
def main() -> Tofino:
# < ---- do a bunch of stuff ---- >
return Tofino(mdp = mdp, VAL_ENV=ve.GLOBAL_VALN_ENV)
And lastly, Sitka.Fin.__ init __.py:
import logging
import traceback
# Run Valuation Environment Startup
from .main import main
# Global Variables:
from .Tofino import Tofino as _Tofino
tofino : _Tofino
tofino = None
try:
tofino = main() # I was trying some stuff out here, hence the weird traceback in try
except:
print(traceback.format_exc())
My issue is, after all that, is when I run import Sitka.Fin as fin, this line in main.py
import Sitka.Fin.Instruments.market_standard_instruments as mkt_std
fires off the Sitka.Fin__init__ process again before we even get to the try block (so init basically runs 2x).
Any help is appreciated!
P.S. Basically I'm just including subfolder init's because its the only way I know how to get Intellsense/autocomplete in the IDE to work nicely... I would love to know how to make my code 'cleaner' from that sense.
Edit:
A simpler way to look at the problem. Lets say I open a new IPython console, and only do:
import Sitka.Fin.Instruments.market_standard_instruments as mkt_std
Simply doing this kicks off the entire Sitka.Fin.__init__ procedure [which I wouldn't have expected]

It seems you only want some code of the main.py to run when the file itself is running. Try using:
if __name__ in "__main__": # All sikta imports
from Sitka.FinDates import getMainDates
from .md_provider_control import MD_ProviderV3
from .Tofino import Tofino
import Sitka.Fin.Instruments.market_standard_instruments as mkt_std

Related

Importing Python modules in large projects and "ModuleNotFoundError"

I have faced a rather famous issue while importing my python modules inside the project.
This code is written to replicate the existing situation:
multiply_func.py:
def multiplier(num_1, num_2):
return num_1 * num_2
power_func.py:
from math_tools import multiplier
def pow(num_1, num_2):
result = num_1
for _ in range(num_2 - 1):
result = multiplier(num_1, result)
return result
The project structure:
project/
│ main.py
│
└─── tools/
│ __init__.py
│ power_func.py
│
└─── math_tools/
│ __init__.py
│ multiply_func.py
I've added these lines to __init__ files to make the importing easier:
__init__.py (math_tools):
from .multiply_func import multiplier
__init__.py (tools):
from .power_func import pow
from .math_tools.multiply_func import multiplier
Here is my main file.
main.py:
from tools import pow
print(pow(2, 3))
Whenever I run it, there is this error:
>>> ModuleNotFoundError: No module named 'math_tools'
I tried manipulating the sys.path, but I had no luck eliminating this puzzling issue. I'd appreciate your kind help. Thank you in advance!

You messed it up in the "power_func.py" file.
You have to use . before math_tools to refer to the current directory module.
Update "power_func.py" as bellow, it works perfectly.
from .math_tools import multiplier

Python how to shorten import path

My Project looks like this:
├─outer_module
│ │ __init__.py
│ │
│ └─inner_module
│ a.py
│ b.py
├─test.py
__init__.py:
from outer_module.inner_module import a
from outer_module.inner_module import b
a.py:
instance_a = 1
b.py:
instance_b = 1
print("instance_b created!")
test.py:
from outer_module.inner_module import a
I want to shorten import path in test.py, i.e. use from outer_module import a. That is not unusual when I turn my project into a release module. But using __init__.py, it will automatically invoke b.py and print instance_b created!. Seperating a.py and b.py from inner_module is not recommended because they are functionally similar. Other .py file may invoke b.py so b.py must appear in __init__.py
Could anyone give some advice?

In your __init__.py file try importing the a and b files individually and adding the two imports to your __all__ variable.
from outer_module.inner_module import a
from outer_module.inner_module import b
__all__ = [
'a',
'b',
]
Now you can import a and b directly from outer_module.
from outermodule import a
import outermodule.b as b

What package is this: from schemas.tokens import Token

In this tutorial one line of the code reads
from schemas.tokens import Token
Which package do I need to install? I cannot find it out by Google.

Further down the tutorial we read:
We need a schema to verify that we are returning an access_token and token_type as defined in our response_model. Let's put this code in schemas > tokens.py
So it's a package created in the tutorial itself, i.e. a custom package, not from some library.

yeah. thats the problem.
if you've read the entire tutorial, you would see this tree structure
backend/
├─.env
├─apis/
│ └─general_pages/
│ └─route_homepage.py
├─core/
│ └─config.py
├─db/
│ ├─base.py
│ ├─base_class.py
│ ├─models/
│ │ ├─jobs.py
│ │ └─users.py
│ └─session.py
├─main.py
├─requirements.txt
├─schemas/ # <---------------- HERE
│ ├─jobs.py
│ └─users.py
├─static/
│ └─images/
│ └─logo.png
└─templates/
├─components/
│ └─navbar.html
├─general_pages/
│ └─homepage.html
└─shared/
└─base.html
where schemas is package inside root project

How do I pass in a self argument to python cProfile

I am trying to use cProfiling with python.
My python project has the following directory structure:
my-project
├── src
│ ├── lib
│ └── app
│ └── data
│ └── car_sim.py
│
│
│
│
├── ptests
│ ├── src
│ └── lib
│ └── app
│ └── data
│ └── cprofile_test.py
I have a function inside car_sim.py that I want to cprofile and it is called "sim_text". It contains a function called:
#car_sim.py
import os
class RootSimulator:
def sim_text(self, text):
return text
I use the following code inside cprofile_test.py:
#cprofile_test.py
import cProfile
import pstats
import io
import src.lib.app.data.car_sim as car_sim_functions
pr = cProfile.Profile()
pr.enable()
text = 'my blabla sentence' #i can pass in this text below i guess...
#how do i pass to the below????!!
my_result = car_sim_functions.RootSimulator.sim_text()
pr.disable()
s = io.StringIO()
ps = pstats.Stats(pr, stream=s).sort_stats('tottime')
ps.print_stats()
with open('test.txt', 'w+') as f:
f.write(s.getvalue())
Now... when I run it using the command
python -m cProfile ptests/src/lib/app/data/cprofile_test.py
I get the following error:
TypeError: sim_text() missing 2 required positional arguments: 'self' and 'text'
My question is... It expects 2 args, so how do I pass in the "self" arg. For the 2nd arg, "text" I can pass in a value no problem.

class RootSimulator:
def sim_text(self, text):
return text
Defines an instance method on instances of RootSimulator. You are trying to call sim_text from the class itself. You need to create an instance:
simulator = car_sim_functions.RootSimulator()
my_result = simulator.sim_text()
If sim_text() does not actually need to be attached to an instance of the simulator, perhaps you don't need a class at all (just make it a plain function), or you could make it a static method:
class RootSimulator:
#staticmethod
def sim_text(text):
return text
Note that it doesn't need self anymore.

Multiprocessing error : can't import module

I'm getting an error message that i'm unable to tackle. I don't get what's the issue with the multiprocessing library and i don't understand why it says that it is impossible to import the build_database module but in the same time it executes perfectly a function from that module.
Could somebody tell me is he sees something. Thank you.
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Python27\lib\multiprocessing\forking.py", line 380, in main
Traceback (most recent call last):
File "<string>", line 1, in <module>
prepare(preparation_data)
File "C:\Python27\lib\multiprocessing\forking.py", line 380, in main
File "C:\Python27\lib\multiprocessing\forking.py", line 495, in prepare
prepare(preparation_data)
'__parents_main__', file, path_name, etc
File "C:\Python27\lib\multiprocessing\forking.py", line 495, in prepare
File "C:\Users\Comp3\Desktop\User\Data\main.py", line 4, in <module>
'__parents_main__', file, path_name, etc
import database.build_database
File "C:\Users\Comp3\Desktop\User\Data\main.py", line 4, in <module>
ImportError : import database.build_database
NImportErroro module named build_database:
No module named build_database
This is what i have in my load_bigquery.py file:
# Send CSV to Cloud Storage
def load_send_csv(table):
job = multiprocessing.current_process().name
print '[' + table + '] : job starting (' + job + ')'
bigquery.send_csv(table)
#timer.print_timing
def send_csv(tables):
jobs = []
build_csv(tables)
for t in tables:
if t not in csv_targets:
continue
print ">>>> Starting " + t
# Load CSV in BigQuery, as parallel jobs
j = multiprocessing.Process(target=load_send_csv, args=(t,))
jobs.append(j)
j.start()
# Wait for jobs to complete
for j in jobs:
j.join()
And i call it like this from my main.py :
bigquery.load_bigquery.send_csv(tables)
My folder is like this:
src
| main.py
|
├───bigquery
│ │ bigquery.py
│ │ bigquery2.dat
│ │ client_secrets.json
│ │ herokudb.py
│ │ herokudb.pyc
│ │ distimo.py
│ │ flurry.py
│ │ load_bigquery.py
│ │ load_bigquery.pyc
│ │ timer.py
│ │ __init__.py
│ │ __init__.pyc
│ │
│ │
├───database
│ │ build_database.py
│ │ build_database.pyc
│ │ build_database2.py
│ │ postgresql.py
│ │ timer.py
│ │ __init__.py
│ │ __init__.pyc
That function works perfectly if i execute load_bigquery.py alone but if i import it into main.py it fails with the errors given above.
UPDATE :
Here are my import, maybe it might help:
main.py
import database.build_database
import bigquery.load_bigquery
import views.build_analytics
import argparse
import getopt
import sys
import os
load_bigquery.py
import sys
import os
import subprocess
import time
import timer
import distimo
import flurry
import herokudb
import bigquery
import multiprocessing
import httplib2
bigquery.py
import sys
import os
import subprocess
import json
import time
import timer
import httplib2
from pprint import pprint
from apiclient.discovery import build
from oauth2client.file import Storage
from oauth2client.client import AccessTokenRefreshError
from oauth2client.client import OAuth2WebServerFlow
from oauth2client.client import flow_from_clientsecrets
from oauth2client.tools import run
from apiclient.errors import HttpError
Maybe the issue is with the fact that load_bigquery.py imports multiprocessing and then main.py imports load_bigquery.py ?

You are probably missing the __init__.py inside src/bigquery/. So your source folders should be:
> src/main.py
> src/bigquery/__init__.py
> src/bigquery/load_bigquery.py
> src/bigquery/bigquery.py
The __init__.py just needs to be empty and is only there so that Python knows that bigquery is a Python package.
UPDATED: Apparently the __init__.py file is present. The actual error message talks about a different error, which is it cannot import database.build_database.
My suggestion is to look into that. It is not mentioned as being in the src folder...
UPDATE 2: I think you have a clash with your imports. Python 2 has a slightly fuzzy relative import, which sometimes catches people out. You have both a package at the same level of main.py called database and one inside bigquery called database. I think somehow you are ending up with the one inside bigquery, which doesn't have build_database. Try renaming one of them.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Project Package / Module structure dependency Problem - python

Related

Importing Python modules in large projects and "ModuleNotFoundError"

Python how to shorten import path

What package is this: from schemas.tokens import Token

How do I pass in a self argument to python cProfile

Multiprocessing error : can't import module

Categories

Resources