I have a Python module I'm developing that's includes a big JSON file as part of the data it relies on. I want Python users to be able to import the JSON file as a Python variable and for users of other programming languages, to be able to use the JSON file directly.
So, what I'm trying to figure out is what's the best way to make the JSON object "importable". Right now, my solution is in __init__.py:
import json
import os
with open(os.path.join(os.path.dirname(os.path.realpath(__file__)), 'seals.json')) as f:
seals_data = json.load(f)
Then Python devs can call:
from my_module import seals_data
And it more or less works, but this feels weird to me, and I want to make sure there isn't a cleaner way to make the json importable.
Your option is probably the best way to store the data as a python structure in a module for someone to use. The only down side I see with it is anything that imports that module will have to wait for it to do IO, which is not ideal, esp if the module is ever used for things that do not involved that data.
I would use a class, or function to lazily load that data, allowing that IO hit to be offset to only when that data is requested.
Here is an example:
SEALS_DATA = None
def get_seals_data():
global SEALS_DATA
if SEALS_DATA is None:
SEALS_DATA = json.load(open("%s/seals.json"%(os.path.dirname(__file__))))
return SEALS_DATA
After they import the module they will have no deal until the first time they call get_seals_data(). Any call after that should already have the data loaded.
Related
I have kind of a tricky question, so that it is difficult to even describe it.
Suppose I have this script, which we will call master:
#in master.py
import slave as slv
def import_func():
import time
slv.method(import_func)
I want to make sure method in slave.py, which looks like this:
#in slave.py
def method(import_func):
import_func()
time.sleep(10)
actually runs like I imported the time package. Currently it does not work, I believe because the import stays exists only in the scope of import_func().
Keep in mind that the rules of the game are:
I cannot import anything in slave.py outside method
I need to pass the imports which method needs through import_func() in master.py
the procedure must work for a variable number of imports inside method. In other words, method cannot know how many imports it will receive but needs to work nonetheless.
the procedure needs to work for any import possible. So options like pyforest are not suitable.
I know it can theoretically be done through importlib, but I would prefer a more straightforward idea, because if we have a lot of imports with different 'as' labels it would become extremely tedious and convoluted with importlib.
I know it is kind of a quirky question but I'd really like to know if it is possible. Thanks
What you can do is this in the master file:
#in master.py
import slave as slv
def import_func():
import time
return time
slv.method(import_func)
Now use time return value in the slave file:
#in slave.py
def method(import_func):
time = import_func()
time.sleep(10)
Why would you have to do this? It's because of the application's stack. When import_func() is called on slave.py, it imports the library on the stack. However, when the function terminates, all stack data is released from memory. So the library would get released and collected by the garbage collector.
By returning time from import_func(), you guarantee it continues existing in memory once the function terminates executing.
Now, to import more modules? Simple. Return a list with multiples modules inside. Or maybe a Dictionary for simple access. That's one way of doing it.
[Edit] Using a dictionary and importlib to pass multiple imports to slave.py:
master.py:
import test2 as slv
import importlib
def master_import(packname, imports={}):
imports[packname] = importlib.import_module(packname)
def import_func():
imports = {}
master_import('time', imports)
return imports
slv.method(import_func)
slave.py:
#in slave.py
def method(import_func):
imports = import_func()
imports['time'].sleep(10)
This way, you can literally import any modules you want on master.py side, using master_import() function, and pass them to slave script.
Check this answer on how to use importlib.
Is it possible to import a Python module from over the internet using the http(s), ftp, smb or any other protocol? If so, how? If not, why?
I guess it's about making Python use more the one protocol(reading the filesystem) and enabling it to use others as well. Yes I agree it would be many folds slower, but some optimization and larger future bandwidths would certainly balance it out.
E.g.:
import site
site.addsitedir("https://bitbucket.org/zzzeek/sqlalchemy/src/e8167548429b9d4937caaa09740ffe9bdab1ef61/lib")
import sqlalchemy
import sqlalchemy.engine
Another version,
I like this answer. when applied it, i simplified it a bit - similar to the look and feel of javascript includes over HTTP.
This is the result:
import os
import imp
import requests
def import_cdn(uri, name=None):
if not name:
name = os.path.basename(uri).lower().rstrip('.py')
r = requests.get(uri)
r.raise_for_status()
codeobj = compile(r.content, uri, 'exec')
module = imp.new_module(name)
exec (codeobj, module.__dict__)
return module
Usage:
redisdl = import_cdn("https://raw.githubusercontent.com/p/redis-dump-load/master/redisdl.py")
# Regular usage of the dynamic included library
json_text = redisdl.dumps(host='127.0.0.1')
Tip - place the import_cdn function in a common library, this way you could re-use this small function
Bear in mind It will fail when no connectivity to that file over http
In principle, yes, but all of the tools built-in which kinda support this go through the filesystem.
To do this, you're going to have to load the source from wherever, compile it with compile, and exec it with the __dict__ of a new module. See below.
I have left the actually grabbing text from the internet, and parsing uris etc as an exercise for the reader (for beginners: I suggest using requests)
In pep 302 terms, this would be the implementation behind a loader.load_module function (the parameters are different). See that document for details on how to integrate this with the import statement.
import imp
modulesource = 'a=1;b=2' #load from internet or wherever
def makemodule(modulesource,sourcestr='http://some/url/or/whatever',modname=None):
#if loading from the internet, you'd probably want to parse the uri,
# and use the last part as the modulename. It'll come up in tracebacks
# and the like.
if not modname: modname = 'newmodulename'
#must be exec mode
# every module needs a source to be identified, can be any value
# but if loading from the internet, you'd use the URI
codeobj = compile(modulesource, sourcestr, 'exec')
newmodule = imp.new_module(modname)
exec(codeobj,newmodule.__dict__)
return newmodule
newmodule = makemodule(modulesource)
print(newmodule.a)
At this point newmodule is already a module object in scope, so you don't need to import it or anything.
modulesource = '''
a = 'foo'
def myfun(astr):
return a + astr
'''
newmod = makemodule(modulesource)
print(newmod.myfun('bat'))
Ideone here: http://ideone.com/dXGziO
Tested with python 2, should work with python 3 (textually compatible print used;function-like exec syntax used).
This seems to be a use case for a self-written import hook. Look up in PEP 302 how exactly they work.
Essentially, you'll have to provide a finder object which, in turn, provides a loader object. I don't understand the process at the very first glance (otherwise I'd be more explicit), but the PEP contains all needed details for implementing the stuff.
As glglgl's has it this import hook has been implemented for Python2 and Python3 in a module called httpimport.
It uses a custom finder/loader object to locate resources using HTTP/S.
Additionally, the import_cdn function in Jossef Harush's answer is almost identically implemented in httpimport's github_repo, and bitbucket_repo functions.
#Marcin's answer contains a good portion of the code of the httpimport's loader class.
Is there a downside to importing the same built-in module in both my script and custom module?
I have a script that: imports my custom module and imports the built-in csv module to open a csv file and append any necessary content to a list.
I then have a method in my custom module in which i pass a path, filename and list and writes a csv, but I have to import the csv module again (in my module).
I do not understand what happens when I import the csv module twice so I wanted to know if there is a more uniformed way of doing what I'm doing or if this is ok.
No, there is no downside. Importing a module does two things:
If not yet in memory, load the module, storing the resulting object in sys.modules.
Bind names to either the module object (import modulename) or to attributes of the module object (from modulename import objectname).
Additional imports only execute step 2, as the module is already loaded.
See The import system in the Python reference documentation for the nitty gritty details:
The import statement combines two operations; it searches for the named module, then it binds the results of that search to a name in the local scope.
The short answer is no, there is no downside.
That being said, it may be helpful to understand what imports mean, particularly for anyone new to programming or coming from a different language background.
I imagine your code looks something like this:
# my_module.py
import os
import csv
def bar(path, filename, list):
full_path = os.path.join(path, filename)
with open(full_path, 'w') as f:
csv_writer = csv.writer
csv_writer.writerows(list)
and
# my_script.py
import csv
import my_module
def foo(path):
contents = []
with open(path, 'r') as f:
csv_reader = csv.reader(f)
for row in csv_reader:
contents.append(row)
As a high-level overview, when you do an import in this manner, Python determines whether the module has already been imported. If not, then it searches the Python path to determine where the imported module lives on the file system, then it loads the imported module's code into memory and executes it. The interpreter takes all objects that are created during the execution of the imported module and makes them attributes on a new module object that the interpreter creates. Then the interpreter stores this module object into a dictionary-like structure that maps the module name to the module object. Finally, the interpreter brings the imported module's name into the importing module's scope.
This has some interesting consequences. For example, it means that you could simply use my_module.csv to access the csv module within my_script.py. It also means that importing csv in both is trivial and is probably the clearest thing you can do.
One very interesting consequence is that if any statements that get executed during import have any side effects, those side effects will only happen when the module is first loaded by the interpreter. For example, suppose you had two modules a.py and b.py with the following code:
# a.py
print('hello world')
# b.py
print('goodbye world')
import a
If you run import a followed by import b then you will see
>>> import a
hello world
>>> import b
goodbye world
>>>
However, if you import in the opposite order, you get this:
>>> import b
goodbye world
hello world
>>> import a
>>>
Anyway, I think I've rambled enough and I hope I've adequately answered the question while giving some background. If this is at all interesting, I'd recommend Allison Kaptur's PyCon 2014 talk about import.
You can import the same module in separate files (custom modules) as far as I know. Python keeps track of already imported modules and knows how to resolve a second import.
Is it possible to import a Python module from over the internet using the http(s), ftp, smb or any other protocol? If so, how? If not, why?
I guess it's about making Python use more the one protocol(reading the filesystem) and enabling it to use others as well. Yes I agree it would be many folds slower, but some optimization and larger future bandwidths would certainly balance it out.
E.g.:
import site
site.addsitedir("https://bitbucket.org/zzzeek/sqlalchemy/src/e8167548429b9d4937caaa09740ffe9bdab1ef61/lib")
import sqlalchemy
import sqlalchemy.engine
Another version,
I like this answer. when applied it, i simplified it a bit - similar to the look and feel of javascript includes over HTTP.
This is the result:
import os
import imp
import requests
def import_cdn(uri, name=None):
if not name:
name = os.path.basename(uri).lower().rstrip('.py')
r = requests.get(uri)
r.raise_for_status()
codeobj = compile(r.content, uri, 'exec')
module = imp.new_module(name)
exec (codeobj, module.__dict__)
return module
Usage:
redisdl = import_cdn("https://raw.githubusercontent.com/p/redis-dump-load/master/redisdl.py")
# Regular usage of the dynamic included library
json_text = redisdl.dumps(host='127.0.0.1')
Tip - place the import_cdn function in a common library, this way you could re-use this small function
Bear in mind It will fail when no connectivity to that file over http
In principle, yes, but all of the tools built-in which kinda support this go through the filesystem.
To do this, you're going to have to load the source from wherever, compile it with compile, and exec it with the __dict__ of a new module. See below.
I have left the actually grabbing text from the internet, and parsing uris etc as an exercise for the reader (for beginners: I suggest using requests)
In pep 302 terms, this would be the implementation behind a loader.load_module function (the parameters are different). See that document for details on how to integrate this with the import statement.
import imp
modulesource = 'a=1;b=2' #load from internet or wherever
def makemodule(modulesource,sourcestr='http://some/url/or/whatever',modname=None):
#if loading from the internet, you'd probably want to parse the uri,
# and use the last part as the modulename. It'll come up in tracebacks
# and the like.
if not modname: modname = 'newmodulename'
#must be exec mode
# every module needs a source to be identified, can be any value
# but if loading from the internet, you'd use the URI
codeobj = compile(modulesource, sourcestr, 'exec')
newmodule = imp.new_module(modname)
exec(codeobj,newmodule.__dict__)
return newmodule
newmodule = makemodule(modulesource)
print(newmodule.a)
At this point newmodule is already a module object in scope, so you don't need to import it or anything.
modulesource = '''
a = 'foo'
def myfun(astr):
return a + astr
'''
newmod = makemodule(modulesource)
print(newmod.myfun('bat'))
Ideone here: http://ideone.com/dXGziO
Tested with python 2, should work with python 3 (textually compatible print used;function-like exec syntax used).
This seems to be a use case for a self-written import hook. Look up in PEP 302 how exactly they work.
Essentially, you'll have to provide a finder object which, in turn, provides a loader object. I don't understand the process at the very first glance (otherwise I'd be more explicit), but the PEP contains all needed details for implementing the stuff.
As glglgl's has it this import hook has been implemented for Python2 and Python3 in a module called httpimport.
It uses a custom finder/loader object to locate resources using HTTP/S.
Additionally, the import_cdn function in Jossef Harush's answer is almost identically implemented in httpimport's github_repo, and bitbucket_repo functions.
#Marcin's answer contains a good portion of the code of the httpimport's loader class.
Summary: when a certain python module is imported, I want to be able to intercept this action, and instead of loading the required class, I want to load another class of my choice.
Reason: I am working on some legacy code. I need to write some unit test code before I start some enhancement/refactoring. The code imports a certain module which will fail in a unit test setting, however. (Because of database server dependency)
Pseduo Code:
from LegacyDataLoader import load_me_data
...
def do_something():
data = load_me_data()
So, ideally, when python excutes the import line above in a unit test, an alternative class, says MockDataLoader, is loaded instead.
I am still using 2.4.3. I suppose there is an import hook I can manipulate
Edit
Thanks a lot for the answers so far. They are all very helpful.
One particular type of suggestion is about manipulation of PYTHONPATH. It does not work in my case. So I will elaborate my particular situation here.
The original codebase is organised in this way
./dir1/myapp/database/LegacyDataLoader.py
./dir1/myapp/database/Other.py
./dir1/myapp/database/__init__.py
./dir1/myapp/__init__.py
My goal is to enhance the Other class in the Other module. But since it is legacy code, I do not feel comfortable working on it without strapping a test suite around it first.
Now I introduce this unit test code
./unit_test/test.py
The content is simply:
from myapp.database.Other import Other
def test1():
o = Other()
o.do_something()
if __name__ == "__main__":
test1()
When the CI server runs the above test, the test fails. It is because class Other uses LegacyDataLoader, and LegacydataLoader cannot establish database connection to the db server from the CI box.
Now let's add a fake class as suggested:
./unit_test_fake/myapp/database/LegacyDataLoader.py
./unit_test_fake/myapp/database/__init__.py
./unit_test_fake/myapp/__init__.py
Modify the PYTHONPATH to
export PYTHONPATH=unit_test_fake:dir1:unit_test
Now the test fails for another reason
File "unit_test/test.py", line 1, in <module>
from myapp.database.Other import Other
ImportError: No module named Other
It has something to do with the way python resolves classes/attributes in a module
You can intercept import and from ... import statements by defining your own __import__ function and assigning it to __builtin__.__import__ (make sure to save the previous value, since your override will no doubt want to delegate to it; and you'll need to import __builtin__ to get the builtin-objects module).
For example (Py2.4 specific, since that's what you're asking about), save in aim.py the following:
import __builtin__
realimp = __builtin__.__import__
def my_import(name, globals={}, locals={}, fromlist=[]):
print 'importing', name, fromlist
return realimp(name, globals, locals, fromlist)
__builtin__.__import__ = my_import
from os import path
and now:
$ python2.4 aim.py
importing os ('path',)
So this lets you intercept any specific import request you want, and alter the imported module[s] as you wish before you return them -- see the specs here. This is the kind of "hook" you're looking for, right?
There are cleaner ways to do this, but I'll assume that you can't modify the file containing from LegacyDataLoader import load_me_data.
The simplest thing to do is probably to create a new directory called testing_shims, and create LegacyDataLoader.py file in it. In that file, define whatever fake load_me_data you like. When running the unit tests, put testing_shims into your PYTHONPATH environment variable as the first directory. Alternately, you can modify your test runner to insert testing_shims as the first value in sys.path.
This way, your file will be found when importing LegacyDataLoader, and your code will be loaded instead of the real code.
The import statement just grabs stuff from sys.modules if a matching name is found there, so the simplest thing is to make sure you insert your own module into sys.modules under the target name before anything else tries to import the real thing.
# in test code
import sys
import MockDataLoader
sys.modules['LegacyDataLoader'] = MockDataLoader
import module_under_test
There are a handful of variations on the theme, but that basic approach should work fine to do what you describe in the question. A slightly simpler approach would be this, using just a mock function to replace the one in question:
# in test code
import module_under_test
def mock_load_me_data():
# do mock stuff here
module_under_test.load_me_data = mock_load_me_data
That simply replaces the appropriate name right in the module itself, so when you invoke the code under test, presumably do_something() in your question, it calls your mock routine.
Well, if the import fails by raising an exception, you could put it in a try...except loop:
try:
from LegacyDataLoader import load_me_data
except: # put error that occurs here, so as not to mask actual problems
from MockDataLoader import load_me_data
Is that what you're looking for? If it fails, but doesn't raise an exception, you could have it run the unit test with a special command line tag, like --unittest, like this:
import sys
if "--unittest" in sys.argv:
from MockDataLoader import load_me_data
else:
from LegacyDataLoader import load_me_data