I am writing my first python package which I want to upload on PyPI. I structured my code based on this blog post.
I want to store user setting in a config.ini file. Read it once(every time the package is run) in separate python module in same package and save user setting in global variables of that module. Later import those in other modules.
To recreate the error I just edited few lines of code, in the template described in the blog post. (Please refer to it since it would take too much typing to recreate entire thing here in question.)
The only difference is that my stuff.py reads from config file like this:
from ConfigParser import SafeConfigParser
config = SafeConfigParser()
config.read('config.ini')
TEST_KEY = config.get('main', 'test_key')
Here are the contents of config.ini (placed in same dir as stuff.py):
[main]
test_key=value
And my bootstrap.py just imports and print the TEST_KEY
from .stuff import TEST_KEY
def main():
print(TEST_KEY)
But on executing the package, the import fails give this error
Traceback (most recent call last):
File "D:\Coding\bootstrap\bootstrap-runner.py", line 8, in <module>
from bootstrap.bootstrap import main
File "D:\Coding\bootstrap\bootstrap\bootstrap.py", line 11, in <module>
from .stuff import TEST_KEY
File "D:\Coding\bootstrap\bootstrap\stuff.py", line 14, in <module>
TEST_KEY = config.get('main', 'test_key')
File "C:\Python27\Lib\ConfigParser.py", line 607, in get
raise NoSectionError(section)
ConfigParser.NoSectionError: No section: 'main'
Import keeps giving ConfigParser.NoSectionError, but if you build/run only stuff.py(I use sublime3), the module gives no errors and printing TEST_KEY gives value as output.
Also, this method of import does work when I just use 3 files(config, stuff, main) in a dir and just execute the main as a script. But there I had to import it like this
from stuff import TEST_KEY
I'm just using the explicit relative imports as described in that post but don't have enough understanding of those. I guess the error is due to project structure and import, since running stuff.py as standalone script raises no ConfigParser.NoSectionError.
Other method to read the config file once and then use data in other modules will be really helpful as well.
There are two aspects to this question. First is the weird behavior of ConfigParser. When ConfigParser is unable to locate the .ini file; it never gives, for some annoying reason, an IOError or an error which indicates that it is unable to read the file.
In my case it keeps giving ConfigParser.NoSectionError when the section is clearly present. When I caught the ConfigParser.NoSectionError error it gave an ImportError! But it never tells you that it is simply unable to read the file.
Second is how to safely read the data files that are included in your package. The only way I found to do this was to use the __file__ parameter. This is how you would safely read the config.ini in the above question, for Python27 and Python3:
import os
try:
# >3.2
from configparser import ConfigParser
except ImportError:
# python27
# Refer to the older SafeConfigParser as ConfigParser
from ConfigParser import SafeConfigParser as ConfigParser
config = ConfigParser()
# get the path to config.ini
config_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'config.ini')
# check if the path is to a valid file
if not os.path.isfile(config_path):
raise BadConfigError # not a standard python exception
config.read(config_path)
TEST_KEY = config.get('main', 'test_key') # value
This relies on the fact that config.ini is located inside our package bootstrap and is expected to be shipped with it.
The important bit is how you get the config_path:
config_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'config.ini')
__file__ refers to the location of current script that is being executed. In my question that means the location of stuff.py, that is inside the bootstrap folder, and so is config.ini.
The above line of code then means; get the absolute path to stuff.py; from that get path to the directory containing it; and join that with config.ini (since it is in same directory) to give absolute path to the config.ini. Then you can proceed to read it and raise an exception just in case.
This will work even when you release your package on pip and a user installs it from there.
As a bonus, and digressing from the question a bit, if you are releasing your package on pip with data files inside your package, you must tell setuptools to include them inside you package when you build sdist and bdists. So to include the config.ini in above package add the following lines to setup class call in setup.py:
include_package_data = True,
package_data = {
# If any package contains *.ini files, include them
'': ['*.ini'],
},
But it still may not work in some cases eg. building wheels etc. So you also do the same in your MANIFEST.IN file:
include LICENSE
include bootstrap/*.ini
abhimanyuPathania : The issue is with path of config.ini in stuff.py. Change config.read('config.ini') to config.read('./bootstrap/config.ini') in stuff.py. I tried the solution. It works for me.
Enjoying Pythoning...
Related
This question already has answers here:
How to read a (static) file from inside a Python package?
(6 answers)
Closed 2 years ago.
I am writing a python package with modules that need to open data files in a ./data/ subdirectory. Right now I have the paths to the files hardcoded into my classes and functions. I would like to write more robust code that can access the subdirectory regardless of where it is installed on the user's system.
I've tried a variety of methods, but so far I have had no luck. It seems that most of the "current directory" commands return the directory of the system's python interpreter, and not the directory of the module.
This seems like it ought to be a trivial, common problem. Yet I can't seem to figure it out. Part of the problem is that my data files are not .py files, so I can't use import functions and the like.
Any suggestions?
Right now my package directory looks like:
/
__init__.py
module1.py
module2.py
data/
data.txt
I am trying to access data.txt from module*.py!
The standard way to do this is with setuptools packages and pkg_resources.
You can lay out your package according to the following hierarchy, and configure the package setup file to point it your data resources, as per this link:
http://docs.python.org/distutils/setupscript.html#installing-package-data
You can then re-find and use those files using pkg_resources, as per this link:
http://peak.telecommunity.com/DevCenter/PkgResources#basic-resource-access
import pkg_resources
DATA_PATH = pkg_resources.resource_filename('<package name>', 'data/')
DB_FILE = pkg_resources.resource_filename('<package name>', 'data/sqlite.db')
There is often not point in making an answer that details code that does not work as is, but I believe this to be an exception. Python 3.7 added importlib.resources that is supposed to replace pkg_resources. It would work for accessing files within packages that do not have slashes in their names, i.e.
foo/
__init__.py
module1.py
module2.py
data/
data.txt
data2.txt
i.e. you could access data2.txt inside package foo with for example
importlib.resources.open_binary('foo', 'data2.txt')
but it would fail with an exception for
>>> importlib.resources.open_binary('foo', 'data/data.txt')
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.7/importlib/resources.py", line 87, in open_binary
resource = _normalize_path(resource)
File "/usr/lib/python3.7/importlib/resources.py", line 61, in _normalize_path
raise ValueError('{!r} must be only a file name'.format(path))
ValueError: 'data/data2.txt' must be only a file name
This cannot be fixed except by placing __init__.py in data and then using it as a package:
importlib.resources.open_binary('foo.data', 'data.txt')
The reason for this behaviour is "it is by design"; but the design might change...
You can use __file__ to get the path to the package, like this:
import os
this_dir, this_filename = os.path.split(__file__)
DATA_PATH = os.path.join(this_dir, "data", "data.txt")
print open(DATA_PATH).read()
To provide a solution working today. Definitely use this API to not reinvent all those wheels.
A true filesystem filename is needed. Zipped eggs will be extracted to a cache directory:
from pkg_resources import resource_filename, Requirement
path_to_vik_logo = resource_filename(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")
Return a readable file-like object for the specified resource; it may be an actual file, a StringIO, or some similar object. The stream is in “binary mode”, in the sense that whatever bytes are in the resource will be read as-is.
from pkg_resources import resource_stream, Requirement
vik_logo_as_stream = resource_stream(Requirement.parse("enb.portals"), "enb/portals/reports/VIK_logo.png")
Package Discovery and Resource Access using pkg_resources
https://setuptools.readthedocs.io/en/latest/pkg_resources.html#resource-extraction
https://setuptools.readthedocs.io/en/latest/pkg_resources.html#basic-resource-access
You need a name for your whole module, you're given directory tree doesn't list that detail, for me this worked:
import pkg_resources
print(
pkg_resources.resource_filename(__name__, 'data/data.txt')
)
Notibly setuptools does not appear to resolve files based on a name match with packed data files, soo you're gunna have to include the data/ prefix pretty much no matter what. You can use os.path.join('data', 'data.txt) if you need alternate directory separators, Generally I find no compatibility problems with hard-coded unix style directory separators though.
I think I hunted down an answer.
I make a module data_path.py, which I import into my other modules containing:
data_path = os.path.join(os.path.dirname(__file__),'data')
And then I open all my files with
open(os.path.join(data_path,'filename'), <param>)
I have a file structure like this:
/package/main.py
/package/__init__.py
/package/config_files/config.py
/package/config_files/__init__.py
I'm trying to dynamically import config.py from main.py based on a command line argument, something like this:
#!/usr/bin/env python
from importlib import import_module
cmd_arguments = sys.argv
config_str = cmd_arguments[1]
config = import_module(config_str, 'config_files')
but it complains breaks with ModuleNotFoundError: No module named 'default_config'. Similar code in ipython does not suffer the same issue, when called from /package
If there is a better way to load a package at run time via user input, I'm open to suggestions.
You are trying to import from a nested package, so use the full package name:
config = import_module(config_str, 'package.config_files')
Alternatively, use a relative import, and set the second argument to package:
config = import_module('.config_files.{}'.format(config_str), 'package')
This is more secure, as now the config_str string can't be used to 'break out' of the config_files sub package.
You do want to strip dots from user-provided names (and, preferably, limit the name to valid Python identifiers only).
Preliminary:
I have Anaconda 3 on Windows 10, and a folder, folder_default, that I have put on the Python path. I'm not actually sure whether that's the right terminology, so to be clear: regardless to where my Python script is, if I have a line of code that says import myFile, that line of code will succeed if myFile.py is in folder_default.
My issue:
In folder_default, I have:
A subfolder called useful_files which contains a text file called useful_file_1.txt.
A python script called my_misc.py.
my_misc.py has a line similar to: np.loadtxt('useful_files/useful_file_1.txt'). This line does not work if I use import my_script in a python file in a location other than folder_default, since useful_files/useful_file_1.txt is not the folder path relative to the python file that imports my_misc.py. I don't want to start using global file paths if I can avoid it.
How can I access files using file paths relative to the imported python module, rather than relative to the python script that imports that module?
Please let me know if the question is unclear - I tried to write a fake, minimal version of the setup that's actually on my computer in the hopes that that would simplify things, but I can change it if that actually makes things more confusing.
Thanks.
You can get the path to current module using the getfile method of inspect module as inspect.getfile(inspect.currentframe()).
For example:
# File : my_misc.py
import os, inspect
module_dir_path = os.path.dirname(os.path.abspath(inspect.getfile(inspect.currentframe()))) # get path to the directory of current module
useful_file_path = os.path.join(module_dir_path,'useful_files','useful_file_1.txt')
# path stored in useful_file_path. do whatever you want!
np.loadtxt(useful_file_path)
I have a simple test script:
import requests
response = requests.get('http://httpbin.org/get')
print response.text
It works when the python script is named test.py but fails if named email.py or logging.py:
Traceback (most recent call last):
File "./email.py", line 3, in <module>
import requests
File "/usr/lib/python2.7/dist-packages/requests/__init__.py", line 53, in <module>
from urllib3.contrib import pyopenssl
File "/usr/lib/python2.7/dist-packages/urllib3/__init__.py", line 16, in <module>
from .connectionpool import (
File "/usr/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 59, in <module>
from .request import RequestMethods
File "/usr/lib/python2.7/dist-packages/urllib3/request.py", line 12, in <module>
from .filepost import encode_multipart_formdata
File "/usr/lib/python2.7/dist-packages/urllib3/filepost.py", line 15, in <module>
from .fields import RequestField
File "/usr/lib/python2.7/dist-packages/urllib3/fields.py", line 7, in <module>
import email.utils
File "/home/ubuntu/temp/email.py", line 4, in <module>
response = requests.get('http://httpbin.org/get')
AttributeError: 'module' object has no attribute 'get'
It appears that requests imports urllib3, which imports email built-in module. Why is Python not finding the built-in email module first, instead of looking in the current path for email.py?
Is there a way to make this work, or do I just have to always avoid naming my Python scripts any built-in module that may be imported by any dependency?
First, as you indicated in a comment, Python checks for a 'built-in' module of that name. Not all modules in the standard library are 'built-in'. You can see the list by:
print sys.builtin_module_names
If it's not found there, the order searched is outlined by Burhan's accepted answer here:
What is the extent of the import statement in Python
Python searches for things it can import in the following order:
From the directory where the script was executed.
From the directories in the PYTHONPATH environment variable (if its set).
From the system-wide Python installation's module directory.
In your case, email is not built-in, so the current directory is checked first.
So, yes, don't shadow a python library name. Technically you could shadow a built-in module's name, but please - please don't. It makes Python cry.
Take a look at sys.path and you'll see '' as the first entry:
>>> import sys
>>> sys.path
['',
'/.../3.3/lib/python33.zip',
'/.../3.3/lib/python3.3',
'/.../3.3/lib/python3.3/plat-darwin',
'/.../3.3/lib/python3.3/lib-dynload',
'/.../3.3/lib/python3.3/site-packages']
That '' is the current directory.
You could modify sys.path, but it's a wiser decision, long-term, to just not give python files names that match builtin modules.
The reason Python is looking for the modules in the current directory first is that that's how Python works. If you look at sys.path you will see that its first element is almost always '' which indicates the current directory. This applies to everything but the built-in modules, which are "imported" at interpreter startup. (They are not really imported, just assigned into sys.modules.)
In general, you are supposed to not name your own modules the same names as other modules you're using (whether they came with Python or not). I tend to put my initials on scripts I'm working with just to avoid this pitfall.
Of course, you can just manipulate sys.path so it doesn't look for modules in the current directory:
import sys
if not sys.path[0]:
del sys.path[0]
import requests
Note that you only need to remove the '' entry when you import. You should put it back after you do your imports, in case you need to import modules of the same name from your own script's directory. A context manager is handy for this.
import sys
from contextlib import contextmanager
#contextmanager
def no_cwd_imports():
old_path = sys.path[:]
if not sys.path[0]:
del sys.path[0]
try:
yield
finally:
sys.path[:] = old_path
with no_cwd_imports():
import requests
That's not a problem with Requests.
The "problem" comes from Python. email and logging are standard modules, it is not advisable to name your own modules same unless you understand and are willing to resolve conflicts.
I have a function that loads data using the current path like this: open('./filename', 'rb'). When I call it from a module located in the same package, it works, but when I import its package from a module in a different package and call it, I get an error telling me that the path './filename' does not exist. The error is raised by the call to open. What is causing this, and how do I fix this?
I'm not aware of best practices, but modules have a __file__ attribute set to a string representation of the name of the file they were loaded from. Thus, you can do this:
import os.path
# Get the directory this module is being loaded from
module_directory = os.path.dirname(__file__)
# Get the path to the file we want to open
file_path = os.path.join(module_directory, 'filename')
with open(file_path, 'rb') as f:
# do what you want with the file