setuptools does not distribute my data files

setuptools does not distribute my data files - python

I have the following in setup.py:
from setuptools import setup
# ...
setup(
name='xml-boiler',
version='0.0.1',
url='https://github.com/vporton/xml-boiler',
license='AGPLv3',
author='Victor Porton',
author_email='porton#narod.ru',
description='Automatically transform between XML namespaces',
packages=find_packages(),
package_data={'': ['*.ttl', '*.xml']},
scripts=['bin/boiler'],
data_files = [
('/etc/xmlboiler', ['etc/config-cli.ttl'])
],
test_suite="xmlboiler.tests",
cmdclass={'build_py': MyBuild},
)
But after I run python setup.py build, the build directory does not contain any *.xml or *.ttl files.
What is my error?
I also want to distribute all files from xmlboiler/core/data/assets/ and
xmlboiler/core/data/assets/.
I don't understand how it works:
package_data={'': ['*/.xml', '*/.ttl', '*/.net', 'data/assets/*', 'data/scripts/*.xslt', 'xmlboiler/doc/*.html', 'xmlboiler/doc/*.css']},
included xmlboiler/core/data/scripts/section.xslt but not xmlboiler/tests/core/data/xml/simple.xml. Why?!

package_data is a mapping of package names to files or file globs. This means that
package_data = {'', ['*.xml', '*.ttl']}
will include every file ending with .xml or .ttl located in any package directory, for example xmlboiler/file.xml, xmlboiler/core/file.ttl etc. It will, however, not include file xmlboiler/core/data/interpreters.ttl because it is located in data which is not a package dir (not containing an __init__.py file). To include that, you should use the correct file path:
package_data = {'xmlboiler.core', ['data/interpreters.ttl']}
To include every .ttl file under xmlboiler/core/data:
package_data = {'xmlboiler.core', ['data/*.ttl', 'data/**/*.ttl']}
This will include every .ttl file in data directory (glob data/*.ttl) and every .ttl file in every subdirectory of data (glob data/**/*.ttl).
To include every .ttl and .xml file in every package:
package_data = {'', ['*.xml', '**/*.xml', '*.ttl', '**/*.ttl']}
I also want to distribute all files from xmlboiler/core/data/assets/
Same approach for data/assets, but omit the file extension in globs:
package_data={
'xmlboiler.core': ['data/assets/*', 'data/assets/**/*'],
}

Related

Bazel read temporary dynamically added file

I have a python script that downloads some json data and then uploads it somewehere else. The project is build and run using bazel and has the following simplified structure.
-scripts/
-exported_file_1.json
-main.py
-tests/
-BUILD
My issue is that when it comes to reading the exported files and loading them in memory, those files cannot be found using:
def get_filepath(filename):
"""Get the full path to file"""
cwd = os.path.dirname(__file__)
return os.path.join(cwd, filename)
If I manually add a json file to the project structure, declare it in the BUILD file to be visible, do a bazel build then it works fine
py_binary(
name = "main",
srcs = [
"main.py",
],
data = ["scripts/exported_file_1.json"],
python_version = "PY2",
visibility = ["//visibility:public"],
deps = [],
)
But how would one handle the case when your files are added dynamically?

Perhaps using glob might work?
Something like:
import os
import glob
def get_json_files():
cwd = os.path.dirname(__file__)
p=f"{cwd}/*.json"
return glob.glob(p)
for json_file in get_json_files():
print(f"found file: {json_file}")

Using glob in the BUILD file will allow the binary to find all .json files in the scripts directory:
py_binary(
name = "main",
srcs = [
"scripts/main.py",
],
data = glob(["scripts/*.json"]),
python_version = "PY2",
visibility = ["//visibility:public"],
deps = [],
)
This would still require the JSON files to be downloaded and present in the scripts/ directory when bazel run :main is executed.

python Packaging and distributing projects

I want to package the python project, but when I package the project and the python files in the project need to use data files, ‘FileNotFoundError’ appears. The following figure is my project structure:enter image description here
enter image description here
enter image description here
setup.py:
from setuptools import setup, find_packages
# To use a consistent encoding
from codecs import open
from os import path
here = path.abspath(path.dirname(__file__))
# Get the long description from the README file
with open(path.join(here, 'README.md'), encoding='utf-8') as f:
long_description = f.read()
setup(
name='NERChinese',
version='0.0.6',
description='基于BiLSTM-CRF的字级别中文命名实体识别模型',
long_description=long_description,
long_description_content_type='text/markdown',
# The project's main homepage.
url='https://github.com/cswangjiawei/Chinese-NER',
# Author details
# author='wangjiawei',
# author_email='cswangjiawei#163.com',
# Choose your license
# license='MIT',
# What does your project relate to?
keywords='Named-entity recognition using neural networks',
# You can just specify the packages manually here if your project is
# simple. Or you can use find_packages().
packages=find_packages(),
zip_safe=False,
package_data={
'NERChinese': ['data/*'],
},
)
‘FileNotFoundError: [Errno 2] No such file or directory: 'data/msra_train.txt'’ appears when 'with open (train path,' R ', encoding ='utf-8') as F1: 'in utils.py file is run. How should I package data files

It seems you don't specify the correct path when you open it, you can try instead:
with open(os.path.join(os.path.dirname(__file__), 'data/msra_train.txt'), 'r', encoding='utf-8') as F1:
This way, it is relative to the module of your package, not relative to where you are calling it.
If it does not work, can you check if you have your data in site-packages?

py2exe not recognizing jsonschema

I've been attempting to build a Windows executable with py2exe for a Python program that uses the jsonschema package, but every time I try to run the executable it fails with the following error:
File "jsonschema\__init__.pyc", line 18, in <module>
File "jsonschema\validators.pyc", line 163, in <module>
File "jsonschema\_utils.pyc", line 57, in load_schema
File "pkgutil.pyc", line 591, in get_data
IOError: [Errno 0] Error: 'jsonschema\\schemas\\draft3.json'
I've tried adding json and jsonschema to the package options for py2exe in setup.py and I also tried manually copying the jsonschema directory from its location in Python27\Libs\site-packages into library.zip, but neither of those work. I also attempted to use the solution found here (http://crazedmonkey.com/blog/python/pkg_resources-with-py2exe.html) that suggests extending py2exe to be able to copy files into the zip file, but that did not seem to work either.
I'm assuming this happens because py2exe only includes Python files in the library.zip, but I was wondering if there is any way for this to work without having to convert draft3.json and draft4.json into .py files in their original location.
Thank you in advance

Well after some more googling (I hate ugly) I got it working without patching the build_exe.py file. The key to the whole thing was the recipe at http://crazedmonkey.com/blog/python/pkg_resources-with-py2exe.html. My collector class looks like this:
from py2exe.build_exe import py2exe as build_exe
class JsonSchemaCollector(build_exe):
"""
This class Adds jsonschema files draft3.json and draft4.json to
the list of compiled files so it will be included in the zipfile.
"""
def copy_extensions(self, extensions):
build_exe.copy_extensions(self, extensions)
# Define the data path where the files reside.
data_path = os.path.join(jsonschema.__path__[0], 'schemas')
# Create the subdir where the json files are collected.
media = os.path.join('jsonschema', 'schemas')
full = os.path.join(self.collect_dir, media)
self.mkpath(full)
# Copy the json files to the collection dir. Also add the copied file
# to the list of compiled files so it will be included in the zipfile.
for name in os.listdir(data_path):
file_name = os.path.join(data_path, name)
self.copy_file(file_name, os.path.join(full, name))
self.compiled_files.append(os.path.join(media, name))
What's left is to add it to the core setup like this:
options = {"bundle_files": 1, # Bundle ALL files inside the EXE
"compressed": 2, # compress the library archive
"optimize": 2, # like python -OO
"packages": packages, # Packages needed by lxml.
"excludes": excludes, # COM stuff we don't want
"dll_excludes": skip} # Exclude unused DLLs
distutils.core.setup(
cmdclass={"py2exe": JsonSchemaCollector},
options={"py2exe": options},
zipfile=None,
console=[prog])
Some of the code is omitted since it's not relevant in this context but I think you get the drift.

How do I call data I included in a python package?

I have a python package with this file structure:
package
- bin
clean_spam_ratings.py
- spam_module
- data
spam_ratings.csv
__init__.py
spam_ratings_functions.py
Contents of clean_spam_ratings.py:
import spam_module
with open(path_to_spam_ratings_csv, 'r') as fin:
spam_module.spam_ratings_functions(fin)
What should I set path_to_spam_ratings_csv to?

If you are in a module, then you can get the absolute path for the directory that contains that module via:
os.path.dirname(__file__)
You can use then that to construct the path to your csv file. For example, if you are in spam_ratings_functions.py, use:
path_to_spam_ratings_csv = os.path.join(os.path.dirname(__file__), "..", "data", "spam_ratings.csv")

Search for sub-directory python

I am attempting to automate the specification of a sub-directory which one of my scripts requires. The idea is to have the script search the C: drive for a folder of a specific name. In my mind, this begs for a recursive search function. The plan is to check all sub-directories, if none are the desired directory, begin searching the sub-directories of the current sub-directories
While researching how to do this, I came across this question and started using os.walk(dir).next()[1] to list directories. This had limited success. As the script searched through directories, it would essentially give up and break after, giving the StopIteration error. Sample output is below searching for a sub-directory within TEST1.
C:\Python27>test.py
curDir: C:\Python27
['DLLs', 'Doc', 'include', 'Lib', 'libs', 'pyinstaller-2.0', 'Scripts', 'tcl', 'TEST1', 'Tools']
curDir: DLLs
[]
curDir: Doc
[]
curDir: include
[]
curDir: Lib
['bsddb', 'compiler', 'ctypes', 'curses', 'distutils', 'email', 'encodings', 'hotshot',
'idlelib', 'importlib', 'json', 'lib-tk', 'lib2to3', 'logging', 'msilib',
'multiprocessing', 'pydoc_data', 'site-packages', 'sqlite3', 'test', 'unittest', 'wsgiref', 'xml']
curDir: bsddb
Traceback (most recent call last):
File "C:\Python27\test.py", line 24, in <module>
if __name__ == "__main__": main()
File "C:\Python27\test.py", line 21, in main
path = searcher(os.getcwd())
File "C:\Python27\test.py", line 17, in searcher
path = searcher(entry)
File "C:\Python27\test.py", line 17, in searcher
path = searcher(entry)
File "C:\Python27\test.py", line 6, in searcher
dirList = os.walk(dir).next()[1]
StopIteration
curDir is the the current directory that is being searched and the next line of output is the list of subdirectories. Once the program finds a directory with no sub-directories, it kicks back up one level and goes to the next directory.
I can provide my code if required, but didn't want to initially post it to avoid an even bigger wall of text.
My question is: why does the script give up after searching a few folders? Thanks in advance for your help!

StopIteration is raised whenever an iterator has no more values to generate.
Why are you using os.walk(dir).next()[1]? Wouldn't it be easier to just do everything in a for loop? Like:
for root, dirs, files in os.walk(mydir):
#dirs here should be equivalent to dirList
Here is the documentation for os.walk.

What worked for me is specifying the full path in os.walk, rather than just the directory name:
# fullpath of the directory of interest with subfolders to be iterated (Mydir)
fullpath = os.path.join(os.path.dirname(__file__),'Mydir')
# iteration
subfolders = os.walk(fullpath).next()[1]
This happened to me in particular when a module that contains os.walk is located in a subfolder itself, imported by a script in a parent folder.
Parent/
script
Folder/
module
Mydir/
Subfolder1
Subfolder2
In script, os.walk('Mydir') will look in Parent/Mydir, which does not exist.
On the other hand, os.walk(fullpath) will look in Parent/Folder/Mydir.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.

setuptools does not distribute my data files - python

Related

Bazel read temporary dynamically added file

python Packaging and distributing projects

py2exe not recognizing jsonschema

How do I call data I included in a python package?

Search for sub-directory python

Categories

Resources