Making Python guess a file Name - python

I have the following function:
unpack_binaryfunction('third-party/jdk-6u29-linux-i586.bin' , ('/home/user/%s/third-party' % installdir), 'jdk1.6.0_29')
Which uses os.sys to execute a java deployment. The line, combined with the function (Which is unimportant, it just calls some linux statements) works perfectly.
However, this only works if in the 'third-party' folder is specificaly that version of the jdk.
Therefore I need a code that will look at the files in the 'third-party' folder and find one that starts with 'jdk' and fill out the rest of the filename itself.
I am absolutely stuck. Are there any functions or libraries that can help with file searching etc?
To clarify: I need the code to not include the entire: jdk-6u29-linux-i586.bin but to use the jdk-xxxx... that will be in the third-party folder.

This can easily be done using the glob module, and then a bit a string parsing to extract the version.
import glob
import os.path
for path in glob.glob('third-party/jdk-*'):
parent, name = os.path.split(path) # "third-party", "jdk-6u29-linux-i586.bin"
version, update = name.split('-')[1].split('u') # ("6", "29")
unpack_binaryfunction(path, ('/home/user/%s/third-party' % installdir), 'jdk1.{}.0_{}'.format(version, update))

Related

Is this the approved way to acess data adjacent to/packaged with a Python script?

I have a Python script that needs some data that's stored in a file that will always be in the same location as the script. I have a setup.py for the script, and I want to make sure it's pip installable in a wide variety of environments, and can be turned into a standalone executable if necessary.
Currently the script runs with Python 2.7 and Python 3.3 or higher (though I don't have a test environment for 3.3 so I can't be sure about that).
I came up with this method to get the data. This script isn't part of a module directory with __init__.py or anything, it's just a standalone file that will work if just run with python directly, but also has an entry point defined in the setup.py file. It's all one file. Is this the correct way?
def fetch_wordlist():
wordlist = 'wordlist.txt'
try:
import importlib.resources as res
return res.read_binary(__file__, wordlist)
except ImportError:
pass
try:
import pkg_resources as resources
req = resources.Requirement.parse('makepw')
wordlist = resources.resource_filename(req, wordlist)
except ImportError:
import os.path
wordlist = os.path.join(os.path.dirname(__file__), wordlist)
with open(wordlist, 'rb') as f:
return f.read()
This seems ridiculously complex. Also, it seems to rely on the package management system in ways I'm uncomfortable with. The script no longer works unless it's been pip-installed, and that also doesn't seem desirable.
Resources living on the filesystem
The standard way to read a file adjacent to your python script would be:
a) If you've got python>=3.4 I'd suggest you use the pathlib module, like this:
from pathlib import Path
def fetch_wordlist(filename="wordlist.txt"):
return (Path(__file__).parent / filename).read_text()
if __name__ == '__main__':
print(fetch_wordlist())
b) And if you're still using a python version <3.4 or you still want to use the good old os.path module you should do something like this:
import os
def fetch_wordlist(filename="wordlist.txt"):
with open(os.path.join(os.path.dirname(__file__), filename)) as f:
return f.read()
if __name__ == '__main__':
print(fetch_wordlist())
Also, I'd suggest you capture exceptions in the outer callers, the above methods are standard way to read files in python so you don't need wrap them in a function like fetch_wordlist, said otherwise, reading files in python is an "atomic" operation.
Now, it may happen that you've frozen your program using some freezer such as cx_freeze, pyinstaller or similars... in that case you'd need to detect that, here's a simple way to check it out:
a) using os.path:
if getattr(sys, 'frozen', False):
app_path = os.path.dirname(sys.executable)
elif __file__:
app_path = os.path.dirname(__file__)
b) using pathlib:
if getattr(sys, 'frozen', False):
app_path = Path(sys.executable).parent
elif __file__:
app_path = Path(__file__).parent
Resources living inside a zip file
The above solutions would work if the code lives on the file system but it wouldn't work if the package is living inside a zip file, when that happens you could use either importlib.resources (new in version 3.7) or pkg_resources combo as you've already shown in the question (or you could wrap up in some helpers) or you could use a nice 3rd party library called importlib_resources that should work with the old&modern python versions:
pypi: https://pypi.org/project/importlib_resources/
documentation: https://importlib-resources.readthedocs.io/en/latest/
Specifically for your particular problem I'd suggest you take a look to this https://importlib-resources.readthedocs.io/en/latest/using.html#file-system-or-zip-file.
If you want to know what that library is doing behind the curtains because you're not willing to install any 3rd party library you can find the code for py2 here and py3 here in case you wanted to get the relevant bits for your particular problem
I'm going to go out on a limb and make an assumption because it may drastically simplify your problem. The only way I can imagine that you can claim that this data is "stored in a file that will always be in the same location as the script" is because you created this data, once, and put it in a file in the source code directory. Even though this data is binary, have you considered making the data a literal byte-string in a python file, and then simply importing it as you would anything else?
You're right about the fact that your method of reading a file is a bit unnecessarily complex. Unless you have got a really specific reason to use the importlib and pkg_resources modules, it's rather simple.
import os
def fetch_wordlist():
if not os.path.exists('wordlist.txt'):
raise FileNotFoundError
with open('wordlist.txt', 'rb') as wordlist:
return wordlist.read()
You haven't given much information regarding your script, so I cannot comment on why it doesn't work unless it's installed using pip. My best guess: your script is probably packed into a python package.

How can I get the directory from a script called by another script in python via a function imported [duplicate]

When writing throwaway scripts it's often needed to load a configuration file, image, or some such thing from the same directory as the script. Preferably this should continue to work correctly regardless of the directory the script is executed from, so we may not want to simply rely on the current working directory.
Something like this works fine if defined within the same file you're using it from:
from os.path import abspath, dirname, join
def prepend_script_directory(s):
here = dirname(abspath(__file__))
return join(here, s)
It's not desirable to copy-paste or rewrite this same function into every module, but there's a problem: if you move it into a separate library, and import as a function, __file__ is now referencing some other module and the results are incorrect.
We could perhaps use this instead, but it seems like the sys.argv may not be reliable either.
def prepend_script_directory(s):
here = dirname(abspath(sys.argv[0]))
return join(here, s)
How to write prepend_script_directory robustly and correctly?
I would personally just os.chdir into the script's directory whenever I execute it. It is just:
import os
os.chdir(os.path.split(__file__)[0])
However if you did want to refactor this thing into a library, you are in essence wanting a function that is aware of its caller's state. You thus have to make it
prepend_script_directory(__file__, blah)
If you just wanted to write
prepend_script_directory(blah)
you'd have to do cpython-specific tricks with stack frames:
import inspect
def getCallerModule():
# gets globals of module called from, and prints out __file__ global
print(inspect.currentframe().f_back.f_globals['__file__'])
I think the reason it doesn't smell right is that $PYTHONPATH (or sys.path) is the proper general mechanism to use.
You want pkg_resources
import pkg_resources
foo_fname = pkg_resources.resource_filename(__name__, "foo.txt")

How to pass entire directory into python via command line and sys.argv

So in the past when I've used a unix server to do my python development if I wanted to pass in an entire folder or directory, I would just put an asterisk() on the end of it. An example would be something like users/wilkenia/shakespeare/ to pass in a set of files containing each of shakespeare's plays. Is there a way to do this in windows? I've tried putting in C:\Users\Alexander\Desktop\coding-data-exam\wx_data* and the same with the disk name removed. Nothing has worked so far, in fact, it takes in the directory as an argument itself.
Edit: implemented glob, getting a permissions error, even though I'm running as administrator. Here's my code if anyone wants to have a look.
For the sake of showing how you can use pathlib to achieve this result. You can do something like this:
some_script.py:
from pathlib import Path
path = Path(sys.argv[1])
glob_path = path.glob('*')
for file_path in glob_path:
print(file_path)
Demo:
python some_script.py C:/some/path/
Output:
C:/some/path/depth_1.txt
C:/some/path/dude.docx
C:/some/path/dude.py
C:/some/path/dude_bock.txt
The nice thing about pathlib, is that it takes an object oriented approach to help work with the filesystem easier.
Note: pathlib is available out-of-the-box from Python 3.4 and above. If you are using an older version of Python, you will need to use the backported package that you can get from pypi: here
Simply: pip install pathlib2
You can use the glob module, it does exactly this.
A quick demo:
In [81]: import glob
In [82]: glob.glob('*')
Out[82]:
[
... # a bunch of my personal files from my cwd
]
If you want to extend this for your use case, you'll need to do something along the lines of:
import sys
import glob
arg = sys.argv[1]
for file in glob.glob(arg):
....
You'll read your args with sys.argv and pass it onto glob.

Making a directory for a file

I was making a exercise generator algorithm for my friend, but I stumbled across a problem. It is a python program, and I wanted to generate a folder in a directory that was above the program's location (like, the python file is in 'C:\Documents\foo' and the folder should be created in 'C:\Documents') so that it could then store the file the program created. Is there a way to do this or should I try something else?
Use the path argument of the os.mkdir() function.
Getting the current script directory is not a built-in feature, but there are multiple hacks suggested here.
Once you get the current script directory, you can build a path based off of that.
Not super familiar with Python in a Windows environment, but this should be easily do-able. Here is a similar question that might be worth looking at: How to check if a directory exists and create it if necessary?
Looks like the pathlib module might do what you are looking for.
from pathlib import Path
path = Path("/my/directory/filename.txt")
try:
if not path.parent.exists():
path.parent.mkdir(parents=True)
except OSError:
# handle error; you can also catch specific errors like
# FileExistsError and so on.
Appears to work on Win 7 with Python 2.7.8 as described:
import os.path
createDir = '\\'.join((os.path.abspath(os.path.join(os.getcwd(), os.pardir)), 'Foo'))
if not os.path.exists(createDir):
os.makedirs(createDir)

Way to access resource files in python

What is the proper way to access resources in python programs.
Basically in many of my python modules I end up writing code like that:
DIRNAME = os.path.split(__file__)[0]
(...)
template_file = os.path.join(DIRNAME, "template.foo")
Which is OK but:
It will break if I will start to use python zip packages
It is boilerplate code
In Java I had a function that did exactly the same --- but worked both when code was lying in bunch of folders and when it was packaged in .jar file.
Is there such function in Python, or is there any other pattern that I might use.
You'll want to look at using either get_data in the stdlib or pkg_resources from setuptools/distribute. Which one you use probably depends on whether you're already using distribute to package your code as an egg.
Since version 3.7 of Python, the proper way to access a file in resources is to use the importlib.resources library.
One can, for example, use the path function to access a particular file in a Python package:
import importlib.resources
with importlib.resources.path("your.package.templates", "template.foo") as template_file:
...
Starting with Python 3.9, this package introduced the files() API, to be preferred over the legacy API.
One can, use the files function to access a particular file in a Python package:
template_res = importlib.resources.files("your.package.templates").joinpath("template.foo")
with importlib.resources.as_file(template_res) as template_file:
...
For older versions, I recommend to install and use the importlib-resources library. The documentation also explains in detail how to migrate your old implementation using pkg_resources to importlib-resources.
Trying to understand how we could combine the two aspect togather
Loading for resources in native filesystem
Packaged in zipped files
Reading through the quick tutorial on zipimport : http://www.doughellmann.com/PyMOTW/zipimport/
I see the following example:
import sys
sys.path.insert(0, 'zipimport_example.zip')
import os
import zipimport
importer = zipimport.zipimporter('zipimport_example.zip')
module = importer.load_module('example_package')
print module.__file__
print module.__loader__.get_data('example_package/README.txt')
I think that output of __file__ is "zipimport_example.zip/example_package/__init__.pyc"
Need to check how it looks from inside.
But then we could always do something like this:
if ".zip" in example_package.__file__:
...
load using get_data
else:
load by building the correct file path
[Edit:] I have tried to work out the example a bit better.
If the the package gets imported as zipped file then, two things happen
__file__ contains ".zip" in it's path.
__loader__ is available in the name space
If these two conditions are met then within the package you could do:
print __loader__.get_data(os.path.join('package_name','README.txt'))
else the module was loaded normally and you can follow the regular approach to loading the file.
I guess the zipimport standard python module could be an answer...
EDIT: well, not the use of the module directly, but using sys.path as shown in the example could be a good way:
I have a zip file test.zip with one python module test and a file test.foo inside
to test that for the zipped python module test can be aware of of test.foo, it contains this code:
c
import os
DIRNAME = os.path.dirname(__file__)
if os.path.exists(os.path.join(DIRNAME, 'test.foo')):
print 'OK'
else:
print 'KO'
Test looks ok:
>>> import sys
>>> sys.path.insert(0, r'D:\DATA\FP12210\My Documents\Outils\SVN\05_impl\2_tools\test.zip')
>>> import test
OK
>>>
So a solution could be to loop in your zip file to retrieve all python modules, and add them in sys.path; this piece of code would be ideally the 1st one loaded by your application.

Categories

Resources