Detecting symlinks (mklink) on Vista/7 in Python without Pywin32 - python

Currently the buildout recipe collective.recipe.omelette uses junction.exe on all versions of Windows to create symlinks. However junction.exe does not come with Windows by default and most importantly does not support creating symlinks to files (only directories) which causes a problem with quite a few Python packages.
On NT6+ (Vista and 7) there is now the mklink utility that not only comes by default but is also capable of creating symlinks to files as well as directories. I would like to update collective.recipe.omelette to use this if available and have done so except for one otherwise simple feature; detecting whether a file or folder is actually a symlink. Since this is a small buildout recipe, requiring Pywin32 in my opinion is a bit too much (unless setuptools could somehow only download it on Windows?).
Currently on Windows what omelette does is call junction.exe on the folder and then grep the response for "Substitute Name:" but I can't find anything as simple for mklink.
The only method I can think of is to call "dir" in the directory and then to go through the response line by line looking for "<SYMLINK>" and the folder/filename on the same line. Surely there is something better?

See jaraco.windows.filesystem (part of the jaraco.windows package) for extensive examples on symlink operations in Windows without pywin32.

Could you use ctypes to access the various needed functions and structures? this patch, currently under discussion, is intended to add symlink functionality to module os under Vista and Windows 7 -- but it won't be in before Python 2.7 and 3.2, so the wait (and the requirement for the very latest versions when they do eventually come) will likely be too long; a ctypes-based solution might tide you over and the code in the patch shows what it takes in C to do it (and ctypes-based programmed in Python is only a bit harder than the same programming in C).
Unless somebody's already released some other stand-alone utility like junction.exe for this purpose, I don't see other workable approaches (until that patch finally makes it into a future Python's stdlib) beyond using ctypes, Pywin32, or dir, and you've already ruled out the last two of these three options...!-)

On windows junctions and symbolic links have the attribute FILE_ATTRIBUTE_REPARSE_POINT (0x400) for reparse points. If you get the file's attributes, then detect this on?
You could use ctypes (as stated in the other answer) to access Kernel32.dll and GetFileAttributes, and detect this value.

You could leverage the Tcl you have available with Tkinter, as that has a 'file link' command that knows about junctions, unlike Pythons os module.

I have searched widely for a better solution, however python 2.7 just does not have a good solution for this. So eventually, I ended up with this (code below), its admittedly ugly, but it's downright pretty compared to all the ctypes hacks I've seen.
import os, subprocess
def realpath(path):
# not a folder path, ignore
if (not os.path.isdir(path)):
return path
rootpath = os.path.abspath(path)
oneup, foldername = os.path.split(rootpath)
output = subprocess.check_output("dir " + oneup, shell=True)
links = {}
for line in output.splitlines():
pos = line.find("<SYMLINKD>")
if not pos == -1:
link = line[pos+15:].split()
links[link[0]] = link[1].strip("[]")
return links[foldername] if links.has_key(foldername) else rootpath

Related

Analyzing impact of adding an import-path

I have the following directory structure:
TestFolder:
test.py
CommonFolder:
common.py
And in test.py, I need to import common.py.
In order to do that, in test.py I add the path of CommonFolder to the system paths.
Here is what I started off with:
sys.path.append(os.path.join(os.path.dirname(os.path.dirname(__file__)), 'CommonFolder'))
Then I figured that / is a valid separator in pretty much every OS, so I changed it to:
sys.path.append(os.path.dirname(os.path.dirname(__file__)) + '/CommonFolder')
Then I figured that .. is also a valid syntax in pretty much every OS, so I changed it to:
sys.path.append(os.path.dirname(__file__) + '/../CommonFolder')
My questions:
Are my assumptions above correct, and will the code run correctly on every OS?
In my last change, I essentially add a slightly longer path to the system paths. More precisely - FullPath/TestFolder/../CommonFolder instead of FullPath/CommonFolder. Is the any runtime impact to this? I suppose that every import statement might be executed slightly slower, but even if so, that would be minor. Is there any good reason not to do it this way?
If you're writing code to span multiple Operating Systems it's best not to try to construct the paths yourself. Between Linux and Windows you immediately run into the forward vs backwards slash issue, just as an example.
I'd recommend looking into the Python pathlib library. It handles generating paths for different operating systems.
https://docs.python.org/3/library/pathlib.html
This is a great blog about this subject and how to use the library:
https://medium.com/#ageitgey/python-3-quick-tip-the-easy-way-to-deal-with-file-paths-on-windows-mac-and-linux-11a072b58d5f
UPDATE:
Updating this with a more specific answer.
Regarding the directory paths, as long as you're not building the paths yourself (using a utility such as pathlib) the paths you've created should be fine. Linux, Mac, and Windows all support relative paths (both mac and linux are Unix ~based of course).
As for whether it's efficient, unless you're frequently dynamically loading or reloading your source files (which is not common) most files are loaded into memory before the code is run, so there would be no performance impact on setting up the file paths in this way.
Use os.path.join() for OS independent path separator instead of /
Example: os.path.join(os.path.dirname(__file__),"..","CommonFolder")
Or instead you can make CommonFolder as python package by just placing a empty file by name __init__.py inside CommonFolder. After that you can simply import common in test.py as:-
from CommonFolder import common

Freezing shared objects with cx_freeze on Linux

I am trying to CX_Freeze an application for the Linux Platform. The Windows MSI installer works perfectly but the Linux counter-part doesn't really function the way I want it.
When the package is build it runs perfectly on the original system, but when ported to a different system (although same architecture) it generates a segfault. First thing I did was check the librarys and there are some huge version differences with libc, pthread and libdl. So I decided to include these in the build, like so:
if windows_build:
build_exe_options['packages'].append("win32net")
build_exe_options['packages'].append("win32security")
build_exe_options['packages'].append("win32con")
pywintypes_dll = 'pywintypes{0}{1}.dll'.format(*sys.version_info[0:2]) # e.g. pywintypes27.dll
build_exe_options['include_files'].append((os.path.join(GetSystemDirectory(), pywintypes_dll), pywintypes_dll))
else:
build_exe_options['packages'].append("subprocess")
build_exe_options['packages'].append("encodings")
arch_lib_path = ("/lib/%s-linux-gnu" % os.uname()[4])
shared_objects = ["libc.so.6", "libpthread.so.0", "libz.so.1", "libdl.so.2", "libutil.so.1", "libm.so.6", "libgcc_s.so.1", "ld-linux-x86-64.so.2"]
lib_paths = ["/lib", arch_lib_path, "/lib64"]
for so in shared_objects:
for lib in lib_paths:
lib_path = "%s/%s" % (lib, so)
if os.path.isfile(lib_path):
build_exe_options['include_files'].append((lib_path, so))
break
After checking the original cx_frozen bin it seems the dynamic libraries play there part and intercept the calls perfectly. Although now I am at the part where pthread segfaults due to the fact that he tries the systems libc instead of mine (checked with ldd and gdb).
My question is quite simple, this method I am trying is horrible as it doesn't do the recursive depency resolving. Therefore my question would be "what is the better way of doing this? or should I write recursive depency solving within my installer?"
And in order to beat the Solution: "Use native Python instead", we got some hardware appliances (think 2~4U) with Linux (and Bash access) where we want to run this aswell. porting entire python (with its dynamic links etcetc) seemd like way to much work when we can cx_freeze it and ship the librarys with.
I don't know about your other problems, but shipping libc.so.6 to another system the way you do it can not possibly work, as explained here.

Can I use msilib or other Python libraries to extract one file from an .msi file?

What I really want to do is determine whether a particular file in the MSI exists and contains a particular string.
My current idea is to run:
db = msilib.OpenDatabase('c:\Temp\myfile.msi',1)
query = "select * from File"
view = db.OpenView(query)
view.Execute(None)
cur_record = view.Fetch() # do this until I get the record I want
print cur_record.GetString(3) # do stuff with this value
And then if it's there, extract all the files using
msiexec /a c:\Temp\myfile.msi /qn TARGETDIR=c:\foo
and use whatever parser to see whether my string is there. But I'm hoping a less clunky way exists.
Note that, as the docs for msilib say, "Support for reading .cab files is currently not implemented". And. more generally, the library is designed for building .msi files, not reading them. And there is nothing else in the stdlib that will do what you want.
So, there are a few possibilities:
Find and install another library, like pycabinet. I know nothing about this particular library; it's just the first search hit I got; you probably want to search on your own. But it claims to provide a zipfile-like API for CAB files, which sounds like exactly the part you're missing.
Use win32com (if you've got pywin32) or ctypes (if you're a masochist) to talk to the underlying COM interfaces and/or the classic Cabinet API (which I think is now deprecated, but still works).
Use IronPython instead of CPython, so you can use the simpler .NET interfaces.
Since I don't have a Windows box here, I can't test this, but here's a sketch of Christopher Painter's .NET solution written in IronPython instead of C#:
import clr
clr.AddReference('Microsoft.Deployment.WindowsInstaller')
clr.AddReference('Microsoft.Deployment.WindowsInstaller.Package')
from Microsoft.Deployment.WindowsInstaller import *
from Microsoft.Deployment.WindowsInstaller.Package import *
def FindAndExtractFiles(packagePath, longFileName):
with InstallPackage(packagePath, DatabaseOpenMode.ReadOnly) as installPackage:
if installPackage.FindFiles(longFileName).Count() > 0:
installPackage.ExtractFiles()
Realize that in using Python you have to deal with the Windows Installer (COM) Automation interface. This means you have to do all the database connections, querying and processing yourself.
If you could move to C# ( or say PowerShell ) you could leverage some higher level classes that exist in Windows Installer XML (WiX) Deployment Tools Foundation (DTF).
using Microsoft.Deployment.WindowsInstaller;
using Microsoft.Deployment.WindowsInstaller.Package;
static void FindAndExtractFiles(string packagePath, string longFileName)
{
using (var installPackage = new InstallPackage(packagePath, DatabaseOpenMode.ReadOnly))
{
if(installPackage.FindFiles(longFileName).Count() > 0 )
installPackage.ExtractFiles();
}
}
You could also write this as ComVisible(True) and call it from Python.
The MSI APIs are inherently clunky, so it's only a matter of where the abstraction lies. Bear in mind that if you just need this a couple times, it may be easier to browse the cab file(s) manually in Explorer. (Files are stored by file key instead of file name).

Reliable way to get path to py file of a module

I'm trying to figure out the best way to reliably discover at runtime the location on the file system of the py file for a given module. I need to do this because I plan to externalize some configuration data about some methods (in this case, schemas to be used for validation of responses from service calls for which interfaces are defined in a module) for cleanliness and ease of maintenance.
An simplified illusutration of the system:
package
|
|-service.py
|
|-call1.scm
|
|-call2.scm
service.py (_call() is a method on the base class, though that's irrelevant to the question)
class FooServ(AbstractService):
def call1(*args):
result = self._call('/relative/uri/call1', *args)
# additional call specific processing
return result
def call2(*args):
result = self._call('/relative/uri/call2', *args)
# additional call specific processing
return result
call1.scm and call2.scm define the response schemas (in the current case, using the draft JSON schema format, though again, irrelevant to the question)
In another place in the codebase, when the service calls are actually made, I want to be able to detect the location of service.py so that I can traverse the file structure and find the scm files. At least on my system, I think that this will work:
# I realize this is contrived here, but in my code, the method is accessed this way
method = FooServ().call1
module_path = sys.modules[method.__self__.__class__.__module__].__file__
schema_path = os.path.join(os.path.dirname(module_path), method.__name__ + '.scm')
However, I wanted to make sure this would be safe on all platforms and installation configurations, and I came across this while doing research, which made me concerned that trying to do this this way will not work reliably. Will this work universally, or is the fact that __file__ on a module object returns the location of the pyc file, which could be in some location other than along side the py file, going to make this solution ineffective? If it will make it ineffective, what, if anything, can I do instead?
In the PEP you link, it says:
In Python 3, when you import a module, its __file__ attribute points to its source py file (in Python 2, it points to the pyc file).
So in Python 3 you're fine because __file__ will always point to the .py file. In Python 2 it might point to the .pyc file, but that .pyc will only ever be in the same directory as the .py file for Python 2.
Okay, I think you're referring to this bit:
Because these distributions cannot share pyc files, elaborate mechanisms have been developed to put the resulting pyc files in non-shared locations while the source code is still shared. Examples include the symlink-based Debian regimes python-support [8] and python-central [9]. These approaches make for much more complicated, fragile, inscrutable, and fragmented policies for delivering Python applications to a wide range of users.
I believe those mechanisms are applied only to Python modules that are packaged by the distribution. I don't think they should affect modules installed manually outside of the distribution's packaging system. It would be the responsibility of whoever was packaging your module for that distribution to make sure that the module isn't broken by the mechanism.

Most efficient way to traverse file structure Python

Is using os.walk in the way below the least time consuming way to recursively search through a folder and return all the files that end with .tnt?
for root, dirs, files in os.walk('C:\\data'):
print "Now in root %s" %root
for f in files:
if f.endswith('.tnt'):
Yes, using os.walk is indeed the best way to do that.
As everyone has said, os.walk is almost certainly the best way to do it.
If you actually have a performance problem, and profiling has shown that it's caused by os.walk (and/or iterating the results with .endswith), your best answer is probably to step outside Python. Replace all of the code above with:
for f in sys.argv[1:]:
Now you need some outside tool that can gather the paths and run your script. (Ideally batching as many paths as possible into each script execution.)
If you can rely on Windows Desktop Search having indexed the drive, it should only need to do a quick database operation to find all files under a certain path with a certain extension. I have no idea how to write a batch file that runs that query and gets the results as a list of arguments to pass to a Python script (or a PowerShell file that runs the query and passes the results to IronPython without serializing it into a list of arguments), but it would be worth researching this before anything else.
If you can't rely on your platform's desktop search index, on any POSIX platform, it would almost certainly be fastest and simplest to use this one-liner shell script:
find /my/path -name '*.tnt' -exec myscript.py {} +
Unfortunately, you're not on a POSIX platform, you're on Windows, which doesn't come with the find tool, which is the thing that's doing all the heavy lifting here.
There are ports of find to native Windows, but you'll have to figure out the command-line intricaties to get everything quoted right and format the path and so on, so you can write the one-liner batch file. Alternatively, you could install cygwin and use the exact same shell script you'd use on a POSIX system. Or you could find a more Windows-y tool that does what you need.
This could conceivably be slower rather than faster—Windows isn't designed to execute lots of little processes with as little overhead as possible, and I believe it has smaller limits on command lines than platforms like linux or OS X, so you may spend more time waiting for the interpreter to start and exit than you save. You'd have to test to see. In fact, you probably want to test both native and cygwin versions (with both native and cygwin Python, in the latter case).
You don't actually have to move the find invocation into a batch/shell script; it's probably the simplest answer, but there are others, such as using subprocess to call find from within Python. This might solve performance problems caused by starting the interpreter too many times.
Getting the right amount of parallelism may also help—spin off each invocation of your script to the background and don't wait for them to finish. (I believe on Windows, the shell isn't involved in this; instead there's a tool named something like "run" that kicks off a process detached from the shell. But I don't remember the details.)
If none of this works out, you may have to write a custom C extension that does the fastest possible Win32 or .NET thing (which also means you have to do the research to find out what that is…) so you can call that from within Python.

Categories

Resources