Pythonic way to retrieve case sensitive path? - python

I was wondering if there was a faster way to implement a function that returns a case-sensitive path in python. One of the solutions I came up with works with both linux and windows, but requires that I iterate os.listdir, which can be slow.
This solution works fine for an application and context that does not need plenty of speed:
def correctPath(start, path):
'Returns a unix-type case-sensitive path, works in windows and linux'
start = unicode(start);
path = unicode(path);
b = '';
if path[-1] == '/':
path = path[:-1];
parts = path.split('\\');
d = start;
c = 0;
for p in parts:
listing = os.listdir(d);
_ = None;
for l in listing:
if p.lower() == l.lower():
if p != l:
c += 1;
d = os.path.join(d, l);
_ = os.path.join(b, l);
break;
if not _:
return None;
b = _;
return b, c; #(corrected path, number of corrections)
>>> correctPath('C:\\Windows', 'SYSTEM32\\CmD.EXe')
(u'System32\\cmd.exe', 2)
This however, will not be as fast when the context is gathering filenames from a large 50,000+ entry database.
One method would be to create a dict tree for each directory. Match the dict tree with the directory parts of the path, and if a key-miss occurs, perform an os.listdir to find and create a dict entry for the new directory and remove the unused parts or keep a variable counter as a way to assign a "lifetime" to each directory.

The following is a slight re-write of your own code with three modifications: checking if the filename is already correct before matching, processing the listing to lowercase before testing, using index to find the relevant 'true case' file.
def corrected_path(start, path):
'''Returns a unix-type case-sensitive path, works in windows and linux'''
start = unicode(start)
path = unicode(path)
corrected_path = ''
if path[-1] == '/':
path = path[:-1]
parts = path.split('\\')
cd = start
corrections_count = 0
for p in parts:
if not os.path.exists(os.path.join(cd,p)): # Check it's not correct already
listing = os.listdir(cd)
cip = p.lower()
cilisting = [l.lower() for l in listing]
if cip in cilisting:
l = listing[ cilisting.index(cip) ] # Get our real folder name
cd = os.path.join(cd, l)
corrected_path = os.path.join(corrected_path, l)
corrections_count += 1
else:
return False # Error, this path element isn't found
else:
cd = os.path.join(cd, p)
corrected_path = os.path.join(corrected_path, p)
return corrected_path, corrections_count
I'm not sure if this will be much faster, though there is a little less testing going on, plus the 'already-correct' catch at the beginning may help.

An extended version with case-insensitive caching to pull out the corrected path:
import os,re
def corrected_paths(start, pathlist):
''' This wrapper function takes a list of paths to correct vs. to allow caching '''
start = unicode(start)
pathlist = [unicode(path[:-1]) if path[-1] == '/' else unicode(path) for path in pathlist ]
# Use a dict as a cache, storing oldpath > newpath for first-pass replacement
# with path keys from incorrect to corrected paths
cache = dict()
corrected_path_list = []
corrections_count = 0
path_split = re.compile('(/+|\+)')
for path in pathlist:
cd = start
corrected_path = ''
parts = path_split.split(path)
# Pre-process against the cache
for n,p in enumerate(parts):
# We pass *parts to send through the contents of the list as a series of strings
uncorrected_path= os.path.join( cd, *parts[0:len(parts)-n] ).lower() # Walk backwards
if uncorrected_path in cache:
# Move up the basepath to the latest matched position
cd = os.path.join(cd, cache[uncorrected_path])
parts = parts[len(parts)-n:] # Retrieve the unmatched segment
break; # First hit, we exit since we're going backwards
# Fallback to walking, from the base path cd point
for n,p in enumerate(parts):
if not os.path.exists(os.path.join(cd,p)): # Check it's not correct already
#if p not in os.listdir(cd): # Alternative: The above does not work on Mac Os, returns case-insensitive path test
listing = os.listdir(cd)
cip = p.lower()
cilisting = [l.lower() for l in listing]
if cip in cilisting:
l = listing[ cilisting.index(cip) ] # Get our real folder name
# Store the path correction in the cache for next iteration
cache[ os.path.join(cd,p).lower() ] = os.path.join(cd, l)
cd = os.path.join(cd, l)
corrections_count += 1
else:
print "Error %s not in folder %s" % (cip, cilisting)
return False # Error, this path element isn't found
else:
cd = os.path.join(cd, p)
corrected_path_list.append(cd)
return corrected_path_list, corrections_count
On an example run for a set of paths, this reduces the number of listdirs considerably (this is obviously dependent on how alike your paths are):
corrected_paths('/Users/', ['mxF793/ScRiPtS/meTApaTH','mxF793/ScRiPtS/meTApaTH/metapAth/html','mxF793/ScRiPtS/meTApaTH/metapAth/html/css','mxF793/ScRiPts/PuBfig'])
([u'/Users/mxf793/Scripts/metapath', u'/Users/mxf793/Scripts/metapath/metapath/html', u'/Users/mxf793/Scripts/metapath/metapath/html/css', u'/Users/mxf793/Scripts/pubfig'], 14)
([u'/Users/mxf793/Scripts/metapath', u'/Users/mxf793/Scripts/metapath/metapath/html', u'/Users/mxf793/Scripts/metapath/metapath/html/css', u'/Users/mxf793/Scripts/pubfig'], 5)
On the way to this I realised the on Mac OSX Python returns path matches as if they are case-insensitive, so the test for existence always succeeds. In that case the listdir can be shifted up to replace it.

Related

How do I Check a Folder's 'hidden' Attribute in Python?

I am working on a load/save module (a GUI written in Python) that will be used with and imported to future programs. My operating system is Windows 10. The problem I've run into is that my get_folders() method is grabbing ALL folder names, including ones that I would rather ignore, such as system folders and hidden folders (best seen on the c-drive).
I have a work around using a hard-coded exclusion list. But this only works for folders already on the list, not for hidden folders that my wizard may come across in the future. I would like to exclude ALL folders that have their 'hidden' attribute set. I would like to avoid methods that require installing new libraries that would have to be re-installed whenever I wipe my system. Also, if the solution is non-Windows specific, yet will work with Windows 10, all the better.
I have searched SO and the web for an answer but have come up empty. The closest I have found is contained in Answer #4 of this thread: Check for a folder, then create a hidden folder in Python, which shows how to set the hidden attribute when creating a new directory, but not how to read the hidden attribute of an existing directory.
Here is my Question: Does anyone know of a way to check if a folder's 'hidden' attribute is set, using either native python, pygame, or os. commands? Or, lacking a native answer, I would accept an imported method from a library that achieves my goal.
The following program demonstrates my get_folders() method, and shows the issue at hand:
# Written for Windows 10
import os
import win32gui
CLS = lambda :os.system('cls||echo -e \\\\033c') # Clear-Command-Console function
def get_folders(path = -1, exclude_list = -1):
if path == -1: path = os.getcwd()
if exclude_list == -1: exclude_list = ["Config.Msi","Documents and Settings","System Volume Information","Recovery","ProgramData"]
dir_list = [entry.name for entry in os.scandir(path) if entry.is_dir()] if exclude_list == [] else\
[entry.name for entry in os.scandir(path) if entry.is_dir() and '$' not in entry.name and entry.name not in exclude_list]
return dir_list
def main():
HWND = win32gui.GetForegroundWindow() # Get Command Console Handle.
win32gui.MoveWindow(HWND,100,50,650,750,False) # Size and Position Command Console.
CLS() # Clear Console Screen.
print(''.join(['\n','Folder Names'.center(50),'\n ',('-'*50).center(50)]))
# Example 1: Current Working Directory
dirs = get_folders() # Get folder names in current directory (uses exclude list.)
for elm in dirs:print(' ',elm) # Show the folder names.
print('','-'*50)
# Examle 2: C Drive, All Folders Included
dirs = get_folders('c:\\', exclude_list = []) # Get a list of folders in the root c: drive, nothing excluded.
for elm in dirs: print(' ',elm) # Show the fiolder names
print('','-'*50)
# Example 3: C Drive, Excluded Folder List Work-Around
dirs = get_folders('c:\\') # Get a list of folders in the root c: drive, excluding sytem dirs those named in the exclude_list.
for elm in dirs:print(' ',elm) # Show the folder names.
print("\n Question: Is there a way to identify folders that have the 'hidden' attribute\n\t set to True, rather than using a hard-coded exclusion list?",end='\n\n' )
# ==========================
if __name__ == "__main__":
main()
input(' Press [Enter] to Quit: ')
CLS()
Here is my revised get_folders() method, with thanks to Alexander.
# Written for Windows 10
# key portions of this code borrowed from:
# https://stackoverflow.com/questions/40367961/how-to-read-or-write-the-a-s-h-r-i-file-attributes-on-windows-using-python-and-c/40372658#40372658
# with thanks to Alexander Goryushkin.
import os
from os import scandir, stat
from stat import (
FILE_ATTRIBUTE_ARCHIVE as A,
FILE_ATTRIBUTE_SYSTEM as S,
FILE_ATTRIBUTE_HIDDEN as H,
FILE_ATTRIBUTE_READONLY as R,
FILE_ATTRIBUTE_NOT_CONTENT_INDEXED as I
)
from ctypes import WinDLL, WinError, get_last_error
import win32gui
CLS = lambda :os.system('cls||echo -e \\\\033c') # Clear-Command-Console function
def read_or_write_attribs(kernel32, entry, a=None, s=None, h=None, r=None, i=None, update=False):
# Get the file attributes as an integer.
if not update: attrs = entry.stat(follow_symlinks=False).st_file_attributes# Fast because we access the stats from the entry
else:
# Notice that this will raise a "WinError: Access denied" on some entries,
# for example C:\System Volume Information\
attrs = stat(entry.path, follow_symlinks=False).st_file_attributes
# A bit slower because we re-read the stats from the file path.
# Construct the new attributes
newattrs = attrs
def setattrib(attr, value):
nonlocal newattrs
# Use '{0:032b}'.format(number) to understand what this does.
if value is True: newattrs = newattrs | attr
elif value is False:
newattrs = newattrs & ~attr
setattrib(A, a)
setattrib(S, s)
setattrib(H, h)
setattrib(R, r)
setattrib(I, i if i is None else not i) # Because this attribute is True when the file is _not_ indexed
# Optional add more attributes here.
# See https://docs.python.org/3/library/stat.html#stat.FILE_ATTRIBUTE_ARCHIVE
# Write the new attributes if they changed
if newattrs != attrs:
if not kernel32.SetFileAttributesW(entry.path, newattrs):
raise WinError(get_last_error())
# Return an info tuple consisting of bools
return ( bool(newattrs & A),
bool(newattrs & S),
bool(newattrs & H),
bool(newattrs & R),
not bool(newattrs & I) )# Because this attribute is true when the file is _not_ indexed)
# Get the file attributes as an integer.
if not update:
# Fast because we access the stats from the entry
attrs = entry.stat(follow_symlinks=False).st_file_attributes
else:
# A bit slower because we re-read the stats from the file path.
# Notice that this will raise a "WinError: Access denied" on some entries,
# for example C:\System Volume Information\
attrs = stat(entry.path, follow_symlinks=False).st_file_attributes
return dir_list
def get_folders(path = -1, show_hidden = False):
if path == -1: path = os.getcwd()
dir_list = []
kernel32 = WinDLL('kernel32', use_last_error=True)
for entry in scandir(path):
a,s,hidden,r,i = read_or_write_attribs(kernel32,entry)
if entry.is_dir() and (show_hidden or not hidden): dir_list.append(entry.name)
return dir_list
def main():
HWND = win32gui.GetForegroundWindow() # Get Command Console Handle.
win32gui.MoveWindow(HWND,100,50,650,750,False) # Size and Position Command Console.
CLS() # Clear Console Screen.
line_len = 36
# Example 1: C Drive, Exclude Hidden Folders
print(''.join(['\n','All Folders Not Hidden:'.center(line_len),'\n ',('-'*line_len).center(line_len)]))
dirs = get_folders('c:\\') # Get a list of folders on the c: drive, exclude hidden.
for elm in dirs:print(' ',elm) # Show the folder names.
print('','='*line_len)
# Examle 2: C Drive, Include Hidden Folders
print(" All Folders Including Hidden\n "+"-"*line_len)
dirs = get_folders('c:\\', show_hidden = True) # Get a list of folders on the c: drive, including hidden.
for elm in dirs: print(' ',elm) # Show the fiolder names
print('','-'*line_len)
# ==========================
if __name__ == "__main__":
main()
input(' Press [Enter] to Quit: ')
CLS()

How to calculate a Directory size in ADLS using PySpark?

I want to calculate a directory(e.g- XYZ) size which contains sub folders and sub files.
I want total size of all the files and everything inside XYZ.
I could find out all the folders inside a particular path. But I want size of all together.
Also I see
display(dbutils.fs.ls("/mnt/datalake/.../XYZ/.../abc.parquet"))
gives me data size of abc file.
But I want complete size of XYZ.
The dbutils.fs.ls doesn't have a recurse functionality like cp, mv or rm. Thus, you need to iterate yourself. Here is a snippet that will do the task for you. Run the code from a Databricks Notebook.
from dbutils import FileInfo
from typing import List
root_path = "/mnt/datalake/.../XYZ"
def discover_size(path: str, verbose: bool = True):
def loop_path(paths: List[FileInfo], accum_size: float):
if not paths:
return accum_size
else:
head, tail = paths[0], paths[1:]
if head.size > 0:
if verbose:
print(f"{head.path}: {head.size / 1e6} MB")
accum_size += head.size / 1e6
return loop_path(tail, accum_size)
else:
extended_tail = dbutils.fs.ls(head.path) + tail
return loop_path(extended_tail, accum_size)
return loop_path(dbutils.fs.ls(path), 0.0)
discover_size(root_path, verbose=True) # Total size in megabytes at the end
If the location is mounted in the dbfs. Then you could use the du -h approach (have not test it). If you are in the Notebook, create a new cell with:
%sh
du -h /mnt/datalake/.../XYZ
The #Emer answer is good, but can hit a RecursionError: maximum recursion depth exceeded really quickly, because it does a recursion for each files (if you have X files you will have X imbricated recursions).
Here is the same thing with recursion only for folders:
%python
from dbutils import FileInfo
from typing import List
def discover_size2(path: str, verbose: bool = True):
def loop_path(path: str):
accum_size = 0.0
path_list = dbutils.fs.ls(path)
if path_list:
for path_object in path_list:
if path_object.size > 0:
if verbose:
print(f"{path_object.path}: {path_object.size / 1e6} MB")
accum_size += path_object.size / 1e6
else:
# Folder: recursive discovery
accum_size += loop_path(path_object.path)
return accum_size
return loop_path(path)
Try using the dbutils ls command, get the list of files in a dataframe and query by using aggregate function SUM() on size column:
val fsds = dbutils.fs.ls("/mnt/datalake/.../XYZ/.../abc.parquet").toDF
fsds.createOrReplaceTempView("filesList")
display(spark.sql("select COUNT(name) as NoOfRows, SUM(size) as sizeInBytes from fileListPROD"))
For anyone still hitting recursion limits with #robin loche's approach, here is a purely iterative answer:
# from dbutils import FileInfo # Not required in databricks
# from dbruntime.dbutils import FileInfo # may work for some people
def get_size_of_path(path):
return sum([file.size for file in get_all_files_in_path(path)])
def get_all_files_in_path(path, verbose=False):
nodes_new = []
nodes_new = dbutils.fs.ls(path)
files = []
while len(nodes_new) > 0:
current_nodes = nodes_new
nodes_new = []
for node in current_nodes:
if verbose:
print(f"Processing {node.path}")
children = dbutils.fs.ls(node.path)
for child in children:
if child.size == 0 and child.path != node.path:
nodes_new.append(child)
elif child.path != node.path:
files.append(child)
return files
path = "s3://some/path/"
print(f"Size of {path} in gb: {get_size_of_path(path) / 1024 / 1024 / 1024}")
Love the answer by Emer!
Small addition:
If you met
"ModuleNotFoundError: No module named 'dbutils'"
try this from dbruntime.dbutils instead of from dbutils. It works for me!
-- Shizheng

Calculate Each dropbox folder size recursively using python api

EDIT: I want to calculate each folder size not just entire dropbox size... My code is working fine for whole dropbox size
I am having difficulty in calculating each folder size of dropbox using python api
as dropbox returns folder size as zero
here's my code so far but it's giving me wrong answer
def main(dp_path):
a= client.metadata(dp_path)
size_local = 0
for x in a['contents']:
if x['is_dir']==False:
global size
size += int(x['bytes'])
size_local += int(x['bytes'])
#print "Total size so far :"+str(size/(1024.00*1024.00))+" Mb..."
if x['is_dir']==True:
a = main(str(x['path']))
print str(x['path'])+" size=="+str(a/(1024.00*1024.00))+" Mb..."
return size_local+size
if __name__ == '__main__':
global size
size=0
main('/')
print str(size/(1024.00*1024.00))+" Mb"
EDIT 2: It seems I misunderstood the question. Here's code that prints out the sizes of each folder (in order of decreasing size):
from dropbox.client import DropboxClient
from collections import defaultdict
client = DropboxClient('<YOUR ACCESS TOKEN>')
sizes = {}
cursor = None
while cursor is None or result['has_more']:
result = client.delta(cursor)
for path, metadata in result['entries']:
sizes[path] = metadata['bytes'] if metadata else 0
cursor = result['cursor']
foldersizes = defaultdict(lambda: 0)
for path, size in sizes.items():
segments = path.split('/')
for i in range(1, len(segments)):
folder = '/'.join(segments[:i])
if folder == '': folder = '/'
foldersizes[folder] += size
for folder in reversed(sorted(foldersizes.keys(), key=lambda x: foldersizes[x])):
print '%s: %d' % (folder, foldersizes[folder])
EDIT: I had a major bug in the second code snippet (the delta one), and I've now tested all three and found them all to report the same number.
This works:
from dropbox.client import DropboxClient
client = DropboxClient('<YOUR ACCESS TOKEN>')
def size(path):
return sum(
f['bytes'] if not f['is_dir'] else size(f['path'])
for f in client.metadata(path)['contents']
)
print size('/')
But it's much more efficient to use /delta:
sizes = {}
cursor = None
while cursor is None or result['has_more']:
result = client.delta(cursor)
for path, metadata in result['entries']:
sizes[path] = metadata['bytes'] if metadata else 0
cursor = result['cursor']
print sum(sizes.values())
And if you truly just need to know the overall usage for the account, you can just do this:
quota_info = client.account_info()['quota_info']
print quota_info['normal'] + quota_info['shared']

shortest path from goal to root in directed graph with cycles python

I want to find the shortest path from goal to root working backwards
My input for root is {'4345092': ['6570646', '40586', '484']}
My input for goal is {'886619': ['GOAL']}
My input for path_holder is an input but it gets converted to dct and is used for this function. I am getting stuck regarding the while loop as it creates the path for me backwards. Right now I can't get q to print because that part of the code isn't being ran. dct is basically a directed graph representation that contains cycles. I can't seem to figure out how to start from GOAL and end up at the root node. I was wondering if someone could help me figure this out thanks!
dct:
dct =
{ '612803266': ['12408765', '46589', '5880', '31848'],
'8140983': ['7922972', '56008'],
'7496838': ['612803266'],
'1558536111': ['7496838'],
'31848': ['DEADEND'],
'1910530': ['8140983'],
'242010': ['58644', '886619'],
'727315568': ['DEADEND'],
'12408765': ['DEADEND'],
'56008': ['DEADEND'],
'58644': ['DEADEND'],
'886619': ['GOAL'],
'40586': ['931', '727315568', '242010', '1910530'],
'5880': ['1558536111'],
'46589': ['DEADEND'],
'6570646': ['2549003','43045', '13830'],
'931': ['299159122'],
'484': ['1311310', '612803266'],
'1311310': ['DEADEND'],
'7922972': ['DEADEND']
}
my function:
def backtrace(path_holder, root, goal):
dct = {}
for d in path_holder:
dct.update(d)
rootnode = root.keys()[0]
goal = goal.keys()[0]
path = []
path.append(goal)
q = 0
while goal != rootnode:
# find key that contains goal in list
for i in dct: #iterate keys
if i in dct: # prevent repeat of path
continue
for j in dct[i]: #iterate though children
if j == goal:
path.append(i)
goal = i # look for new goal
q += 1
print q
#print goal
# append key that has goal in the list
# set goal to be the key that was appended
# repeat
return path
Just find the paths and then invert them.
UPDATED: Added "[]" to "DEADEND" and "GOAL" in end conditions.
import copy as cp
DCT = {...} # You already know what goes here.
FOUND_PATHS = [] # In case of more than one path to GOAL.
FOUND_REVERSE_PATHS = []
COUNTER = len(DCT)
def back_track(root, target_path = [], counter=COUNTER):
"""
#param root: DCT key.
#type root: str.
#param target_path: Reference to the path we are constructing.
#type target_path: list.
"""
global FOUND_PATHS
# Avoiding cycles.
if counter == 0:
return
# Some nodes aren't full generated.
try:
DCT[root]
except KeyError:
return
# End condition.
if DCT[root] == ['DEADEND']:
return
# Path found.
if DCT[root] == ['GOAL']:
FOUND_PATHS.append(target_path) # The normal path.
reverse_path = cp.copy(target_path)
reverse_path.reverse()
FOUND_REVERSE_PATHS.append(reverse_path) # The path you want.
return
for node in DCT[root]:
# Makes copy of target parh and add the node.
path_copy = cp.copy(target_path)
path_copy.append(node)
# Call back_track with current node and the copy
# of target_path.
back_track(node, path_copy, counter=(counter - 1))
if __name__ == '__main__':
back_track('4345092')
print(FOUND_PATHS)
print(FOUND_REVERSE_PATHS)

Reading the target of a .lnk file in Python?

I'm trying to read the target file/directory of a shortcut (.lnk) file from Python. Is there a headache-free way to do it? The spec is way over my head.
I don't mind using Windows-only APIs.
My ultimate goal is to find the "(My) Videos" folder on Windows XP and Vista. On XP, by default, it's at %HOMEPATH%\My Documents\My Videos, and on Vista it's %HOMEPATH%\Videos. However, the user can relocate this folder. In the case, the %HOMEPATH%\Videos folder ceases to exists and is replaced by %HOMEPATH%\Videos.lnk which points to the new "My Videos" folder. And I want its absolute location.
Create a shortcut using Python (via WSH)
import sys
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
shortcut = shell.CreateShortCut("t:\\test.lnk")
shortcut.Targetpath = "t:\\ftemp"
shortcut.save()
Read the Target of a Shortcut using Python (via WSH)
import sys
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
shortcut = shell.CreateShortCut("t:\\test.lnk")
print(shortcut.Targetpath)
I know this is an older thread but I feel that there isn't much information on the method that uses the link specification as noted in the original question.
My shortcut target implementation could not use the win32com module and after a lot of searching, decided to come up with my own. Nothing else seemed to accomplish what I needed under my restrictions. Hopefully this will help other folks in this same situation.
It uses the binary structure Microsoft has provided for MS-SHLLINK.
import struct
path = 'myfile.txt.lnk'
target = ''
with open(path, 'rb') as stream:
content = stream.read()
# skip first 20 bytes (HeaderSize and LinkCLSID)
# read the LinkFlags structure (4 bytes)
lflags = struct.unpack('I', content[0x14:0x18])[0]
position = 0x18
# if the HasLinkTargetIDList bit is set then skip the stored IDList
# structure and header
if (lflags & 0x01) == 1:
position = struct.unpack('H', content[0x4C:0x4E])[0] + 0x4E
last_pos = position
position += 0x04
# get how long the file information is (LinkInfoSize)
length = struct.unpack('I', content[last_pos:position])[0]
# skip 12 bytes (LinkInfoHeaderSize, LinkInfoFlags, and VolumeIDOffset)
position += 0x0C
# go to the LocalBasePath position
lbpos = struct.unpack('I', content[position:position+0x04])[0]
position = last_pos + lbpos
# read the string at the given position of the determined length
size= (length + last_pos) - position - 0x02
temp = struct.unpack('c' * size, content[position:position+size])
target = ''.join([chr(ord(a)) for a in temp])
Alternatively, you could try using SHGetFolderPath(). The following code might work, but I'm not on a Windows machine right now so I can't test it.
import ctypes
shell32 = ctypes.windll.shell32
# allocate MAX_PATH bytes in buffer
video_folder_path = ctypes.create_string_buffer(260)
# 0xE is CSIDL_MYVIDEO
# 0 is SHGFP_TYPE_CURRENT
# If you want a Unicode path, use SHGetFolderPathW instead
if shell32.SHGetFolderPathA(None, 0xE, None, 0, video_folder_path) >= 0:
# success, video_folder_path now contains the correct path
else:
# error
Basically call the Windows API directly. Here is a good example found after Googling:
import os, sys
import pythoncom
from win32com.shell import shell, shellcon
shortcut = pythoncom.CoCreateInstance (
shell.CLSID_ShellLink,
None,
pythoncom.CLSCTX_INPROC_SERVER,
shell.IID_IShellLink
)
desktop_path = shell.SHGetFolderPath (0, shellcon.CSIDL_DESKTOP, 0, 0)
shortcut_path = os.path.join (desktop_path, "python.lnk")
persist_file = shortcut.QueryInterface (pythoncom.IID_IPersistFile)
persist_file.Load (shortcut_path)
shortcut.SetDescription ("Updated Python %s" % sys.version)
mydocs_path = shell.SHGetFolderPath (0, shellcon.CSIDL_PERSONAL, 0, 0)
shortcut.SetWorkingDirectory (mydocs_path)
persist_file.Save (shortcut_path, 0)
This is from http://timgolden.me.uk/python/win32_how_do_i/create-a-shortcut.html.
You can search for "python ishelllink" for other examples.
Also, the API reference helps too: http://msdn.microsoft.com/en-us/library/bb774950(VS.85).aspx
I also realize this question is old, but I found the answers to be most relevant to my situation.
Like Jared's answer, I also could not use the win32com module. So Jared's use of the binary structure from MS-SHLLINK got me part of the way there. I needed read shortcuts on both Windows and Linux, where the shortcuts are created on a samba share by Windows. Jared's implementation didn't quite work for me, I think only because I encountered some different variations on the shortcut format. But, it gave me the start I needed (thanks Jared).
So, here is a class named MSShortcut which expands on Jared's approach. However, the implementation is only Python3.4 and above, due to using some pathlib features added in Python3.4
#!/usr/bin/python3
# Link Format from MS: https://msdn.microsoft.com/en-us/library/dd871305.aspx
# Need to be able to read shortcut target from .lnk file on linux or windows.
# Original inspiration from: http://stackoverflow.com/questions/397125/reading-the-target-of-a-lnk-file-in-python
from pathlib import Path, PureWindowsPath
import struct, sys, warnings
if sys.hexversion < 0x03040000:
warnings.warn("'{}' module requires python3.4 version or above".format(__file__), ImportWarning)
# doc says class id =
# 00021401-0000-0000-C000-000000000046
# requiredCLSID = b'\x00\x02\x14\x01\x00\x00\x00\x00\xC0\x00\x00\x00\x00\x00\x00\x46'
# Actually Getting:
requiredCLSID = b'\x01\x14\x02\x00\x00\x00\x00\x00\xC0\x00\x00\x00\x00\x00\x00\x46' # puzzling
class ShortCutError(RuntimeError):
pass
class MSShortcut():
"""
interface to Microsoft Shortcut Objects. Purpose:
- I need to be able to get the target from a samba shared on a linux machine
- Also need to get access from a Windows machine.
- Need to support several forms of the shortcut, as they seem be created differently depending on the
creating machine.
- Included some 'flag' types in external interface to help test differences in shortcut types
Args:
scPath (str): path to shortcut
Limitations:
- There are some omitted object properties in the implementation.
Only implemented / tested enough to recover the shortcut target information. Recognized omissions:
- LinkTargetIDList
- VolumeId structure (if captured later, should be a separate class object to hold info)
- Only captured environment block from extra data
- I don't know how or when some of the shortcut information is used, only captured what I recognized,
so there may be bugs related to use of the information
- NO shortcut update support (though might be nice)
- Implementation requires python 3.4 or greater
- Tested only with Unicode data on a 64bit little endian machine, did not consider potential endian issues
Not Debugged:
- localBasePath - didn't check if parsed correctly or not.
- commonPathSuffix
- commonNetworkRelativeLink
"""
def __init__(self, scPath):
"""
Parse and keep shortcut properties on creation
"""
self.scPath = Path(scPath)
self._clsid = None
self._lnkFlags = None
self._lnkInfoFlags = None
self._localBasePath = None
self._commonPathSuffix = None
self._commonNetworkRelativeLink = None
self._name = None
self._relativePath = None
self._workingDir = None
self._commandLineArgs = None
self._iconLocation = None
self._envTarget = None
self._ParseLnkFile(self.scPath)
#property
def clsid(self):
return self._clsid
#property
def lnkFlags(self):
return self._lnkFlags
#property
def lnkInfoFlags(self):
return self._lnkInfoFlags
#property
def localBasePath(self):
return self._localBasePath
#property
def commonPathSuffix(self):
return self._commonPathSuffix
#property
def commonNetworkRelativeLink(self):
return self._commonNetworkRelativeLink
#property
def name(self):
return self._name
#property
def relativePath(self):
return self._relativePath
#property
def workingDir(self):
return self._workingDir
#property
def commandLineArgs(self):
return self._commandLineArgs
#property
def iconLocation(self):
return self._iconLocation
#property
def envTarget(self):
return self._envTarget
#property
def targetPath(self):
"""
Args:
woAnchor (bool): remove the anchor (\\server\path or drive:) from returned path.
Returns:
a libpath PureWindowsPath object for combined workingDir/relative path
or the envTarget
Raises:
ShortCutError when no target path found in Shortcut
"""
target = None
if self.workingDir:
target = PureWindowsPath(self.workingDir)
if self.relativePath:
target = target / PureWindowsPath(self.relativePath)
else: target = None
if not target and self.envTarget:
target = PureWindowsPath(self.envTarget)
if not target:
raise ShortCutError("Unable to retrieve target path from MS Shortcut: shortcut = {}"
.format(str(self.scPath)))
return target
#property
def targetPathWOAnchor(self):
tp = self.targetPath
return tp.relative_to(tp.anchor)
def _ParseLnkFile(self, lnkPath):
with lnkPath.open('rb') as f:
content = f.read()
# verify size (4 bytes)
hdrSize = struct.unpack('I', content[0x00:0x04])[0]
if hdrSize != 0x4C:
raise ShortCutError("MS Shortcut HeaderSize = {}, but required to be = {}: shortcut = {}"
.format(hdrSize, 0x4C, str(lnkPath)))
# verify LinkCLSID id (16 bytes)
self._clsid = bytes(struct.unpack('B'*16, content[0x04:0x14]))
if self._clsid != requiredCLSID:
raise ShortCutError("MS Shortcut ClassID = {}, but required to be = {}: shortcut = {}"
.format(self._clsid, requiredCLSID, str(lnkPath)))
# read the LinkFlags structure (4 bytes)
self._lnkFlags = struct.unpack('I', content[0x14:0x18])[0]
#logger.debug("lnkFlags=0x%0.8x" % self._lnkFlags)
position = 0x4C
# if HasLinkTargetIDList bit, then position to skip the stored IDList structure and header
if (self._lnkFlags & 0x01):
idListSize = struct.unpack('H', content[position:position+0x02])[0]
position += idListSize + 2
# if HasLinkInfo, then process the linkinfo structure
if (self._lnkFlags & 0x02):
(linkInfoSize, linkInfoHdrSize, self._linkInfoFlags,
volIdOffset, localBasePathOffset,
cmnNetRelativeLinkOffset, cmnPathSuffixOffset) = struct.unpack('IIIIIII', content[position:position+28])
# check for optional offsets
localBasePathOffsetUnicode = None
cmnPathSuffixOffsetUnicode = None
if linkInfoHdrSize >= 0x24:
(localBasePathOffsetUnicode, cmnPathSuffixOffsetUnicode) = struct.unpack('II', content[position+28:position+36])
#logger.debug("0x%0.8X" % linkInfoSize)
#logger.debug("0x%0.8X" % linkInfoHdrSize)
#logger.debug("0x%0.8X" % self._linkInfoFlags)
#logger.debug("0x%0.8X" % volIdOffset)
#logger.debug("0x%0.8X" % localBasePathOffset)
#logger.debug("0x%0.8X" % cmnNetRelativeLinkOffset)
#logger.debug("0x%0.8X" % cmnPathSuffixOffset)
#logger.debug("0x%0.8X" % localBasePathOffsetUnicode)
#logger.debug("0x%0.8X" % cmnPathSuffixOffsetUnicode)
# if info has a localBasePath
if (self._linkInfoFlags & 0x01):
bpPosition = position + localBasePathOffset
# not debugged - don't know if this works or not
self._localBasePath = UnpackZ('z', content[bpPosition:])[0].decode('ascii')
#logger.debug("localBasePath: {}".format(self._localBasePath))
if localBasePathOffsetUnicode:
bpPosition = position + localBasePathOffsetUnicode
self._localBasePath = UnpackUnicodeZ('z', content[bpPosition:])[0]
self._localBasePath = self._localBasePath.decode('utf-16')
#logger.debug("localBasePathUnicode: {}".format(self._localBasePath))
# get common Path Suffix
cmnSuffixPosition = position + cmnPathSuffixOffset
self._commonPathSuffix = UnpackZ('z', content[cmnSuffixPosition:])[0].decode('ascii')
#logger.debug("commonPathSuffix: {}".format(self._commonPathSuffix))
if cmnPathSuffixOffsetUnicode:
cmnSuffixPosition = position + cmnPathSuffixOffsetUnicode
self._commonPathSuffix = UnpackUnicodeZ('z', content[cmnSuffixPosition:])[0]
self._commonPathSuffix = self._commonPathSuffix.decode('utf-16')
#logger.debug("commonPathSuffix: {}".format(self._commonPathSuffix))
# check for CommonNetworkRelativeLink
if (self._linkInfoFlags & 0x02):
relPosition = position + cmnNetRelativeLinkOffset
self._commonNetworkRelativeLink = CommonNetworkRelativeLink(content, relPosition)
position += linkInfoSize
# If HasName
if (self._lnkFlags & 0x04):
(position, self._name) = self.readStringObj(content, position)
#logger.debug("name: {}".format(self._name))
# get relative path string
if (self._lnkFlags & 0x08):
(position, self._relativePath) = self.readStringObj(content, position)
#logger.debug("relPath='{}'".format(self._relativePath))
# get working dir string
if (self._lnkFlags & 0x10):
(position, self._workingDir) = self.readStringObj(content, position)
#logger.debug("workingDir='{}'".format(self._workingDir))
# get command line arguments
if (self._lnkFlags & 0x20):
(position, self._commandLineArgs) = self.readStringObj(content, position)
#logger.debug("commandLineArgs='{}'".format(self._commandLineArgs))
# get icon location
if (self._lnkFlags & 0x40):
(position, self._iconLocation) = self.readStringObj(content, position)
#logger.debug("iconLocation='{}'".format(self._iconLocation))
# look for environment properties
if (self._lnkFlags & 0x200):
while True:
size = struct.unpack('I', content[position:position+4])[0]
#logger.debug("blksize=%d" % size)
if size==0: break
signature = struct.unpack('I', content[position+4:position+8])[0]
#logger.debug("signature=0x%0.8x" % signature)
# EnvironmentVariableDataBlock
if signature == 0xA0000001:
if (self._lnkFlags & 0x80): # unicode
self._envTarget = UnpackUnicodeZ('z', content[position+268:])[0]
self._envTarget = self._envTarget.decode('utf-16')
else:
self._envTarget = UnpackZ('z', content[position+8:])[0].decode('ascii')
#logger.debug("envTarget='{}'".format(self._envTarget))
position += size
def readStringObj(self, scContent, position):
"""
returns:
tuple: (newPosition, string)
"""
strg = ''
size = struct.unpack('H', scContent[position:position+2])[0]
#logger.debug("workingDirSize={}".format(size))
if (self._lnkFlags & 0x80): # unicode
size *= 2
strg = struct.unpack(str(size)+'s', scContent[position+2:position+2+size])[0]
strg = strg.decode('utf-16')
else:
strg = struct.unpack(str(size)+'s', scContent[position+2:position+2+size])[0].decode('ascii')
#logger.debug("strg='{}'".format(strg))
position += size + 2 # 2 bytes to account for CountCharacters field
return (position, strg)
class CommonNetworkRelativeLink():
def __init__(self, scContent, linkContentPos):
self._networkProviderType = None
self._deviceName = None
self._netName = None
(linkSize, flags, netNameOffset,
devNameOffset, self._networkProviderType) = struct.unpack('IIIII', scContent[linkContentPos:linkContentPos+20])
#logger.debug("netnameOffset = {}".format(netNameOffset))
if netNameOffset > 0x014:
(netNameOffsetUnicode, devNameOffsetUnicode) = struct.unpack('II', scContent[linkContentPos+20:linkContentPos+28])
#logger.debug("netnameOffsetUnicode = {}".format(netNameOffsetUnicode))
self._netName = UnpackUnicodeZ('z', scContent[linkContentPos+netNameOffsetUnicode:])[0]
self._netName = self._netName.decode('utf-16')
self._deviceName = UnpackUnicodeZ('z', scContent[linkContentPos+devNameOffsetUnicode:])[0]
self._deviceName = self._deviceName.decode('utf-16')
else:
self._netName = UnpackZ('z', scContent[linkContentPos+netNameOffset:])[0].decode('ascii')
self._deviceName = UnpackZ('z', scContent[linkContentPos+devNameOffset:])[0].decode('ascii')
#property
def deviceName(self):
return self._deviceName
#property
def netName(self):
return self._netName
#property
def networkProviderType(self):
return self._networkProviderType
def UnpackZ (fmt, buf) :
"""
Unpack Null Terminated String
"""
#logger.debug(bytes(buf))
while True :
pos = fmt.find ('z')
if pos < 0 :
break
z_start = struct.calcsize (fmt[:pos])
z_len = buf[z_start:].find(b'\0')
#logger.debug(z_len)
fmt = '%s%dsx%s' % (fmt[:pos], z_len, fmt[pos+1:])
#logger.debug("fmt='{}', len={}".format(fmt, z_len))
fmtlen = struct.calcsize(fmt)
return struct.unpack (fmt, buf[0:fmtlen])
def UnpackUnicodeZ (fmt, buf) :
"""
Unpack Null Terminated String
"""
#logger.debug(bytes(buf))
while True :
pos = fmt.find ('z')
if pos < 0 :
break
z_start = struct.calcsize (fmt[:pos])
# look for null bytes by pairs
z_len = 0
for i in range(z_start,len(buf),2):
if buf[i:i+2] == b'\0\0':
z_len = i-z_start
break
fmt = '%s%dsxx%s' % (fmt[:pos], z_len, fmt[pos+1:])
# logger.debug("fmt='{}', len={}".format(fmt, z_len))
fmtlen = struct.calcsize(fmt)
return struct.unpack (fmt, buf[0:fmtlen])
I hope this helps others as well.
Thanks
I didn't really like any of the answers available because I didn't want to keep importing more and more libraries and the 'shell' option was spotty on my test machines. I opted for reading the ".lnk" in and then using a regular expression to read out the path. For my purposes, I am looking for pdf files that were recently opened and then reading the content of those files:
# Example file path to the shortcut
shortcut = "shortcutFileName.lnk"
# Open the lnk file using the ISO-8859-1 encoder to avoid errors for special characters
lnkFile = open(shortcut, 'r', encoding = "ISO-8859-1")
# Use a regular expression to parse out the pdf file on C:\
filePath = re.findall("C:.*?pdf", lnkFile.read(), flags=re.DOTALL)
# Close File
lnkFile.close()
# Read the pdf at the lnk Target
pdfFile = open(tmpFilePath[0], 'rb')
Comments:
Obviously this works for pdf but needs to specify other file extensions accordingly.
It's easy as opening ".exe" file. Here also, we are going to use the os module for this. You just have to create a shortcut .lnk and store it in any folder of your choice. Then, in any Python file, first import the os module (already installed, just import). Then, use a variable, say path, and assign it a string value containing the location of your .lnk file. Just create a shortcut of your desired application. At last, we will use os.startfile()
to open our shortcut.
Points to remember:
The location should be within double inverted commas.
Most important, open Properties. Then, under that, open "Details". There, you can get the exact name of your shortcut. Please write that name with ".lnk" at last.
Now, you have completed the procedure. I hope it helps you. For additional assistance, I am leaving my code for this at the bottom.
import os
path = "C:\\Users\\hello\\OneDrive\\Desktop\\Shortcuts\\OneNote for Windows 10.lnk"
os.startfile(path)
In my code, I used path as variable and I had created a shortcut for OneNote. In path, I defined the location of OneNote's shortcut. So when I use os.startfile(path), the os module is going to open my shortcut file defined in variable path.
this job is possible without any modules, doing this will return a b string having the destination of the shortcut file. Basically what you do is you open the file in read binary mode (rb mode). This is the code to accomplish this task:
with open('programs.lnk - Copy','rb') as f:
destination=f.read()
i am currently using python 3.9.2, in case you face problems with this, just tell me and i will try to fix it.
A more stable solution in python, using powershell to read the target path from the .lnk file.
using only standard libraries avoids introducing extra dependencies such as win32com
this approach works with the .lnks that failed with jared's answer, more details
we avoid directly reading the file, which felt hacky, and sometimes failed
import subprocess
def get_target(link_path) -> (str, str):
"""
Get the target & args of a Windows shortcut (.lnk)
:param link_path: The Path or string-path to the shortcut, e.g. "C:\\Users\\Public\\Desktop\\My Shortcut.lnk"
:return: A tuple of the target and arguments, e.g. ("C:\\Program Files\\My Program.exe", "--my-arg")
"""
# get_target implementation by hannes, https://gist.github.com/Winand/997ed38269e899eb561991a0c663fa49
ps_command = \
"$WSShell = New-Object -ComObject Wscript.Shell;" \
"$Shortcut = $WSShell.CreateShortcut(\"" + str(link_path) + "\"); " \
"Write-Host $Shortcut.TargetPath ';' $shortcut.Arguments "
output = subprocess.run(["powershell.exe", ps_command], capture_output=True)
raw = output.stdout.decode('utf-8')
launch_path, args = [x.strip() for x in raw.split(';', 1)]
return launch_path, args
# to test
shortcut_file = r"C:\Users\REPLACE_WITH_USERNAME\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Accessibility\Narrator.lnk"
a, args = get_target(shortcut_file)
print(a) # C:\WINDOWS\system32\narrator.exe
(you can remove -> typehinting to get it to work in older python versions)
I did notice this is slow when running on lots of shortcuts. You could use jareds method, check if the result is None, and if so, run this code to get the target path.
The nice approach with direct regex-based parsing (proposed in the answer) didn't work reliable for all shortcuts in my case. Some of them have only relative path like ..\\..\\..\\..\\..\\..\\Program Files\\ImageGlass\\ImageGlass.exe (produced by msi-installer), and it is stored with wide chars, which are tricky to handle in Python.
So I've discovered a Python module LnkParse3, which is easy to use and meets my needs.
Here is a sample script to show target of a lnk-file passed as first argument:
import LnkParse3
import sys
with open(sys.argv[1], 'rb') as indata:
lnk = LnkParse3.lnk_file(indata)
print(lnk.lnk_command)
I arrived at this thread looking for a way to parse a ".lnk" file and get the target file name.
I found another very simple solution:
pip install comtypes
Then
from comtypes.client import CreateObject
from comtypes.persist import IPersistFile
from comtypes.shelllink import ShellLink
# MAKE SURE THIS VAT CONTAINS A STRING AND NOT AN OBJECT OF 'PATH'
# I spent too much time figuring out the problem with .load(..) function ahead
pathStr="c:\folder\yourlink.lnk"
s = CreateObject(ShellLink)
p = s.QueryInterface(IPersistFile)
p.Load(pathStr, False)
print(s.GetPath())
print(s.GetArguments())
print(s.GetWorkingDirectory())
print(s.GetIconLocation())
try:
# the GetDescription create exception in some of the links
print(s.GetDescription())
except Exception as e:
print(e)
print(s.Hotkey)
print(s.ShowCmd)
Based on this great answer...
https://stackoverflow.com/a/43856809/2992810

Categories

Resources