GAE blobstore filename UTF-8 encoding problem - python

I have some filename encoding problem in GAE blobstore here.
class UploadHandler(blobstore_handlers.BlobstoreUploadHandler):
def post(self):
upload_files = self.get_uploads('file')
blob_info = upload_files[0]
#Problem right here
decoded_filename = blob_info.filename.decode("utf-8")
#
File_info = Fileinfo(
key_name=str(blob_info.key()),
filename=decoded_filename,
)
File_info.put()
self.redirect("/")
When I run in local, it function normal in SDK console,
but after upload to GAE it store it shows like non-decode string
"=?UTF-8?B?54Wn54mH5pel5pyfIDIwMTAtMDgtMDM=?="
or =?Big5?B?v8O59afWt9MgMjAxMC0xMi0wMiA=?=
I doubt the best solution might be,
stop using Chinese character filename ...
All suggestions are very welcome :)

It's an open issue: Blobstore handler breaking data encoding, check here.

The filename of BlobInfo is MIME-encoded by Google.
I do not know why Google is doing so.
It is broken for the people living in multi-byte countries.
You can get a correct filename, if you using any character code, as below:
import email
for blob_info in self.get_uploads('file'):
filename_mime = blob_info.filename
if isinstance(filename_mime, unicode):
filename_mime_utf8 = filename_mime.encode('utf-8')
else:
filename_mime_utf8 = filename_mime
filename_encoded, encoding = email.header.decode_header(filename_mime_utf8)[0]
if encoding is not None:
filename_unicode = filename_encoded.decode(encoding)
filename_utf8 = filename_unicode.encode('utf-8')
blob_info._BlobInfo__entity['filename'] = filename_utf8

Here is a tweak to ENDOH takanao solution, which you can call on each file_info object:
def get_filename_from_file_info(file_info):
filename_mime = file_info.filename
if isinstance(filename_mime, unicode):
filename_mime_utf8 = filename_mime.encode('utf-8')
else:
filename_mime_utf8 = filename_mime
filename_encoded, encoding = email.header.decode_header(filename_mime_utf8)[0]
if encoding is not None:
filename_unicode = filename_encoded.decode(encoding)
filename_utf8 = filename_unicode.encode('utf-8')
return filename_utf8
return filename_mime_utf8

Related

openai.error.InvalidRequestError: Engine not found

Tried accessing the OpenAPI example - Explain code
But it shows error as -
InvalidRequestError: Engine not found
enter code response = openai.Completion.create(
engine="code-davinci-002",
prompt="class Log:\n def __init__(self, path):\n dirname = os.path.dirname(path)\n os.makedirs(dirname, exist_ok=True)\n f = open(path, \"a+\")\n\n # Check that the file is newline-terminated\n size = os.path.getsize(path)\n if size > 0:\n f.seek(size - 1)\n end = f.read(1)\n if end != \"\\n\":\n f.write(\"\\n\")\n self.f = f\n self.path = path\n\n def log(self, event):\n event[\"_event_id\"] = str(uuid.uuid4())\n json.dump(event, self.f)\n self.f.write(\"\\n\")\n\n def state(self):\n state = {\"complete\": set(), \"last\": None}\n for line in open(self.path):\n event = json.loads(line)\n if event[\"type\"] == \"submit\" and event[\"success\"]:\n state[\"complete\"].add(event[\"id\"])\n state[\"last\"] = event\n return state\n\n\"\"\"\nHere's what the above class is doing:\n1.",
temperature=0,
max_tokens=64,
top_p=1.0,
frequency_penalty=0.0,
presence_penalty=0.0,
stop=["\"\"\""]
)
I've been trying to access the engine named code-davinci-002 which is a private beta version engine. So without access it's not possible to access the engine. It seems only the GPT-3 models are of public usage. We need to need to join the OpenAI Codex Private Beta Waitlist in order to access Codex models through API.
Please note that your code is not very readable.
However, from the given error, I think it has to do with the missing colon : in the engine name.
Change this line from:
engine="code-davinci-002",
to
engine="code-davinci:002",
If you are using a finetuned model instead of an engine, you'd want to use model= instead of engine=.
response = openai.Completion.create(
model="<finetuned model>",
prompt=

Mutagen 1.22 Encoding Issue

I am having an issue with character encoding with Mutagen.
I casted the dict[key] to Unicode, bu all I receive are errors. The character in question is U+00E9 or é, but what I prints is ├⌐. I am assuming the default character set for Mutagen is UTF-8, but is there a way to fix this?
Output:
Winter Wonderland.mp3
Album : Christmas
Album Artist: Michael Bublé
Artist : Michael Bublé
Composer : None
Disk : None
Encoded By : None
Genre : Christmas
Title : Winter Wonderland
Track : 17/19
Year : 2011
Code:
#!/usr/bin/env python
import os
import re
from mutagen.mp3 import MP3
first_cap_re = re.compile('(.)([A-Z][a-z]+)')
all_cap_re = re.compile('([a-z0-9])([A-Z])')
def convertCamelCase2Underscore(name):
s1 = first_cap_re.sub(r'\1_\2', name)
return all_cap_re.sub(r'\1_\2', s1).lower()
def convertCamelCase2CapitalizedWords(name):
return ' '.join([x.capitalize() for x in convertCamelCase2Underscore(name).split('_')])
def safeValue(dict, key):
return None if key not in dict else dict[key]
class Track:
def __init__(self, path):
audio = MP3(path)
self.title = safeValue(audio, 'TIT2')
self.artist = safeValue(audio, 'TPE1')
self.albumArtist = safeValue(audio, 'TPE2')
self.album = safeValue(audio, 'TALB')
self.genre = safeValue(audio, 'TCON')
self.year = safeValue(audio, 'TDRL')
self.encodedBy = safeValue(audio, 'TENC')
self.composer = safeValue(audio, 'TXXX:TCM')
self.track = safeValue(audio, 'TRCK')
self.disk = safeValue(audio, 'TXXX:TPA')
def __repr__(self):
ret = ''
fields = self.__dict__
for k, v in sorted(self.__dict__.iteritems()):
ret += '{:12s}: {:s}\n'.format(convertCamelCase2CapitalizedWords(k), v)
return ret
files = os.listdir('.')
for filename in files:
print filename
print Track(filename)
I am assuming the default character set for Mutagen is UTF-8
Mutagen returns Unicode strings, though wrapped in a TextFrame object. When you print that object it's an implicit str() conversion of the text property to bytes, and Mutagen (arbitrarily) chooses UTF-8 for that encoding.
Unfortunately the Windows console doesn't support UTF-8[1]. The encoding it uses varies but in your case you are getting the US DOS code page 437 where the byte sequence 0xC3 0xA9 represents ├⌐ and not é. You could try to print to the console in the encoding that it wants by explicitly encoding to it:
print unicode(audio['TIT2']).encode(sys.stdout.encoding) # 'cp437'
but this will still only allow you to print characters that are supported in that code page. 437 is OK for Michael Bublé, but not so good for 東京事変. There isn't a good way to get Unicode out to the Windows console.[2]
[1] There is code page 65001 which is supposed to be UTF-8, but there are bugs in the MS implementation which usually make it unusable.
[2] You can, if you must, call the Win32 API WriteConsoleW directly using ctypes, but then you have to take care to only do that when you are connected to a Windows console and not any other type of stream so you don't break everywhere else. It's usually not worth it; Windows users are assumed to be used to a console where non-ASCII characters just break all the time.

How can I retrieve a file object from the blobstore within a handler?

I've saved a document in the blobstore, and am trying to retrieve it within a handler (which handles a task). I've had a read of the appengine documentation regarding the how to use blobstore, but am struggling to get it to work for my case. I've tried the following within the handler but cannot seem to have the object returned as the saved file (e.g. .pdf or .txt)
class SendDocuments(webapp2.RequestHandler):
def post(self):
document_key = self.request.get("document_key")
document_key = Key(str(document_key))
the_document = DocumentsModel.all().filter("__key__ =", document_key).get()
file_data = blobstore.BlobInfo.get(str(the_document.blobstore_key)) # returns a blobinfo object
file_data.open() # returns a blobreader object
file_data.open().read() # returns a string
I've also tried
class ServeSavedDocument(blobstore_handlers.BlobstoreDownloadHandler):
def get(self, blob_key):
self.send_blob(blob_key, save_as=True)
return
class SendDocuments(webapp2.RequestHandler):
def post(self):
document_key = self.request.get("document_key")
document_key = Key(str(document_key))
the_document = DocumentsModel.all().filter("__key__ =", document_key).get()
grab_blob = ServeSavedDocument()
file_data = grab_blob.get(self, str(the_document.blobstore_key))
But the call to ServeSavedDocument fails with
'NoneType' object has no attribute 'headers'
I've had a look at the files api but the only example that's not saving the file simply seems to return the blob key i.e.
blob_key = files.blobstore.get_blob_key(file_name)
What is the best way to grab a saved file in the blobstore from within a handler?
EDIT 1:
I'm trying to retrieve the txt file or pdf file from the blobstore in a format / state that can be encoded as a file in a post request using the following code
from google.appengine.api import urlfetch
from poster.encode import multipart_encode
# assuming here that file_data is the file object
payload = {}
payload['user_id'] = '1234123412341234'
payload['test_file'] = MultipartParam('test_file', filename=file_data.filename,
filetype=file_data.type,
fileobj=file_data.file)
data,headers= multipart_encode(payload)
send_url = "http://127.0.0.0/"
t = urlfetch.fetch(url=send_url, payload="".join(data), method=urlfetch.POST, headers=headers)
Okay, after a lot of messing around I"ve found that this works! The key was to simply call open() on the blobinfo object.
class SendDocuments(webapp2.RequestHandler):
def post(self):
document_key = self.request.get("document_key")
document_key = Key(str(document_key))
the_document = DocumentsModel.all().filter("__key__ =", document_key).get()
file_data = blobstore.BlobInfo.get(str(the_document.blobstore_key))
payload = {}
payload['user_id'] = '1234123412341234'
payload['test_file'] = MultipartParam('the_file', filename="something",
filetype=file_data.content_type,
fileobj=file_data.open())
Have a look at the BlobReader class, I think that's what you are looking for:
https://developers.google.com/appengine/docs/python/blobstore/blobreaderclass
It allows file-like read access to blobstore data:
# blob_key = ..
# Instantiate a BlobReader for a given Blobstore value.
blob_reader = blobstore.BlobReader(blob_key)
# Read the entire value into memory. This may take a while depending
# on the size of the value and the size of the read buffer, and is not
# recommended for large values.
value = blob_reader.read()
Try as this,specify the file name and it's type in the send_blob() as :
def mime_type(filename):
return guess_type(filename)[0]
class ServeSavedDocument(blobstore_handlers.BlobstoreDownloadHandler):
def get(self):
blob_key = self.request.get("blob_key")
if blob_key:
blob_info = blobstore.get(blob_key)
if blob_info:
save_as1 = blob_info.filename
type1=mime_type(blob_info.filename)
self.send_blob(blob_info,content_type=type1,save_as=save_as1)

Fetching language detection from Google api

I have a CSV with keywords in one column and the number of impressions in a second column.
I'd like to provide the keywords in a url (while looping) and for the Google language api to return what type of language was the keyword in.
I have it working manually. If I enter (with the correct api key):
http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&key=myapikey&q=merde
I get:
{"responseData": {"language":"fr","isReliable":false,"confidence":6.213709E-4}, "responseDetails": null, "responseStatus": 200}
which is correct, 'merde' is French.
so far I have this code but I keep getting server unreachable errors:
import time
import csv
from operator import itemgetter
import sys
import fileinput
import urllib2
import json
E_OPERATION_ERROR = 1
E_INVALID_PARAMS = 2
#not working
def parse_result(result):
"""Parse a JSONP result string and return a list of terms"""
# Deserialize JSON to Python objects
result_object = json.loads(result)
#Get the rows in the table, then get the second column's value
# for each row
return row in result_object
#not working
def retrieve_terms(seedterm):
print(seedterm)
"""Retrieves and parses data and returns a list of terms"""
url_template = 'http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&key=myapikey&q=%(seed)s'
url = url_template % {"seed": seedterm}
try:
with urllib2.urlopen(url) as data:
data = perform_request(seedterm)
result = data.read()
except:
sys.stderr.write('%s\n' % 'Could not request data from server')
exit(E_OPERATION_ERROR)
#terms = parse_result(result)
#print terms
print result
def main(argv):
filename = argv[1]
csvfile = open(filename, 'r')
csvreader = csv.DictReader(csvfile)
rows = []
for row in csvreader:
rows.append(row)
sortedrows = sorted(rows, key=itemgetter('impressions'), reverse = True)
keys = sortedrows[0].keys()
for item in sortedrows:
retrieve_terms(item['keywords'])
try:
outputfile = open('Output_%s.csv' % (filename),'w')
except IOError:
print("The file is active in another program - close it first!")
sys.exit()
dict_writer = csv.DictWriter(outputfile, keys, lineterminator='\n')
dict_writer.writer.writerow(keys)
dict_writer.writerows(sortedrows)
outputfile.close()
print("File is Done!! Check your folder")
if __name__ == '__main__':
start_time = time.clock()
main(sys.argv)
print("\n")
print time.clock() - start_time, "seconds for script time"
Any idea how to finish the code so that it will work? Thank you!
Try to add referrer, userip as described in the docs:
An area to pay special attention to
relates to correctly identifying
yourself in your requests.
Applications MUST always include a
valid and accurate http referer header
in their requests. In addition, we
ask, but do not require, that each
request contains a valid API Key. By
providing a key, your application
provides us with a secondary
identification mechanism that is
useful should we need to contact you
in order to correct any problems. Read
more about the usefulness of having an
API key
Developers are also encouraged to make
use of the userip parameter (see
below) to supply the IP address of the
end-user on whose behalf you are
making the API request. Doing so will
help distinguish this legitimate
server-side traffic from traffic which
doesn't come from an end-user.
Here's an example based on the answer to the question "access to google with python":
#!/usr/bin/python
# -*- coding: utf-8 -*-
import json
import urllib, urllib2
from pprint import pprint
api_key, userip = None, None
query = {'q' : 'матрёшка'}
referrer = "https://stackoverflow.com/q/4309599/4279"
if userip:
query.update(userip=userip)
if api_key:
query.update(key=api_key)
url = 'http://ajax.googleapis.com/ajax/services/language/detect?v=1.0&%s' %(
urllib.urlencode(query))
request = urllib2.Request(url, headers=dict(Referer=referrer))
json_data = json.load(urllib2.urlopen(request))
pprint(json_data['responseData'])
Output
{u'confidence': 0.070496580000000003, u'isReliable': False, u'language': u'ru'}
Another issue might be that seedterm is not properly quoted:
if isinstance(seedterm, unicode):
value = seedterm
else: # bytes
value = seedterm.decode(put_encoding_here)
url = 'http://...q=%s' % urllib.quote_plus(value.encode('utf-8'))

Reading the target of a .lnk file in Python?

I'm trying to read the target file/directory of a shortcut (.lnk) file from Python. Is there a headache-free way to do it? The spec is way over my head.
I don't mind using Windows-only APIs.
My ultimate goal is to find the "(My) Videos" folder on Windows XP and Vista. On XP, by default, it's at %HOMEPATH%\My Documents\My Videos, and on Vista it's %HOMEPATH%\Videos. However, the user can relocate this folder. In the case, the %HOMEPATH%\Videos folder ceases to exists and is replaced by %HOMEPATH%\Videos.lnk which points to the new "My Videos" folder. And I want its absolute location.
Create a shortcut using Python (via WSH)
import sys
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
shortcut = shell.CreateShortCut("t:\\test.lnk")
shortcut.Targetpath = "t:\\ftemp"
shortcut.save()
Read the Target of a Shortcut using Python (via WSH)
import sys
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
shortcut = shell.CreateShortCut("t:\\test.lnk")
print(shortcut.Targetpath)
I know this is an older thread but I feel that there isn't much information on the method that uses the link specification as noted in the original question.
My shortcut target implementation could not use the win32com module and after a lot of searching, decided to come up with my own. Nothing else seemed to accomplish what I needed under my restrictions. Hopefully this will help other folks in this same situation.
It uses the binary structure Microsoft has provided for MS-SHLLINK.
import struct
path = 'myfile.txt.lnk'
target = ''
with open(path, 'rb') as stream:
content = stream.read()
# skip first 20 bytes (HeaderSize and LinkCLSID)
# read the LinkFlags structure (4 bytes)
lflags = struct.unpack('I', content[0x14:0x18])[0]
position = 0x18
# if the HasLinkTargetIDList bit is set then skip the stored IDList
# structure and header
if (lflags & 0x01) == 1:
position = struct.unpack('H', content[0x4C:0x4E])[0] + 0x4E
last_pos = position
position += 0x04
# get how long the file information is (LinkInfoSize)
length = struct.unpack('I', content[last_pos:position])[0]
# skip 12 bytes (LinkInfoHeaderSize, LinkInfoFlags, and VolumeIDOffset)
position += 0x0C
# go to the LocalBasePath position
lbpos = struct.unpack('I', content[position:position+0x04])[0]
position = last_pos + lbpos
# read the string at the given position of the determined length
size= (length + last_pos) - position - 0x02
temp = struct.unpack('c' * size, content[position:position+size])
target = ''.join([chr(ord(a)) for a in temp])
Alternatively, you could try using SHGetFolderPath(). The following code might work, but I'm not on a Windows machine right now so I can't test it.
import ctypes
shell32 = ctypes.windll.shell32
# allocate MAX_PATH bytes in buffer
video_folder_path = ctypes.create_string_buffer(260)
# 0xE is CSIDL_MYVIDEO
# 0 is SHGFP_TYPE_CURRENT
# If you want a Unicode path, use SHGetFolderPathW instead
if shell32.SHGetFolderPathA(None, 0xE, None, 0, video_folder_path) >= 0:
# success, video_folder_path now contains the correct path
else:
# error
Basically call the Windows API directly. Here is a good example found after Googling:
import os, sys
import pythoncom
from win32com.shell import shell, shellcon
shortcut = pythoncom.CoCreateInstance (
shell.CLSID_ShellLink,
None,
pythoncom.CLSCTX_INPROC_SERVER,
shell.IID_IShellLink
)
desktop_path = shell.SHGetFolderPath (0, shellcon.CSIDL_DESKTOP, 0, 0)
shortcut_path = os.path.join (desktop_path, "python.lnk")
persist_file = shortcut.QueryInterface (pythoncom.IID_IPersistFile)
persist_file.Load (shortcut_path)
shortcut.SetDescription ("Updated Python %s" % sys.version)
mydocs_path = shell.SHGetFolderPath (0, shellcon.CSIDL_PERSONAL, 0, 0)
shortcut.SetWorkingDirectory (mydocs_path)
persist_file.Save (shortcut_path, 0)
This is from http://timgolden.me.uk/python/win32_how_do_i/create-a-shortcut.html.
You can search for "python ishelllink" for other examples.
Also, the API reference helps too: http://msdn.microsoft.com/en-us/library/bb774950(VS.85).aspx
I also realize this question is old, but I found the answers to be most relevant to my situation.
Like Jared's answer, I also could not use the win32com module. So Jared's use of the binary structure from MS-SHLLINK got me part of the way there. I needed read shortcuts on both Windows and Linux, where the shortcuts are created on a samba share by Windows. Jared's implementation didn't quite work for me, I think only because I encountered some different variations on the shortcut format. But, it gave me the start I needed (thanks Jared).
So, here is a class named MSShortcut which expands on Jared's approach. However, the implementation is only Python3.4 and above, due to using some pathlib features added in Python3.4
#!/usr/bin/python3
# Link Format from MS: https://msdn.microsoft.com/en-us/library/dd871305.aspx
# Need to be able to read shortcut target from .lnk file on linux or windows.
# Original inspiration from: http://stackoverflow.com/questions/397125/reading-the-target-of-a-lnk-file-in-python
from pathlib import Path, PureWindowsPath
import struct, sys, warnings
if sys.hexversion < 0x03040000:
warnings.warn("'{}' module requires python3.4 version or above".format(__file__), ImportWarning)
# doc says class id =
# 00021401-0000-0000-C000-000000000046
# requiredCLSID = b'\x00\x02\x14\x01\x00\x00\x00\x00\xC0\x00\x00\x00\x00\x00\x00\x46'
# Actually Getting:
requiredCLSID = b'\x01\x14\x02\x00\x00\x00\x00\x00\xC0\x00\x00\x00\x00\x00\x00\x46' # puzzling
class ShortCutError(RuntimeError):
pass
class MSShortcut():
"""
interface to Microsoft Shortcut Objects. Purpose:
- I need to be able to get the target from a samba shared on a linux machine
- Also need to get access from a Windows machine.
- Need to support several forms of the shortcut, as they seem be created differently depending on the
creating machine.
- Included some 'flag' types in external interface to help test differences in shortcut types
Args:
scPath (str): path to shortcut
Limitations:
- There are some omitted object properties in the implementation.
Only implemented / tested enough to recover the shortcut target information. Recognized omissions:
- LinkTargetIDList
- VolumeId structure (if captured later, should be a separate class object to hold info)
- Only captured environment block from extra data
- I don't know how or when some of the shortcut information is used, only captured what I recognized,
so there may be bugs related to use of the information
- NO shortcut update support (though might be nice)
- Implementation requires python 3.4 or greater
- Tested only with Unicode data on a 64bit little endian machine, did not consider potential endian issues
Not Debugged:
- localBasePath - didn't check if parsed correctly or not.
- commonPathSuffix
- commonNetworkRelativeLink
"""
def __init__(self, scPath):
"""
Parse and keep shortcut properties on creation
"""
self.scPath = Path(scPath)
self._clsid = None
self._lnkFlags = None
self._lnkInfoFlags = None
self._localBasePath = None
self._commonPathSuffix = None
self._commonNetworkRelativeLink = None
self._name = None
self._relativePath = None
self._workingDir = None
self._commandLineArgs = None
self._iconLocation = None
self._envTarget = None
self._ParseLnkFile(self.scPath)
#property
def clsid(self):
return self._clsid
#property
def lnkFlags(self):
return self._lnkFlags
#property
def lnkInfoFlags(self):
return self._lnkInfoFlags
#property
def localBasePath(self):
return self._localBasePath
#property
def commonPathSuffix(self):
return self._commonPathSuffix
#property
def commonNetworkRelativeLink(self):
return self._commonNetworkRelativeLink
#property
def name(self):
return self._name
#property
def relativePath(self):
return self._relativePath
#property
def workingDir(self):
return self._workingDir
#property
def commandLineArgs(self):
return self._commandLineArgs
#property
def iconLocation(self):
return self._iconLocation
#property
def envTarget(self):
return self._envTarget
#property
def targetPath(self):
"""
Args:
woAnchor (bool): remove the anchor (\\server\path or drive:) from returned path.
Returns:
a libpath PureWindowsPath object for combined workingDir/relative path
or the envTarget
Raises:
ShortCutError when no target path found in Shortcut
"""
target = None
if self.workingDir:
target = PureWindowsPath(self.workingDir)
if self.relativePath:
target = target / PureWindowsPath(self.relativePath)
else: target = None
if not target and self.envTarget:
target = PureWindowsPath(self.envTarget)
if not target:
raise ShortCutError("Unable to retrieve target path from MS Shortcut: shortcut = {}"
.format(str(self.scPath)))
return target
#property
def targetPathWOAnchor(self):
tp = self.targetPath
return tp.relative_to(tp.anchor)
def _ParseLnkFile(self, lnkPath):
with lnkPath.open('rb') as f:
content = f.read()
# verify size (4 bytes)
hdrSize = struct.unpack('I', content[0x00:0x04])[0]
if hdrSize != 0x4C:
raise ShortCutError("MS Shortcut HeaderSize = {}, but required to be = {}: shortcut = {}"
.format(hdrSize, 0x4C, str(lnkPath)))
# verify LinkCLSID id (16 bytes)
self._clsid = bytes(struct.unpack('B'*16, content[0x04:0x14]))
if self._clsid != requiredCLSID:
raise ShortCutError("MS Shortcut ClassID = {}, but required to be = {}: shortcut = {}"
.format(self._clsid, requiredCLSID, str(lnkPath)))
# read the LinkFlags structure (4 bytes)
self._lnkFlags = struct.unpack('I', content[0x14:0x18])[0]
#logger.debug("lnkFlags=0x%0.8x" % self._lnkFlags)
position = 0x4C
# if HasLinkTargetIDList bit, then position to skip the stored IDList structure and header
if (self._lnkFlags & 0x01):
idListSize = struct.unpack('H', content[position:position+0x02])[0]
position += idListSize + 2
# if HasLinkInfo, then process the linkinfo structure
if (self._lnkFlags & 0x02):
(linkInfoSize, linkInfoHdrSize, self._linkInfoFlags,
volIdOffset, localBasePathOffset,
cmnNetRelativeLinkOffset, cmnPathSuffixOffset) = struct.unpack('IIIIIII', content[position:position+28])
# check for optional offsets
localBasePathOffsetUnicode = None
cmnPathSuffixOffsetUnicode = None
if linkInfoHdrSize >= 0x24:
(localBasePathOffsetUnicode, cmnPathSuffixOffsetUnicode) = struct.unpack('II', content[position+28:position+36])
#logger.debug("0x%0.8X" % linkInfoSize)
#logger.debug("0x%0.8X" % linkInfoHdrSize)
#logger.debug("0x%0.8X" % self._linkInfoFlags)
#logger.debug("0x%0.8X" % volIdOffset)
#logger.debug("0x%0.8X" % localBasePathOffset)
#logger.debug("0x%0.8X" % cmnNetRelativeLinkOffset)
#logger.debug("0x%0.8X" % cmnPathSuffixOffset)
#logger.debug("0x%0.8X" % localBasePathOffsetUnicode)
#logger.debug("0x%0.8X" % cmnPathSuffixOffsetUnicode)
# if info has a localBasePath
if (self._linkInfoFlags & 0x01):
bpPosition = position + localBasePathOffset
# not debugged - don't know if this works or not
self._localBasePath = UnpackZ('z', content[bpPosition:])[0].decode('ascii')
#logger.debug("localBasePath: {}".format(self._localBasePath))
if localBasePathOffsetUnicode:
bpPosition = position + localBasePathOffsetUnicode
self._localBasePath = UnpackUnicodeZ('z', content[bpPosition:])[0]
self._localBasePath = self._localBasePath.decode('utf-16')
#logger.debug("localBasePathUnicode: {}".format(self._localBasePath))
# get common Path Suffix
cmnSuffixPosition = position + cmnPathSuffixOffset
self._commonPathSuffix = UnpackZ('z', content[cmnSuffixPosition:])[0].decode('ascii')
#logger.debug("commonPathSuffix: {}".format(self._commonPathSuffix))
if cmnPathSuffixOffsetUnicode:
cmnSuffixPosition = position + cmnPathSuffixOffsetUnicode
self._commonPathSuffix = UnpackUnicodeZ('z', content[cmnSuffixPosition:])[0]
self._commonPathSuffix = self._commonPathSuffix.decode('utf-16')
#logger.debug("commonPathSuffix: {}".format(self._commonPathSuffix))
# check for CommonNetworkRelativeLink
if (self._linkInfoFlags & 0x02):
relPosition = position + cmnNetRelativeLinkOffset
self._commonNetworkRelativeLink = CommonNetworkRelativeLink(content, relPosition)
position += linkInfoSize
# If HasName
if (self._lnkFlags & 0x04):
(position, self._name) = self.readStringObj(content, position)
#logger.debug("name: {}".format(self._name))
# get relative path string
if (self._lnkFlags & 0x08):
(position, self._relativePath) = self.readStringObj(content, position)
#logger.debug("relPath='{}'".format(self._relativePath))
# get working dir string
if (self._lnkFlags & 0x10):
(position, self._workingDir) = self.readStringObj(content, position)
#logger.debug("workingDir='{}'".format(self._workingDir))
# get command line arguments
if (self._lnkFlags & 0x20):
(position, self._commandLineArgs) = self.readStringObj(content, position)
#logger.debug("commandLineArgs='{}'".format(self._commandLineArgs))
# get icon location
if (self._lnkFlags & 0x40):
(position, self._iconLocation) = self.readStringObj(content, position)
#logger.debug("iconLocation='{}'".format(self._iconLocation))
# look for environment properties
if (self._lnkFlags & 0x200):
while True:
size = struct.unpack('I', content[position:position+4])[0]
#logger.debug("blksize=%d" % size)
if size==0: break
signature = struct.unpack('I', content[position+4:position+8])[0]
#logger.debug("signature=0x%0.8x" % signature)
# EnvironmentVariableDataBlock
if signature == 0xA0000001:
if (self._lnkFlags & 0x80): # unicode
self._envTarget = UnpackUnicodeZ('z', content[position+268:])[0]
self._envTarget = self._envTarget.decode('utf-16')
else:
self._envTarget = UnpackZ('z', content[position+8:])[0].decode('ascii')
#logger.debug("envTarget='{}'".format(self._envTarget))
position += size
def readStringObj(self, scContent, position):
"""
returns:
tuple: (newPosition, string)
"""
strg = ''
size = struct.unpack('H', scContent[position:position+2])[0]
#logger.debug("workingDirSize={}".format(size))
if (self._lnkFlags & 0x80): # unicode
size *= 2
strg = struct.unpack(str(size)+'s', scContent[position+2:position+2+size])[0]
strg = strg.decode('utf-16')
else:
strg = struct.unpack(str(size)+'s', scContent[position+2:position+2+size])[0].decode('ascii')
#logger.debug("strg='{}'".format(strg))
position += size + 2 # 2 bytes to account for CountCharacters field
return (position, strg)
class CommonNetworkRelativeLink():
def __init__(self, scContent, linkContentPos):
self._networkProviderType = None
self._deviceName = None
self._netName = None
(linkSize, flags, netNameOffset,
devNameOffset, self._networkProviderType) = struct.unpack('IIIII', scContent[linkContentPos:linkContentPos+20])
#logger.debug("netnameOffset = {}".format(netNameOffset))
if netNameOffset > 0x014:
(netNameOffsetUnicode, devNameOffsetUnicode) = struct.unpack('II', scContent[linkContentPos+20:linkContentPos+28])
#logger.debug("netnameOffsetUnicode = {}".format(netNameOffsetUnicode))
self._netName = UnpackUnicodeZ('z', scContent[linkContentPos+netNameOffsetUnicode:])[0]
self._netName = self._netName.decode('utf-16')
self._deviceName = UnpackUnicodeZ('z', scContent[linkContentPos+devNameOffsetUnicode:])[0]
self._deviceName = self._deviceName.decode('utf-16')
else:
self._netName = UnpackZ('z', scContent[linkContentPos+netNameOffset:])[0].decode('ascii')
self._deviceName = UnpackZ('z', scContent[linkContentPos+devNameOffset:])[0].decode('ascii')
#property
def deviceName(self):
return self._deviceName
#property
def netName(self):
return self._netName
#property
def networkProviderType(self):
return self._networkProviderType
def UnpackZ (fmt, buf) :
"""
Unpack Null Terminated String
"""
#logger.debug(bytes(buf))
while True :
pos = fmt.find ('z')
if pos < 0 :
break
z_start = struct.calcsize (fmt[:pos])
z_len = buf[z_start:].find(b'\0')
#logger.debug(z_len)
fmt = '%s%dsx%s' % (fmt[:pos], z_len, fmt[pos+1:])
#logger.debug("fmt='{}', len={}".format(fmt, z_len))
fmtlen = struct.calcsize(fmt)
return struct.unpack (fmt, buf[0:fmtlen])
def UnpackUnicodeZ (fmt, buf) :
"""
Unpack Null Terminated String
"""
#logger.debug(bytes(buf))
while True :
pos = fmt.find ('z')
if pos < 0 :
break
z_start = struct.calcsize (fmt[:pos])
# look for null bytes by pairs
z_len = 0
for i in range(z_start,len(buf),2):
if buf[i:i+2] == b'\0\0':
z_len = i-z_start
break
fmt = '%s%dsxx%s' % (fmt[:pos], z_len, fmt[pos+1:])
# logger.debug("fmt='{}', len={}".format(fmt, z_len))
fmtlen = struct.calcsize(fmt)
return struct.unpack (fmt, buf[0:fmtlen])
I hope this helps others as well.
Thanks
I didn't really like any of the answers available because I didn't want to keep importing more and more libraries and the 'shell' option was spotty on my test machines. I opted for reading the ".lnk" in and then using a regular expression to read out the path. For my purposes, I am looking for pdf files that were recently opened and then reading the content of those files:
# Example file path to the shortcut
shortcut = "shortcutFileName.lnk"
# Open the lnk file using the ISO-8859-1 encoder to avoid errors for special characters
lnkFile = open(shortcut, 'r', encoding = "ISO-8859-1")
# Use a regular expression to parse out the pdf file on C:\
filePath = re.findall("C:.*?pdf", lnkFile.read(), flags=re.DOTALL)
# Close File
lnkFile.close()
# Read the pdf at the lnk Target
pdfFile = open(tmpFilePath[0], 'rb')
Comments:
Obviously this works for pdf but needs to specify other file extensions accordingly.
It's easy as opening ".exe" file. Here also, we are going to use the os module for this. You just have to create a shortcut .lnk and store it in any folder of your choice. Then, in any Python file, first import the os module (already installed, just import). Then, use a variable, say path, and assign it a string value containing the location of your .lnk file. Just create a shortcut of your desired application. At last, we will use os.startfile()
to open our shortcut.
Points to remember:
The location should be within double inverted commas.
Most important, open Properties. Then, under that, open "Details". There, you can get the exact name of your shortcut. Please write that name with ".lnk" at last.
Now, you have completed the procedure. I hope it helps you. For additional assistance, I am leaving my code for this at the bottom.
import os
path = "C:\\Users\\hello\\OneDrive\\Desktop\\Shortcuts\\OneNote for Windows 10.lnk"
os.startfile(path)
In my code, I used path as variable and I had created a shortcut for OneNote. In path, I defined the location of OneNote's shortcut. So when I use os.startfile(path), the os module is going to open my shortcut file defined in variable path.
this job is possible without any modules, doing this will return a b string having the destination of the shortcut file. Basically what you do is you open the file in read binary mode (rb mode). This is the code to accomplish this task:
with open('programs.lnk - Copy','rb') as f:
destination=f.read()
i am currently using python 3.9.2, in case you face problems with this, just tell me and i will try to fix it.
A more stable solution in python, using powershell to read the target path from the .lnk file.
using only standard libraries avoids introducing extra dependencies such as win32com
this approach works with the .lnks that failed with jared's answer, more details
we avoid directly reading the file, which felt hacky, and sometimes failed
import subprocess
def get_target(link_path) -> (str, str):
"""
Get the target & args of a Windows shortcut (.lnk)
:param link_path: The Path or string-path to the shortcut, e.g. "C:\\Users\\Public\\Desktop\\My Shortcut.lnk"
:return: A tuple of the target and arguments, e.g. ("C:\\Program Files\\My Program.exe", "--my-arg")
"""
# get_target implementation by hannes, https://gist.github.com/Winand/997ed38269e899eb561991a0c663fa49
ps_command = \
"$WSShell = New-Object -ComObject Wscript.Shell;" \
"$Shortcut = $WSShell.CreateShortcut(\"" + str(link_path) + "\"); " \
"Write-Host $Shortcut.TargetPath ';' $shortcut.Arguments "
output = subprocess.run(["powershell.exe", ps_command], capture_output=True)
raw = output.stdout.decode('utf-8')
launch_path, args = [x.strip() for x in raw.split(';', 1)]
return launch_path, args
# to test
shortcut_file = r"C:\Users\REPLACE_WITH_USERNAME\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Accessibility\Narrator.lnk"
a, args = get_target(shortcut_file)
print(a) # C:\WINDOWS\system32\narrator.exe
(you can remove -> typehinting to get it to work in older python versions)
I did notice this is slow when running on lots of shortcuts. You could use jareds method, check if the result is None, and if so, run this code to get the target path.
The nice approach with direct regex-based parsing (proposed in the answer) didn't work reliable for all shortcuts in my case. Some of them have only relative path like ..\\..\\..\\..\\..\\..\\Program Files\\ImageGlass\\ImageGlass.exe (produced by msi-installer), and it is stored with wide chars, which are tricky to handle in Python.
So I've discovered a Python module LnkParse3, which is easy to use and meets my needs.
Here is a sample script to show target of a lnk-file passed as first argument:
import LnkParse3
import sys
with open(sys.argv[1], 'rb') as indata:
lnk = LnkParse3.lnk_file(indata)
print(lnk.lnk_command)
I arrived at this thread looking for a way to parse a ".lnk" file and get the target file name.
I found another very simple solution:
pip install comtypes
Then
from comtypes.client import CreateObject
from comtypes.persist import IPersistFile
from comtypes.shelllink import ShellLink
# MAKE SURE THIS VAT CONTAINS A STRING AND NOT AN OBJECT OF 'PATH'
# I spent too much time figuring out the problem with .load(..) function ahead
pathStr="c:\folder\yourlink.lnk"
s = CreateObject(ShellLink)
p = s.QueryInterface(IPersistFile)
p.Load(pathStr, False)
print(s.GetPath())
print(s.GetArguments())
print(s.GetWorkingDirectory())
print(s.GetIconLocation())
try:
# the GetDescription create exception in some of the links
print(s.GetDescription())
except Exception as e:
print(e)
print(s.Hotkey)
print(s.ShowCmd)
Based on this great answer...
https://stackoverflow.com/a/43856809/2992810

Categories

Resources