How to programmatically calculate Chrome extension ID? - python

I'm building an automated process to produce extensions. Is there a code example of calculating the extension-ID directly and entirely bypassing interaction with the browser?
(I'm answering my own question, below.)

I was only able to find a related article with a Ruby fragment, and it's only available in the IA: http://web.archive.org/web/20120606044635/http://supercollider.dk/2010/01/calculating-chrome-extension-id-from-your-private-key-233
Important to know:
This depends on a DER-encoded public key (raw binary), not a PEM-encoded key (nice ASCII generated by base64-encoding the DER key).
The extension-IDs are base-16, but are encoded using [a-p] (called "mpdecimal"), rather than [0-9a-f].
Using a PEM-encoded public key, follow the following steps:
If your PEM-formatted public-key still has the header and footer and is split into multiple lines, reformat it by hand so that you have a single string of characters that excludes the header and footer, and runs together such that every line of the key wraps to the next.
Base64-decode the public key to render a DER-formatted public-key.
Generate a SHA256 hex-digest of the DER-formatted key.
Take the first 32-bytes of the hash. You will not need the rest.
For each character, convert it to base-10, and add the ASCII code for 'a'.
The following is a Python routine to do this:
import hashlib
from base64 import b64decode
def build_id(pub_key_pem):
pub_key_der = b64decode(pub_key_pem)
sha = hashlib.sha256(pub_key_der).hexdigest()
prefix = sha[:32]
reencoded = ""
ord_a = ord('a')
for old_char in prefix:
code = int(old_char, 16)
new_char = chr(ord_a + code)
reencoded += new_char
return reencoded
def main():
pub_key = 'MIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQCjvF5pjuK8gRaw/2LoRYi37QqRd48B/FeO9yFtT6ueY84z/u0NrJ/xbPFc9OCGBi8RKIblVvcbY0ySGqdmp0QsUr/oXN0b06GL4iB8rMhlO082HhMzrClV8OKRJ+eJNhNBl8viwmtJs3MN0x9ljA4HQLaAPBA9a14IUKLjP0pWuwIDAQAB'
id_ = build_id(pub_key)
print(id_)
if __name__ == '__main__':
main()
You're more than welcome to test this against an existing extension and its ID. To retrieve its PEM-formatted public-key:
Go into the list of your existing extensions in Chrome. Grab the extension-ID of one.
Find the directory where the extension is hosted. On my Windows 7 box, it is: C:\Users<username>\AppData\Local\Google\Chrome\User Data\Default\Extensions<extension ID>
Grab the public-key from the manifest.json file under "key". Since the key is already ready to be base64-decoded, you can skip step (1) of the process.
The public-key in the example is from the "Chrome Reader" extension. Its extension ID is "lojpenhmoajbiciapkjkiekmobleogjc".
See also:
Google Chrome - Alphanumeric hashes to identify extensions
http://blog.roomanna.com/12-14-2010/getting-an-extensions-id

Starting with Chrome 64, Chrome changed the package format for extensions to the CRX₃ file format, which supports multiple signatures and explicitly declares its CRX ID. Extracting the CRX ID from a CRX₃ file requires parsing a protocol buffer.
Here is a small python script for extracting the ID from a CRX₃ file.
This solution should only be used with trusted CRX₃ files or in contexts where security is not a concern: unlike CRX₂, the package format does not restrict what CRX ID a CRX₃ file declares. (In practice, consumers of the file (i.e. Chrome) will place restrictions upon it, such as requiring the file to be signed with at least one key that hashes to the declared CRX ID).
import binascii
import string
import struct
import sys
def decode(proto, data):
index = 0
length = len(data)
msg = dict()
while index < length:
item = 128
key = 0
left = 0
while item & 128:
item = data[index]
index += 1
value = (item & 127) << left
key += value
left += 7
field = key >> 3
wire = key & 7
if wire == 0:
item = 128
num = 0
left = 0
while item & 128:
item = data[index]
index += 1
value = (item & 127) << left
num += value
left += 7
continue
elif wire == 1:
index += 8
continue
elif wire == 2:
item = 128
_length = 0
left = 0
while item & 128:
item = data[index]
index += 1
value = (item & 127) << left
_length += value
left += 7
last = index
index += _length
item = data[last:index]
if field not in proto:
continue
msg[proto[field]] = item
continue
elif wire == 5:
index += 4
continue
raise ValueError(
'invalid wire type: {wire}'.format(wire=wire)
)
return msg
def get_extension_id(crx_file):
with open(crx_file, 'rb') as f:
f.read(8); # 'Cr24\3\0\0\0'
data = f.read(struct.unpack('<I', f.read(4))[0])
crx3 = decode(
{10000: "signed_header_data"},
[ord(d) for d in data])
signed_header = decode(
{1: "crx_id"},
crx3['signed_header_data'])
return string.translate(
binascii.hexlify(bytearray(signed_header['crx_id'])),
string.maketrans('0123456789abcdef', string.ascii_lowercase[:16]))
def main():
if len(sys.argv) != 2:
print 'usage: %s crx_file' % sys.argv[0]
else:
print get_extension_id(sys.argv[1])
if __name__ == "__main__":
main()
(Thanks to https://github.com/thelinuxkid/python-protolite for the protobuf parser skeleton.)

A nice and simple way to get the public key from the .crx file using python, since chrome only generates the private .pem key for you. The public key is actually stored in the .crx file.
This is based on the format of the .crx file found here http://developer.chrome.com/extensions/crx.html
import struct
import hashlib
import string
def get_pub_key_from_crx(crx_file):
with open(crx_file, 'rb') as f:
data = f.read()
header = struct.unpack('<4sIII', data[:16])
pubkey = struct.unpack('<%ds' % header[2], data[16:16+header[2]])[0]
return pubkey
def get_extension_id(crx_file):
pubkey = get_pub_key_from_crx(crx_file)
digest = hashlib.sha256(pubkey).hexdigest()
trans = string.maketrans('0123456789abcdef', string.ascii_lowercase[:16])
return string.translate(digest[:32], trans)
if __name__ == '__main__':
import sys
if len(sys.argv) != 2:
print 'usage: %s crx_file' % sys.argv[0]
print get_extension_id(sys.argv[1])
Although this isn't possible to do "bypassing interaction with the browser", because you still need to generate the .crx file with a command like
chrome.exe --pack-extension=my_extension --pack-extension-key=my_extension.pem

Related

How to Read-in WPF extension in Python?

1I am trying to read-in some old text document file using python.
This file written in 1995 and has a file extension of ".WPF"
I had tried
f = open('/Users/zachary/Downloads/2R.WPF', mode = 'r')
print(f.read())
If I open it up through libreoffice, it well appears.
Any hint how to process text in .WPF using python?
linke address:
WTO Dispute Settlment DS2 Panel Report
Someone had marked it as the duplicated under the notion that the file is just wrongly named in WPF, however, it looks it's not a .doc file since the textract.process returns the error "it's not .doc"
As can be determined from the very first bytes, that file is a WordPerfect 5.x file (where x is 0, 1, or possibly 2), a file format dating back to around 1989.
According to its description, the Tika interface for Python should be able to convert this for you, but as far as word processor formats go, these older WordPerfect files are fairly easy to decode, without anything more than a plain Python installation.
The format consists of a large header (which, among other information, defines the printer that the document was formatted for, the list of fonts used, and some basic "style" information – I chose to skip it entirely in my program below), followed by plain text which is interspersed with binary codes which govern the formatting.
The binary codes appear in 3 distinct forms:
single-byte: 0x0A is a Return, 0xA9 is a breaking hyphen, 0xAA is a breaking hyphen when the line is broken at that position, and so on.
multi-byte, fixed length: the byte is followed by one or more specifications. For example, 0xC0 is a "special character code". It is followed by the character set index and the index of the actual character inside that set. The final byte of a fixed-length code is always the starting byte again.
multi-byte, variable length: the code determines a main category of formatting and is followed by a second to indicate a subcategory; after that, 2 bytes in little-endian indicate the length of the following data (excluding the first 4 bytes). This code always ends with the same items in reversed order: 2 bytes (little-endian) for the length, the subcategory, then the main category.
Codes between 0x00..0x1F and 0x7F..0xBF are single-byte control codes (not all are used). Codes between 0xC0..0xCF are fixed-length control codes, with various predefined lengths. Codes from 0xD0 onward are always variable-length codes.
With only this information, it's already possible to extract the plain text of this document, and just skip all possible formatting. Comparing the output codes against the PDF from the same site reveals the meaning of some of the codes, such as the various types of Return, the Tab, and plain text formatting such as bold and italics.
In addition, footnotes are stored in-line inside a variable-length code, so this needs some form of re-entrant parser.
The following Python 3 program is a basic framework which you can use as-is (it extracts the text, with a hint for the footnotes), or you can enable the commented-out lines at the bottom and find further information on parsing more of the formatting code.
# -*- coding: utf-8 -*-
import sys
WPType_None = 0
WPType_Text = 1
WPType_Byte = 2
WPType_Fixed = 3
WPType_Variable = 4
plain_remap = {10:'\n', 11:' ', 13:' ', 160:' ', 169:'-', 170:'-'}
WpCharacterSet = { 0x0121:'à', 0x0406:'§', 0x041c:u'’', 0x041d:u'‘', 0x041f:'”', 0x0420:'“', 0x0422:'—' }
textAttributes = [
"Extra Large",
"Very Large",
"Large",
"Small",
"Fine",
"Superscript",
"Subscript",
"Outline",
"Italic",
"Shadow",
"Redline",
"Double Underline",
"Bold",
"Strikeout",
"Underline",
"SmallCaps" ]
class WPElem:
def __init__(self, type=WPType_None, data = [], code=None):
self.type = type
self.code = code
if type == WPType_Text:
self.data = data
else:
self.data = data
class WordPerfect:
def __init__(self, filename):
with open(filename, "rb") as file:
self.data = bytearray(file.read())
sig = ''.join(chr(x) for x in self.data[1:4])
if self.data[0] != 255 or sig != 'WPC':
raise TypeError('Invalid file type')
self.data_start = self.data[4]+256*(self.data[5]+256*(self.data[6]+256*self.data[7]))
self.length = len(self.data)
self.elements = []
self.parse (self.data_start, self.length)
def parse (self, start,maxlength):
pos = start
while pos < maxlength:
byte = self.data[pos]
if byte in plain_remap:
byte = ord(plain_remap[byte])
if byte == 10 or byte >= 32 and byte <= 126:
if len(self.elements) == 0 or self.elements[-1].type != WPType_Text:
self.elements.append(WPElem(WPType_Text, ''))
self.elements[-1].data += chr(byte)
pos += 1
elif byte == 12:
self.elements.append(WPElem(WPType_Text, '\n\n'))
pos += 1
elif byte == 0x8c: # [HRt/Pg Break]
self.elements.append(WPElem(WPType_Text, '\n'))
pos += 1
elif byte == 0x8d: # [Ftn Num]
self.elements.append(WPElem(WPType_Text, '[Ftn Num]'))
pos += 1
elif byte == 0x99: # [HRt/Top of Pg]
self.elements.append(WPElem(WPType_Text, '\n'))
pos += 1
elif byte == 0xc0 and pos+3 < maxlength and self.data[pos+3] == 0xc0:
wpchar = self.data[pos+1]+256*self.data[pos+2]
if wpchar in WpCharacterSet:
self.elements.append(WPElem(WPType_Text, WpCharacterSet[wpchar]))
else:
self.elements.append(WPElem(WPType_Text, '{CHAR:%04X}' % wpchar))
pos += 4
elif byte == 0xc1 and self.data[pos+8] == 0xc1:
# self.elements.append(WPElem(WPType_Fixed, self.data[pos:pos+7]))
self.elements.append(WPElem(WPType_Text, '\t'))
pos += 9
elif byte == 0xc2 and self.data[pos+10] == 0xc2:
# self.elements.append(WPElem(WPType_Fixed, self.data[pos:pos+9]))
self.elements.append(WPElem(WPType_Text, '\t'))
pos += 11
elif byte == 0xc3:
self.elements.append(WPElem(WPType_Fixed, self.data[pos:pos+1], '%s On' % textAttributes[self.data[pos+1]]))
pos += 3
elif byte == 0xc4:
self.elements.append(WPElem(WPType_Fixed, self.data[pos:pos+1], '%s Off' % textAttributes[self.data[pos+1]]))
pos += 3
elif byte == 0xc6:
self.elements.append(WPElem(WPType_Fixed, self.data[pos:pos+5]))
pos += 6
elif byte == 0xd6 and self.data[pos+1] == 0: # Footnote
self.elements.append(WPElem(WPType_Text, '[Footnote:'))
length = self.data[pos+2]+256*self.data[pos+3]
self.parse (pos+0x13, pos+length)
pos += 4+length
self.elements.append(WPElem(WPType_Text, ']'))
else:
self.elements.append(WPElem(WPType_Byte, [byte]))
if byte >= 0xd0 and pos+4 <= maxlength:
length = self.data[pos+2]+256*self.data[pos+3]
if pos+4+length <= self.length:
if pos+4+length <= self.length and self.data[pos+4+length-1] == byte:
self.elements[-1].type = WPType_Variable
self.elements[-1].data += [x for x in self.data[pos+1:pos+length]]
pos += 4+length
else:
pos += 1
else:
pos += 1
else:
pos += 1
if len(sys.argv) != 2:
print("usage: read_wpf.py [suitably ancient WordPerfect file]")
sys.exit(1)
wpdata = WordPerfect (sys.argv[1])
for i in wpdata.elements:
if i.type == WPType_Text:
print (i.data, end='')
'''
elif i.code:
print ('[%s]' % i.code, end='')
elif i.type == WPType_Variable:
print ('[%02X:%d]' % (i.data[0],i.data[1]), end='')
else:
print ('[%02X]' % i.data[0], end='')
'''
Running it prints out the text to the console:
$ python3 read_wpf.py 2R.WPF
RESTRICTED
World Trade WT/DS2/R
29 January 1996
Organization
(96-0326)
(.. several thousands of lines omitted for brevity ..)
8.2 The Panel recommends that the Dispute Settlement Body request the
United States to bring this part of the Gasoline Rule into conformity
with its obligations under the General Agreement.
and you can either rewrite the program to store this into a plain text file, or redirect via your console into a file.
I've only added a translation for the handful of special characters that appear in the sample file. For a full-featured version, you'd need to look up a 90s data sheet somewhere, and provide Unicode translations for each of the thousands of characters.
Similarly, I've only 'parsed' some of the special formatting codes, and to a very limited extend. If you need to be able to extract particular formatting – say, tab settings, margins, font sizes, et cetera –, you must locate a full specification of the file format and add specific parsing code for these functions.

Python 3.6 script surprisingly slow on Windows 10 but not on Ubuntu 17.10

I recently had to write a challenge for a company that was to merge 3 CSV files into one based on the first attribute of each (the attributes were repeating in all files).
I wrote the code and sent it to them, but they said it took 2 minutes to run. That was funny because it ran for 10 seconds on my machine. My machine had the same processor, 16GB of RAM, and had an SSD as well. Very similar environments.
I tried optimising it and resubmitted it. This time they said they ran it on an Ubuntu machine and got 11 seconds, while the code ran for 100 seconds on the Windows 10 still.
Another peculiar thing was that when I tried profiling it with the Profile module, it went on forever, had to terminate after 450 seconds. I moved to cProfiler and it recorded it for 7 seconds.
EDIT: The exact formulation of the problem is
Write a console program to merge the files provided in a timely and
efficient manner. File paths should be supplied as arguments so that
the program can be evaluated on different data sets. The merged file
should be saved as CSV; use the id column as the unique key for
merging; the program should do any necessary data cleaning and error
checking.
Feel free to use any language you’re comfortable with – only
restriction is no external libraries as this defeats the purpose of
the test. If the language provides CSV parsing libraries (like
Python), please avoid using them as well as this is a part of the
test.
Without further ado here's the code:
#!/usr/bin/python3
import sys
from multiprocessing import Pool
HEADERS = ['id']
def csv_tuple_quotes_valid(a_tuple):
"""
checks if a quotes in each attribute of a entry (i.e. a tuple) agree with the csv format
returns True or False
"""
for attribute in a_tuple:
in_quotes = False
attr_len = len(attribute)
skip_next = False
for i in range(0, attr_len):
if not skip_next and attribute[i] == '\"':
if i < attr_len - 1 and attribute[i + 1] == '\"':
skip_next = True
continue
elif i == 0 or i == attr_len - 1:
in_quotes = not in_quotes
else:
return False
else:
skip_next = False
if in_quotes:
return False
return True
def check_and_parse_potential_tuple(to_parse):
"""
receives a string and returns an array of the attributes of the csv line
if the string was not a valid csv line, then returns False
"""
a_tuple = []
attribute_start_index = 0
to_parse_len = len(to_parse)
in_quotes = False
i = 0
#iterate through the string (line from the csv)
while i < to_parse_len:
current_char = to_parse[i]
#this works the following way: if we meet a quote ("), it must be in one
#of five cases: "" | ", | ," | "\0 | (start_of_string)"
#in case we are inside a quoted attribute (i.e. "123"), then commas are ignored
#the following code also extracts the tuples' attributes
if current_char == '\"':
if i == 0 or (to_parse[i - 1] == ',' and not in_quotes): # (start_of_string)" and ," case
#not including the quote in the next attr
attribute_start_index = i + 1
#starting a quoted attr
in_quotes = True
elif i + 1 < to_parse_len:
if to_parse[i + 1] == '\"': # "" case
i += 1 #skip the next " because it is part of a ""
elif to_parse[i + 1] == ',' and in_quotes: # ", case
a_tuple.append(to_parse[attribute_start_index:i].strip())
#not including the quote and comma in the next attr
attribute_start_index = i + 2
in_quotes = False #the quoted attr has ended
#skip the next comma - we know what it is for
i += 1
else:
#since we cannot have a random " in the middle of an attr
return False
elif i == to_parse_len - 1: # "\0 case
a_tuple.append(to_parse[attribute_start_index:i].strip())
#reached end of line, so no more attr's to extract
attribute_start_index = to_parse_len
in_quotes = False
else:
return False
elif current_char == ',':
if not in_quotes:
a_tuple.append(to_parse[attribute_start_index:i].strip())
attribute_start_index = i + 1
i += 1
#in case the last attr was left empty or unquoted
if attribute_start_index < to_parse_len or (not in_quotes and to_parse[-1] == ','):
a_tuple.append(to_parse[attribute_start_index:])
#line ended while parsing; i.e. a quote was openned but not closed
if in_quotes:
return False
return a_tuple
def parse_tuple(to_parse, no_of_headers):
"""
parses a string and returns an array with no_of_headers number of headers
raises an error if the string was not a valid CSV line
"""
#get rid of the newline at the end of every line
to_parse = to_parse.strip()
# return to_parse.split(',') #if we assume the data is in a valid format
#the following checking of the format of the data increases the execution
#time by a factor of 2; if the data is know to be valid, uncomment 3 lines above here
#if there are more commas than fields, then we must take into consideration
#how the quotes parse and then extract the attributes
if to_parse.count(',') + 1 > no_of_headers:
result = check_and_parse_potential_tuple(to_parse)
if result:
a_tuple = result
else:
raise TypeError('Error while parsing CSV line %s. The quotes do not parse' % to_parse)
else:
a_tuple = to_parse.split(',')
if not csv_tuple_quotes_valid(a_tuple):
raise TypeError('Error while parsing CSV line %s. The quotes do not parse' % to_parse)
#if the format is correct but more data fields were provided
#the following works faster than an if statement that checks the length of a_tuple
try:
a_tuple[no_of_headers - 1]
except IndexError:
raise TypeError('Error while parsing CSV line %s. Unknown reason' % to_parse)
#this replaces the use my own hashtables to store the duplicated values for the attributes
for i in range(1, no_of_headers):
a_tuple[i] = sys.intern(a_tuple[i])
return a_tuple
def read_file(path, file_number):
"""
reads the csv file and returns (dict, int)
the dict is the mapping of id's to attributes
the integer is the number of attributes (headers) for the csv file
"""
global HEADERS
try:
file = open(path, 'r');
except FileNotFoundError as e:
print("error in %s:\n%s\nexiting...")
exit(1)
main_table = {}
headers = file.readline().strip().split(',')
no_of_headers = len(headers)
HEADERS.extend(headers[1:]) #keep the headers from the file
lines = file.readlines()
file.close()
args = []
for line in lines:
args.append((line, no_of_headers))
#pool is a pool of worker processes parsing the lines in parallel
with Pool() as workers:
try:
all_tuples = workers.starmap(parse_tuple, args, 1000)
except TypeError as e:
print('Error in file %s:\n%s\nexiting thread...' % (path, e.args))
exit(1)
for a_tuple in all_tuples:
#add quotes to key if needed
key = a_tuple[0] if a_tuple[0][0] == '\"' else ('\"%s\"' % a_tuple[0])
main_table[key] = a_tuple[1:]
return (main_table, no_of_headers)
def merge_files():
"""
produces a file called merged.csv
"""
global HEADERS
no_of_files = len(sys.argv) - 1
processed_files = [None] * no_of_files
for i in range(0, no_of_files):
processed_files[i] = read_file(sys.argv[i + 1], i)
out_file = open('merged.csv', 'w+')
merged_str = ','.join(HEADERS)
all_keys = {}
#this is to ensure that we include all keys in the final file.
#even those that are missing from some files and present in others
for processed_file in processed_files:
all_keys.update(processed_file[0])
for key in all_keys:
merged_str += '\n%s' % key
for i in range(0, no_of_files):
(main_table, no_of_headers) = processed_files[i]
try:
for attr in main_table[key]:
merged_str += ',%s' % attr
except KeyError:
print('NOTE: no values found for id %s in file \"%s\"' % (key, sys.argv[i + 1]))
merged_str += ',' * (no_of_headers - 1)
out_file.write(merged_str)
out_file.close()
if __name__ == '__main__':
# merge_files()
import cProfile
cProfile.run('merge_files()')
# import time
# start = time.time()
# print(time.time() - start);
Here is the profiler report I got on my Windows.
EDIT: The rest of the csv data provided is here. Pastebin was taking too long to process the files, so...
It might not be the best code and I know that, but my question is what slows down Windows so much that doesn't slow down an Ubuntu? The merge_files() function takes the longest, with 94 seconds just for itself, not including the calls to other functions. And there doesn't seem to be anything too obvious to me for why it is so slow.
Thanks
EDIT: Note: We both used the same dataset to run the code with.
It turns out that Windows and Linux handle very long strings differently. When I moved the out_file.write(merged_str) inside the outer for loop (for key in all_keys:) and stopped appending to merged_str, it ran for 11 seconds as expected. I don't have enough knowledge on either of the OS's memory management systems to be able to give a prediction on why it is so different.
But I would say that the way that the second one (the Windows one) is the more fail-safe method because it is unreasonable to keep a 30 MB string in memory. It just turns out that Linux sees that and doesn't always try to keep the string in cache, or to rebuild it every time.
Funny enough, initially I did run it a few times on my Linux machine with these same writing strategies, and the one with the large string seemed to go faster, so I stuck with it. I guess you never know.
Here's the modified code
for key in all_keys:
merged_str = '%s' % key
for i in range(0, no_of_files):
(main_table, no_of_headers) = processed_files[i]
try:
for attr in main_table[key]:
merged_str += ',%s' % attr
except KeyError:
print('NOTE: no values found for id %s in file \"%s\"' % (key, sys.argv[i + 1]))
merged_str += ',' * (no_of_headers - 1)
out_file.write(merged_str + '\n')
out_file.close()
When I run your solution on Ubuntu 16.04 with the three given files, it seems to take ~8 seconds to complete. The only modification I made was to uncomment the timing code at the bottom and use it.
$ python3 dimitar_merge.py file1.csv file2.csv file3.csv
NOTE: no values found for id "aaa5d09b-684b-47d6-8829-3dbefd608b5e" in file "file2.csv"
NOTE: no values found for id "38f79a49-4357-4d5a-90a5-18052ef03882" in file "file2.csv"
NOTE: no values found for id "766590d9-4f5b-4745-885b-83894553394b" in file "file2.csv"
8.039648056030273
$ python3 dimitar_merge.py file1.csv file2.csv file3.csv
NOTE: no values found for id "38f79a49-4357-4d5a-90a5-18052ef03882" in file "file2.csv"
NOTE: no values found for id "766590d9-4f5b-4745-885b-83894553394b" in file "file2.csv"
NOTE: no values found for id "aaa5d09b-684b-47d6-8829-3dbefd608b5e" in file "file2.csv"
7.78482985496521
I rewrote my first attempt without using csv from the standard library and am now getting times of ~4.3 seconds.
$ python3 lettuce_merge.py file1.csv file2.csv file3.csv
4.332579612731934
$ python3 lettuce_merge.py file1.csv file2.csv file3.csv
4.305467367172241
$ python3 lettuce_merge.py file1.csv file2.csv file3.csv
4.27345871925354
This is my solution code (lettuce_merge.py):
from collections import defaultdict
def split_row(csv_row):
return [col.strip('"') for col in csv_row.rstrip().split(',')]
def merge_csv_files(files):
file_headers = []
merged_headers = []
for i, file in enumerate(files):
current_header = split_row(next(file))
unique_key, *current_header = current_header
if i == 0:
merged_headers.append(unique_key)
merged_headers.extend(current_header)
file_headers.append(current_header)
result = defaultdict(lambda: [''] * (len(merged_headers) - 1))
for file_header, file in zip(file_headers, files):
for line in file:
key, *values = split_row(line)
for col_name, col_value in zip(file_header, values):
result[key][merged_headers.index(col_name) - 1] = col_value
file.close()
quotes = '"{}"'.format
with open('lettuce_merged.csv', 'w') as f:
f.write(','.join(quotes(a) for a in merged_headers) + '\n')
for key, values in result.items():
f.write(','.join(quotes(b) for b in [key] + values) + '\n')
if __name__ == '__main__':
from argparse import ArgumentParser, FileType
from time import time
parser = ArgumentParser()
parser.add_argument('files', nargs='*', type=FileType('r'))
args = parser.parse_args()
start_time = time()
merge_csv_files(args.files)
print(time() - start_time)
I'm sure this code could be optimized even further but sometimes just seeing another way to solve a problem can help spark new ideas.

Python - encryption and decryption scripts produce occasional errors

I made a Python script to encrypt plaintext files using the symmetric-key algorithm described in this video. I then created a second script to decrypt the encrypted message. Here is the original text:
I came, I saw, I conquered.
Here is the text after being encrypted and decrypted:
I came, I saw, I conquerdd.
Almost perfect, except for a single letter. For longer texts, there will be multiple letters which are just off ie the numerical representation of the character which appears is one lower than the numerical representation of the original character. I have no idea why this is.
Here's how my scripts work. First, I generated a random sequence of digits -- my PAD -- and saved it in the text file "pad.txt". I won't show the code because it is so straightforward. I then saved the text which I want to be encrypted in "text.txt". Next, I run the encryption script, which encrypts the text and saves it in the file "encryptedText.txt":
#!/usr/bin/python3.4
import string
def getPad():
padString = open("pad.txt","r").read()
pad = padString.split(" ")
return pad
def encrypt(textToEncrypt,pad):
encryptedText = ""
possibleChars = string.printable[:98] # last two elements are not used bec
# ause they don't show up well on te
# xt files.
for i in range(len(textToEncrypt)):
char = textToEncrypt[i]
if char in possibleChars:
num = possibleChars.index(char)
else:
return False
encryptedNum = num + int(pad[(i)%len(pad)])
if encryptedNum >= len(possibleChars):
encryptedNum = encryptedNum - len(possibleChars)
encryptedChar = possibleChars[encryptedNum]
encryptedText = encryptedText + encryptedChar
return encryptedText
if __name__ == "__main__":
textToEncrypt = open("text.txt","r").read()
pad = getPad()
encryptedText = encrypt(textToEncrypt,pad)
if not encryptedText:
print("""An error occurred during the encryption process. Confirm that \
there are no forbidden symbols in your text.""")
else:
open("encryptedText.txt","w").write(encryptedText)
Finally, I decrypt the text with this script:
#!/usr/bin/python3.4
import string
def getPad():
padString = open("pad.txt","r").read()
pad = padString.split(" ")
return pad
def decrypt(textToDecrypt,pad):
trueText = ""
possibleChars = string.printable[:98]
for i in range(len(textToDecrypt)):
encryptedChar = textToDecrypt[i]
encryptedNum = possibleChars.index(encryptedChar)
trueNum = encryptedNum - int(pad[i%len(pad)])
if trueNum < 0:
trueNum = trueNum + len(possibleChars)
trueChar = possibleChars[trueNum]
trueText = trueText + trueChar
return trueText
if __name__ == "__main__":
pad = getPad()
textToDecrypt = open("encryptedText.txt","r").read()
trueText = decrypt(textToDecrypt,pad)
open("decryptedText.txt","w").write(trueText)
Both scripts seem very straightforward, and they obvious work almost perfectly. However, every once in a while there is an error and I cannot see why.
I found the solution to this problem. It turns out that every character that was not decrypted properly was encrypted to \r, which my text editor changed to a \n for whatever reason. Removing \r from the list of possible characters fixed the issue.

Read the properties of HDF file in Python

I have a problem reading hdf file in pandas. As of now, I don't know the keys of the file.
How do I read the file [data.hdf] in such a case? And, my file is .hdf not .h5 , Does it make a difference it terms data fetching?
I see that you need a 'group identifier in the store'
pandas.io.pytables.read_hdf(path_or_buf, key, **kwargs)
I was able to get the metadata from pytables
File(filename=data.hdf, title='', mode='a', root_uep='/', filters=Filters(complevel=0, shuffle=False, fletcher32=False, least_significant_digit=None))
/ (RootGroup) ''
/UID (EArray(317,)) ''
atom := StringAtom(itemsize=36, shape=(), dflt='')
maindim := 0
flavor := 'numpy'
byteorder := 'irrelevant'
chunkshape := (100,)
/X Y (EArray(8319, 2, 317)) ''
atom := Float32Atom(shape=(), dflt=0.0)
maindim := 0
flavor := 'numpy'
byteorder := 'little'
chunkshape := (1000, 2, 100)
How do I make it readable via pandas?
First (.hdf or .h5) doesn't make any difference.
Second, I'm not sure about the pandas, but I read the HDF5 key like:
import h5py
h5f = h5py.File("test.h5", "r")
h5f.keys()
or
h5f.values()
Docs are here. However you will jot be able to directly read the format you show with pandas. You need to use PyTables to read it in. pandas can read in PyTables Table format directly even without the meta data that pandas uses.
pyhdf will be alternative option for hdf file in python
you can read and see keys from:
import pyhdf
hdf = pyhdf.SD.SD('file.hdf')
hdf.datasets()
I hope it will help you!
gud luck
You can use this simple function to see the variable names of any the HDF file (only works for the variables in the scientific mode)
from pyhdf.SD import *
def HDFvars(File):
"""
Extract variable names for an hdf file
"""
# hdfFile = SD.SD(File, mode=1)
hdfFile = SD(File, mode=1)
dsets = hdfFile.datasets()
k = []
for key in dsets.keys():
k.append(key)
k.sort()
hdfFile.end() # close the file
return k
If the variables aren't in the scientific mode, you can try whit pyhdf.V using the following program that shows the contents of the vgroups contained inside
any HDF file.
from pyhdf.HDF import *
from pyhdf.V import *
from pyhdf.VS import *
from pyhdf.SD import *
def describevg(refnum):
# Describe the vgroup with the given refnum.
# Open vgroup in read mode.
vg = v.attach(refnum)
print "----------------"
print "name:", vg._name, "class:",vg._class, "tag,ref:",
print vg._tag, vg._refnum
# Show the number of members of each main object type.
print "members: ", vg._nmembers,
print "datasets:", vg.nrefs(HC.DFTAG_NDG),
print "vdatas: ", vg.nrefs(HC.DFTAG_VH),
print "vgroups: ", vg.nrefs(HC.DFTAG_VG)
# Read the contents of the vgroup.
members = vg.tagrefs()
# Display info about each member.
index = -1
for tag, ref in members:
index += 1
print "member index", index
# Vdata tag
if tag == HC.DFTAG_VH:
vd = vs.attach(ref)
nrecs, intmode, fields, size, name = vd.inquire()
print " vdata:",name, "tag,ref:",tag, ref
print " fields:",fields
print " nrecs:",nrecs
vd.detach()
# SDS tag
elif tag == HC.DFTAG_NDG:
sds = sd.select(sd.reftoindex(ref))
name, rank, dims, type, nattrs = sds.info()
print " dataset:",name, "tag,ref:", tag, ref
print " dims:",dims
print " type:",type
sds.endaccess()
# VS tag
elif tag == HC.DFTAG_VG:
vg0 = v.attach(ref)
print " vgroup:", vg0._name, "tag,ref:", tag, ref
vg0.detach()
# Unhandled tag
else:
print "unhandled tag,ref",tag,ref
# Close vgroup
vg.detach()
# Open HDF file in readonly mode.
filename = 'yourfile.hdf'
hdf = HDF(filename)
# Initialize the SD, V and VS interfaces on the file.
sd = SD(filename)
vs = hdf.vstart()
v = hdf.vgstart()
# Scan all vgroups in the file.
ref = -1
while 1:
try:
ref = v.getid(ref)
print ref
except HDF4Error,msg: # no more vgroup
break
describevg(ref)

Reading the target of a .lnk file in Python?

I'm trying to read the target file/directory of a shortcut (.lnk) file from Python. Is there a headache-free way to do it? The spec is way over my head.
I don't mind using Windows-only APIs.
My ultimate goal is to find the "(My) Videos" folder on Windows XP and Vista. On XP, by default, it's at %HOMEPATH%\My Documents\My Videos, and on Vista it's %HOMEPATH%\Videos. However, the user can relocate this folder. In the case, the %HOMEPATH%\Videos folder ceases to exists and is replaced by %HOMEPATH%\Videos.lnk which points to the new "My Videos" folder. And I want its absolute location.
Create a shortcut using Python (via WSH)
import sys
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
shortcut = shell.CreateShortCut("t:\\test.lnk")
shortcut.Targetpath = "t:\\ftemp"
shortcut.save()
Read the Target of a Shortcut using Python (via WSH)
import sys
import win32com.client
shell = win32com.client.Dispatch("WScript.Shell")
shortcut = shell.CreateShortCut("t:\\test.lnk")
print(shortcut.Targetpath)
I know this is an older thread but I feel that there isn't much information on the method that uses the link specification as noted in the original question.
My shortcut target implementation could not use the win32com module and after a lot of searching, decided to come up with my own. Nothing else seemed to accomplish what I needed under my restrictions. Hopefully this will help other folks in this same situation.
It uses the binary structure Microsoft has provided for MS-SHLLINK.
import struct
path = 'myfile.txt.lnk'
target = ''
with open(path, 'rb') as stream:
content = stream.read()
# skip first 20 bytes (HeaderSize and LinkCLSID)
# read the LinkFlags structure (4 bytes)
lflags = struct.unpack('I', content[0x14:0x18])[0]
position = 0x18
# if the HasLinkTargetIDList bit is set then skip the stored IDList
# structure and header
if (lflags & 0x01) == 1:
position = struct.unpack('H', content[0x4C:0x4E])[0] + 0x4E
last_pos = position
position += 0x04
# get how long the file information is (LinkInfoSize)
length = struct.unpack('I', content[last_pos:position])[0]
# skip 12 bytes (LinkInfoHeaderSize, LinkInfoFlags, and VolumeIDOffset)
position += 0x0C
# go to the LocalBasePath position
lbpos = struct.unpack('I', content[position:position+0x04])[0]
position = last_pos + lbpos
# read the string at the given position of the determined length
size= (length + last_pos) - position - 0x02
temp = struct.unpack('c' * size, content[position:position+size])
target = ''.join([chr(ord(a)) for a in temp])
Alternatively, you could try using SHGetFolderPath(). The following code might work, but I'm not on a Windows machine right now so I can't test it.
import ctypes
shell32 = ctypes.windll.shell32
# allocate MAX_PATH bytes in buffer
video_folder_path = ctypes.create_string_buffer(260)
# 0xE is CSIDL_MYVIDEO
# 0 is SHGFP_TYPE_CURRENT
# If you want a Unicode path, use SHGetFolderPathW instead
if shell32.SHGetFolderPathA(None, 0xE, None, 0, video_folder_path) >= 0:
# success, video_folder_path now contains the correct path
else:
# error
Basically call the Windows API directly. Here is a good example found after Googling:
import os, sys
import pythoncom
from win32com.shell import shell, shellcon
shortcut = pythoncom.CoCreateInstance (
shell.CLSID_ShellLink,
None,
pythoncom.CLSCTX_INPROC_SERVER,
shell.IID_IShellLink
)
desktop_path = shell.SHGetFolderPath (0, shellcon.CSIDL_DESKTOP, 0, 0)
shortcut_path = os.path.join (desktop_path, "python.lnk")
persist_file = shortcut.QueryInterface (pythoncom.IID_IPersistFile)
persist_file.Load (shortcut_path)
shortcut.SetDescription ("Updated Python %s" % sys.version)
mydocs_path = shell.SHGetFolderPath (0, shellcon.CSIDL_PERSONAL, 0, 0)
shortcut.SetWorkingDirectory (mydocs_path)
persist_file.Save (shortcut_path, 0)
This is from http://timgolden.me.uk/python/win32_how_do_i/create-a-shortcut.html.
You can search for "python ishelllink" for other examples.
Also, the API reference helps too: http://msdn.microsoft.com/en-us/library/bb774950(VS.85).aspx
I also realize this question is old, but I found the answers to be most relevant to my situation.
Like Jared's answer, I also could not use the win32com module. So Jared's use of the binary structure from MS-SHLLINK got me part of the way there. I needed read shortcuts on both Windows and Linux, where the shortcuts are created on a samba share by Windows. Jared's implementation didn't quite work for me, I think only because I encountered some different variations on the shortcut format. But, it gave me the start I needed (thanks Jared).
So, here is a class named MSShortcut which expands on Jared's approach. However, the implementation is only Python3.4 and above, due to using some pathlib features added in Python3.4
#!/usr/bin/python3
# Link Format from MS: https://msdn.microsoft.com/en-us/library/dd871305.aspx
# Need to be able to read shortcut target from .lnk file on linux or windows.
# Original inspiration from: http://stackoverflow.com/questions/397125/reading-the-target-of-a-lnk-file-in-python
from pathlib import Path, PureWindowsPath
import struct, sys, warnings
if sys.hexversion < 0x03040000:
warnings.warn("'{}' module requires python3.4 version or above".format(__file__), ImportWarning)
# doc says class id =
# 00021401-0000-0000-C000-000000000046
# requiredCLSID = b'\x00\x02\x14\x01\x00\x00\x00\x00\xC0\x00\x00\x00\x00\x00\x00\x46'
# Actually Getting:
requiredCLSID = b'\x01\x14\x02\x00\x00\x00\x00\x00\xC0\x00\x00\x00\x00\x00\x00\x46' # puzzling
class ShortCutError(RuntimeError):
pass
class MSShortcut():
"""
interface to Microsoft Shortcut Objects. Purpose:
- I need to be able to get the target from a samba shared on a linux machine
- Also need to get access from a Windows machine.
- Need to support several forms of the shortcut, as they seem be created differently depending on the
creating machine.
- Included some 'flag' types in external interface to help test differences in shortcut types
Args:
scPath (str): path to shortcut
Limitations:
- There are some omitted object properties in the implementation.
Only implemented / tested enough to recover the shortcut target information. Recognized omissions:
- LinkTargetIDList
- VolumeId structure (if captured later, should be a separate class object to hold info)
- Only captured environment block from extra data
- I don't know how or when some of the shortcut information is used, only captured what I recognized,
so there may be bugs related to use of the information
- NO shortcut update support (though might be nice)
- Implementation requires python 3.4 or greater
- Tested only with Unicode data on a 64bit little endian machine, did not consider potential endian issues
Not Debugged:
- localBasePath - didn't check if parsed correctly or not.
- commonPathSuffix
- commonNetworkRelativeLink
"""
def __init__(self, scPath):
"""
Parse and keep shortcut properties on creation
"""
self.scPath = Path(scPath)
self._clsid = None
self._lnkFlags = None
self._lnkInfoFlags = None
self._localBasePath = None
self._commonPathSuffix = None
self._commonNetworkRelativeLink = None
self._name = None
self._relativePath = None
self._workingDir = None
self._commandLineArgs = None
self._iconLocation = None
self._envTarget = None
self._ParseLnkFile(self.scPath)
#property
def clsid(self):
return self._clsid
#property
def lnkFlags(self):
return self._lnkFlags
#property
def lnkInfoFlags(self):
return self._lnkInfoFlags
#property
def localBasePath(self):
return self._localBasePath
#property
def commonPathSuffix(self):
return self._commonPathSuffix
#property
def commonNetworkRelativeLink(self):
return self._commonNetworkRelativeLink
#property
def name(self):
return self._name
#property
def relativePath(self):
return self._relativePath
#property
def workingDir(self):
return self._workingDir
#property
def commandLineArgs(self):
return self._commandLineArgs
#property
def iconLocation(self):
return self._iconLocation
#property
def envTarget(self):
return self._envTarget
#property
def targetPath(self):
"""
Args:
woAnchor (bool): remove the anchor (\\server\path or drive:) from returned path.
Returns:
a libpath PureWindowsPath object for combined workingDir/relative path
or the envTarget
Raises:
ShortCutError when no target path found in Shortcut
"""
target = None
if self.workingDir:
target = PureWindowsPath(self.workingDir)
if self.relativePath:
target = target / PureWindowsPath(self.relativePath)
else: target = None
if not target and self.envTarget:
target = PureWindowsPath(self.envTarget)
if not target:
raise ShortCutError("Unable to retrieve target path from MS Shortcut: shortcut = {}"
.format(str(self.scPath)))
return target
#property
def targetPathWOAnchor(self):
tp = self.targetPath
return tp.relative_to(tp.anchor)
def _ParseLnkFile(self, lnkPath):
with lnkPath.open('rb') as f:
content = f.read()
# verify size (4 bytes)
hdrSize = struct.unpack('I', content[0x00:0x04])[0]
if hdrSize != 0x4C:
raise ShortCutError("MS Shortcut HeaderSize = {}, but required to be = {}: shortcut = {}"
.format(hdrSize, 0x4C, str(lnkPath)))
# verify LinkCLSID id (16 bytes)
self._clsid = bytes(struct.unpack('B'*16, content[0x04:0x14]))
if self._clsid != requiredCLSID:
raise ShortCutError("MS Shortcut ClassID = {}, but required to be = {}: shortcut = {}"
.format(self._clsid, requiredCLSID, str(lnkPath)))
# read the LinkFlags structure (4 bytes)
self._lnkFlags = struct.unpack('I', content[0x14:0x18])[0]
#logger.debug("lnkFlags=0x%0.8x" % self._lnkFlags)
position = 0x4C
# if HasLinkTargetIDList bit, then position to skip the stored IDList structure and header
if (self._lnkFlags & 0x01):
idListSize = struct.unpack('H', content[position:position+0x02])[0]
position += idListSize + 2
# if HasLinkInfo, then process the linkinfo structure
if (self._lnkFlags & 0x02):
(linkInfoSize, linkInfoHdrSize, self._linkInfoFlags,
volIdOffset, localBasePathOffset,
cmnNetRelativeLinkOffset, cmnPathSuffixOffset) = struct.unpack('IIIIIII', content[position:position+28])
# check for optional offsets
localBasePathOffsetUnicode = None
cmnPathSuffixOffsetUnicode = None
if linkInfoHdrSize >= 0x24:
(localBasePathOffsetUnicode, cmnPathSuffixOffsetUnicode) = struct.unpack('II', content[position+28:position+36])
#logger.debug("0x%0.8X" % linkInfoSize)
#logger.debug("0x%0.8X" % linkInfoHdrSize)
#logger.debug("0x%0.8X" % self._linkInfoFlags)
#logger.debug("0x%0.8X" % volIdOffset)
#logger.debug("0x%0.8X" % localBasePathOffset)
#logger.debug("0x%0.8X" % cmnNetRelativeLinkOffset)
#logger.debug("0x%0.8X" % cmnPathSuffixOffset)
#logger.debug("0x%0.8X" % localBasePathOffsetUnicode)
#logger.debug("0x%0.8X" % cmnPathSuffixOffsetUnicode)
# if info has a localBasePath
if (self._linkInfoFlags & 0x01):
bpPosition = position + localBasePathOffset
# not debugged - don't know if this works or not
self._localBasePath = UnpackZ('z', content[bpPosition:])[0].decode('ascii')
#logger.debug("localBasePath: {}".format(self._localBasePath))
if localBasePathOffsetUnicode:
bpPosition = position + localBasePathOffsetUnicode
self._localBasePath = UnpackUnicodeZ('z', content[bpPosition:])[0]
self._localBasePath = self._localBasePath.decode('utf-16')
#logger.debug("localBasePathUnicode: {}".format(self._localBasePath))
# get common Path Suffix
cmnSuffixPosition = position + cmnPathSuffixOffset
self._commonPathSuffix = UnpackZ('z', content[cmnSuffixPosition:])[0].decode('ascii')
#logger.debug("commonPathSuffix: {}".format(self._commonPathSuffix))
if cmnPathSuffixOffsetUnicode:
cmnSuffixPosition = position + cmnPathSuffixOffsetUnicode
self._commonPathSuffix = UnpackUnicodeZ('z', content[cmnSuffixPosition:])[0]
self._commonPathSuffix = self._commonPathSuffix.decode('utf-16')
#logger.debug("commonPathSuffix: {}".format(self._commonPathSuffix))
# check for CommonNetworkRelativeLink
if (self._linkInfoFlags & 0x02):
relPosition = position + cmnNetRelativeLinkOffset
self._commonNetworkRelativeLink = CommonNetworkRelativeLink(content, relPosition)
position += linkInfoSize
# If HasName
if (self._lnkFlags & 0x04):
(position, self._name) = self.readStringObj(content, position)
#logger.debug("name: {}".format(self._name))
# get relative path string
if (self._lnkFlags & 0x08):
(position, self._relativePath) = self.readStringObj(content, position)
#logger.debug("relPath='{}'".format(self._relativePath))
# get working dir string
if (self._lnkFlags & 0x10):
(position, self._workingDir) = self.readStringObj(content, position)
#logger.debug("workingDir='{}'".format(self._workingDir))
# get command line arguments
if (self._lnkFlags & 0x20):
(position, self._commandLineArgs) = self.readStringObj(content, position)
#logger.debug("commandLineArgs='{}'".format(self._commandLineArgs))
# get icon location
if (self._lnkFlags & 0x40):
(position, self._iconLocation) = self.readStringObj(content, position)
#logger.debug("iconLocation='{}'".format(self._iconLocation))
# look for environment properties
if (self._lnkFlags & 0x200):
while True:
size = struct.unpack('I', content[position:position+4])[0]
#logger.debug("blksize=%d" % size)
if size==0: break
signature = struct.unpack('I', content[position+4:position+8])[0]
#logger.debug("signature=0x%0.8x" % signature)
# EnvironmentVariableDataBlock
if signature == 0xA0000001:
if (self._lnkFlags & 0x80): # unicode
self._envTarget = UnpackUnicodeZ('z', content[position+268:])[0]
self._envTarget = self._envTarget.decode('utf-16')
else:
self._envTarget = UnpackZ('z', content[position+8:])[0].decode('ascii')
#logger.debug("envTarget='{}'".format(self._envTarget))
position += size
def readStringObj(self, scContent, position):
"""
returns:
tuple: (newPosition, string)
"""
strg = ''
size = struct.unpack('H', scContent[position:position+2])[0]
#logger.debug("workingDirSize={}".format(size))
if (self._lnkFlags & 0x80): # unicode
size *= 2
strg = struct.unpack(str(size)+'s', scContent[position+2:position+2+size])[0]
strg = strg.decode('utf-16')
else:
strg = struct.unpack(str(size)+'s', scContent[position+2:position+2+size])[0].decode('ascii')
#logger.debug("strg='{}'".format(strg))
position += size + 2 # 2 bytes to account for CountCharacters field
return (position, strg)
class CommonNetworkRelativeLink():
def __init__(self, scContent, linkContentPos):
self._networkProviderType = None
self._deviceName = None
self._netName = None
(linkSize, flags, netNameOffset,
devNameOffset, self._networkProviderType) = struct.unpack('IIIII', scContent[linkContentPos:linkContentPos+20])
#logger.debug("netnameOffset = {}".format(netNameOffset))
if netNameOffset > 0x014:
(netNameOffsetUnicode, devNameOffsetUnicode) = struct.unpack('II', scContent[linkContentPos+20:linkContentPos+28])
#logger.debug("netnameOffsetUnicode = {}".format(netNameOffsetUnicode))
self._netName = UnpackUnicodeZ('z', scContent[linkContentPos+netNameOffsetUnicode:])[0]
self._netName = self._netName.decode('utf-16')
self._deviceName = UnpackUnicodeZ('z', scContent[linkContentPos+devNameOffsetUnicode:])[0]
self._deviceName = self._deviceName.decode('utf-16')
else:
self._netName = UnpackZ('z', scContent[linkContentPos+netNameOffset:])[0].decode('ascii')
self._deviceName = UnpackZ('z', scContent[linkContentPos+devNameOffset:])[0].decode('ascii')
#property
def deviceName(self):
return self._deviceName
#property
def netName(self):
return self._netName
#property
def networkProviderType(self):
return self._networkProviderType
def UnpackZ (fmt, buf) :
"""
Unpack Null Terminated String
"""
#logger.debug(bytes(buf))
while True :
pos = fmt.find ('z')
if pos < 0 :
break
z_start = struct.calcsize (fmt[:pos])
z_len = buf[z_start:].find(b'\0')
#logger.debug(z_len)
fmt = '%s%dsx%s' % (fmt[:pos], z_len, fmt[pos+1:])
#logger.debug("fmt='{}', len={}".format(fmt, z_len))
fmtlen = struct.calcsize(fmt)
return struct.unpack (fmt, buf[0:fmtlen])
def UnpackUnicodeZ (fmt, buf) :
"""
Unpack Null Terminated String
"""
#logger.debug(bytes(buf))
while True :
pos = fmt.find ('z')
if pos < 0 :
break
z_start = struct.calcsize (fmt[:pos])
# look for null bytes by pairs
z_len = 0
for i in range(z_start,len(buf),2):
if buf[i:i+2] == b'\0\0':
z_len = i-z_start
break
fmt = '%s%dsxx%s' % (fmt[:pos], z_len, fmt[pos+1:])
# logger.debug("fmt='{}', len={}".format(fmt, z_len))
fmtlen = struct.calcsize(fmt)
return struct.unpack (fmt, buf[0:fmtlen])
I hope this helps others as well.
Thanks
I didn't really like any of the answers available because I didn't want to keep importing more and more libraries and the 'shell' option was spotty on my test machines. I opted for reading the ".lnk" in and then using a regular expression to read out the path. For my purposes, I am looking for pdf files that were recently opened and then reading the content of those files:
# Example file path to the shortcut
shortcut = "shortcutFileName.lnk"
# Open the lnk file using the ISO-8859-1 encoder to avoid errors for special characters
lnkFile = open(shortcut, 'r', encoding = "ISO-8859-1")
# Use a regular expression to parse out the pdf file on C:\
filePath = re.findall("C:.*?pdf", lnkFile.read(), flags=re.DOTALL)
# Close File
lnkFile.close()
# Read the pdf at the lnk Target
pdfFile = open(tmpFilePath[0], 'rb')
Comments:
Obviously this works for pdf but needs to specify other file extensions accordingly.
It's easy as opening ".exe" file. Here also, we are going to use the os module for this. You just have to create a shortcut .lnk and store it in any folder of your choice. Then, in any Python file, first import the os module (already installed, just import). Then, use a variable, say path, and assign it a string value containing the location of your .lnk file. Just create a shortcut of your desired application. At last, we will use os.startfile()
to open our shortcut.
Points to remember:
The location should be within double inverted commas.
Most important, open Properties. Then, under that, open "Details". There, you can get the exact name of your shortcut. Please write that name with ".lnk" at last.
Now, you have completed the procedure. I hope it helps you. For additional assistance, I am leaving my code for this at the bottom.
import os
path = "C:\\Users\\hello\\OneDrive\\Desktop\\Shortcuts\\OneNote for Windows 10.lnk"
os.startfile(path)
In my code, I used path as variable and I had created a shortcut for OneNote. In path, I defined the location of OneNote's shortcut. So when I use os.startfile(path), the os module is going to open my shortcut file defined in variable path.
this job is possible without any modules, doing this will return a b string having the destination of the shortcut file. Basically what you do is you open the file in read binary mode (rb mode). This is the code to accomplish this task:
with open('programs.lnk - Copy','rb') as f:
destination=f.read()
i am currently using python 3.9.2, in case you face problems with this, just tell me and i will try to fix it.
A more stable solution in python, using powershell to read the target path from the .lnk file.
using only standard libraries avoids introducing extra dependencies such as win32com
this approach works with the .lnks that failed with jared's answer, more details
we avoid directly reading the file, which felt hacky, and sometimes failed
import subprocess
def get_target(link_path) -> (str, str):
"""
Get the target & args of a Windows shortcut (.lnk)
:param link_path: The Path or string-path to the shortcut, e.g. "C:\\Users\\Public\\Desktop\\My Shortcut.lnk"
:return: A tuple of the target and arguments, e.g. ("C:\\Program Files\\My Program.exe", "--my-arg")
"""
# get_target implementation by hannes, https://gist.github.com/Winand/997ed38269e899eb561991a0c663fa49
ps_command = \
"$WSShell = New-Object -ComObject Wscript.Shell;" \
"$Shortcut = $WSShell.CreateShortcut(\"" + str(link_path) + "\"); " \
"Write-Host $Shortcut.TargetPath ';' $shortcut.Arguments "
output = subprocess.run(["powershell.exe", ps_command], capture_output=True)
raw = output.stdout.decode('utf-8')
launch_path, args = [x.strip() for x in raw.split(';', 1)]
return launch_path, args
# to test
shortcut_file = r"C:\Users\REPLACE_WITH_USERNAME\AppData\Roaming\Microsoft\Windows\Start Menu\Programs\Accessibility\Narrator.lnk"
a, args = get_target(shortcut_file)
print(a) # C:\WINDOWS\system32\narrator.exe
(you can remove -> typehinting to get it to work in older python versions)
I did notice this is slow when running on lots of shortcuts. You could use jareds method, check if the result is None, and if so, run this code to get the target path.
The nice approach with direct regex-based parsing (proposed in the answer) didn't work reliable for all shortcuts in my case. Some of them have only relative path like ..\\..\\..\\..\\..\\..\\Program Files\\ImageGlass\\ImageGlass.exe (produced by msi-installer), and it is stored with wide chars, which are tricky to handle in Python.
So I've discovered a Python module LnkParse3, which is easy to use and meets my needs.
Here is a sample script to show target of a lnk-file passed as first argument:
import LnkParse3
import sys
with open(sys.argv[1], 'rb') as indata:
lnk = LnkParse3.lnk_file(indata)
print(lnk.lnk_command)
I arrived at this thread looking for a way to parse a ".lnk" file and get the target file name.
I found another very simple solution:
pip install comtypes
Then
from comtypes.client import CreateObject
from comtypes.persist import IPersistFile
from comtypes.shelllink import ShellLink
# MAKE SURE THIS VAT CONTAINS A STRING AND NOT AN OBJECT OF 'PATH'
# I spent too much time figuring out the problem with .load(..) function ahead
pathStr="c:\folder\yourlink.lnk"
s = CreateObject(ShellLink)
p = s.QueryInterface(IPersistFile)
p.Load(pathStr, False)
print(s.GetPath())
print(s.GetArguments())
print(s.GetWorkingDirectory())
print(s.GetIconLocation())
try:
# the GetDescription create exception in some of the links
print(s.GetDescription())
except Exception as e:
print(e)
print(s.Hotkey)
print(s.ShowCmd)
Based on this great answer...
https://stackoverflow.com/a/43856809/2992810

Categories

Resources