Python object persistence

Python object persistence - python

I'm seeking advice about methods of implementing object persistence in Python. To be more precise, I wish to be able to link a Python object to a file in such a way that any Python process that opens a representation of that file shares the same information, any process can change its object and the changes will propagate to the other processes, and even if all processes "storing" the object are closed, the file will remain and can be re-opened by another process.
I found three main candidates for this in my distribution of Python - anydbm, pickle, and shelve (dbm appeared to be perfect, but it is Unix-only, and I am on Windows). However, they all have flaws:
anydbm can only handle a dictionary of string values (I'm seeking to store a list of dictionaries, all of which have string keys and string values, though ideally I would seek a module with no type restrictions)
shelve requires that a file be re-opened before changes propagate - for instance, if two processes A and B load the same file (containing a shelved empty list), and A adds an item to the list and calls sync(), B will still see the list as being empty until it reloads the file.
pickle (the module I am currently using for my test implementation) has the same "reload requirement" as shelve, and also does not overwrite previous data - if process A dumps fifteen empty strings onto a file, and then the string 'hello', process B will have to load the file sixteen times in order to get the 'hello' string. I am currently dealing with this problem by preceding any write operation with repeated reads until end of file ("wiping the slate clean before writing on it"), and by making every read operation repeated until end of file, but I feel there must be a better way.
My ideal module would behave as follows (with "A>>>" representing code executed by process A, and "B>>>" code executed by process B):
A>>> import imaginary_perfect_module as mod
B>>> import imaginary_perfect_module as mod
A>>> d = mod.load('a_file')
B>>> d = mod.load('a_file')
A>>> d
{}
B>>> d
{}
A>>> d[1] = 'this string is one'
A>>> d['ones'] = 1 #anydbm would sulk here
A>>> d['ones'] = 11
A>>> d['a dict'] = {'this dictionary' : 'is arbitrary', 42 : 'the answer'}
B>>> d['ones'] #shelve would raise a KeyError here, unless A had called d.sync() and B had reloaded d
11 #pickle (with different syntax) would have returned 1 here, and then 11 on next call
(etc. for B)
I could achieve this behaviour by creating my own module that uses pickle, and editing the dump and load behaviour so that they use the repeated reads I mentioned above - but I find it hard to believe that this problem has never occurred to, and been fixed by, more talented programmers before. Moreover, these repeated reads seem inefficient to me (though I must admit that my knowledge of operation complexity is limited, and it's possible that these repeated reads are going on "behind the scenes" in otherwise apparently smoother modules like shelve). Therefore, I conclude that I must be missing some code module that would solve the problem for me. I'd be grateful if anyone could point me in the right direction, or give advice about implementation.

Use the ZODB (the Zope Object Database) instead. Backed with ZEO it fulfills your requirements:
Transparent persistence for Python objects
ZODB uses pickles underneath so anything that is pickle-able can be stored in a ZODB object store.
Full ACID-compatible transaction support (including savepoints)
This means changes from one process propagate to all the other processes when they are good and ready, and each process has a consistent view on the data throughout a transaction.
ZODB has been around for over a decade now, so you are right in surmising this problem has already been solved before. :-)
The ZODB let's you plug in storages; the most common format is the FileStorage, which stores everything in one Data.fs with an optional blob storage for large objects.
Some ZODB storages are wrappers around others to add functionality; DemoStorage for example keeps changes in memory to facilitate unit testing and demonstration setups (restart and you have clean slate again). BeforeStorage gives you a window in time, only returning data from transactions before a given point in time. The latter has been instrumental in recovering lost data for me.
ZEO is such a plugin that introduces a client-server architecture. Using ZEO lets you access a given storage from multiple processes at a time; you won't need this layer if all you need is multi-threaded access from one process only.
The same could be achieved with RelStorage, which stores ZODB data in a relational database such as PostgreSQL, MySQL or Oracle.

For beginners, You can port your shelve databases to ZODB databases like this:
#!/usr/bin/env python
import shelve
import ZODB, ZODB.FileStorage
import transaction
from optparse import OptionParser
import os
import sys
import re
reload(sys)
sys.setdefaultencoding("utf-8")
parser = OptionParser()
parser.add_option("-o", "--output", dest = "out_file", default = False, help ="original shelve database filename")
parser.add_option("-i", "--input", dest = "in_file", default = False, help ="new zodb database filename")
parser.set_defaults()
options, args = parser.parse_args()
if options.in_file == False or options.out_file == False :
print "Need input and output database filenames"
exit(1)
db = shelve.open(options.in_file, writeback=True)
zstorage = ZODB.FileStorage.FileStorage(options.out_file)
zdb = ZODB.DB(zstorage)
zconnection = zdb.open()
newdb = zconnection.root()
for key, value in db.iteritems() :
print "Copying key: " + str(key)
newdb[key] = value
transaction.commit()

I suggest using TinyDB, it's much much better and simple to use.
https://tinydb.readthedocs.io/en/stable/

Related

CANoe: How to select and start test cases from XML Test Module from Python using CANoe COM interface?

currently I am able to:
start CANoe application
load a CANoe configuration file
load a test setup file
def load_test_setup(self, canoe_test_setup_file: str = None) -> None:
logger.info(
f'Loading CANoe test setup file <{canoe_test_setup_file}>.')
if self.measurement.Running:
logger.info(
f'Simulation is currently running, so new test setup could \
not be loaded!')
return
self.test_setup.TestEnvironments.Add(canoe_test_setup_file)
test_environment = self.test_setup.TestEnvironments.Item(1)
logger.info(f'Loaded test environment is <{test_environment.Name}>.')
How can I access the XML Test Module loaded with the test setup (tse) file and select tests to be executed?

The last before line in your snippet is most probably causing the issue.
I have been trying to fix this issue for quite some time now and finally found the solution.
Somehow when you execute the line self.test_setup.TestEnvironments.Item(1)
win32com creates an object of type TestSetupItem which doesn't have the necessary properties or methods to access the test cases. Instead we want to access objects of collection types TestSetupFolders or TestModules. win32com creates object of TestSetupItem type even though I have a single XML Test Module (called AutomationTestSeq) in the Test Environment as you can see here.
There are three possible solutions that I found.
Manually clearing the generated cache before each run.
Using win32com.client.DispatchWithEvents or win32com.client.gencache.EnsureDispatch generates a bunch of python files that describe CANoe's object model.
If you had used either of those before, TestEnvironments.Item(1) will always return TestSetupItem instead of the more appropriate type objects.
To remove the cache you need to delete the C:\Users\{username}\AppData\Local\Temp\gen_py\{python version} folder.
Doing this every time is of course not very practical.
Force win32com to always use dynamic dispatch.
You can do this by using:
canoe = win32com.client.dynamic.Dispatch("CANoe.Application")
Any objects you create using canoe from now on, will be dynamically dispatched.
Forcing dynamic dispatch is easier than manually clearing the cache folder every time. This gave me good results always. But doing this will not let you have any insight into the objects. You won't be able to see the acceptable properties and methods for the objects.
Typecast TestSetupItem to TestSetupFolders or TestModules.
This has the risk that if you typecast incorrectly, you will get unexpected results. But has worked well for me so far.
In short: win32.CastTo(test_env, "ITestEnvironment2"). This will ensure that you are using the recommended object hierarchy as per CANoe technical reference.
Note that you will also have to typecast TestSequenceItem to TestCase to be able to access test case verdict and enable/disable test cases.
Below is a decent example script.
"""Execute XML Test Cases without a pass verdict"""
import sys
from time import sleep
import win32com.client as win32
CANoe = win32.DispatchWithEvents("CANoe.Application")
CANoe.Open("canoe.cfg")
test_env = CANoe.Configuration.TestSetup.TestEnvironments.Item('Test Environment')
# Cast required since test_env is originally of type <ITestEnvironment>
test_env = win32.CastTo(test_env, "ITestEnvironment2")
# Get the XML TestModule (type <TSTestModule>) in the test setup
test_module = test_env.TestModules.Item('AutomationTestSeq')
# {.Sequence} property returns a collection of <TestCases> or <TestGroup>
# or <TestSequenceItem> which is more generic
seq = test_module.Sequence
for i in range(1, seq.Count+1):
# Cast from <ITestSequenceItem> to <ITestCase> to access {.Verdict}
# and the {.Enabled} property
tc = win32.CastTo(seq.Item(i), "ITestCase")
if tc.Verdict != 1: # Verdict 1 is pass
tc.Enabled = True
print(f"Enabling Test Case {tc.Ident} with verdict {tc.Verdict}")
else:
tc.Enabled = False
print(f"Disabling Test Case {tc.Ident} since it has already passed")
CANoe.Measurement.Start()
sleep(5) # Sleep because measurement start is not instantaneous
test_module.Start()
sleep(1)

Just continue what you have done.
The TestEnvironment contains the TestModules. Each TestModule contains a TestSequence which in turn contains the TestCases.
Keep in mind that you cannot individual TestCases but only the TestModule. But you can enable and disable individual TestCases before execution by using the COM-API.
(typing this from the top of my head, might not work 100%)
test_module = test_environment.TestModules.Item(1) # of 2 or whatever
test_sequence = test_module.Sequence
for i in range(1, test_sequence.Count + 1):
test_case = test_sequence.Item(i)
if ...:
test_case.Enabled = False # or True
test_module.Start()
You have to keep in mind that a TestSequence can also contain other TestSequences (i.e. a TestGroup). This depends on how your TestModule is setup. If so, you have to take care of that in your loop and descend into these TestGroups while searching for your TestCase of interest.

Making a Savefile for a Text-Based Game in Python

tl;dr in bold below
I'm currently developing a text-based adventure game, and I've implemented a basic saving system.
The process takes advantage of the 'pickle' module. It generates or appends to a file with a custom extension (when it is, in reality, a text file).
The engine pickles the player's location, their inventory, and, well, the last part is where it gets a little weird.
The game loads dialog from a specially formatted script (Here I mean as in an actor's script). Some dialog changes based on certain conditions (already spoken to them before, new event, etc.). So, for that third object the engine saves, it saves ALL dialog trees in their current positions. As in, it saves the literal script in its current state.
Here is the saving routine:
with open('save.devl','wb') as file:
pickle.dump((current_pos,player_inv,dia_dict),file)
for room in save_map:
pickle.dump(room,file)
file.close()
My problem is, this process makes a very ugly, very verbose, super large text file. Now I know that text files are basically the smallest files I can generate, but I was wondering if there was any way to compress or otherwise make more efficient the process of recording the state of everything in the game. Or, less preferably but better in the long run, just a smarter way to save the player's data.
The format of dialog was requested. Here is a sample:
[Isaac]
a: Hello.|1. I'm Evan.|b|
b: Nice to meet you.|1. Where are you going?\2.Goodbye.|c,closer|
c: My cousin's wedding.|1. Interesting. Where are you from?\2. What do you know about the ship?\3. Goodbye.|e,closer|
closer: See you later.||break|
e: It's the WPT Magnus. Cruise-class zeppelin. Been in service for about three years, I believe.||c|
standing: Hello, again.|1. What do you know about the ship?\2.Goodbye.|e,closer|
The name in brackets is how the program identifies which tree to call. Each letter is a separate branch in the tree. The bars separate the branch into three parts: 1. What the character says 2. The responses you are allowed 3. Where each response goes, or if the player doesn't respond, where the player is directed afterwards.
In this example, after the player has talked to Isaac, the 'a' branch is erased from the copy of the tree that the game stores in memory. It then permanently uses the 'standing' branch.

Pickle itself has other protocols that are all more compact than the default protocol (protocol 0) - which is the only one "text based" - the others are binary protocols.
But them, you hardly would get more than 50% of the file size - to be able to enhance the answer, we need to know better what you are saving, and if there are smarter ways to save your data - for example, by avoiding repeating the same sub-data structure if it is present in several of your rooms. (Although if you are using object identity inside your game, Pickle should take care of that).
That said, just change your pickle.dump calls to include the protocol parameter - the -1 value is equivalent to "HIGHEST_PROTOCOL", which is usually the most efficient:
pickle.dump(room,file, protocol=-1)
(loading the pickles do not require that the protocol is passed at all)
Aditionally, you might want to use Python's zlib interface to compress pickle data. That could give you another 20-30% file size reduction - you have to chain the calls to file.write, zlib.compress and pickle.dumps, so you will be easier with a little helper code - also you need to control file offsets, as zlib is not like pickle which advances the file pointer:
import pickle, zlib
def store_obj(file_, obj):
compressed = zlib.compress(pickle.dumps(obj, protocol=-1), level=9)
file_.write(len(compressed).to_bytes(4, "little"))
file_.write(compressed)
def get_obj(file_):
obj_size = int.from_bytes(file_.read(4), "little")
if obj_size == 0:
return None
data = zlib.decompress(self.file_.read(obj_size))
return pickle.loads(data)

share variable (data from file) among multiple python scripts with not loaded duplicates

I would like to load a big matrix contained in the matrix_file.mtx. This load must be made once. Once the variable matrix is loaded into the memory, I would like many python scripts to share it with not duplicates in order to have a memory efficient multiscript program in bash (or python itself). I can imagine some pseudocode like this:
# Loading and sharing script:
import share
matrix = open("matrix_file.mtx","r")
share.send_to_shared_ram(matrix, as_variable('matrix'))
# Shared matrix variable processing script_1
import share
pointer_to_matrix = share.share_variable_from_ram('matrix')
type(pointer_to_matrix)
# output: <type 'numpy.ndarray'>
# Shared matrix variable processing script_2
import share
pointer_to_matrix = share.share_variable_from_ram('matrix')
type(pointer_to_matrix)
# output: <type 'numpy.ndarray'>
...
The idea is pointer_to_matrix to point to matrix in RAM, which is only once loaded by the n scripts (not n times). They are separately called from a bash script (or if possible form a python main):
$ python Load_and_share.py
$ python script_1.py -args string &
$ python script_2.py -args string &
$ ...
$ python script_n.py -args string &
I'd also be interested in solutions via hard disk, i.e. matrix could be stored at disk while the share object access to it as being required. Nonetheless, the object (a kind of pointer) in RAM can be seen as the whole matrix.
Thank you for your help.

Between the mmap module and numpy.frombuffer, this is fairly easy:
import mmap
import numpy as np
with open("matrix_file.mtx","rb") as matfile:
mm = mmap.mmap(matfile.fileno(), 0, access=mmap.ACCESS_READ)
# Optionally, on UNIX-like systems in Py3.3+, add:
# os.posix_fadvise(matfile.fileno(), 0, len(mm), os.POSIX_FADV_WILLNEED)
# to trigger background read in of the file to the system cache,
# minimizing page faults when you use it
matrix = np.frombuffer(mm, np.uint8)
Each process would perform this work separately, and get a read only view of the same memory. You'd change the dtype to something other than uint8 as needed. Switching to ACCESS_WRITE would allow modifications to shared data, though it would require synchronization and possibly explicit calls to mm.flush to actually ensure the data was reflected in other processes.
A more complex solution that follows your initial design more closely might be to uses multiprocessing.SyncManager to create a connectable shared "server" for data, allowing a single common store of data to be registered with the manager and returned to as many users as desired; creating an Array (based on ctypes types) with the correct type on the manager, then register-ing a function that returns the same shared Array to all callers would work too (each caller would then convert the returned Array via numpy.frombuffer as before). It's much more involved (it would be easier to have a single Python process initialize an Array, then launch Processes that would share it automatically thanks to fork semantics), but it's the closest to the concept you describe.

How to free memory after opening a file in Python

I'm opening a 3 GB file in Python to read strings. I then store this data in a dictionary. My next goal is to build a graph using this dictionary so I'm closely monitoring memory usage.
It seems to me that Python loads the whole 3 GB file into memory and I can't get rid of it. My code looks like that :
with open(filename) as data:
accounts = dict()
for line in data:
username = line.split()[1]
IP = line.split()[0]
try:
accounts[username].add(IP)
except KeyError:
accounts[username] = set()
accounts[username].add(IP)
print "The accounts will be deleted from memory in 5 seconds"
time.sleep(5)
accounts.clear()
print "The accounts have been deleted from memory"
time.sleep(5)
print "End of script"
The last lines are there so that I could monitor memory usage.
The script uses a bit more than 3 GB in memory. Clearing the dictionary frees around 300 MB. When the script ends, the rest of the memory is freed.
I'm using Ubuntu and I've monitored memory usage using both "System Monitor" and the "free" command in terminal.
What I don't understand is why does Python need so much memory after I've cleared the dictionary. Is the file still stored in memory ? If so, how can I get rid of it ? Is it a problem with my OS not seeing freed memory ?
EDIT : I've tried to force a gc.collect() after clearing the dictionary, to no avail.
EDIT2 : I'm running Python 2.7.3 on Ubuntu 12.04.LTS
EDIT3 : I realize I forgot to mention something quite important. My real problem is not that my OS does not "get back" the memory used by Python. It's that later on, Python does not seem to reuse that memory (it just asks for more memory to the OS).

this really does make no sense to me either, and I wanted to figure out how/why this happens. ( i thought that's how this should work too! ) i replicated it on my machine - though with a smaller file.
i saw two discrete problems here
why is Python reading the file into memory ( with lazy line reading, it shouldn't - right ? )
why isn't Python freeing up memory to the system
I'm not knowledgable at all on the Python internals, so I just did a lot of web searching. All of this could be completely off the mark. ( I barely develop anymore , have been on the biz side of tech for the past few years )
Lazy line reading...
I looked around and found this post -
http://www.peterbe.com/plog/blogitem-040312-1
it's from a much earlier version of python, but this line resonated with me:
readlines() reads in the whole file at once and splits it by line.
then i saw this , also old, effbot post:
http://effbot.org/zone/readline-performance.htm
the key takeaway was this:
For example, if you have enough memory, you can slurp the entire file into memory, using the readlines method.
and this:
In Python 2.2 and later, you can loop over the file object itself. This works pretty much like readlines(N) under the covers, but looks much better
looking at pythons docs for xreadlines [ http://docs.python.org/library/stdtypes.html?highlight=readline#file.xreadlines ]:
This method returns the same thing as iter(f)
Deprecated since version 2.3: Use for line in file instead.
it made me think that perhaps some slurping is going on.
so if we look at readlines [ http://docs.python.org/library/stdtypes.html?highlight=readline#file.readlines ]...
Read until EOF using readline() and return a list containing the lines thus read.
and it sort of seems like that's what's happening here.
readline , however, looked like what we wanted [ http://docs.python.org/library/stdtypes.html?highlight=readline#file.readline ]
Read one entire line from the file
so i tried switching this to readline, and the process never grew above 40MB ( it was growing to 200MB, the size of the log file , before )
accounts = dict()
data= open(filename)
for line in data.readline():
info = line.split("LOG:")
if len(info) == 2 :
( a , b ) = info
try:
accounts[a].add(True)
except KeyError:
accounts[a] = set()
accounts[a].add(True)
my guess is that we're not really lazy-reading the file with the for x in data construct -- although all the docs and stackoverflow comments suggest that we are. readline() consumed signficantly less memory for me, and realdlines consumed approximately the same amount of memory as for line in data
freeing memory
in terms of freeing up memory, I'm not familiar much with Python's internals, but I recall back from when I worked with mod_perl... if I opened up a file that was 500MB, that apache child grew to that size. if I freed up the memory, it would only be free within that child -- garbage collected memory was never returned to the OS until the process exited.
so i poked around on that idea , and found a few links that suggest this might be happening:
http://effbot.org/pyfaq/why-doesnt-python-release-the-memory-when-i-delete-a-large-object.htm
If you create a large object and delete it again, Python has probably released the memory, but the memory allocators involved don’t necessarily return the memory to the operating system, so it may look as if the Python process uses a lot more virtual memory than it actually uses.
that was sort of old, and I found a bunch of random (accepted) patches afterwards into python that suggested the behavior was changed and that you could now return memory to the os ( as of 2005 when most of those patches were submitted and apparently approved ).
then i found this posting http://objectmix.com/python/17293-python-memory-handling.html -- and note the comment #4
"""- Patch #1123430: Python's small-object allocator now returns an arena to the system free() when all memory within an arena becomes unused again. Prior to Python 2.5, arenas (256KB chunks of memory) were never freed. Some applications will see a drop in virtual memory size now, especially long-running applications that, from time to time, temporarily use a large number of small objects. Note that when Python returns an arena to the platform C's free(), there's no guarantee that the platform C library will in turn return that memory to the operating system. The effect of the patch is to stop making that impossible, and in tests it appears to be effective at least on Microsoft C and gcc-based systems. Thanks to Evan Jones for hard work and patience.
So with 2.4 under linux (as you tested) you will indeed not always get
the used memory back, with respect to lots of small objects being
collected.
The difference therefore (I think) you see between doing an f.read() and
an f.readlines() is that the former reads in the whole file as one large
string object (i.e. not a small object), while the latter returns a list
of lines where each line is a python object.
if the 'for line in data:' construct is essentially wrapping readlines and not readline, maybe this has something to do with it? perhaps it's not a problem of having a single 3GB object, but instead having millions of 30k objects.

Which version of python that are you trying this?
I did a test on Python 2.7/Win7, and it worked as expected, the memory was released.
Here I generate sample data like yours:
import random
fn = random.randint
with open('ips.txt', 'w') as f:
for i in xrange(9000000):
f.write('{0}.{1}.{2}.{3} username-{4}\n'.format(
fn(0,255),
fn(0,255),
fn(0,255),
fn(0,255),
fn(0, 9000000),
))
And then your script. I replaced dict by defaultdict because throwing exceptions makes the code slower:
import time
from collections import defaultdict
def read_file(filename):
with open(filename) as data:
accounts = defaultdict(set)
for line in data:
IP, username = line.split()[:2]
accounts[username].add(IP)
print "The accounts will be deleted from memory in 5 seconds"
time.sleep(5)
accounts.clear()
print "The accounts have been deleted from memory"
time.sleep(5)
print "End of script"
if __name__ == '__main__':
read_file('ips.txt')
As you can see, memory reached 1.4G and was then released, leaving 36MB:
Using your original script I got the same results, but a bit slower:

There are difference between when Python releases memory for reuse by Python and when it releases memory back to the OS. Python has internal pools for some kinds of objects and it will reuse these itself but doesn't give it back to the OS.

The gc module may be useful, particularly the collect function. I have never used it myself, but from the documentation, it looks like it may be useful. I would try running gc.collect() before you run accounts.clear().

msiexec scripting in python

Most of this is background, skip the next 3 paragraphs for the question:
I have developed a tool that calls some installers, changes registry items, and moves files around to help me test a product which has a fairly fast update cycle. So far so good, I have a GUI which runs in a separate process to the business logic to prevent it locking due to the GIL, everything works etc, however I have concerns with a section of my code where I make calls to msiexec.
Specifically it's the uninstall part which gives me concerns. Currently the GUID does not change so I am able to uninstall the product using an os.system('msiexec /x "{GUID}" /passive') sort of thing. It's actually a bit more complicated as I'm using subprocess.Popen and polling it until it finished from within an event loop to allow for concurrency with other steps.
My concern is that should the GUID change, obviously this will not work. I don't want to point msiexec directly at the installation source, as this would mean that it wouldn't work if I were to 'lose' the msi file, which I store in a temporary directory.
What I am looking for, is a way of querying by program name to get the GUID, or even a wrapper for msiexec that would do all of this, including the uninstall, for me. I thought of scanning through the registry, but the _winreg module seems very slow, so I'd prefer to avoid this if at all possible. If there's a better way to scan the registry, I'm all ears, as this would speed up other parts of the tool also.
Update0
Performance on this is critical as one of the design goals is to make the process which the tool follows faster than any other method, manual or otherwise, in order to gain wholesale adoption.
Update1
I have tried a slight variation of the registry version below however it consistently returns None. I'm not quite sure how this is happening - it seems like it is failing to open the appropriate key as I have inserted a breakpoint after the with statement which is never reached...
def get_guid_by_name(name):
from _winreg import (OpenKey,
QueryInfoKey,
EnumKey,
QueryValueEx,
HKEY_LOCAL_MACHINE,
)
with OpenKey(HKEY_LOCAL_MACHINE,
r'SOFTWARE\Microsoft\Windows\CurrentVersion\Uninstall') as key:
subkeys, _0, _1 = QueryInfoKey(key) # The breakpoint here is never reached
del _0, _1
for i in range(subkeys):
subkey = EnumKey(key, i)
if subkey[0] != '{' or subkey[-1] != '}':
continue
with OpenKey(key, subkey) as _subkey:
if name in QueryValueEx(_subkey, 'DisplayName')[0]:
return subkey
return None
print get_guid_by_name('Microsoft Visual Studio')
Update2
Strike that - I'm a fool who doesn't check his indentation thoroughly enough - print get_guid_by_name('Microsoft Visual Studio') was within get_guid_by_name...

I'm not sure about the _winreg module being all that slow. I suppose if you were trying to enumerate the entire registry to find all instances of a string that might take a while, but with a decently targeted query it seems reasonably fast.
Here's an example:
from _winreg import *
def get_guid_by_name(name):
# Open the uninstaller key
with OpenKey(HKEY_LOCAL_MACHINE, r'Software\Microsoft\Windows\CurrentVersion\Uninstall') as key:
# We only care about subkeys of the installer key
subkeys, _, _ = QueryInfoKey(key)
for i in range(subkeys):
subkey = EnumKey(key, i)
# Since we're looking for uninstallers for MSI products,
# the key name will always be the GUID. We assume that any
# key starting with '{' and ending with '}' is a GUID, but
# if not the name won't match.
if subkey[0] != '{' or subkey[-1] != '}':
continue
# Query the display name or other property of the key to
# see if it's the one we want
with OpenKey(key, subkey) as _subkey:
if QueryValueEx(_subkey, 'DisplayName')[0] == name:
return subkey
return None
On my machine, querying for ActiveState's Komodo Edit (I actually used a regular expression rather than straight-value comparison), 1000 iterations of this took 8.18 seconds (timed using timeit), which seems like a negligible amount of time to me. Better yet, you can pull the UninstallString key from the registry and pass that straight to your subprocess (though you may want to add the /passive switch to the end.
Edit
Microsoft does, of course, provide a WMI class (Win32_Product) that provides a rather convenient interface to do all of this. Using Tim Golden's excellent WMI wrapper, one could initiate an install like this:
import wmi
c = wmi.WMI()
c.Win32_Product(Name = 'ProductName')[0].Uninstall()
However, as noted in this blog post, the Win32_Product class is extremely, painfully slow to use.

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.