Enumerating all modules for a binary using Python (pefile / win32api)

Enumerating all modules for a binary using Python (pefile / win32api) - python

I want to use PEfile or another Python library to enumerate all modules. I thought I had it, but then I go into WinDbg because some obvious ones were missing, and I saw there were a number of missing ones.
For filezilla.exe:
00400000 00fe7000 image00400000 image00400000
01c70000 01ecc000 combase C:\WINDOWS\SysWOW64\combase.dll
6f590000 6f5ac000 SRVCLI C:\WINDOWS\SysWOW64\SRVCLI.DLL
6f640000 6f844000 COMCTL32 C:\WINDOWS\WinSxS\x86_microsoft.windows.common-controls_6595b64144ccf1df_6.0.17134.472_none_42ecd1cc44e43e73\COMCTL32.DLL
70610000 7061b000 NETUTILS C:\WINDOWS\SysWOW64\NETUTILS.DLL
70720000 70733000 NETAPI32 C:\WINDOWS\SysWOW64\NETAPI32.dll
72910000 72933000 winmmbase C:\WINDOWS\SysWOW64\winmmbase.dll
729d0000 729d8000 WSOCK32 C:\WINDOWS\SysWOW64\WSOCK32.DLL
72b40000 72b64000 WINMM C:\WINDOWS\SysWOW64\WINMM.DLL
72b70000 72b88000 MPR C:\WINDOWS\SysWOW64\MPR.DLL
73c60000 73c6a000 CRYPTBASE C:\WINDOWS\SysWOW64\CRYPTBASE.dll
73c70000 73c90000 SspiCli C:\WINDOWS\SysWOW64\SspiCli.dll
74120000 741b6000 OLEAUT32 C:\WINDOWS\SysWOW64\OLEAUT32.dll
741c0000 7477a000 windows_storage C:\WINDOWS\SysWOW64\windows.storage.dll
74780000 7487c000 ole32 C:\WINDOWS\SysWOW64\ole32.dll
74880000 74908000 shcore C:\WINDOWS\SysWOW64\shcore.dll
74910000 7498d000 msvcp_win C:\WINDOWS\SysWOW64\msvcp_win.dll
74990000 74a4f000 msvcrt C:\WINDOWS\SysWOW64\msvcrt.dll
74a50000 74a72000 GDI32 C:\WINDOWS\SysWOW64\GDI32.dll
74bd0000 74bde000 MSASN1 C:\WINDOWS\SysWOW64\MSASN1.dll
74be0000 74c47000 WS2_32 C:\WINDOWS\SysWOW64\WS2_32.dll
74c70000 74d30000 RPCRT4 C:\WINDOWS\SysWOW64\RPCRT4.dll
74d30000 74d37000 Normaliz C:\WINDOWS\SysWOW64\Normaliz.dll
74d40000 74d79000 cfgmgr32 C:\WINDOWS\SysWOW64\cfgmgr32.dll
74fe0000 75025000 powrprof C:\WINDOWS\SysWOW64\powrprof.dll
75150000 7526e000 ucrtbase C:\WINDOWS\SysWOW64\ucrtbase.dll
75280000 75416000 CRYPT32 C:\WINDOWS\SysWOW64\CRYPT32.dll
75420000 75584000 gdi32full C:\WINDOWS\SysWOW64\gdi32full.dll
755c0000 755c8000 FLTLIB C:\WINDOWS\SysWOW64\FLTLIB.DLL
755d0000 755e8000 profapi C:\WINDOWS\SysWOW64\profapi.dll
755f0000 75635000 SHLWAPI C:\WINDOWS\SysWOW64\SHLWAPI.dll
75640000 7698a000 SHELL32 C:\WINDOWS\SysWOW64\SHELL32.dll
76990000 76b74000 KERNELBASE C:\WINDOWS\SysWOW64\KERNELBASE.dll
76cf0000 76cff000 kernel_appcore C:\WINDOWS\SysWOW64\kernel.appcore.dll
76d00000 76d17000 win32u C:\WINDOWS\SysWOW64\win32u.dll
76db0000 76e86000 COMDLG32 C:\WINDOWS\SysWOW64\COMDLG32.DLL
76e90000 7701d000 USER32 C:\WINDOWS\SysWOW64\USER32.dll
77020000 77064000 sechost C:\WINDOWS\SysWOW64\sechost.dll
77100000 771e0000 KERNEL32 C:\WINDOWS\SysWOW64\KERNEL32.DLL
771e0000 77238000 bcryptPrimitives C:\WINDOWS\SysWOW64\bcryptPrimitives.dll
77240000 772b8000 ADVAPI32 C:\WINDOWS\SysWOW64\ADVAPI32.dll
773b0000 77540000 ntdll ntdll.dll
This is the output that I obtained from pefile by using a simlar script:
ADVAPI32.dll
COMCTL32.DLL
COMDLG32.DLL
CRYPT32.dll
GDI32.dll
KERNEL32.dll
MPR.DLL
msvcrt.dll
NETAPI32.dll
Normaliz.dll
ole32.dll
OLEAUT32.dll
POWRPROF.dll
SHELL32.DLL
USER32.dll
WINMM.DLL
WS2_32.dll
WSOCK32.DLL
def findDLL():
pe = pefile.PE(name)
for each in pe.DIRECTORY_ENTRY_IMPORT:
print entry.dll
Is there something else in Pefile I should be looking at to obtain more complete listing of modules that will be loaded?
Is there something in win32api or win32con that could get me this information? I would prefer pefile if possible, but either works. I need to be able to output a listing of all modules that would be loaded. I am working in Python and inflexible about changing.

Modules can be loaded into a process in many ways. Imported DLLs are just one way.
Imported DLLs may import DLLs for themselves too. foobar.exe may depend on user32.dll, but user32.dll in turn also depends on kernel32.dll so that will be loaded into your process. If you want a complete list, you may want to check the imported DLLs if your executable's dependencies.
Modules can be dynamically loaded in code with LoadLibrary(). You will not see those in the import directory. You'll have to disassemble the code for that and even then the library name can be generated on the fly so it will be hard to tell.
There are some more unsupported methods of loading modules that malware can use.
As the comments mentioned, getting a list of loaded modules through debugging APIs is probably simpler. But it all depends on what you're actually trying to do with this data.

As a matter of fact, there are different types of techniques to import modules. The type is determined by the way the referenced module is bound to the executable. As far as I know, PEfile does only lists the dynamic link libraries that are statically bound to the executable through the Imports table. Other types of dynamic link libraries are: explicit (those called via LoadLibrary/GetProcAddress APIs), Forwarded (those loaded via a PE mechanism that allow forwarding API calls) and delayed (those loaded via a PE mechanism that allow delayed loading of API calls).
Please find below a schema representing these methods (which is part of my slides available https://winitor.com/pdf/DynamicLinkLibraries.pdf
I hope that helps.

Related

Specifying Exact CPU Instruction Set with Cythonized Python Wheels

I have a Python package with a native extension compiled by Cython. Due to some performance needs, the compilation is done with -march=native, -mtune=native flags. This basically enables the compiler to use any of the available ISA extensions.
Additionally, we keep a non-cythonized, pure-python version of this package. It should be used in environments which are less performance sensitive.
Hence, in total we have two versions published:
Cythonized wheel built for a very specific platform
Pure-python wheel.
Some other packages depend on this package, and some of the machines are a bit different than the one that the package was compiled on. Since we used -march=native, as a result we get SIGILL, since some ISA extension is missing on the server.
So, in essence, I'd like to somehow make pip disregard the native wheel if the host CPU is not compatible with the wheel.
The native wheel does have the cp37 and platform name, but I don't see a way to define a more granular ISA requirements here. I can always use --implementation flags for pip, but I wonder if there's a better way for pip to differentiate among different ISAs.
Thanks,

The pip infrastructure doesn't support such granularity.
I think a better approach would be to have two versions of the Cython-extension compiled: with -march=native and without, to install both and to decide at the run time which one should be loaded.
Here is a proof of concept.
The first hoop to jump: how to check at run time which instructions are supported by CPU/OS combination. For the simplicity we will check for AVX (this SO-post has more details) and I offer only a gcc-specific (see also this) solution - called impl_picker.pyx:
cdef extern from *:
"""
int cpu_supports_avx(void){
return __builtin_cpu_supports("avx");
}
"""
int cpu_supports_avx()
def cpu_has_avx_support():
return cpu_supports_avx() != 0
The second problem: the pyx-file and the module must have the same name. To avoid code duplication, the actual code is in a pxi-file:
# worker.pxi
cdef extern from *:
"""
int compiled_with_avx(void){
#ifdef __AVX__
return 1;
#else
return 0;
#endif
}
"""
int compiled_with_avx()
def compiled_with_avx_support():
return compiled_with_avx() != 0
As one can see, the function compiled_with_avx_support will yield different results, depending on whether it was compiled with -march=native or not.
And now we can define two versions of the module just by including the actual code from the *.pxi-file. One module called worker_native.pyx:
# distutils: extra_compile_args=["-march=native"]
include "worker.pxi"
and worker_fallback.pyx:
include "worker.pxi"
Building everything, e.g. via cythonize -i -3 *.pyx, it can be used as follows:
from impl_picker import cpu_has_avx_support
# overhead once when imported:
if cpu_has_avx_support():
import worker_native as worker
else:
print("using fallback worker")
import worker_fallback as worker
print("compiled_with_avx_support:", worker.compiled_with_avx_support())
On my machine the above would lead to compiled_with_avx_support: True, on older machines the "slower" worker_fallback will be used and the result will be compiled_with_avx_support: False.
The goal of this post is not to give a working setup.py, but just to outline the idea how one could achieve the goal of picking correct version at the run time. Obviously, the setup.py could be quite more complicated: e.g. one would need to compile multiple c-files with different compiler settings (see this SO-post, how this could be achieved).

RAM usage after importing numpy in python 3.7.2

I run conda 4.6.3 with python 3.7.2 win32. In python, when I import numpy, i see the RAM usage increase by 80MB. Since I am using multiprocessing, I wonder if this is normal and if there is anyway to avoid this RAM overhead? Please see below all the versions from relevant packages (from conda list):
python...........3.7.2 h8c8aaf0_2
mkl_fft...........1.0.10 py37h14836fe_0
mkl_random..1.0.2 py37h343c172_0
numpy...........1.15.4 py37h19fb1c0_0
numpy-base..1.15.4 py37hc3f5095_0
thanks!

You can't avoid this cost, but it's likely not as bad as it seems. The numpy libraries (a copy of C only libopenblasp, plus all the Python numpy extension modules) occupy over 60 MB on disk, and they're all going to be memory mapped into your Python process on import; adding on all the Python modules and the dynamically allocated memory involved in loading and initializing all of them, and 80 MB of increased reported RAM usage is pretty normal.
That said:
The C libraries and Python extension modules are memory mapped in, but that doesn't actually mean they occupy "real" RAM; if the code paths in a given page aren't exercised, the page will either never be loaded, or will be dropped under memory pressure (not even written to the page file, since it can always reload it from the original DLL).
On UNIX-like systems, when you fork (multiprocessing does this by default everywhere but Windows) that memory is shared between parent and worker processes in copy-on-write mode. Since the code itself is generally not written, the only cost is the page tables themselves (a tiny fraction of the memory they reference), and both parent and child will share that RAM.
Sadly, on Windows, fork isn't an option (unless you're running Ubuntu bash on Windows, in which case it's only barely Windows, effectively Linux), so you'll likely pay more of the memory costs in each process. But even there, libopenblasp, the C library backing large parts of numpy, will be remapped per process, but the OS should properly share that read-only memory across processes (and large parts, if not all, of the Python extension modules as well).
Basically, until this actually causes a problem (and it's unlikely to do so), don't worry about it.

[NumPy]: NumPy
is the fundamental package for scientific computing with Python.
It is a big package, designed to work with large datasets and optimized (primarily) for speed. If you look in its __init__.py (which gets executed when importing it (e.g.: import numpy)), you'll notice that it imports lots of items (packages / modules):
Those items themselves, may import others
Some of them are extension modules (.pyds (.dlls) or .sos) which get loaded into the current process (their dependencies as well)
I've prepared a demo.
code.py:
#!/usr/bin/env python3
import sys
import os
import psutil
#import pprint
def main():
display_text = "This {:s} screenshot was taken. Press <Enter> to continue ... "
pid = os.getpid()
print("Pid: {:d}\n".format(pid))
p = psutil.Process(pid=pid)
mod_names0 = set(k for k in sys.modules)
mi0 = p.memory_info()
input(display_text.format("first"))
import numpy
input(display_text.format("second"))
mi1 = p.memory_info()
for idx, mi in enumerate([mi0, mi1], start=1):
print("\nMemory info ({:d}): {:}".format(idx, mi))
print("\nExtra modules imported by `{:s}` :".format(numpy.__name__))
print(sorted(set(k for k in sys.modules) - mod_names0))
#pprint.pprint({k: v for k, v in sys.modules.items() if k not in mod_names0})
print("\nDone.")
if __name__ == "__main__":
print("Python {:s} on {:s}\n".format(sys.version, sys.platform))
main()
Output:
[cfati#CFATI-5510-0:e:\Work\Dev\StackOverflow\q054675983]> "e:\Work\Dev\VEnvs\py_064_03.06.08_test0\Scripts\python.exe" code.py
Python 3.6.8 (tags/v3.6.8:3c6b436a57, Dec 24 2018, 00:16:47) [MSC v.1916 64 bit (AMD64)] on win32
Pid: 27160
This first screenshot was taken. Press <Enter> to continue ...
This second screenshot was taken. Press <Enter> to continue ...
Memory info (1): pmem(rss=15491072, vms=8458240, num_page_faults=4149, peak_wset=15495168, wset=15491072, peak_paged_pool=181160, paged_pool=180984, peak_nonpaged_pool=13720, nonpaged_pool=13576, pagefile=8458240, peak_pagefile=8458240, private=8458240)
Memory info (2): pmem(rss=27156480, vms=253882368, num_page_faults=7283, peak_wset=27205632, wset=27156480, peak_paged_pool=272160, paged_pool=272160, peak_nonpaged_pool=21640, nonpaged_pool=21056, pagefile=253882368, peak_pagefile=253972480, private=253882368)
Extra modules imported by `numpy` :
['_ast', '_bisect', '_blake2', '_compat_pickle', '_ctypes', '_decimal', '_hashlib', '_pickle', '_random', '_sha3', '_string', '_struct', 'argparse', 'ast', 'atexit', 'bisect', 'copy', 'ctypes', 'ctypes._endian', 'cython_runtime', 'decimal', 'difflib', 'gc', 'gettext', 'hashlib', 'logging', 'mtrand', 'numbers', 'numpy', 'numpy.__config__', 'numpy._distributor_init', 'numpy._globals', 'numpy._import_tools', 'numpy.add_newdocs', 'numpy.compat', 'numpy.compat._inspect', 'numpy.compat.py3k', 'numpy.core', 'numpy.core._internal', 'numpy.core._methods', 'numpy.core._multiarray_tests', 'numpy.core.arrayprint', 'numpy.core.defchararray', 'numpy.core.einsumfunc', 'numpy.core.fromnumeric', 'numpy.core.function_base', 'numpy.core.getlimits', 'numpy.core.info', 'numpy.core.machar', 'numpy.core.memmap', 'numpy.core.multiarray', 'numpy.core.numeric', 'numpy.core.numerictypes', 'numpy.core.records', 'numpy.core.shape_base', 'numpy.core.umath', 'numpy.ctypeslib', 'numpy.fft', 'numpy.fft.fftpack', 'numpy.fft.fftpack_lite', 'numpy.fft.helper', 'numpy.fft.info', 'numpy.lib', 'numpy.lib._datasource', 'numpy.lib._iotools', 'numpy.lib._version', 'numpy.lib.arraypad', 'numpy.lib.arraysetops', 'numpy.lib.arrayterator', 'numpy.lib.financial', 'numpy.lib.format', 'numpy.lib.function_base', 'numpy.lib.histograms', 'numpy.lib.index_tricks', 'numpy.lib.info', 'numpy.lib.mixins', 'numpy.lib.nanfunctions', 'numpy.lib.npyio', 'numpy.lib.polynomial', 'numpy.lib.scimath', 'numpy.lib.shape_base', 'numpy.lib.stride_tricks', 'numpy.lib.twodim_base', 'numpy.lib.type_check', 'numpy.lib.ufunclike', 'numpy.lib.utils', 'numpy.linalg', 'numpy.linalg._umath_linalg', 'numpy.linalg.info', 'numpy.linalg.lapack_lite', 'numpy.linalg.linalg', 'numpy.ma', 'numpy.ma.core', 'numpy.ma.extras', 'numpy.matrixlib', 'numpy.matrixlib.defmatrix', 'numpy.polynomial', 'numpy.polynomial._polybase', 'numpy.polynomial.chebyshev', 'numpy.polynomial.hermite', 'numpy.polynomial.hermite_e', 'numpy.polynomial.laguerre', 'numpy.polynomial.legendre', 'numpy.polynomial.polynomial', 'numpy.polynomial.polyutils', 'numpy.random', 'numpy.random.info', 'numpy.random.mtrand', 'numpy.testing', 'numpy.testing._private', 'numpy.testing._private.decorators', 'numpy.testing._private.nosetester', 'numpy.testing._private.pytesttester', 'numpy.testing._private.utils', 'numpy.version', 'pathlib', 'pickle', 'pprint', 'random', 'string', 'struct', 'tempfile', 'textwrap', 'unittest', 'unittest.case', 'unittest.loader', 'unittest.main', 'unittest.result', 'unittest.runner', 'unittest.signals', 'unittest.suite', 'unittest.util', 'urllib', 'urllib.parse']
Done.
And the (before and after import) screenshots ([MS.Docs]: Process Explorer):
As a personal remark, I think that ~80 MiB (or whatever the exact amount is), is more than decent for the current "era", which is characterized by ridiculously high amounts of hardware resources, especially in the memories area. Besides, that would probably insignificant, compared to the amount required by the arrays themselves. If it's not the case, you should probably consider moving away from numpy.
There could be a way to reduce the memory footprint, by selectively importing only the modules containing the features that you need (my personal advice is against it), and thus going around __init__.py:
You'd have to be an expert in numpy's internals
Modules must be imported "manually" (by file name), using [Python 3]: importlib - The implementation of import (or alternatives)
Their dependents will be imported / loaded as well (and because of this, I don't know how much free memory you'd gain)

How can I partition pyspark RDDs holding R functions

import rpy2.robjects as robjects
dffunc = sc.parallelize([(0,robjects.r.rnorm),(1,robjects.r.runif)])
dffunc.collect()
Outputs
[(0, <rpy2.rinterface.SexpClosure - Python:0x7f2ecfc28618 / R:0x26abd18>), (1, <rpy2.rinterface.SexpClosure - Python:0x7f2ecfc283d8 / R:0x26aad28>)]
While the partitioned version results in an error:
dffuncpart = dffunc.partitionBy(2)
dffuncpart.collect()
RuntimeError: ('R cannot evaluate code before being initialized.', <built-in function unserialize>
It seems like this error is that R wasn't loaded on one of the partitions, which I assume implies that the first import step was not performed. Is there anyway around this?
EDIT 1 This second example causes me to think there's a bug in the timing of pyspark or rpy2.
dffunc = sc.parallelize([(0,robjects.r.rnorm), (1,robjects.r.runif)]).partitionBy(2)
def loadmodel(model):
import rpy2.robjects as robjects
return model[1](2)
dffunc.map(loadmodel).collect()
Produces the same error R cannot evaluate code before being initialized.
dffuncpickle = sc.parallelize([(0,pickle.dumps(robjects.r.rnorm)),(1,pickle.dumps(robjects.r.runif))]).partitionBy(2)
def loadmodelpickle(model):
import rpy2.robjects as robjects
import pickle
return pickle.loads(model[1])(2)
dffuncpickle.map(loadmodelpickle).collect()
Works just as expected.

I'd like to say that "this is not a bug in rpy2, this is a feature" but I'll realistically have to settle with "this is a limitation".
What is happening is that rpy2 has 2 interface levels. One is a low-level one (closer to R's C API) and available through rpy2.rinterface and the other one is a high-level interface with more bells and whistles, more "pythonic", and with classes for R objects inheriting from rinterface level-ones (that last part is important for the part about pickling below). Importing the high-level interface results in initializing (starting) the embedded R with default parameters if necessary. Importing the low-level interface rinterface does not have this side effect and the initialization of the embedded R must be performed explicitly (function initr). rpy2 was designed this way because the initialization of the embedded R can have parameters: importing first rpy2.rinterface, setting the initialization, then importing rpy2.robjects makes this possible.
In addition to that the serialization (pickling) of R objects wrapped by rpy2 is currently only defined at the rinterface level (see the documentation). Pickling robjects-level (high-level) rpy2 objects is using the rinterface-level code and when unpickling them they will remain at that lower-level (the Python pickle contains the module the class of the object is defined in and will import that module - here rinterface, which does not imply the initialization of the embedded R). The reason for things being this way are simply that it was "good enough for now": at the time this was implemented I had to simultaneously think of a good way to bridge two somewhat different languages and learn my way through Python C-API and pickling/unpickling Python objects. Given the ease with which one can write something like
import rpy2.robjects
or
import rpy2.rinterface
rpy2.rinterface.initr()
before unpickling, this was never revisited. The uses of rpy2's pickling I know about are using Python's multiprocessing (and adding something similar to the import statements in the code initializing a child process was a cheap and sufficient fix). May this is the time to look at this again. File a bug report for rpy2 if the case.
edit: this is undoubtedly an issue with rpy2. pickled robjects-level objects should unpickle back to robjects-level, not rinterface-level. I have opened an issue in the rpy2 tracker (and already pushed a rudimentary patch in the default/dev branch).
2nd edit: The patch is part of released rpy2 starting with version 2.7.7 (latest release at the time of writing is 2.7.8).

Which python module is used to read CPU temperature and processor fan speed in Windows?

Which python module is used to read CPU temperature and processor Fan speed in Windows?
I explored the WMI python module, however I am unable to find the correct option or function to capture the above mentioned info.
Actually I tried the following code snip, but it returns 'nothing'.
import wmi
w = wmi.WMI()
print w.Win32_TemperatureProbe()[0].CurrentReading
Is there a way to get this information?

As per Microsoft's MSDN:
Most of the information that the Win32_TemperatureProbe WMI class
provides comes from SMBIOS. Real-time readings for the CurrentReading
property cannot be extracted from SMBIOS tables. For this reason,
current implementations of WMI do not populate the CurrentReading
property. The CurrentReading property's presence is reserved for
future use.
You can use MSAcpi_ThermalZoneTemperature instead:
import wmi
w = wmi.WMI(namespace="root\\wmi")
print (w.MSAcpi_ThermalZoneTemperature()[0].CurrentTemperature/10.0)-273.15

This works perfectly:
import wmi
w = wmi.WMI(namespace="root\\wmi")
print (w.MSAcpi_ThermalZoneTemperature()[0].CurrentTemperature / 10.0) - 273.15
Make sure you're running the program as administrator, otherwise it will fail or give error codes when trying to test/run/execute your code.

how to find list of modules which depend upon a specific module in python

In order to reduce development time of my Python based web application, I am trying to use reload() for the modules I have recently modified. The reload() happens through a dedicated web page (part of the development version of the web app) which lists the modules which have been recently modified (and the modified time stamp of py file is later than the corresponding pyc file). The full list of modules is obtained from sys.modules (and I filter the list to focus on only those modules which are part of my package).
Reloading individual python files seems to work in some cases and not in other cases. I guess, all the modules which depend on a modified module should be reloaded and the reloading should happen in proper order.
I am looking for a way to get the list of modules imported by a specific module. Is there any way to do this kind of introspection in Python?
I understand that my approach might not be 100% guaranteed and the safest way would be to reload everything, but if a fast approach works for most cases, it would be good enough for development purposes.
Response to comments regarding DJango autoreloader
#Glenn Maynard, Thanx, I had read about DJango's autoreloader. My web app is based on Zope 3 and with the amount of packages and a lot of ZCML based initializations, the total restart takes about 10 seconds to 30 seconds or more if the database size is bigger. I am attempting to cut down on this amount of time spent during restart. When I feel I have done a lot of changes, I usually prefer to do full restart, but more often I am changing couple of lines here and there for which I do not wish to spend so much of time. The development setup is completely independent of production setup and usually if something is wrong in reload, it becomes obvious since the application pages start showing illogical information or throwing exceptions. Am very much interested in exploring whether selective reload would work or not.

So - this answers "Find a list of modules which depend on a given one" - instead of how the question was initally phrased - which I answered above.
As it turns out, this is a bit more complex: One have to find the dependency tree for all loaded modules, and invert it for each module, while preserving a loading order that would not break things.
I had also posted this to brazillian's python wiki at:
http://www.python.org.br/wiki/RecarregarModulos
#! /usr/bin/env python
# coding: utf-8
# Author: João S. O. Bueno
# Copyright (c) 2009 - Fundação CPqD
# License: LGPL V3.0
from types import ModuleType, FunctionType, ClassType
import sys
def find_dependent_modules():
"""gets a one level inversed module dependence tree"""
tree = {}
for module in sys.modules.values():
if module is None:
continue
tree[module] = set()
for attr_name in dir(module):
attr = getattr(module, attr_name)
if isinstance(attr, ModuleType):
tree[module].add(attr)
elif type(attr) in (FunctionType, ClassType):
tree[module].add(attr.__module__)
return tree
def get_reversed_first_level_tree(tree):
"""Creates a one level deep straight dependence tree"""
new_tree = {}
for module, dependencies in tree.items():
for dep_module in dependencies:
if dep_module is module:
continue
if not dep_module in new_tree:
new_tree[dep_module] = set([module])
else:
new_tree[dep_module].add(module)
return new_tree
def find_dependants_recurse(key, rev_tree, previous=None):
"""Given a one-level dependance tree dictionary,
recursively builds a non-repeating list of all dependant
modules
"""
if previous is None:
previous = set()
if not key in rev_tree:
return []
this_level_dependants = set(rev_tree[key])
next_level_dependants = set()
for dependant in this_level_dependants:
if dependant in previous:
continue
tmp_previous = previous.copy()
tmp_previous.add(dependant)
next_level_dependants.update(
find_dependants_recurse(dependant, rev_tree,
previous=tmp_previous,
))
# ensures reloading order on the final list
# by postponing the reload of modules in this level
# that also appear later on the tree
dependants = (list(this_level_dependants.difference(
next_level_dependants)) +
list(next_level_dependants))
return dependants
def get_reversed_tree():
"""
Yields a dictionary mapping all loaded modules to
lists of the tree of modules that depend on it, in an order
that can be used fore reloading
"""
tree = find_dependent_modules()
rev_tree = get_reversed_first_level_tree(tree)
compl_tree = {}
for module, dependant_modules in rev_tree.items():
compl_tree[module] = find_dependants_recurse(module, rev_tree)
return compl_tree
def reload_dependences(module):
"""
reloads given module and all modules that
depend on it, directly and otherwise.
"""
tree = get_reversed_tree()
reload(module)
for dependant in tree[module]:
reload(dependant)
This wokred nicely in all tests I made here - but I would not recoment abusing it.
But for updating a running zope2 server after editing a few lines of code, I think I would use this myself.

You might want to take a look at Ian Bicking's Paste reloader module, which does what you want already:
http://pythonpaste.org/modules/reloader?highlight=reloader
It doesn't give you specifically a list of dependent files (which is only technically possible if the packager has been diligent and properly specified dependencies), but looking at the code will give you an accurate list of modified files for restarting the process.

Some introspection to the rescue:
from types import ModuleType
def find_modules(module, all_mods = None):
if all_mods is None:
all_mods = set([module])
for item_name in dir(module):
item = getattr(module, item_name)
if isinstance(item, ModuleType) and not item in all_mods:
all_mods.add(item)
find_modules(item, all_mods)
return all_mods
This gives you a set with all loaded modules - just call the function with your first module as a sole parameter. You can then iterate over the resulting set reloading it, as simply as:
[reload (m) for m in find_modules(<module>)]

Develop Reference

Python is a programming language that lets you work quickly and integrate systems more effectively.