(Assume that: application start-up time is absolutely critical; my application is started a lot; my application runs in an environment in which importing is slower than usual; many files need to be imported; and compilation to .pyc files is not available.)
I would like to concatenate all the Python source files that define a collection of modules into a single new Python source file.
I would like the result of importing the new file to be as if I imported one of the original files (which would then import some more of the original files, and so on).
Is this possible?
Here is a rough, manual simulation of what a tool might produce when fed the source files for modules 'bar' and 'baz'. You would run such a tool prior to deploying the code.
__file__ = 'foo.py'
def _module(_name):
import types
mod = types.ModuleType(name)
mod.__file__ = __file__
sys.modules[module_name] = mod
return mod
def _bar_module():
def hello():
print 'Hello World! BAR'
mod = create_module('foo.bar')
mod.hello = hello
return mod
bar = _bar_module()
del _bar_module
def _baz_module():
def hello():
print 'Hello World! BAZ'
mod = create_module('foo.bar.baz')
mod.hello = hello
return mod
baz = _baz_module()
del _baz_module
And now you can:
from foo.bar import hello
hello()
This code doesn't take account of things like import statements and dependencies. Is there any existing code that will assemble source files using this, or some other technique?
This is very similar idea to tools being used to assemble and optimise JavaScript files before sending to the browser, where the latency of multiple HTTP requests hurts performance. In this Python case, it's the latency of importing hundreds of Python source files at startup which hurts.
If this is on google app engine as the tags indicate, make sure you are using this idiom
def main():
#do stuff
if __name__ == '__main__':
main()
Because GAE doesn't restart your app every request unless the .py has changed, it just runs main() again.
This trick lets you write CGI style apps without the startup performance hit
AppCaching
If a handler script provides a main()
routine, the runtime environment also
caches the script. Otherwise, the
handler script is loaded for every
request.
I think that due to the precompilation of Python files and some system caching, the speed up that you'll eventually get won't be measurable.
Doing this is unlikely to yield any performance benefits. You're still importing the same amount of Python code, just in fewer modules - and you're sacrificing all modularity for it.
A better approach would be to modify your code and/or libraries to only import things when needed, so that a minimum of required code is loaded for each request.
Without dealing with the question, whether or not this technique would boost up things at your environment, say you are right, here is what I would have done.
I would make a list of all my modules e.g.
my_files = ['foo', 'bar', 'baz']
I would then use os.path utilities to read all lines in all files under the source directory and writes them all into a new file, filtering all import foo|bar|baz lines since all code is now within a single file.
Of curse, at last adding the main() from __init__.py (if there is such) at the tail of the file.
Related
So I'm new to Python and I need some help on how to improve my life. I learned Python for work and need to cut my workload a little. I have three different scripts which I run around 5 copies of at the same time all the time, they read XML data and add in information etc... However, when I make a change to a script I have to change the 5 other files too, which is annoying after a while. I can't just run the same script 5 times because each file needs some different parameters which I store as variables at the start in every script (different filepaths...).
But I'm sure theres a much better way out there?
A very small example:
script1.py
xml.open('c:\file1.xls')
while True:
do script...
script2.py
xml.open('c:\file2.xls')
while True:
do exactley the same script...
etc...
You'll want to learn about Python functions and modules.
A function is the solution to your problem: it bundles some functionality and allows you to call it to run it, with only minor differences passed as a parameter:
def do_something_with_my_sheet(name):
xml.open(name)
while True:
do script...
Elsewhere in your script, you can just call the function:
do_something_with_my_sheet('c:\file1.xls')
Now, if you want to use the same function from multiple other scripts, you can put the function in a module and import it from both scripts. For example:
This is my_module.py:
def do_something_with_my_sheet(name):
xml.open(name)
while True:
do script...
This is script1.py:
import my_module
my_module.do_something_with_my_sheet('c:\file1.xls')
And this could be script2.py (showing a different style of import):
from my_module import do_something_with_my_sheet
do_something_with_my_sheet('c:\file2.xls')
Note that the examples above assume you have everything sitting in a single folder, all the scripts in one place. You can separate stuff for easier reuse by putting your module in a package, but that's beyond the scope of this answer - look into it if you're curious.
You only need one script, that takes the name of the file as an argument:
import sys
xml.open(sys.argv[1])
while True:
do script...
Then run the script. Other variables can be passed as additional arguments, accessed via sys.argv[2], etc.
If there are many such parameters, it may be easier to save them in a configuration file, the pass the name of the configuration file as the single argument. Your script would then parse the file for all the information it needs.
For example, you might have a JSON file with contents like
{
"filename": "c:\file1.xls",
"some_param": 6,
"some_other_param": True
}
and your script would look like
import json
import sys
with open(sys.argv[1]) as f:
config = json.load(f)
xml.open(config['filename'])
while True:
do stuff using config['some_param'] and config['some_other_param']
I have a bytecode document that declares functions and a logo. I also have a .py file where I call the bytecode to output the logo and strings in the functions. How do I go about actually executing the bytecode? I was able to dissemble it and see the assembly code. How can I actually run it?
question.py
import dis
import logo
def work_here():
# execute the bytecode
def main():
work_here()
if __name__ == '__main__':
main()
Try something like:
import dis
code = 'some byte code'
b_code = dis.Bytecode(code)
exec(b.codeobj)
To import a .pyc file, you just do the same thing you do with a .py file: import spam will find an appropriately-placed spam.pyc (or rather, something like __pycache__/spam.cpython-36.pyc) just as it will find an appropriately-placed spam.py. Its top-level code gets run, any functions and classes get defined so you can call them, etc., exactly the same as with a .py file; the only difference is that there isn't source text to show for things like tracebacks or debugger stepping.
If you want to programmatically import a .pyc file by explicit path, or execute one without importing it, you again do the same thing you do with a .py file.
Look at the Examples in importlib. For example:
path = 'bytecoderepo/myfile.pyc'
spec = importlib.util.spec_from_file('myfile', path)
mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
And now, the code in bytecoderepo/myfile.pyc has been executed, and the resulting module is available in the variable mod, but it isn't in sys.modules or stored as a global.
If you actually need to dig into the .pyc format and, e.g., extract the bytecode of some function so you can exec it (or build a function object out of it) without executing the main module code, the details are only documented in the source, and subject to change between Python versions. Start with importlib; being able to (validate and) skip over the header and marshal.loads the body may be as far as you need to learn, but probably not (since ultimately, that's what the module loader already does for you in the sample code above, so if that's not good enough, you need to get deeper into the internals).
When writing throwaway scripts it's often needed to load a configuration file, image, or some such thing from the same directory as the script. Preferably this should continue to work correctly regardless of the directory the script is executed from, so we may not want to simply rely on the current working directory.
Something like this works fine if defined within the same file you're using it from:
from os.path import abspath, dirname, join
def prepend_script_directory(s):
here = dirname(abspath(__file__))
return join(here, s)
It's not desirable to copy-paste or rewrite this same function into every module, but there's a problem: if you move it into a separate library, and import as a function, __file__ is now referencing some other module and the results are incorrect.
We could perhaps use this instead, but it seems like the sys.argv may not be reliable either.
def prepend_script_directory(s):
here = dirname(abspath(sys.argv[0]))
return join(here, s)
How to write prepend_script_directory robustly and correctly?
I would personally just os.chdir into the script's directory whenever I execute it. It is just:
import os
os.chdir(os.path.split(__file__)[0])
However if you did want to refactor this thing into a library, you are in essence wanting a function that is aware of its caller's state. You thus have to make it
prepend_script_directory(__file__, blah)
If you just wanted to write
prepend_script_directory(blah)
you'd have to do cpython-specific tricks with stack frames:
import inspect
def getCallerModule():
# gets globals of module called from, and prints out __file__ global
print(inspect.currentframe().f_back.f_globals['__file__'])
I think the reason it doesn't smell right is that $PYTHONPATH (or sys.path) is the proper general mechanism to use.
You want pkg_resources
import pkg_resources
foo_fname = pkg_resources.resource_filename(__name__, "foo.txt")
I have written three simple scripts (which I will not post here, as they are part of my dissertation research) that are all in working order.
What I would like to do now is write a "batch-processing" script for them. I have many (read as potentially tens of thousands) of data files on which I want these scripts to act.
My questions about this process are as follows:
What is the most efficient way to go about this sort of thing?
I am relatively new to programming. Is there a simple way to do this, or is this a very complex endeavor?
Before anyone downvotes this question as "unresearched" or whatever negative connotation comes to mind, PLEASE just offer help. I have spent days reading documentation and following leads from Google searches, and it would be most appreciated if a human being could offer some input.
If you just need to have the scripts run, probably a shell script would be the easiest thing.
If you want to stay in Python, the best way would be to have a main() (or somesuch) function in each script (and have each script importable), have the batch script import the subscript and then run its main.
If staying in Python:
- your three scripts must have the .py ending to be importable
- they should either be in Python's search path, or the batch control script can set the path
- they should each have a main function (or whatever name you choose) that will activate that script
For example:
batch_script
import sys
sys.path.insert(0, '/location/of/subscripts')
import first_script
import second_script
import third_script
first_script.main('/location/of/files')
second_script.main('/location/of/files')
third_script.main('/location/of/files')
example sub_script
import os
import sys
import some_other_stuff
SOMETHING_IMPORTANT = 'a value'
def do_frobber(a_file):
...
def main(path_to_files):
all_files = os.listdir(path_to_files)
for file in all_files:
do_frobber(os.path.join(path_to_files, file)
if __name__ == '__main__':
main(sys.argv[1])
This way, your subscript can be run on its own, or called from the main script.
You can write a batch script in python using os.walk() to generate a list of the files and then process them one by one with your existing python programs.
import os, re
for root, dir, file in os.walk(/path/to/files):
for f in file:
if re.match('.*\.dat$', f):
run_existing_script1 root + "/" file
run_existing_script2 root + "/" file
If there are other files in the directory you might want to add a regex to ensure you only process the files you're interested in.
EDIT - added regular expression to ensure only files ending ".dat" are processed.
This is a very basic question - but I haven't been able to find an answer by searching online.
I am using python to control ArcGIS, and I have a simple python script, that calls some pre-written code.
However, when I make a change to the pre-written code, it does not appear to result in any change. I import this module, and have tried refreshing it, but nothing happens.
I've even moved the file it calls to another location, and the script still works fine. One thing I did yesterday was I added the folder where all my python files are to the sys path (using sys.append('path') ), and I wonder if that made a difference.
Thanks in advance, and sorry for the sloppy terminology.
It's unclear what you mean with "refresh", but the normal behavior of Python is that you need to restart the software for it to take a new look on a Python module and reread it.
If your changes isn't taken care of even after restart, then this is due to one of two errors:
The timestamp on the pyc-file is incorrect and some time in the future.
You are actually editing the wrong file.
You can with reload re-read a file even without restarting the software with the reload() command. Note that any variable pointing to anything in the module will need to get reimported after the reload. Something like this:
import themodule
from themodule import AClass
reload(themodule)
from themodule import AClass
One way to do this is to call reload.
Example: Here is the contents of foo.py:
def bar():
return 1
In an interactive session, I can do:
>>> import foo
>>> foo.bar()
1
Then in another window, I can change foo.py to:
def bar():
return "Hello"
Back in the interactive session, calling foo.bar() still returns 1, until I do:
>>> reload(foo)
<module 'foo' from 'foo.py'>
>>> foo.bar()
'Hello'
Calling reload is one way to ensure that your module is up-to-date even if the file on disk has changed. It's not necessarily the most efficient (you might be better off checking the last modification time on the file or using something like pyinotify before you reload), but it's certainly quick to implement.
One reason that Python doesn't read from the source module every time is that loading a module is (relatively) expensive -- what if you had a 300kb module and you were just using a single constant from the file? Python loads a module once and keeps it in memory, until you reload it.
If you are running in an IPython shell, then there are some magic commands that exist.
The IPython docs cover this feature called the autoreload extension.
Originally, I found this solution from Jonathan March's blog posting on this very subject (see point 3 from that link).
Basically all you have to do is the following, and changes you make are reflected automatically after you save:
In [1]: %load_ext autoreload
In [2]: %autoreload 2
In [3]: Import MODULE
In [4]: my_class = Module.class()
my_class.printham()
Out[4]: ham
In [5]: #make changes to printham and save
In [6]: my_class.printham()
Out[6]: hamlet
I used the following when importing all objects from within a module to ensure web2py was using my current code:
import buttons
import table
reload(buttons)
reload(table)
from buttons import *
from table import *
I'm not really sure that is what you mean, so don't hesitate to correct me. You are importing a module - let's call it mymodule.py - in your program, but when you change its contents, you don't see the difference?
Python will not look for changes in mymodule.py each time it is used, it will load it a first time, compile it to bytecode and keep it internally. It will normally also save the compiled bytecode (mymodule.pyc). The next time you will start your program, it will check if mymodule.py is more recent than mymodule.pyc, and recompile it if necessary.
If you need to, you can reload the module explicitly:
import mymodule
[... some code ...]
if userAskedForRefresh:
reload(mymodule)
Of course, it is more complicated than that and you may have side-effects depending on what you do with your program regarding the other module, for example if variables depends on classes defined in mymodule.
Alternatively, you could use the execfile function (or exec(), eval(), compile())
I had the exact same issue creating a geoprocessing script for ArcGIS 10.2. I had a python toolbox script, a tool script and then a common script. I have a parameter for Dev/Test/Prod in the tool that would control which version of the code was run. Dev would run the code in the dev folder, test from test folder and prod from prod folder. Changes to the common dev script would not run when the tool was run from ArcCatalog. Closing ArcCatalog made no difference. Even though I selected Dev or Test it would always run from the prod folder.
Adding reload(myCommonModule) to the tool script resolved this issue.
The cases will be different for different versions of python.
Following shows an example of python 3.4 version or above:
hello import hello_world
#Calls hello_world function
hello_world()
HI !!
#Now changes are done and reload option is needed
import importlib
importlib.reload(hello)
hello_world()
How are you?
For earlier python versions like 2.x, use inbuilt reload function as stated above.
Better is to use ipython3 as it provides autoreload feature.