Fast-Responding Command Line Scripts - python

I have been writing command-line Python scripts for a while, but recently I felt really frustrated with speed.
I'm not necessarily talking about processing speed, dispatching tasks or other command-line tool-specific processes (that is usually a design/implementation problem), but rather I am talking of simply running a tool to get a help menu, or display minimum information.
As an example, Mercurial is at around 0.080scs and GIT is at 0.030scs
I have looked into Mercurial's source code (it is Python after all) but the answer to have a fast-responding script still eludes me.
I think imports and how you manage them is a big reason to initial slow downs. But is there a best-practice for fast-acting, fast-responding command line scripts in Python?
A single Python script that import os and optparse and executes main() to parse some argument options takes 0.160scs on my machine just to display the help menu...
This is 5 times slower than just running git!
Edit:
I shouldn't have mentioned git as it is written in C. But the Mercurial part still stands, and no, pyc don't feel like big improvement (to me at least).
Edit 2:
Although lazy imports are key to speedups in Mercurial, they key to slowness in
regular Python scripts is not having auto-generated scripts with pkg_resources in them, like:
from pkg_resources import load_entry_point
If you have manually generated scripts that don't use pkg_resources you should see at least 2x speed increases.
However! Be warned that pkg_resources does provide a nice way of version dependencies so make sure you are aware that not using it basically means possible version conflicts.

In addition to compiling the Python files, Mercurial modifies importing to be on demand which does indeed reduce the start-up time. It sets __builtin__.__import__ to its own import function in the demandimport module.
If you look at the hg script in /usr/lib/ (or wherever it is on your machine), you can see this for yourself in the following lines:
try:
from mercurial import demandimport; demandimport.enable()
except ImportError:
import sys
sys.stderr.write("abort: couldn't find mercurial libraries in [%s]\n" %
' '.join(sys.path))
sys.stderr.write("(check your install and PYTHONPATH)\n")
sys.exit(-1)
If you change the demandimport line to pass, you will find that the start-up time increases substantially. On my machine, it seems to roughly double.
I recommend studying demandimport.py to see how to apply a similar technique in your own projects.
P.S. Git, as I'm sure you know, is written in C so I'm not surprised that it has a fast start-up time.

I am sorry - but certainly is not the 0.08 seconds that is bothering you. Although you don say it it feels like you are running an "outter" shell (or other language) scritp thatis calling several hundred Python scripts inside a loop -
That is the only way these start-up times cound make any difference. So, either are withholding this crucial information in your question, or your father is this guy.
So, assuming you have an external scripts that calls of the order of hundereds of python process: write this external script in Python, and import whatver python stuff you need in the same process and run it from there. There fore you will cut on interpretor start-up and module import for each script execution.
That applies even for mercurial, for example. You can import "mercurial" and the apropriate submodules and call functions inside it that perform the same actions than equivalent command line arguments

Related

Run different script with pypy3 or python

Is there a way to run the main script in pypy3, but an import, say helper.py to be executed/interpreted by regular python? And vice versa?
To clarify, let's say I have main.py that I want to execute with pypy3. That script imports helper and I want the entire script in helper.py to be executed with python3. Or vice versa. I was wondering if there's something like import pyximport; pyximport install() where the import is then compiled, basically work/act differently as the main.py. I was wondering if there's something like that, that I can do. Currently, I would use pypy3 main.py and within main.py, have subprocess.popen and execute python helper.py, and just pass an object or results through the stdout/pipe. Curious if there are other ways I could do this.
Yes, I know you would ask why even bother doing this. I am currently thinking of this since iterating a file with python in Windows is much faster than iterating a file line by line with pypy3. I know they are trying to update/fix this, but since it is not yet fixed, was wondering what I could do. In Linux, pypy3 works great, even in iterating a file.
I guess another scenario can be when a library is not supported in pypy3 yet, so you would want to still execute that script with python3, but maybe the other part of the scripts you may want to use pypy3 to gain some performance. Hope this question was clear.
Subprocess seems like the right way to go. There are however, humanized equivalent libraries for managing subprocesses that you could look at like,
Delegator
Envoy
Pexpect
This feels like an interesting experiment to provide fallback support for libraries or functions that are not supported in one runtime environment, but can be executed in some other supported environment and still retain the linear flow of execution of the program.
How you would scale this? is an entirely different question.

What is the recommended practice in django to execute external scripts?

I'm planning to build a WebApp that will need to execute scripts based on the argument that an user will provide in a text-field or in the Url.
possible solutions that I have found:
create a lib directory in the root directory of the project, and put the scripts there, and import it from views.
using subprocess module to directly run the scripts in the following way:
subprocess.call(['python', 'somescript.py', argument_1,...])
argument_1: should be what an end user provides.
I'm planning to build a WebApp that will need to execute scripts
Why should it "execute scripts" ? Turn your "scripts" into proper modules, import the relevant functions and call them. The fact that Python can be used as a "scripting language" doesn't mean it's not a proper programming language.
Approach (1) should be the default approach. Never subprocess unless you absolutely have to.
Disadvantages of subprocessing:
Depends on the underlying OS and in your case Python (i.e. is python command the same as the Python that runs the original script?).
Potentially harder to make safe.
Harder to pass values, return results and report errors.
Eats more memory and cpu (a side effect is that you can utilize all cpu cores but since you are writing a web app it is likely you do that anyway).
Generally harder to code and maintain.
Advantages of subprocessing:
Isolates the runtime. This is useful if for example scripts are uploaded by users. You don't want them to mess with your application.
Related to 1: potentially easier to dynamically add scripts. Not that you should do that anyway. Also becomes harder when you have more then 1 server and you need to synchronize them.
Well, you can run non-python code that way. But it doesn't apply to your case.

Lazily download/install python submodules

Would it be possible to create a python module that lazily downloads and installs submodules as needed? I've worked with "subclassed" modules that mimic real modules, but I've never tried to do so with downloads involved. Is there a guaranteed directory that I can download source code and data to, that the module would then be able to use on subsequent runs?
To make this more concrete, here is the ideal behavior:
User runs pip install magic_module and the lightweight magic_module is installed to their system.
User runs the code import magic_module.alpha
The code goes to a predetermine URL, is told that there is an "alpha" subpackage, and is then given the URLs of alpha.py and alpha.csv files.
The system downloads these files to somewhere that it knows about, and then loads the alpha module.
On subsequent runs, the user is able to take advantage of the downloaded files to skip the server trip.
At some point down the road, the user could run a import magic_module.alpha ; alpha._upgrade() function from the command line to clear the cache and get the latest version.
Is this possible? Is this reasonable? What kinds of problems will I run into with permissions?
Doable, certainly. The core feature will probably be import hooks. The relevant module would be importlib in python 3.
Extending the import mechanism is needed when you want to load modules that are stored in a non-standard way. Examples include [...] modules that are loaded from a database over a network.
Convenient, probably not. The import machinery is one of the parts of python that has seen several changes over releases. It's undergoing a full refactoring right now, with most of the existing things being deprecated.
Reasonable, well it's up to you. Here are some caveats I can think of:
Tricky to get right, especially if you have to support several python versions.
What about error handling? Should application be prepared for import to fail in normal circumstances? Should they degrade gracefully? Or just crash and spew a traceback?
Security? Basically you're downloading code from someplace, how do you ensure the connection is not being hijacked?
How about versionning? If you update some of the remote modules, how can make the application download the correct version?
Dependencies? Pushing of security updates? Permissions management?
Summing it up, you'll have to solve most of the issues of a package manager, along with securing downloads and permissions issues of course. All those issues are tricky to begin with, easy to get wrong with dire consequences.
So with all that in mind, it really comes down to how much resources you deem worth investing into that, and what value that adds over a regular use of readily available tools such as pip.
(the permission question cannot really be answered until you come up with a design for your package manager)

How do I systematically identify the dependencies Python has across its accessible package/module tree?

Question: How can I systematically probe into files that are involved at any time by the interpreter (like in debug mode).
When everything fails I get error message. What I ask for is the opposite: Everything works, but I don't know how much redundant rubbish I have in comparison to its usage, even though I can imagine that something like pynotify probably could trace it.
Context:
I've spent all morning exercising trial & error to get a package to work. I'm sure I have copied the relevant python package into at least 3 directories and messed up my windows setx -m path badly with junk. Now I'm wondering how to clean it all up without breaking any dependencies, and actually learn from the process.
I can't possibly be the only one wondering about this. Some talented test-developer must have written a script/package that:
import everything from everywhere
check for all dependencies
E = list(errorMessages)
L = list_of_stuff_that_was_used
print L
print E
so if I have something stored which is not in L, I can delete it. But of course the probing has to be thorough to exhaust all accessible files (or at least actively used directories).
What the question is NOT about:
I'm not interested in what is on the sys.path. This is trivial.
More Context:
I know from The Hitchhikers Guide to Packaging that the future of this problem is being adressed, however it does not probe into the past. So with the transition from Python 2xx to 3xx this problem must become more and more relevant?
The dynamic nature of python makes this a next to impossible task.
Functions can import too, for example. Are you going to run all code in all modules?
And then there are backward-compatibility tests; import pysqlite2 if sqlite3 is not present, use a backport module if collections.Counter isn't present in the current version of Python, etc. There are platform-specific modules (os.path is posixpath, ntpath (same code but renamed) or riscospath depending on the current platform), and whole-sale imports into the main module (posix, nt, os2, ce and riscos all can be used by the os module depending on the platform to supply functions).
Packages that use setuptools declare their dependencies and are discoverable through the pkg_resources library. That's the limit of what you can reasonably discover.

Speeding up the python "import" loader

I'm getting seriously frustrated at how slow python startup is. Just importing more or less basic modules takes a second, since python runs down the sys.path looking for matching files (and generating 4 stat() calls - ["foo", "foo.py", "foo.pyc", "foo.so"] - for each check). For a complicated project environment, with tons of different directories, this can take around 5 seconds -- all to run a script that might fail instantly.
Do folks have suggestions for how to speed up this process? For instance, one hack I've seen is to set the LD_PRELOAD_32 environment variable to a library that caches the result of ENOENT calls (e.g. failed stat() calls) between runs. Of course, this has all sorts of problems (potentially confusing non-python programs, negative caching, etc.).
zipping up as many pyc files as feasible (with proper directory structure for packages), and putting that zipfile as the very first entry in sys.path (on the best available local disk, ideally) can speed up startup times a lot.
The first things that come to mind are:
Try a smaller path
Make sure your modules are pyc's so they'll load faster
Make sure you don't double import, or import too much
Other than that, are you sure that the disk operations are what's bogging you down? Is your disk/operating system really busy or old and slow?
Maybe a defrag is in order?
When trying to speed things up, profiling is key. Otherwise, how will you know which parts of your code are really the slow ones?
A while ago, I've created the runtime and import profile visualizer tuna, and I think it may be useful here. Simply create an import profile (with Python 3.7+) and run tuna on it:
python3.7 -X importtime -c "import scipy" 2> scipy.log
tuna scipy.log
If you run out of options, you can create a ramdisk to store your python packages. A ramdisk appears as a directory in your file system, but will actually be mapped directly to your computer's RAM. Here are some instructions for Linux/Redhat.
Beware: A ramdisk is volatile, so you'll also need to keep a backup of your files on your regular hard drive, otherwise you'll lose your data when your computer shuts down.
Something's missing from your premise--I've never seen some "more-or-less" basic modules take over a second to import, and I'm not running Python on what I would call cutting-edge hardware. Either you're running on some seriously old hardware, or you're running on an overloaded machine, or either your OS or Python installation is broken in some way. Or you're not really importing "basic" modules.
If it's any of the first three issues, you need to look at the root problem for a solution. If it's the last, we really need to know what the specific packages are to be of any help.

Categories

Resources