In case both options available: to call a command line tool with subprocess (say, hg) or to make use of native python API (say, mercurial API), is there a case where it's more favorable to use the former?
If you want to execute some third party native code which you know is not stable and may crash with a segvault then it is better to execute it as a subprocess - you will be able to safely handle the possible crashes from your Python process.
Also, if you want to call several times some code which is known to leak memory, leave open files or other resources, from a long running Python process, then again it may be wise to run it as a subprocess. In this case the leaking memory or other resources will be reclaimed by the operating system for you each time the subprocess exits, and not accumulate.
The only way that i see myself using subprocess instead of a native python api is if some option of the program is not provided in the api.
Related
Have script a.py, it will run some task with multiple-thread, please noticed that I have no control over the a.py.
I'm looking for a way to limit the number of thread it could use, as I found that using more threads than my CPU core will slow down the script.
It could be something like:
python --nthread=2 a.py
or Modifying something in my OS is also acceptable .
I am using ubuntu 16.04
As requested:
the a.py is just a module in scikit-learn the MLPRegressor .
I also asked this question here.
A more general way, not specific to python:
taskset -c 1-3 python yourProgram.py
In this case the threads 1-3 (3 in total) will be used. Any parallelization invoked by your program will share those resources.
For a solution that fits your exact problem you should better identify which part of the code parellelizes. For instance, if it is due to numpy routines, you could limit it by using:
OMP_NUM_THREADS=4 python yourProgram.py
Again, the first solution is general and handled by the os, whereas the second is python (numpy) specific.
Read the threading doc, it said:
CPython implementation detail: In CPython, due to the Global Interpreter Lock, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use multiprocessing. However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously.
If you will like to take advantage of the multi-core processing better update the code to use multiprocessing module.
In case that you prefer anyway continue using threading, you have one option passing the number of threads to the application as an argument like:
python a.py --nthread=2
Then you can update the script code to limit the threads.
I use commands.getstatusoutput('some_terminal_command') to store output of some terminal command in a variable. But am curious to know how does python actually get the output from the terminal? does python has some part of 'Shell' or something?
Ultimately, your call to run a command uses the fork and exec system calls. These are functions provided by the OS and exposed to most programming languages, which allows the language to start a new process and get its output. It's one of the fundamental building blocks of many modern operating systems.
Note that not all OS's have fork-exec, but If they don't, they will have some sort of system-provided function for starting processes. This is one of the benefits of using a high level language: it hides platform-specific features behind a cross-platform api.
If you read the source You'll see that it uses the os.popen call. What's intriguing, however, is that os.popen isn't defined in the os.py module that I can find.
Eventually and with some help I was able to find the posixmodule source, which verifies that the way that python interacts with Linux, at least when PYCC_GCC is defined, is via the popen function. According to man popen, that means:
The popen() function opens a process by creating a pipe, forking, and invoking the shell.
If you look through this module you'll see that the module defines some other approaches for different OSes (like OSX, NT, etc.)
I need to validate phone numbers and there is a very good python library that will do this. My stack however is Go and I'm really not looking forward to porting a very large library. Do you think it would be better to use the python library by running a shell command from within the Go codebase or by running a daemon that I then have to communicate with somehow?
Python, being an interpreted language, requires the system to load the interpreter each time a script is run from the command line. Also
On my particular system, after disk caching, it takes the system 20ms to execute a script with import string (which is plausible for your use case). If you're processing a lot information, and can't submit it all at once, you should consider setting up a daemon to avoid this kind of overhead.
On the other hand, a daemon is more complex to write and test, so you should probably see if a script suits your needs before optimizing prematurely.
There's no answer to your question that fits every possible case. Ultimately, you always have to try the performance with your data and in your system,
I am thinking about making a program that will need to send input and take output from the various aircrack-ng suite tools. I know of a couple of python modules like subprocess, envoy, sarge and pexpect that would provide the necessary functionality. Can anyone advise on what I should be using or not using, especially as I'm new to python.
Thanks
As the maintainer of sarge, I can tell you that its goals are broadly similar to envoy (in terms of ease of use over subprocess) and there is (IMO) more functionality in sarge with respect to:
Cross-platform support for bash-like syntax (e.g.use of &&, ||, & in command lines)
Better support for capturing subprocess output streams and working with them asynchronously
More documentation, especially about the internals and peripheral issues like threading+forking in the context of using subprocess
Support for prevention of shell injection attacks
Of course YMMV, but you can check out the docs, they're reasonably comprehensive.
pexpect
In 2015, pexpect does not work on windows. Rumored to add "experimental" support in the next version, but this has been a rumor for a long time (I'm not holding my breath).
Having written many applications using pexpect (and loving it), I am now sorry because one of the things I love about python (that it is cross platform) is not true for my applications.
If you plan to ever add windows support, for the moment, avoid pexpect.
envoy
Not much activity in the last year. And few commits (12 total) since 2012. Not very promising for its future.
Internally it uses shlex in a way that is not compatible with windows paths (the commands must use '/' not '\' for directory separators). A workaround (when using pathlib) is to call as_posix() on path objects before passing them as commands. See this answer.
Getting access to the internal streams (i.e. I want to parse the output to have some updating scrollbars), seems possible but is not documented.
sarge
Works on windows out-of-the-box and has an expect() method that should provide functionality similar to pexpect (allowing me to update a scrollbar). Recent activity, but it is hosted on gitlab and bitbucket (very confusing).
Personal Conclusion
I'm moving from pexpect to sarge for future development. Seems to provide similar feature set to pexpect and supports windows.
subprocess - is a standard library module, so it'll be available with python installation. But it has a reputation of hard to use since it's api is non-intuitive.
envoy - is a third party module that wraps around subprocess. It was written to be an easy to use alternative to subprocess. The author of envoy Kenneth Reitz is famous for his Python for Humans philosophy.
I'm not familiar with the other two.
I would like seek some guidance in writing a "process profiler" which runs in kernel mode. I am asking for a kernel mode profiler is because I run loads of applications and I do not want my profiler to be swapped out.
When I said "process profiler" I mean to something that would monitor resource usage by the process. including usage of threads and their statistics.
And I wish to write this in python. Point me to some modules or helpful resource.
Please provide me guidance/suggestion for doing it.
Thanks,
Edit::: Would like to add that currently my interest isto write only for linux. however after i built it i will have to support windows.
It's going to be very difficult to do the process monitoring part in Python, since the python interpreter doesn't run in the kernel.
I suspect there are two easy approaches to this:
use the /proc filesystem if you have one (you don't mention your OS)
Use dtrace if you have dtrace (again, without the OS, who knows.)
Okay, following up after the edit.
First, there's no way you're going to be able to write code that runs in the kernel, in python, and is portable between Linux and Windows. Or at least if you were to, it would be a hack that would live in glory forever.
That said, though, if your purpose is to process Python, there are a lot of Python tools available to get information from the Python interpreter at run time.
If instead your desire is to get process information from other processes in general, you're going to need to examine the options available to you in the various OS APIs. Linux has a /proc filesystem; that's a useful start. I suspect Windows has similar APIs, but I don't know them.
If you have to write kernel code, you'll almost certainly need to write it in C or C++.
don't try and get python running in kernel space!
You would be much better using an existing tool and getting it to spit out XML that can be sucked into Python. I wouldn't want to port the Python interpreter to kernel-mode (it sounds grim writing it).
The /proc option does sound good.
some code code that reads proc information to determine memory usage and such. Should get you going:
http://www.pixelbeat.org/scripts/ps_mem.py reads memory information of processes using Python through /proc/smaps like charlie suggested.
Some of your comments on other answers suggest that you are a relatively inexperienced programmer. Therefore I would strongly suggest that you stay away from kernel programming, as it is very hard even for experienced programmers.
Why would you want to write something that
is a very complex system (just look at existing profiling infrastructures and how complex they are)
can not be done in python (I don't know any kernel that would allow execution of python in kernel mode)
already exists (oprofile on Linux)
have you looked at PSI? (http://www.psychofx.com/psi/)
"PSI is a Python module providing direct access to real-time system and process information. PSI is a Python C extension, providing the most efficient access to system information directly from system calls."
it might give you what you are looking for. .... or at least a starting point.
Edit 2014:
I'd recommend checking out psutil instead:
https://pypi.python.org/pypi/psutil
psutil is actively maintained and has some nifty process monitoring features. PSI seems to be somewhat dead (last release 2009).